Statistics
Statistics
Statistics
Understanding Statistics
Descriptive statistics
are brief
informational coefficients that summarize a given data set, which can
be either a representation of the entire population or a sample of a
population. Descriptive statistics are broken down into measures of
central tendency and measures of variability (spread). Measures of
central tendency include the mean, median, and mode, while
measures of variability include standard deviation, variance,
minimum and maximum variables, kurtosis, and skewness.
A population is the complete set group of individuals, whether that
group comprises a nation or a group of people with a common
characteristic.
In statistics, a population is the pool of individuals from which a
statistical sample is drawn for a study. Thus, any selection of
individuals grouped by a common feature can be said to be a
population. A sample may also refer to a statistically significant
portion of a population, not an entire population. For this reason, a
statistical analysis of a sample must report the approximate standard
deviation, or standard error, of its results from the entire population.
Only an analysis of an entire population would have no standard
error.
Inferential Statistics
There are two main types of variables. First, qualitative variables are
specific attributes that are often non-numeric. Many of the examples
given in the car example are qualitative. Other examples of
qualitative variables in statistics are gender, eye color, or city of
birth. Qualitative data is most often used to determine what
percentage of an outcome occurs for any given qualitative
variable. Qualitative analysis often does not rely on numbers. For
example, trying to determine what percentage of women own a
business analyzes qualitative data.
Nominal-level Measurement
Ordinal-level Measurement
Outcomes can be arranged in an order, but all data values have the
same value or weight. Although numerical, ordinal-level
measurements can't be subtracted against each other in statistics
because only the position of the data point matters. Ordinal levels
are often incorporated into nonparametric statistics and compared
against the total variable group.
Example: American Fred Kerley was the 2nd fastest man at the
2020 Tokyo Olympics based on 100-meter sprint times.3
Olympics. "Tokyo 2020, Athletics, Men's 100m ."
Interval-level Measurement
Example: Inflation hit 8.6% in May 2022. The last time inflation was
this high was in December 1981.4
Ratio-level Measurement
Advantages
Each item within a population has an equal chance of being
selected.
Disadvantages
Incomplete population demographics may exclude certain groups
from being sampled.
Random selection means the sample may not be truly
representative of the population.
Depending on the data set size and format, random sampling may
be a time-intensive process.
Systemic Sampling
For example, if the selected starting point was 20, the 70th person
on the list would be chosen followed by the 120th, and so on. Once
the end of the list was reached and if additional participants are
required, the count loops to the beginning of the list to finish the
count.
The researcher will soon find that there were almost 200,000 MBA graduates
for the year. They might decide just to take a simple random sample of
50,000 graduates and run a survey. Better still, they could divide the
population into strata and take a random sample from the strata. To do this,
they would create population groups based on gender, age range, race,
country of nationality, and career background. A random sample from each
stratum is taken in a number proportional to the stratum’s size compared with
the population. These subsets of the strata are then pooled to form a random
sample.
Now assume that the team looks at the different attributes of the
sample participants and wonders if there are any differences in
GPAs and students’ majors. Suppose it finds that 560 students are
English majors, 1,135 are science majors, 800 are computer science
majors, 1,090 are engineering majors, and 415 are math majors. The
team wants to use a proportional stratified random sample where the
stratum of the sample is proportional to the random sample in the
population.
The team then needs to confirm that the stratum of the population is
in proportion to the stratum in the sample; however, they find the
proportions are not equal. The team then needs to resample 4,000
students from the population and randomly select 480 English, 1,120
science, 960 computer science, 840 engineering, and 600
mathematics students.