Statistics Introduction
Statistics Introduction
On the one hand, student researchers frequently fear the use quantitative methods for their
research design due to their lack of familiarity with statistics. There is a perception that one must know a
lot of advanced math and be comfortable with it in order to use statistics. Yet, even qualitative
researchers use statistical concepts in their research designs and reports. On the other hand, some new
researchers make what appear to be solid conclusions based on statistical evidence without
understanding the assumptions for or weakness of the statistical test or how statistics should be properly
used. The result is questionable recommendations based on flimsy evidence. The well-known phrase
attributed to Mark Twain of “Figures don’t lie, but liars figure” describes this latter group that appear to
manipulate statistical analysis to support their conclusions to an undiscerning audience.
Descriptive Statistics
We use descriptive statistics across many fields and activities – including in qualitative research.
Descriptive statistics describe a sample’s characteristics without any analysis. They provide the reader
information about the sample and often to describe the extent to which the sample represents a
population. A variable is a trait that varies from case to case in its meaning (Steinberg, 4). Independent
variables (IV) are those factors that are viewed as the cause or reason for a phenomenon and will
therefore be manipulated or controlled. Dependent variables (DV) are those factors affected by one or
more independent variables. Confounding variables (XV) are factors associated with the independent
variable in such a way that they cannot be completely isolated and therefore may influence the
dependent variables to some degree. Identifying and then limiting or eliminating confounding variables
is therefore a goal within quantitative study designs (Steinberg, 152). We could consider height to be an
IV and success in basketball a DV because we assume that the amount of height influences the success
of a basketball player. An XV could be nutrition as an adolescent that might influence individual growth
because better nutrition might make a person taller than another with equal basketball prowess but
nutrition cannot be separated from individual height.
Univariate (single variable) statistics analyze the frequency (f) a value occurs for a variable within
a data set, typically with a raw number (n) or a percentage of the value’s occurrence among the total
number of values within the data set (N) (Huck, 3). Graphs and tables augment the words used to
describe the analysis and can sometimes better convey patterns within the data, specifically the
distributional shape of the data. While the frequency of individual values is the typical method for
describing the data, some reports group values to reduce the size of the presentation of the data and its
analysis. Care should be taken in not creating groups that unfairly portray the data. For example,
grouping ages in uneven bands can create the impression of a uniform or predicted distribution when
the graphing of individual ages would not.
In many phenomena, the values of variables when plotted to a graph form a pattern of
distribution. A normal distribution is when the data scores “are clustered near the middle of the
continuum of observed scores, and there is a gradual and symmetrical decrease in frequency in both
directions away from the middle area of scores” (Huck, 24). This is commonly called a bell curve because
the parametric shape of the graphed data approximates that of a bell. The measures of central tendency
that help describe a distribution curve mathematically include the mean, mode, median, standard
deviation, variance, and range of the data set. The first three describe the shape of the curve. The mean
(M, X̅, or μ) is the sum of all scores for a variable divided by the number of scores. It indicates the
mathematical value that is in the center of the data set. The mode is the most frequent score within a
data set – what would be the peak of a bar chart or curve of all values arrayed from lowest to highest
scores on an axis. The median (Mo or Mdn) is the centermost score in a data set. The mean can be (but
does not have to be) a score in the data set but the median is always a score. A normal distribution
occurs when the mean, mode, and median are equal (Huck, 29). A curve is said to be skewed – meaning
its shape has a longer tail to one end than the other – when the mean and the median are not equal.
Positively skewed curves are indicated by the median < mean resulting in a tail to the positive end of
measurement for the scores while a negatively skewed curve is indicated by the mean < median (Huck,
30).
The standard deviation (SD, s or σ) and variance (s2 or σ2) help describe the dispersion of data
from the central point. The standard deviation is an expression of the mean of the differences for all of
the points in a data set from the mean score; specifically, it is the square root of the mean of the squared
deviations from the mean for the data. The variance is the square of the standard deviation. The range is
the expression the difference between the highest and lowest values of a variable. These three values
help us understand how well dispersed the scores are around the mean value of a data set and
potentially predict the chance for a future score to be within that curve. The range describes the width
of the curve’s base while the standard deviation and variance describe how peaked or flat the curve is to
either side of the mean score. The standard deviation is frequently used to describe data because in a
normal distribution curve, 68% of scores fall within ± 1 SD, 95% of scores fall within ± 2 SD, and 99% of
scores fall within ± 3 SD (Steinberg, 91). This will be important later as we compare one variable’s data
set to another variable’s data set. The range is simply the lowest and highest values in a data set
expressed to show the ends of the data set (e.g. a range of 0-100 points). Including the range in the
description of data can help identify whether there are values well outside the expected indicating an
outlier – a value that could be incorrect or a mistake – and should therefore be excluded from analysis.
In the 2024 Paris Summer Olympics, 21 female swimmers (n=21) competed in the 400-meter
freestyle across three heats. The 8 fastest swimmers across the three heats later competed in the final
race. There are 8 lanes in the pool. With 21 swimmers, one heat only had 5 swimmers while the other
two heats had 8 each. Several questions could come to mind. These are supposed to be the world’s
fastest women swimmers in the event, so would the scores result in a skewed distribution or a normal
distribution? Was one lane “faster” than the others? Was there a benefit of racing in a small group of 5
compared to a full pool of 8?
Across all three heats, the mean time was 4:10.299 and the median time was 4:08.02. The range
of finish times had a range of 4:02.19 to 4:37.46. The standard deviation was 10.375 seconds. Since the
median time was faster than the mean time, we would expect a graph of the times to be skewed
positively – there are more slow times than there are fast times on either side of the median –
suggesting that there were more world class swimmers rather than a collection of swimmers who
showed up to race one day. When we compare the SD to the mean, we would expect to see a range of
times to be from 3:49.55 to 4:31.05 (the mean plus or minus 2 SDs where 95% of the results would fall),
the fastest time was less than 1 SD from the mean while the slowest time was above that range by over 6
seconds. So we can conclude that the competitors included some of the world’s fastest female swimmers
but also included some of the best from individual nations that were not in the same class as those that
progressed to the final meet. To answer the second question, we would need to analyze times against
more than one variable – the 8 lanes – so simple descriptive statistics will not help. For the last question,
we could compare the heats statistically or we could look at a simple chart and note than none of the
swimmers in the small heat of 5 qualified for the final event so an initial assumption from that data
would suggest that swimming in a smaller heat is a disadvantage – but we need more data and more
swimmers racing in small heats and in large heats to see if there is a pattern, which leads to us
considering inferential statistics.
Inferential Statistics
Simple random sampling is a technique that selects the member of a population to participate
based on a neutral factor like first letter of first name using a random number generator or table to
determine which individuals on the list to select. Systematic random sampling selects every n th individual
on a list to participate (e.g. the researcher selects every 4th name on a list of the population to obtain a
sample of 25% of the population). Both (if done properly) should yield a representative sample because
we assume that there is a normal distribution for any group of participants as a sample as there is for the
population. Care should be taken to ensure the list is created in a manner that does not create a biased
sample, though. Stratified random sampling occurs when the population is first sorted into categories
before sampling. A study that compares the attitudes of male Army officers on a phenomenon to those
of female Army officers must first sort a roster of the population by gender before selecting within each
group the appropriate sample for analysis (Steinberg, 148-50).
Random sampling assumes most of those sampled will respond or participate in the study. In
many studies, that is not the case. What is left is not a true sample of the target population unless the
researcher can establish that the sample’s relevant characteristics are similar to that of the target
population. Convenience sampling also often occurs in even quantitative studies. Sampling students in
the Command and General Staff College is convenient in that they are located generally in two to five
locations. Even with properly conducting statistical analysis of data from such a sample, the outcomes
cannot be generalized to the population because of the uniqueness of the students and their
experiences within the Command and General Staff College. For example, students selected to attend
the resident courses are considered to be at the top of their branch or service cohorts for performance
and potential making statistical analysis informative but erroneous if generalizing the analysis to non-
resident students as well.
Size of the sample is also important. In the swimming example above, looking at one swimmer
per lane for a maximum of two swims (one in the preliminary heat and one in the final) is logically
insufficient to make an inference. Optimally, we would want to see a swimmer in each of the lanes at
least once if not more and see them swim against individuals of both similar and dissimilar times to be
able to make a deduction about whether one lane is faster than another. Likewise, different statistical
tests will require a minimum number of samples to make a generalization to the population. The central
limit theorem informs us that as more samples are collected within a population, the distribution of
scores will more accurately reflect a normal distribution curve even when the population is not normally
distributed (Steinberg, 175). While a normal distribution might be evident with only a few samples, it will
be more obvious with more samples from the population.
There will also be a corresponding null hypothesis (symbolized as H0) for every hypothesis. The
null hypothesis has its foundation in a postpositivist perspective on knowing: while there may be an
absolute form of reality, we are limited by our senses to never knowing whether we have found that
reality or truth yet we can rule out what is not reality or truth. The null hypothesis is a statement that
there is no relationship between an IV and a DV. The object of quantitative research is to determine
whether the null hypothesis remains true or can be rejected. When a statistical test indicates a
relationship, we can rule out the null hypothesis and therefore provide indirect evidence in support of
the hypothesis. We do not state authoritatively that we have proved the hypothesis as true within a
postpositivist framework.
A common phrase in statistics and those who like to use them to prove a point is that a result
was “statistically significant.” While a statistical test might produce a value that suggests a relationship or
that there is not a relationship that can be inferred to the population, a researcher should ask whether
the probability of the outcome of the test is true (probability of error p) and if the effect of the
significance is meaningful (effect size d or η2). The probability of error is a value normally expressed as
p<x where x is a number indicating the chance that the outcome was due to chance and not a normal
distribution of the data. For studies in the social sciences and education, p<0.05 is normal indicating that
the researcher desired to limit the potential for chance to result in a finding to 5% or put another way
that the outcomes of the test are 95% likely to be true. In technical fields, the threshold for probability of
error might be p<0.01. Notice that the researcher decides in advance the level of acceptable error to
reject a hypothesis before testing. Similarly, the researcher should decide in advance to degree of effect
desired to accept a hypothesis by establishing on a range of small to large how much relative change is
acceptable to state that a test is meaningful in the problem studied. For example, a statistical
comparison of fitness regimens to physical performance on a physical fitness test might be within the
desired p<0.05, but if the effect of change is small then the researcher may not be willing to recommend
a service adopt the better fitness regimen since the potential increase in performance is not worth the
added investment or changes to existing training programs. Statistical tests will calculate probability of
error and effect size for the researcher; the key is for those researchers to make decisions early on what
the desired thresholds should be in order to make sound interpretations and recommendations based
on the statistical results.
Finally, quantitative researchers must be sure that the source for data is reliable and valid.
Reliability is a statement about the consistency in an instrument or observer in providing accurate data
across repeated collections. The researcher may look at reliability in one of three ways: does the
instrument consistently provide similar data from repeated applications, do the elements of an
instrument (such as a survey) consistently provide evidence to measure a variable, or do observers or
instruments consistently measure the same phenomenon across multiple occurrences? Many studies
will provide a reliability coefficient such as Cronbach’s alpha, Pearson’s r, or Cohen’s kappa to describe
the reliability of a data collection instrument or measuring device. Validity is an expression of the
accuracy of the data. Content validity is an assessment of how well an instrument collects relevant data
related to the variable being measured. Construct validity is an assessment of how well an instrument
discriminates in the data collected, i.e. how well individual elements of the instrument measure a
variable – and only one variable – under study. While there are numerous ways to test and report
validity, many studies will rely on a face validity statement that attempts to demonstrate the validity of
the instrument to observers through logic and reason (Huck, 68-86).
So, which test do you use for your study and how do you interpret the results? One key resource
is a decision tree for which test to use based on your variables. These products begin with identifying
whether the variables are continuous or categorical. When the independent and dependent variables
are both continuous, then the researcher is looking for a degree or amount of change of relationship
between the independent and dependent variables. Correlation and regression tests help determine the
degree with which the dependent variable consistently changes with the independent variable. When
the independent variable(s) is categorical and the dependent variable(s) are continuous (e.g. race and
gender of students as independent variables and test scores as dependent variables), then the statistical
test is for the degree of differences between the variables. Possible tests are t-test, ANOVA, and
MANOVA depending on the number of independent and dependent variables. When both independent
and dependent variables are categorical, statistical tests aim to determine the degree of variation for any
one variable from the expected values. Examples include χ2 (chi squared) and Goodness of Fit tests. Once
the correct tests are selected, then it is important to research through a reliable statistics test how these
tests are done and how to report the results. Software can run the numbers, but the researcher will have
to explain why the test was the best choice in chapter 3 Methodology and the results of the test in
chapter 4 Analysis consistent with what quantitative researchers expect to see.
Quantitative designs do reduce down to some simple steps. First determine the independent
variable that you will change (or in post hoc studies will see changed in the data) and the dependent
variables influenced by the independent variable. Next, decide how the variables will be measured
understanding that certain types of variables (continuous and categorical) must lead to specific types of
relationships in the analysis. Third, develop your hypotheses on what you believe will occur in each
relationship, a null hypothesis that is the negative of your expected relationship for each hypothesis, and
an alternate hypothesis that is the opposite of the null hypothesis. Fourth, choose the appropriate test
and your threshold for both probability of error and effect size. Fifth, analyze your collected data using
that test. Finally, report the results. Know the terms and relationships for your design and analysis to
communicate in writing in the thesis as well as to orally describe in your thesis defense.
References
Huck, Schuyler W. (2012). Reading Statistics and Research 6th ed. Boston, MA: Pearson Education.
Steinberg, Wendy J. (2011). Statistics Alive! 2nd ed. Los Angeles, CA: SAGE Publications.
Fear of Statistics: Many student researchers are intimidated by quantitative methods due to
unfamiliarity with statistics and advanced math.
Statistical Literacy: Even qualitative researchers use statistical concepts; understanding statistics
is crucial for all researchers.
Misuse of Statistics: Some new researchers draw conclusions from statistical evidence without
understanding the underlying assumptions, leading to questionable recommendations.
Key Questions: They address "what is the relationship?" and "how strong is the relationship?".
Definition: Descriptive statistics summarize and describe sample characteristics without analysis.
Variables:
o Confounding Variables (XV): Uncontrolled variables that may affect the DV.
Univariate Analysis: Examines frequency and distribution of single variables, often presented
through graphs and tables.
Types of Variables
Quantitative Variables: Continuous (e.g., time) and discrete (e.g., count of medals).
Central Tendency Measures: Mean, mode, and median describe the data's center.
Dispersion Measures: Standard deviation, variance, and range help understand data spread and
identify outliers.
Inferential Statistics
Sampling Techniques:
Hypothesis Testing: Involves null and alternative hypotheses to explore variable relationships.
Statistical Significance
Meaning: A result is considered statistically significant if the probability of it being due to chance
is low (e.g., p < 0.05).
Effect Size: Measures the magnitude of the effect to determine practical significance.
Reliability and Validity
3. Develop hypotheses.
Conclusion
4o mini