Module 006 - Logic of Inferential Statistics
Module 006 - Logic of Inferential Statistics
1
Logic of Inferential Statistics
Inferential Statistics
If you conduct a survey, the goal is to apply the conclusions to a more general
population, assuming the sample size is large enough and the sample representative
enough of the broader public. This is important because studies and experiments
need to state and conclude something about general populations and not just about
the sample that was studied.
For example, suppose there is a training program that claims to improve test scores
and an experimenter wants to verify the claims. She starts with two groups, one
taking the training program and the other not (the control). She measures the test
scores in the beginning and at the end, making sure that the starting test scores are,
on an average, the same for both test groups. The researcher finds that the test scores
for those who take the training are indeed higher now, and this difference
is statistically significant. She rejects her null hypothesis. Merely stating the results
for the two groups in terms of average score difference and representing this in the
form of graphs is descriptive statistics. But if she concludes that the training program
is effective in improving test scores (in general, and for all people), then this
is inferential statistics.
It's tempting to assume that descriptive statistics alone signals the end of an
experiment, or to fail to draw a distinction between the results of descriptive
statistical tests and your analysis. Statistics are powerful tools, but it's the analysis
provided afterwards by inferential statistics that explicitly makes claims about what
those results mean, why, and in what context. Remember that inference involves
moving focus from smaller and more specific to larger and more general.
It should be noted that inferential statistics always talks in terms of probability, but
this can be made highly reliable by designing the right experimental conditions. The
Course Module
inferences are almost always an estimate with a confidence interval. There are
however some cases where there is simply a rejection of a hypothesis that is involved,
which is the case if the experiment is designed to refute some claim.
Several models are available in inferential statistics that help in the process of
analysis. These models need to be chosen with care, since an error in assuming one
model might give faulty conclusions about the experiment.
Sampling is done usually because it is impossible to test every single individual in the
population. It is also done to save time, money and effort while conducting the
research.
Still, every researcher must keep in mind that the ideal scenario is to test all the
individuals to obtain reliable, valid and accurate results. If testing all the individuals
is impossible, that is the only time we rely on sampling techniques.
Performing population sampling must be conducted correctly since errors can lead
to inaccurate and misleading data.
Types of Sampling
There are several different sampling techniques available, and they can be subdivided
into two groups: probability sampling and non-probability sampling. In probability
(random) sampling, you start with a complete sampling frame of all eligible
individuals from which you select your sample. In this way, all eligible individuals
have a chance of being chosen for the sample, and you will be more able to generalise
the results from your study. Probability sampling methods tend to be more time-
consuming and expensive than non-probability sampling. In non-probability (non-
.random) sampling, you do not start with a complete sampling frame, so some
individuals have no chance of being selected. Consequently, you cannot estimate the
effect of sampling error and there is a significant risk of ending up with a non-
representative sample which produces non-generalisable results. However, non-
probability sampling methods tend to be cheaper and more convenient, and they are
useful for exploratory research and hypothesis generation.
Non-Probability Sampling
In this type of population sampling, members of the population do not have equal
chance of being selected. Due to this, it is not safe to assume that the sample fully
represents the target population. It is also possible that the researcher deliberately
chose the individuals that will participate in the study.
1. Convenience sampling
2. Quota sampling
This method of sampling is often used by market researchers. Interviewers are given
a quota of subjects of a specified type to attempt to recruit. For example, an
interviewer might be told to go out and select 20 adult men, 20 adult women, 10
teenage girls and 10 teenage boys so that they could interview them about their
television viewing. Ideally the quotas chosen would proportionally represent the
characteristics of the underlying population.
Course Module
Whilst this has the advantage of being relatively straightforward and potentially
representative, the chosen sample may not be representative of other characteristics
that weren’t considered (a consequence of the non-random nature of sampling). 2
4. Snowball sampling
Probability Sampling
In probability sampling, every individual in the population have equal chance of being
selected as a subject for the research. This method guarantees that the selection
process is completely randomized and without bias. The most basic example of
probability sampling is listing all the names of the individuals in the population in
separate pieces of paper, and then drawing a number of papers one by one from the
complete collection of names.
In this case each individual is chosen entirely by chance and each member of the
population has an equal chance, or probability, of being selected. One way of
obtaining a random sample is to give each individual in a population a number, and
then use a table of random numbers to decide which individuals to include.1 For
example, if you have a sampling frame of 1000 individuals, labelled 0 to 999, use
groups of three digits from the random number table to pick your sample. So, if the
first three numbers from the random number table were 094, select the individual
labelled “94”, and so on.
As with all probability sampling methods, simple random sampling allows the
sampling error to be calculated and reduces selection bias. A specific advantage is that
it is the most straightforward method of probability sampling. A disadvantage of
simple random sampling is that you may not select enough individuals with your
characteristic of interest, especially if that characteristic is uncommon. It may also be
difficult to define a complete sampling frame and inconvenient to contact them,
especially if different forms of contact are required (email, phone, post) and your
sample units are scattered over a wide geographical area.
2. Systematic sampling
Individuals are selected at regular intervals from the sampling frame. The intervals
are chosen to ensure an adequate sample size. If you need a sample size n from a
population of size x, you should select every x/nth individual for the sample. For
example, if you wanted a sample size of 100 from a population of 1000, select every
1000/100 = 10th member of the sampling frame.
Systematic sampling is often more convenient than simple random sampling, and it is
easy to administer. However, it may also lead to bias, for example if there are
underlying patterns in the order of the individuals in the sampling frame, such that
the sampling technique coincides with the periodicity of the underlying pattern. As a
hypothetical example, if a group of students were being sampled to gain their
opinions on college facilities, but the Student Record Department’s central list of all
students was arranged such that the sex of students alternated between male and
female, choosing an even interval (e.g. every 20thstudent) would result in a sample
of all males or all females. Whilst in this example the bias is obvious and should be
easily corrected, this may not always be the case.
3. Stratified sampling
In this method, the population is first divided into subgroups (or strata) who all share
a similar characteristic. It is used when we might reasonably expect the measurement
of interest to vary between the different subgroups, and we want to ensure
representation from all the subgroups. For example, in a study of stroke outcomes,
we may stratify the population by sex, to ensure equal representation of men and
women. The study sample is then obtained by taking equal sample sizes from each
stratum. In stratified sampling, it may also be appropriate to choose non-equal
Course Module
sample sizes from each stratum. For example, in a study of the health outcomes of
nursing staff in a county, if there are three hospitals each with different numbers of
nursing staff (hospital A has 500 nurses, hospital B has 1000 and hospital C has 2000),
then it would be appropriate to choose the sample numbers from each
hospital proportionally (e.g. 10 from hospital A, 20 from hospital B and 40 from
hospital C). This ensures a more realistic and accurate estimation of the health
outcomes of nurses across the county, whereas simple random sampling would over-
represent nurses from hospitals A and B. The fact that the sample was stratified
should be taken into account at the analysis stage.
Stratified sampling improves the accuracy and representativeness of the results by
reducing sampling bias. However, it requires knowledge of the appropriate
characteristics of the sampling frame (the details of which are not always available),
and it can be difficult to decide which characteristic(s) to stratify by.
4. Clustered sampling
In a clustered sample, subgroups of the population are used as the sampling unit,
rather than individuals. The population is divided into subgroups, known as clusters,
which are randomly selected to be included in the study. Clusters are usually already
defined, for example individual GP practices or towns could be identified as clusters.
In single-stage cluster sampling, all members of the chosen clusters are then included
in the study. In two-stage cluster sampling, a selection of individuals from each cluster
is then randomly selected for inclusion. Clustering should be taken into account in the
analysis. The General Household survey, which is undertaken annually in England, is
a good example of a (one-stage) cluster sample. All members of the selected
households (clusters) are included in the survey.
Cluster sampling can be more efficient that simple random sampling, especially
where a study takes place over a wide geographical region. For instance, it is easier
to contact lots of individuals in a few GP practices than a few individuals in many
different GP practices. Disadvantages include an increased risk of bias, if the chosen
clusters are not representative of the population, resulting in an increased sampling
error.
1 2
𝑌= ∗ 𝑒 − (𝑥 − 𝜇)
𝜎 ∗ √2𝜋 2𝜎2
where
X is a normal random variable,
μ is the mean,
σ is the standard deviation,
π is approximately 3.14159, and
e is approximately 2.71828.
The random variable X in the normal equation is called the normal random variable.
The normal equation is the probability density function for the normal distribution.
The graph of the normal distribution depends on two factors - the mean and the
standard deviation. The mean of the distribution determines the location of the center
of the graph, and the standard deviation determines the height and width of the
graph. All normal distributions look like a symmetric, bell-shaped curve, as shown
below.
When the standard deviation is small, the curve is tall and narrow; and when the
standard deviation is big, the curve is short and wide (see above)
The probability that a normal random variable X equals any particular value is 0.
Course Module
The probability that X is greater than a equals the area under the normal curve
bounded by a and plus infinity (as indicated by the non-shaded area in the figure
below).
The probability that X is less than a equals the area under the normal curve bounded
by a and minus infinity (as indicated by the shaded area in the figure below).
About 68% of the area under the curve falls within 1 standard deviation of the
mean.
About 95% of the area under the curve falls within 2 standard deviations of
the mean.
About 99.7% of the area under the curve falls within 3 standard deviations of
the mean.
Collectively, these points are known as the empirical rule or the 68-95-99.7 rule.
Clearly, given a normal distribution, most outcomes will be within 3 standard
deviations of the mean.
Hypothesis Testing
The second type of inference method - confidence intervals was the first, is hypothesis
testing. A hypothesis, in statistics, is a statement about a population where this
statement typically is represented by some specific numerical value. In testing a
hypothesis, we use a method where we gather data in an effort to gather evidence
about the hypothesis. In hypothesis testing there are certain steps one must
follow. Below these are summarized into six such steps to conducting a test of a
hypothesis.
2. Set some level of significance called alpha. This value is used as a probability
cutoff for making decisions about the null hypothesis. As we will learn later, this
alpha value represents the probability we are willing to place on our test for
making an incorrect decision in regards to rejecting the null hypothesis. The most
Quantitative Methods
9
Logic of Inferential Statistics
common alpha value is 0.05 or 5%. Other popular choices are 0.01 (1%) and 0.1
(10%).
3. Calculate a test statistic. Gather sample data and calculate a test statistic where
the sample statistic is compared to the parameter value. The test statistic is
calculated under the assumption the null hypothesis is true, and incorporates a
measure of standard error and assumptions (conditions) related to the sampling
distribution. Such assumptions could be normality of data, independence, and
number of success and failure outcomes.
5. Make a test decision about the null hypothesis - In this step we decide to either
reject the null hypothesis or decide to fail to reject the null hypothesis. Notice we
do not make a decision where we will accept the null hypothesis.
6. State an overall conclusion - Once we have found the p-value or rejection region,
and made a statistical decision about the null hypothesis (i.e. we will reject the
null or fail to reject the null). Following this decision, we want to summarize our
results into an overall conclusion for our test.
We will continue our discussion by considering two specific hypothesis tests: a test
of one proportion, and a test of one mean. We will provide the general set up of the
hypothesis and the test statistics for both tests. From there, we will branch off into
specific discussions on each of these tests.
In order to make judgment about the value of a parameter, the problem can be set up
as a hypothesis testing problem.
We usually set the hypothesis that one wants to conclude as the alternative
hypothesis, also called the research hypothesis.
Course Module
2. The population parameter is less than a certain value. Referred to as a "left-tailed
test"
For all three alternatives, the null hypothesis is the population parameter is
equal to that certain value.
Since hypothesis tests are about a parameter value, the hypotheses use parameter
notation - p for proportion or μ for mean - in their arrangement. For tests of a
proportion or a test of a mean, we would choose the appropriate alternative based on
our research question. Below are the possible alternative hypothesis from which we
would select only one of them based on the research question. The
symbols p0 and μ0 are just used in these general statements. In practice, these get
replaced by the parameter value being tested. The examples following will illustrate.
Ha:p>p0, or Ha:μ>μ0
H0:p=p0, or H0:μ=μ0
When debating the State Appropriation for Penn State, the following question is
asked: "Are the majority of students at Penn State from Pennsylvania?" To answer
this question, we can set it up as a hypothesis testing problem and use data collected
to answer it. This example is about a population proportion and thus we set up the
hypotheses in terms of p. Here the value p0 is 0.5 since more than 0.5 constitute a
majority. The hypthoses set up would be a right-tailed test:
A consumer test agency wants to see the whether the mean lifetime of a brand of tires
is less than 42,000 miles as the tire manufacturer advertises that the average lifetime
is at least 42,000 miles. In this example, we are discussing a mean and therefore set
up the hypotheses in terms of μ. Here the value of μ0 is 42,000. With the consumer
Quantitative Methods
11
Logic of Inferential Statistics
test agency wanting to research that the mean lifetime is below 42,000, we would set
up the hypotheses as a left-tailed test:
The length of a certain lumber from a national home building store is supposed to be
8.5 feet. A builder wants to check whether the shipment of lumber she receives has a
mean length different from 8.5 feet. In this example, we are discussing a mean and
therefore set up the hypotheses in terms of μ. Here the value of μ0 is 8.5. With the
builder wanting to check if the mean length is different from 8.5, she would set up the
hypotheses as a two-tailed test:
A political news company believes the national approval rating for the current
president has fallen below 40%. In this example, we are discussing a proportion and
therefore will set up the hypothesis in terms of p. Here is the p0 value is 0.4 and the
hypotheses would be set up as a left-tailed test:
H0:p=0.4 vs. Ha:p<0.4
If the conditions necessary to conduct the hypothesis test are satistified, then we can
use the formulas below to calculate the appropriate test statistic from our sample
data. These assumptions and test statistics are as follows:
Test of One Proportion: the conditions are that np0 and n(1−p0) are at least 5. If so,
then the one proportion test statistic is:
𝑝̂ − 𝑝0
𝑍∗ =
𝑝0 (1−𝑝0 )
√
𝑛
Test of One Mean: the condition is that the data satisfies the conditions similar to
those used for constructing a t-confidence interval for the mean. Those were either
the data comes from an approximately normal distribution, or the sample size is large
enough (at least 30), or a small sample size (less than 30) the data is not skewed or
has outliers. If any of these conditions are satisfied, the we can calculate the following
test statistic:
𝑥̅ − 𝜇𝜎
𝑡= 𝑆
√𝑛
NOTE - do not get too hung up on symbols. We just want to use a notation that helps
to remind us that these values are a test statistic.
Course Module
The Logic of Hypothesis Testing
If the sample data are consistent with the null hypothesis, then we do not reject it.
If the sample data are inconsistent with the null hypothesis, but consistent with the
alternative, then we reject the null hypothesis and conclude that the alternative
hypothesis is true.
Levels of Significance
So, you might get a p-value such as 0.03 (i.e., p = .03). This means that there is a 3%
chance of finding a difference as large as (or larger than) the one in your study given
that the null hypothesis is true. However, you want to know whether this is
"statistically significant". Typically, if there was a 5% or less chance (5 times in 100
or less) that the difference in the mean exam performance between the two teaching
methods (or whatever statistic you are using) is as different as observed given the
null hypothesis is true, you would reject the null hypothesis and accept the alternative
hypothesis. Alternately, if the chance was greater than 5% (5 times in 100 or more),
you would fail to reject the null hypothesis and would not accept the alternative
hypothesis. As such, in this example where p = .03, we would reject the null
hypothesis and accept the alternative hypothesis. We reject it because at a
significance level of 0.03 (i.e., less than a 5% chance), the result we obtained could
happen too frequently for us to be confident that it was the two teaching methods that
had an effect on exam performance.
Whilst there is relatively little justification why a significance level of 0.05 is used
rather than 0.01 or 0.10, for example, it is widely used in academic research. However,
if you want to be particularly confident in your results, you can set a more stringent
level of 0.01 (a 1% chance or less; 1 in 100 chance or less).
Quantitative Methods
13
Logic of Inferential Statistics
2. Wolfgang Karl Härdle, Sigbert Klinke, Bernd Rönz; 2015; Introduction to Statistics:
Using Interactive MM*Stat Elements; New York; Springer
Course Module