0% found this document useful (0 votes)
80 views

P-Values Notes

The document discusses hypothesis testing, explaining significance levels, p-values, and how they relate to determining if sample data supports rejecting the null hypothesis. It uses graphs to intuitively illustrate key concepts like where the critical region is drawn on a distribution based on the significance level. The document also discusses how confidence intervals are closely related and how their confidence level similarly indicates how often they would contain the unknown population parameter value if constructed repeatedly.

Uploaded by

ScarfaceXXX
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
80 views

P-Values Notes

The document discusses hypothesis testing, explaining significance levels, p-values, and how they relate to determining if sample data supports rejecting the null hypothesis. It uses graphs to intuitively illustrate key concepts like where the critical region is drawn on a distribution based on the significance level. The document also discusses how confidence intervals are closely related and how their confidence level similarly indicates how often they would contain the unknown population parameter value if constructed repeatedly.

Uploaded by

ScarfaceXXX
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 15

https://blog.minitab.

com/en/adventures-in-statistics-2/understanding-hypothesis-tests-significance-
levels-alpha-and-p-values-in-statistics

Understanding Hypothesis Tests:


Significance Levels (Alpha) and P values in
Statistics
Minitab Blog Editor | 19 March, 2015

Topics: Hypothesis Testing, Statistics

What do significance levels and P values mean in hypothesis tests? What is statistical
significance anyway? In this post, I’ll continue to focus on concepts and graphs to help you
gain a more intuitive understanding of how hypothesis tests work in statistics.

To bring it to life, I’ll add the significance level and P value to the graph in my previous post
in order to perform a graphical version of the 1 sample t-test. It’s easier to understand when
you can see what statistical significance truly means!

Here’s where we left off in my last post. We want to determine whether our sample mean
(330.6) indicates that this year's average energy cost is significantly different from last year’s
average energy cost of $260.
The probability distribution plot above shows the distribution of sample means we’d obtain
under the assumption that the null hypothesis is true (population mean = 260) and we
repeatedly drew a large number of random samples.

I left you with a question: where do we draw the line for statistical significance on the graph?
Now we'll add in the significance level and the P value, which are the decision-making
tools we'll need.

We'll use these tools to test the following hypotheses:

 Null hypothesis: The population mean equals the hypothesized mean (260).
 Alternative hypothesis: The population mean differs from the hypothesized mean
(260).

What Is the Significance Level (Alpha)?


The significance level, also denoted as alpha or α, is the probability of rejecting the null
hypothesis when it is true. For example, a significance level of 0.05 indicates a 5% risk of
concluding that a difference exists (rejecting the null hypothesis) when there is no actual
difference (null hypothesis).

These types of definitions can be hard to understand because of their technical nature. A
picture makes the concepts much easier to comprehend!

The significance level determines how far out from the null hypothesis value we'll draw that
line on the graph. To graph a significance level of 0.05, we need to shade the 5% of the
distribution that is furthest away from the null hypothesis.
In the graph above, the two shaded areas are equidistant from the null hypothesis value and
each area has a probability of 0.025, for a total of 0.05. In statistics, we call these shaded
areas the critical region for a two-tailed test. If the population mean is 260, we’d expect to
obtain a sample mean that falls in the critical region 5% of the time. The critical region
defines how far away our sample statistic must be from the null hypothesis value before we
can say it is unusual enough to reject the null hypothesis.

Our sample mean (330.6) falls within the critical region, which indicates it is statistically
significant (H0 rejected) at the 0.05 level.

We can also see if it is statistically significant using the other common significance level of
0.01.
The two shaded areas each have a probability of 0.005, which adds up to a total probability of
0.01. This time our sample mean does not fall within the critical region and we fail to reject
the null hypothesis. This comparison shows why you need to choose your significance level
before you begin your study. It protects you from choosing a significance level because it
conveniently gives you significant results!

Thanks to the graph, we were able to determine that our results are statistically significant (H0
rejected) at the 0.05 level without using a P value. However, when you use the numeric
output produced by statistical software, you’ll need to compare the P value to your
significance level to make this determination.

Ready for a demo of Minitab Statistical Software? Just ask! 

 
What Are P values?
P-values are the probability of obtaining an effect at least as extreme as the one in your
sample data, assuming the truth of the null hypothesis. (i.e. not statistically significant?)

This definition of P values, while technically correct, is a bit convoluted. It’s easier to
understamnd with a graph!

To graph the P value for our example data set, we need to determine the distance between the
sample mean and the null hypothesis value (330.6 - 260 = 70.6). Next, we can graph the
probability of obtaining a sample mean that is at least as extreme in both tails of the
distribution (260 +/- 70.6).
In the graph above, the two shaded areas each have a probability of 0.01556, for a total
probability 0.03112. This probability represents the likelihood of obtaining a sample mean
that is at least as extreme as our sample mean in both tails of the distribution if the population
mean is 260. That’s our P value!

When a P value is less than or equal to the significance level, you reject the null hypothesis.
If we take the P value for our example and compare it to the common significance levels, it
matches the previous graphical results. The P value of 0.03112 is statistically significant at an
alpha level of 0.05, but not at the 0.01 level.

If we stick to a significance level of 0.05, we can conclude that the average energy cost for
the population is greater than 260.

A common mistake is to interpret the P-value as the probability that the null hypothesis is
true. To understand why this interpretation is incorrect, please read my blog post How to
Correctly Interpret P Values.

Discussion about Statistically Significant Results


A hypothesis test evaluates two mutually exclusive statements about a population to
determine which statement is best supported by the sample data. A test result is statistically
significant when the sample statistic is unusual enough relative to the null hypothesis that we
can reject the null hypothesis for the entire population. “Unusual enough” in a hypothesis test
is defined by:

 The assumption that the null hypothesis is true—the graphs are centered on the null
hypothesis value.
 The significance level—how far out do we draw the line for the critical region?
 Our sample statistic—does it fall in the critical region?

Keep in mind that there is no magic significance level that distinguishes between the
studies that have a true effect and those that don’t with 100% accuracy. The common
alpha values of 0.05 and 0.01 are simply based on tradition. For a significance level of
0.05, expect to obtain sample means in the critical region 5% of the time when the null
hypothesis is true. In these cases, you won’t know that the null hypothesis is true but
you’ll reject it because the sample mean falls in the critical region. That’s why the
significance level is also referred to as an error rate!

This type of error doesn’t imply that the experimenter did anything wrong or require any
other unusual explanation. The graphs show that when the null hypothesis is true, it is
possible to obtain these unusual sample means for no reason other than random
sampling error. It’s just luck of the draw.

Significance levels and P values are important tools that help you quantify and control this
type of error in a hypothesis test. Using these tools to decide when to reject the null
hypothesis increases your chance of making the correct decision.

If you like this post, you might want to read the other posts in this series that use the same
graphical framework:

 Previous: Why We Need to Use Hypothesis Tests


 Next: Confidence Intervals and Confidence Levels

If you'd like to see how I made these graphs, please read: How to Create a Graphical Version
of the 1-sample t-Test.

https://blog.minitab.com/en/adventures-in-statistics-2/understanding-hypothesis-tests-confidence-
intervals-and-confidence-levels

Understanding Hypothesis Tests:


Confidence Intervals and Confidence
Levels
Minitab Blog Editor | 02 April, 2015

Topics: Hypothesis Testing, Data Analysis, Statistics

In this series of posts, I show how hypothesis tests and confidence intervals work by focusing
on concepts and graphs rather than equations and numbers.  

Previously, I used graphs to show what statistical significance really means. In this post, I’ll
explain both confidence intervals and confidence levels, and how they’re closely related to P
values and significance levels.
How to Correctly Interpret Confidence Intervals and
Confidence Levels
A confidence interval is a range of values that is likely to contain an unknown population
parameter.

If you draw a random sample many times, a certain percentage of the confidence intervals
will contain the population mean (parameter). This percentage is the confidence level.

Most frequently, you’ll use confidence intervals to bound the mean or standard deviation, but
you can also obtain them for regression coefficients, proportions, rates of occurrence
(Poisson), and for the differences between populations.

Just as there is a common misconception of how to interpret P values, there’s a common


misconception of how to interpret confidence intervals. In this case, the confidence level is
not the probability that a specific confidence interval contains the population parameter.

The confidence level represents the theoretical ability of the analysis to produce accurate
intervals if you are able to assess many intervals and you know the value of the population
parameter. For a specific confidence interval from one study, the interval either contains the
population value or it does not—there’s no room for probabilities other than 0 or 1. And you
can't choose between these two possibilities because you don’t know the value of the
population parameter.

"The parameter is an unknown constant and no probability statement concerning its


value may be made." 
—Jerzy Neyman, original developer of confidence intervals.

This will be easier to understand after we discuss the graph below . . .

With this in mind, how do you interpret confidence intervals?

Confidence intervals serve as good estimates of the population parameter because the
procedure tends to produce intervals that contain the parameter. Confidence intervals
are comprised of the point estimate (the most likely value) and a margin of error around
that point estimate. The margin of error indicates the amount of uncertainty that surrounds
the sample estimate of the population parameter.

In this vein, you can use confidence intervals to assess the precision of the sample estimate.
For a specific variable, a narrower confidence interval [90 110] suggests a more precise
estimate of the population parameter than a wider confidence interval [50 150].

Confidence Intervals and the Margin of Error


Let’s move on to see how confidence intervals account for that margin of error. To do this,
we’ll use the same tools that we’ve been using to understand hypothesis tests. I’ll create a
sampling distribution using probability distribution plots, the t-distribution, and the variability
in our data. We'll base our confidence interval on the energy cost data set that we've been
using.

When we looked at significance levels, the graphs displayed a sampling distribution centered
on the null hypothesis value, and the outer 5% of the distribution was shaded. For confidence
intervals, we need to shift the sampling distribution so that it is centered on the sample mean
and shade the middle 95%.

The shaded area shows the range of sample means that you’d obtain 95% of the time using
our sample mean as the point estimate of the population mean. This range [267 394] is our
95% confidence interval.

Using the graph, it’s easier to understand how a specific confidence interval represents the
margin of error, or the amount of uncertainty, around the point estimate. The sample
mean is the most likely value for the population mean given the information that we have.
However, the graph shows it would not be unusual at all for other random samples drawn
from the same population to obtain different sample means within the shaded area. These
other likely sample means all suggest different values for the population mean. Hence, the
interval represents the inherent uncertainty that comes with using sample data.

You can use these graphs to calculate probabilities for specific values. However, notice that
you can’t place the population mean on the graph because that value is unknown.
Consequently, you can’t calculate probabilities for the population mean, just as Neyman said!

Why P Values and Confidence Intervals Always Agree


About Statistical Significance
You can use either P values or confidence intervals to determine whether your results
are statistically significant. If a hypothesis test produces both, these results will agree.

The confidence level is equivalent to 1 – the alpha level. So, if your significance level is 0.05,
the corresponding confidence level is 95%.

 If the P value is less than your significance (alpha) level, the hypothesis test is
statistically significant.
 If the confidence interval does not contain the null hypothesis value, the results are
statistically significant.
 If the P value is less than alpha, the confidence interval will not contain the null
hypothesis value.

For our example, the P value (0.031) is less than the significance level (0.05), which indicates
that our results are statistically significant. Similarly, our 95% confidence interval [267 394]
does not include the null hypothesis mean of 260 and we draw the same conclusion.

To understand why the results always agree, let’s recall how both the significance level and
confidence level work.

 The significance level defines the distance the sample mean must be from the null
hypothesis to be considered statistically significant.
 The confidence level defines the distance for how close the confidence limits are to
sample mean.

Both the significance level and the confidence level define a distance from a limit to a mean.
Guess what? The distances in both cases are exactly the same!

The distance equals the critical t-value * standard error of the mean. For our energy cost
example data, the distance works out to be $63.57.

Imagine this discussion between the null hypothesis mean and the sample mean:

Null hypothesis mean, hypothesis test representative: Hey buddy! I’ve found that you’re
statistically significant because you’re more than $63.57 away from me!

Sample mean, confidence interval representative: Actually, I’m significant because you’re
more than $63.57 away from me!

Very agreeable aren’t they? And, they always will agree as long as you compare the correct
pairs of P values and confidence intervals. If you compare the incorrect pair, you can get
conflicting results, as shown by common mistake #1 in this post.

Closing Thoughts
In statistical analyses, there tends to be a greater focus on P values and simply detecting a
significant effect or difference. However, a statistically significant effect is not necessarily
meaningful in the real world. For instance, the effect might be too small to be of any practical
value.
It’s important to pay attention to the both the magnitude and the precision of the estimated
effect. That’s why I'm rather fond of confidence intervals. They allow you to assess these
important characteristics along with the statistical significance. You'd like to see a narrow
confidence interval where the entire range represents an effect that is meaningful in the real
world.

If you like this post, you might want to read the previous posts in this series that use the same
graphical framework:

 Part One: Why We Need to Use Hypothesis Tests


 Part Two: Significance Levels (alpha) and P values

For more about confidence intervals, read my post where I compare them to tolerance
intervals and prediction intervals.

If you'd like to see how I made the probability distribution plot, please read: How to Create a
Graphical Version of the 1-sample t-Test.

https://blog.minitab.com/en/adventures-in-statistics-2/how-to-correctly-interpret-p-values

How to Correctly Interpret P Values


Minitab Blog Editor | 17 April, 2014

Topics: Hypothesis Testing

The P value is used all over statistics, from t-tests to regression analysis. Everyone knows
that you use P values to determine statistical significance in a hypothesis test. In fact, P
values often determine what studies get published and what projects get funding.

Despite being so important, the P value is a slippery concept that people often interpret
incorrectly. How do you interpret P values?

In this post, I'll help you to understand P values in a more intuitive way and to avoid a very
common misinterpretation that can cost you money and credibility.

What Is the Null Hypothesis in Hypothesis Testing?


In order to understand P values, you must first understand the null hypothesis.

In every experiment, there is an effect or difference between groups that the researchers are
testing. It could be the effectiveness of a new drug, building material, or other intervention
that has benefits. Unfortunately for the researchers, there is always the possibility that there is
no effect, that is, that there is no difference between the groups. This lack of a difference is
called the null hypothesis, which is essentially the position a devil’s advocate would take
when evaluating the results of an experiment.

To see why, let’s imagine an experiment for a drug that we know is totally ineffective. The
null hypothesis is true: there is no difference between the experimental groups at the
population level.

Despite the null being true, it’s entirely possible that there will be an effect in the sample data
due to random sampling error. In fact, it is extremely unlikely that the sample groups will
ever exactly equal the null hypothesis value. Consequently, the devil’s advocate position is
that the observed difference in the sample does not reflect a true difference between
populations.

What Are P Values?


P values evaluate how well the sample data
support the devil’s advocate argument that the null hypothesis is true. It measures how
compatible your data are with the null hypothesis. How likely is the effect observed in your
sample data if the null hypothesis is true?

 High P values: your data are likely with a true null.


 Low P values: your data are unlikely with a true null.

A low P value suggests that your sample provides enough evidence that you can reject the
null hypothesis for the entire population.

How Do You Interpret P Values?

In technical terms, a P value is the probability of obtaining an effect at least as extreme as the
one in your sample data, assuming the truth of the null hypothesis.

For example, suppose that a vaccine study produced a P value of 0.04. This P value indicates
that if the vaccine had no effect, you’d obtain the observed difference or more in 4% of
studies due to random sampling error.
P values address only one question: how likely are your data, assuming a true null
hypothesis? It does not measure support for the alternative hypothesis. This limitation leads
us into the next section to cover a very common misinterpretation of P values.

P Values Are NOT the Probability of Making a Mistake


Incorrect interpretations of P values are very common. The most common mistake is to
interpret a P value as the probability of making a mistake by rejecting a true null hypothesis
(a Type I error).

There are several reasons why P values can’t be the error rate.

First, P values are calculated based on the assumptions that the null is true for the population
and that the difference in the sample is caused entirely by random chance. Consequently, P
values can’t tell you the probability that the null is true or false because it is 100% true from
the perspective of the calculations.

Second, while a low P value indicates that your data are unlikely assuming a true null, it can’t
evaluate which of two competing cases is more likely:

 The null is true but your sample was unusual.


 The null is false.

Determining which case is more likely requires subject area knowledge and replicate studies.

Let’s go back to the vaccine study and compare the correct and incorrect way to interpret the
P value of 0.04:

 Correct: Assuming that the vaccine had no effect, you’d obtain the observed
difference or more in 4% of studies due to random sampling error.
 
 Incorrect: If you reject the null hypothesis, there’s a 4% chance that you’re making a
mistake.

To see a graphical representation of how hypothesis tests work, see my post: Understanding
Hypothesis Tests: Significance Levels and P Values.

What Is the True Error Rate?


Think that this interpretation difference is simply a matter of semantics, and only important to
picky statisticians? Think again. It’s important to you.

If a P value is not the error rate, what the heck is the error rate? (Can you guess which way
this is heading now?)

Sellke et al.* have estimated the error rates associated with different P values. While the
precise error rate depends on various assumptions (which I discuss here), the table
summarizes them for middle-of-the-road assumptions.

P value Probability of incorrectly rejecting a true null hypothesis


0.05 At least 23% (and typically close to 50%)
0.01 At least 7% (and typically close to 15%)

Do the higher error rates in this table surprise you? Unfortunately, the common
misinterpretation of P values as the error rate creates the illusion of substantially more
evidence against the null hypothesis than is justified. As you can see, if you base a decision
on a single study with a P value near 0.05, the difference observed in the sample may not
exist at the population level. That can be costly!

Now that you know how to interpret P values, read my five guidelines for how to use P
values and avoid mistakes.

You can also read my rebuttal to an academic journal that actually banned P values!

An exciting study about the reproducibility of experimental results was published in August
2015. This study highlights the importance of understanding the true error rate. For more
information, read my blog post: P Values and the Replication of Experiments.

The American Statistical Association speaks out on how to use p-values!

*Thomas SELLKE, M. J. BAYARRI, and James O. BERGER, Calibration of p Values for


Testing Precise Null Hypotheses, The American Statistician, February 2001, Vol. 55, No. 1

You might also like