0% found this document useful (0 votes)

12 views17 pages

Hypothesis Test Resolution and Distribution of P-Values

The document discusses the concept of hypothesis testing, focusing on the resolution of statistical tests and the distribution of P-values. It introduces a new rule of thumb for determining significance levels based on test resolution, suggesting that the significance level should be set at 1.5 times the standard deviation in the P-value. The paper emphasizes the importance of considering test resolution to avoid misleading conclusions when evaluating hypotheses.

Uploaded by

Hugo Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views17 pages

Hypothesis Test Resolution and Distribution of P-Values

Uploaded by

Hugo Hernandez

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Vol.

10, 2025-03

Hypothesis Test Resolution and Distribution of P-values

Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]

doi: 10.13140/RG.2.2.25155.72487

Abstract
The resolution of a statistical test can be interpreted as “the ability of the test to detect small
deviations from the null hypothesis”, in a form analogous to the resolution of a measurement
system. Statistical tests of hypotheses are commonly evaluated considering a fixed significance
level of 5% and test resolution of zero. As a result, the null hypothesis will eventually be rejected
if the sample size is large enough, even when the null hypothesis is true. By considering a non-
zero test resolution and the corresponding distribution of P-values at the limit of test
resolution, an alternative rule of thumb for the determination of adequate significance levels
for the test is obtained. This rule of thumb suggests that the significance level should be set 1.5
times the standard deviation in the P-value at the limit of the test resolution. Considering the
two-sided Z-test, this approach is found consistent with the optimal significance level obtained
by minimizing the total test error.

Keywords
Change of Variable Theorem, Confidence, Dance of P-values, Hypothesis Testing, Optimal
Significance Level, P-values, Statistical Significance, Test Resolution, Uncertainty, Z-test

1. Introduction
Hypothesis testing is “a statistical analysis that uses sample data to assess two mutually exclusive
theories about the properties of a population.” [1] Those theories, denoted as hypotheses, are
associated to a research question about the behavior of a subject population observed under
certain specific conditions [2]. The two mutually exclusive theories are denoted as the null
hypothesis ( ) and the alternative hypothesis ( ). The null hypothesis is the default theory,
usually associated to the lack of an effect of the particular conditions considered on the
observed behavior of the population. The alternative hypothesis, on the other hand, is the
challenging theory and represents a significant effect of the observation conditions on the
properties of the population (a significant deviation from the null hypothesis).

Cite as: Hernandez, H. (2025). Hypothesis Test Resolution and Distribution of P-values. ForsChem
Research Reports, 10, 2025-03, 1 - 17. doi: 10.13140/RG.2.2.25155.72487. Publication Date: 14/02/2025.
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Statistical tools are required for the validation of hypotheses since the sampling of data from a
population always introduces uncertainty. Thus, it is necessary to draw a conclusion using
approximate values of the population properties which are inferred with a limited confidence
and accuracy [3]. This means that statistical testing of hypotheses always involves error, and
absolute conclusions will never be obtained.
Two types of errors are possible during hypothesis testing:
 Type I errors (false positives): Resulting when the null hypothesis is erroneously
rejected. The probability of occurrence of false positives is the significance level ( ),
which is related to the confidence level ( ) according to the following expression:

(1.1)
 Type II errors (false negatives): Resulting when the null hypothesis is erroneously not
rejected (i.e. the alternative hypothesis is erroneously rejected). The probability of
occurrence of false negatives is denoted by , whereas the power of the test, defined
as the probability of correctly rejecting the alternative hypothesis, is:

(1.2)
depends on the type of test employed, on the resolution of the test, on the size of the
sample, and on the confidence level of the test. In general, as the confidence level increases,
also increases. If zero false positive errors were to be obtained in a test ( confidence),
then the probability of false negatives becomes ( power). Similarly, if zero false
negative errors were to be obtained ( power), the confidence of the tests drops to zero
( confidence). Clearly, a balance between test confidence and test power is required.
Typically, the statistical test for evaluating the hypotheses consists in determining the -value,
and comparing it with the significance level previously defined for the test. The -value can be
interpreted as the risk of erroneously rejecting the null hypothesis (type I error) considering
the available data, whereas the significance level can be interpreted as the maximum
acceptable risk for erroneously rejecting the null hypothesis (type I error) in the test. For this
reason, if the risk observed from the data is less than the maximum acceptable risk, then it is
safe to reject the null hypothesis. However, if the risk is greater than the maximum acceptable
risk, then it is unsafe to reject the null hypothesis.
The most common practice observed in the scientific literature is assuming a fixed significance
level of for any statistical test. This value was suggested by Fisher, when discussing what he
considered a convenient maximum acceptable risk for his own purposes, corresponding to
fail in trials [4].
The previous analysis is based on the assumption that the null hypothesis is true, since for the
alternative hypothesis, the exact value of the parameter should be known. Unfortunately, the

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (2 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

analysis based strictly on the null hypothesis does not take into account the resolution of the
test, and this may result in misleading conclusions. The concept of test resolution is explained
in Section 2. An illustrative example of the effect of test resolution (and sample size) on the
distribution of -values is presented in Section 3. In this case, the -test is used because it is a
nice illustrative example due to its mathematical simplicity. The distribution of -values
obtained is compared in Section 4 with the optimal significance level values obtained for the
two-sided -test in a previous report [3].

2. Test Resolution
To understand the concept of test resolution and its importance on statistical testing of
hypotheses, let us consider the following set of hypotheses for a test of means:

(2.1)

where is the mean value of a certain property in a population under observation, and is a
reference value being tested.
For example, let us assume that we want to test the mean oral temperature in humans. Since
we need a reference value, let us consider some values obtained from the scientific literature:
 In , Carl Wunderlich suggested a mean value of °C ( °F) [5].
 In 1992, Mackowiak and coworkers [6] proposed that the mean oral temperature was
°C ( °F).
 In 2023, Ley and coworkers [7] averaged the oral temperature of more than half a
million people, concluding that the mean temperature is °C ( °F), with a
standard deviation of °C.
So we have three different reference values here. If the true mean value for the oral
temperature in humans is °C, then by considering °C or °C,
depending on the sample size, we will eventually reject the null hypothesis. But if we consider
°C, the null hypothesis will never be rejected.
However, if the true mean oral temperature in humans is °C, the null hypothesis
( °C) will eventually be rejected. Furthermore, if the true mean temperature is
°C, if the sample size is large enough, the null hypothesis
( °C) will also eventually be rejected. Theoretically speaking, in both cases the null
hypothesis should be rejected, but this does not seem the correct decision. Clearly, we need to
consider a minimum detectable difference when performing the analysis. This minimum
detectable difference between the true mean and the reference value is the resolution of the
test ( ).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (3 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

The test resolution can be defined as: “the ability of the statistical test to detect small deviations
from the null hypothesis.” [8] Below this test resolution threshold, deviations from the null
hypothesis should not be detected. This concept is analogous to that of measurement
resolution [9]. In fact, the test resolution can be related to the resolution of the measurement
system employed in the analysis. For example, if a digital thermometer with a measurement
resolution of °C is used to determine oral temperatures, it is quite reasonable to also
employ this value for the resolution of the test. Alternatively, we can define the test resolution
proportional to the standard deviation in the observations (which includes the measurement
uncertainty along with other sources of variation). Or we can simply define the test resolution
in terms of the minimum acceptable difference required to take action, which is usually related
to a cost function. For example, updating the mean human temperature value in textbooks,
databases, websites, and university courses worldwide has a high cost. So, updating the mean
temperature value for a difference of less than °C is probably not worth the effort.
The test resolution ( ) can be interpreted as a different formulation of the null and alterative
hypotheses as follows:
| |
| |
(2.2)
Then, the original set of hypotheses will be obtained in the limit when (resolution zero).
Considering the nature of continuous numbers, if we consider , given enough
observations all hull hypotheses will be eventually rejected, since the true mean value (with all
possible decimals) will never be exactly described by the reference value.
Let us now consider the mathematical implications of the incorporation of test resolution in
the analysis (i.e. in the -values of the test). For illustrative purposes, we will consider only the
two-sided -test, where the population values are assumed to distribute normally, and the
standard deviation (or the variance) of the population is assumed to be exactly known [10].

3. -value Distribution for the Two-Sided -test

Let us consider a normal population variable described by the following randomistic [11]
expression:

(3.1)
where represents the true mean of the population, is the true standard deviation of the
population (assumed known), and is a type I standard normal random variable [12].
Let us now assume that a sample of random, independent observations of variable is
obtained. Then, the average of such sample will be (considering Eq. 3.1):

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (4 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

̅ ∑ ∑( ) ∑
√
(3.2)
where is a random variable representing the -th observation of variable , represents
the standard normal random variable associated to the -th observation of , and is the
standard normal random variable of the sum of standard random variables [13].
Now, the -test involves the determination of the test statistic as follows [14]:
̅
√
(3.3)
Replacing Eq. (3.2) in Eq. (3.3), the following expression is obtained:

√
(3.4)
Indicating that is a normal variable with a mean value of and standard deviation of .
√

Using this test statistic, we can determine the -value (probability value) of the two-sided -
test as follows [14]:
( (| |) )
(3.5)
where represents the cumulative probability function for the standard normal random
variable, and given by:
( )
√
( )

(3.6)
where represents the error function.
Now, if we replace Eq. (3.4) and (3.6) in Eq. (3.5) we obtain:

| |
√
(|√ ( ) |)
√ √
( ( ( )))
(3.7)
Since the distribution of is known (standard normal) then the distribution of can be
obtained from the change of variable theorem [15]. First, we need to solve Eq. (3.7) for :

√ ( ) √ ( )

(3.8)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (5 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

where is the inverse of the error function.

We also need to determine the derivative of Eq. (3.8):
( ) ( ) ( ( ))
√ √
( )
(3.9)
And then, applying the change of variable theorem we can obtain the probability density
function of , as follows:

( ( )) (√ ( ) √ ( )) (√ ( ) √ ( ))

( )

(3.10)
This expression can be simplified into:

( )
√ ( ) ( ) √ ( ) ( )
( )
( )

(3.11)
And the corresponding cumulative probability function becomes:

( )
∫ ( )

(√ ( ) ( )) (√ ( ) ( ))

(3.12)
Eq. (3.11) and (3.12) describe the probability distribution of -values in a two-sided -test.
Eq. (3.12) is consistent with the cumulative distribution function of -values previously reported
by Cumming (Eq. B4 in [16]), which were also consistent with the results by Hung et al. [17].
Notice that the probability density of is also function of the sample size and the standardized
difference between the true mean and the reference value (which is the mean value of ).
Under the assumption of the null hypothesis with zero test resolution, that is when ,
the probability density function simply becomes:

( )
(3.13)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (6 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

with cumulative probability:

( )
(3.14)
Eq. (3.13) and (3.14) represent a uniform distribution of -values. That is if the true mean of the
population is exactly equal to the reference value, then any -value will be obtained with the
same probability due to random sampling.

Figure 1. Evolution of the probability density function of -values for different values of √ | |.

On the other hand, if the term is not exactly zero, a non-uniform distribution of -values
is obtained. Figure 1 illustrates the behavior of the probability density function of -values for
the two-sided -test, for different values of √ | |. Similarly, Figure 2 shows the
corresponding behavior of the cumulative probability function of -values.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (7 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Figure 2. Evolution of the cumulative probability of -values for different values of √ | |.

As the value of the term √ | | increases, the skewness of the distribution of -values

increases, and the probability of yielding small values increases. At about √ | | , the

distribution of -values closely resembles an exponential distribution. And for √ | | ,

Dirac’s delta function is finally obtained (the deterministic number [11]).
Notice that the effect of the difference between the true mean and the reference value is
amplified by the square root of sample size. That is, given enough elements in the sample, any
hypothesis test with zero test resolution will be rejected, even for an infinitesimally small
difference (which will always exist due to numerical truncation of the reference values).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (8 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Since the true mean value is unknown, we will never be able to determine the exact value of
√ | |, and therefore, we will never know the true distribution of -values in a test.
However, we may consider the scenario where the true mean is at the limit of the resolution of
the test.
Considering the set of hypotheses proposed in Eq. (2.2) with test resolution ( ) greater than
zero, the following probability distribution for the limit-case scenario is obtained:

( )
√ ( ) √ ( )
( )
( )

(3.15)

(√ ( )) (√ ( ))

( )

(3.16)
Now, we can define the standardized test resolution (or Cohen’s test resolution) as follows:

(3.17)
and then the probability distribution of -values for the two-sided -test becomes:

√ ( ) √ ( )
( ) ( )
(3.18)

(√ ( )) (√ ( ))
( )

(3.19)

Eq. (3.18) can be used to determine the expected -value of the test ( ( )) and its standard
deviation ( ( ) ) [17,18] (a measure of uncertainty), whereas Eq. (3.19) can be used to determine
the median -value of the test ( ( ) ) [19], as follows:

√ ( ) √ ( )
( ) ∫ ( )

√ ( ) √ ( )
(∫ ∫ )

(3.20)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (9 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

√ ( ) √ ( )
( ) ( (∫ ∫ ) ( ))

(3.21)

( ( ( )) √ ) ( ( ( )) √ )

(3.22)
Here, Eq. (3.22) is an implicit function of the median. Unfortunately, Eq. (3.20) to (3.22) cannot
be easily expressed in terms of simple, known functions, except for . However, they can
§
be empirically approximated by exponential functions as follows :

( ( ) )

( )

(3.23)
√

( )
√
(3.24)
√

√
( )
√
{ √
(3.25)

Exact and approximate expressions are graphically compared for different values of √ in
Figure 3 to Figure 5.

Figure 3. Expected -values for different values of √ . Left plot: Original scale. Right plot: Logarithm
scale of expected -values. Solid blue line: Exact values (Eq. 3.20). Dashed red line: Approximated values
(Eq. 3.23)

§
represents the coefficient evaluated for the logarithm transformation of the variables.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (10 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Figure 4. Standard deviation (uncertainty) in -values for different values of √ . Left plot: Original
scale. Right plot: Logarithm scale of standard deviation of -values. Solid blue line: Exact values (Eq.
3.21). Dashed red line: Approximated values (Eq. 3.24)

Figure 5. Median of -values for different values of √ . Left plot: Original scale. Right plot: Logarithm
scale of the median of -values. Solid blue line: Exact values (Eq. 3.22). Dashed red line: Approximated
values (Eq. 3.25)

The median of the -value distribution is observed to decay faster than the mean with
increasing values of √ (see Figure 6), due to the skewness of the distribution obtained. The
standard deviation initially increases with respect to the uniform distribution, and then decays
with increasing √ . Also due to increased skewness, standard deviation values are greater
than those of the mean or median -values for √ , approximately.

Figure 6. Properties of the -values distribution for the two-sided -test, for different values of √ . Left
plot: Original scale. Right plot: Logarithm scale of -values. Solid purple line: Expected values (Eq. 3.20).
Dashed orange line: Standard deviation values (Eq. 3.21). Dotted green line: Median values (Eq. 3.22).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (11 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Notice that for √ the standard deviation in -values is relatively large (greater than ).
That is, the uncertainty in the determination of the -value is greater than the typical
significance level employed for the test. This situation leads to the “dance of -values”
described by Cumming [20,21], where the test conclusion fluctuates between acceptance and
rejection of the null hypothesis for different random samples obtained from the same
population under identical conditions. The dance of -values for low √ values can be easily
observed by computer simulation [22].

On the other hand, for √ the standard deviation in -values is less than , but then the
expected -value is even smaller. So, when the typical significance level is used, the most
likely conclusion will be the rejection of the null hypothesis. For a constant sample size, as the
standardized test resolution ( ) increases, the null hypothesis will be more easily rejected in
clear contradiction with the hypothesis test formulated in Eq. (2.2), where the null hypothesis
should be more easily accepted (as the test resolution increases). This contradiction reflects
the fact that a constant significance level is inconsistent with tests having non-zero resolution.
Thus, the adequate significance level to be employed in a test must somehow resemble the
behavior of the distribution of -values. For the two-sided -test, as an example, the
significance level of the test should decay almost exponentially with respect to , just like
the main properties of the -value distribution.

4. Comparison with Optimal Significance Levels

In a previous report [3], the concept of optimal significance level ( ) was introduced as the
significance level value that minimizes the total test error (including both type I and type II
errors). In this report, the optimal significance level for the two-sided -test was found to be:

( )

√ √
√

( )
(4.1)

For √ √ we simply obtain . That is, the uncertainty in the test is so large that
the probability of erroneously rejecting the null hypothesis becomes . Thus, performing a
test of hypotheses with zero resolution, is similar to performing a test with zero confidence. In
fact, the null hypothesis will be eventually rejected given enough elements in the sample, even
when the null hypothesis is true.

For larger values of √ ( √ ) Eq. (4.1) can be empirically approximated by [3]:

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (12 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

√ ( √ )
√ √
(4.2)
Notice that Eq. (4.2) represents an exponential decay with respect to as suggested in the
previous Section.
These expressions are graphically compared to the properties of the distribution of -values in
Figure 7 (for √ ).

Figure 7. Comparison between optimal significant levels ( ) Properties of the -values distribution for
the two-sided -test, for different values of √ . Left plot: Original scale. Right plot: Logarithm scale of
-values. Solid purple line: Expected values (Eq. 3.20). Dashed orange line: Standard deviation values (Eq.
3.21). Dotted green line: Median values (Eq. 3.22). Solid red line: Exact optimal significant levels (Eq. 4.1).
Dashed blue line: Approximated optimal significant levels (Eq. 4.2).
Interestingly, the optimal significant levels determined by Eq. (4.2) are approximately
proportional to the standard deviation in the distribution of -values, and the proportionality
constant is on average about . This is reasonable, considering that the uncertainty in the -
values at the limit of test resolution should be less than the maximum type I error tolerance
(significance level). It also suggests that times the standard deviation in the -value
distribution might be used as a rule-of-thumb criterion for determining an adequate
significance level for a test, as illustrated in Figure 8.

Figure 8. Optima significance level vs. standard deviation in -values. Solid blue line: Empirical
approximation (Eq. 4.2). Dashed red line: Rule of thumb ( ).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (13 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Using the approximated standard deviation given in Eq. (3.24), the expression for the rule of
thumb becomes for the two-sided -test:

√ √
√

(4.3)
On the other hand, using the expected -value or the median -value as significance levels for
the test implies that the type I error is lower than for the optimal significance level, but at the
expense of an even greater increase in type II error.

5. Summary
Statistical tests of hypotheses should always be evaluated considering non-zero test
resolutions, and adequate significance levels depending on both the test resolution and sample
size employed by the test.
An adequate significance level is, for example, the optimal significance level obtained by
minimizing the total error of the test [23].
In addition, in this report a new rule of thumb is proposed for the adequate significance level,
as times the uncertainty (standard deviation) in -values at the limit of the corresponding
test resolution. The uncertainty in the -value can be obtained from the probability density
function of -values. Such function can be obtained using the Change of Variable Theorem [15]
considering the definition of the P-value and the probability distribution of the data assumed
by the test.
For example, for the case of the two-sided -test, a normal probability density function (mean
and standard deviation ) of the data is assumed, and the -value is determined as follows:

( (| |) )
(3.5)
where

√
(3.4)
where is the reference value of the null hypothesis, is the sample size and represents a
standard normal distribution, and

( )
√
( )

(3.6)
is the standard normal cumulative probability function, where represents the error function.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (14 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Then, by using the change of variable theorem, the following expression is obtained for the
probability density function of -values at the limit of standardized test resolution ( ):

√ ( ) √ ( )
( ) ( )
(3.18)
And from this function, the uncertainty in -values is obtained as follows:

√ ( ) √ ( )
( ) ( (∫ ∫ ) ( ))

(3.21)
where

√ ( ) √ ( )
( ) ∫ ( )

√ ( ) √ ( )
(∫ ∫ )

(3.20)
Since no simple analytical expression is obtained for the standard deviation, the following
approximation can be used (see Figure 4):
√

( )
√
(3.24)
Thus, the rule of thumb for an adequate significance level ( ) becomes:

√ √
√

(4.3)
which closely resembles the optimal significance levels of this test (see Figure 8):

( )

√ √
√

( )
(4.1)
and approximated as:
√ ( √ )
√ √
(4.2)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (15 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

It is possible to avoid defining arbitrary significance levels and/or test resolutions by using the
concept of relevance instead of significance [8]. However, when the resolution of the test is
deemed important, minimization of total error [23] or determination of the -value uncertainty
(Section 3) can be used to define an adequate significance level for the test.

Acknowledgment and Disclaimer

The author gratefully acknowledges Prof. Jaime Aguirre (Universidad Nacional de Colombia)
for reading the manuscript and suggesting improvements.

This report provides data, information and conclusions obtained by the author(s) from original scientific
research, based on the best knowledge available to the author(s). The main purpose of this publication is
to openly share scientific knowledge. Any mistake, omission, error or inaccuracy published, if any, is
completely unintentional.

This research did not receive any specific grant from funding agencies in the public, commercial, or non-
profit sectors.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
 Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes are made. This can be done in any reasonable manner, but not in any way that suggests
endorsement by the licensor.
 Non-Commercial: This material may not be used for commercial purposes.

References

[1] Frost, J. (2020). Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions. Statistics
by Jim Publishing, State College (PA, USA). https://statisticsbyjim.com/hypothesis-
testing/hypothesis-testing-intuitive-guide/
[2] Hernandez, H. (2020). Formulation and Testing of Scientific Hypotheses in the presence of
Uncertainty. ForsChem Research Reports, 5, 2020-01. doi: 10.13140/RG.2.2.36317.97767.
[3] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 1. Tests of
Means. ForsChem Research Reports, 6, 2021-06, 1-45. doi: 10.13140/RG.2.2.18643.09762.
[4] Fisher, R. A. (1934). Statistical Methods for Research Workers. 5th Ed. Revised and Enlarged. Oliver
and Boyd, Edinburgh. §12. The Normal Distribution. pp. 43-46. ark:/13960/t2v46472h.
[5] Mackowiak, P. A., & Worden, G. (1994). Carl Reinhold August Wunderlich and the Evolution of
Clinical Thermometry. Clinical Infectious Diseases, 18 (3), 458-467. JSTOR: 4457716.
[6] Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6 F, the upper
limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich.
JAMA, 268 (12), 1578-1580. doi: 10.1001/jama.1992.03490120092034.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (16 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

[7] Ley, C., Heath, F., Hastie, T., Gao, Z., Protsiv, M., & Parsonnet, J. (2023). Defining usual oral
temperature ranges in outpatients using an unsupervised learning algorithm. JAMA Internal
Medicine, 183 (10), 1128-1135. doi: 10.1001/jamainternmed.2023.4291.
[8] Hernandez, H. (2025). Relevant vs. Significant Differences in Hypothesis Testing. ForsChem
Research Reports, 10, 2025-01, 1 - 17. doi: 10.13140/RG.2.2.30012.35200.
[9] Croarkin, C. & Tobias, P. (2012). NIST/SEMATECH e-Handbook of Statistical Methods. Chapter 2:
Measurement Process Characterization. Section 2.4: Gage R&R Studies. Part 2.4.5: Analysis of Bias.
2.4.5.1. Resolution. https://www.itl.nist.gov/div898/handbook/mpc/section4/mpc451.htm. Latest
access: February 6, 2025.
[10] Ross, S. M. (2004). Introduction to Probability and Statistics for Engineers and Scientists. 3rd Ed.
Elsevier Academic Press, San Diego CA. 8.3. Tests concerning the Mean of a Normal Population.
pp. 293-311. ISBN: 0125980574.
[11] Hernandez, H. (2022). Standard Deterministic, Standard Random, and Randomistic Variables.
ForsChem Research Reports, 7, 2022-06, 1 - 18. doi: 10.13140/RG.2.2.36316.87688.
[12] Hernandez, H. (2018). Expected Value, Variance and Covariance of Natural Powers of
Representative Standard Random Variables. ForsChem Research Reports, 3, 2018-08, 1-19. doi:
10.13140/RG.2.2.15187.07205.
[13] Hernandez, H. (2023). Representative Functions of the Standard Normal Distribution. ForsChem
Research Reports, 8, 2023-01, 1 - 29. doi: 10.13140/RG.2.2.29607.83362.
[14] Montgomery, D. C., & Runger, G. C. (2003). Applied Statistics and Probability for Engineers. 3rd
Edition. John Wiley & Sons, Inc., New York. 9.2. Tests on the Mean of a Normal Distribution,
Variance Known. pp. 289-300. ISBN: 9780471204541.
[15] Hernandez, H. (2017). Multivariate Probability Theory: Determination of Probability Density
Functions. ForsChem Research Reports, 2, 2017-13, 1-13. doi: 10.13140/RG.2.2.28214.60481.
[16] Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but
confidence intervals do much better. Perspectives on Psychological Science, 3 (4), 286-300. doi:
10.1111/j.1745-6924.2008.00079.x.
[17] Hung, H. J., O'Neill, R. T., Bauer, P., & Köhne, K. (1997). The behavior of the p-value when the
alternative hypothesis is true. Biometrics, 11-22. doi: 10.2307/2533093.
[18] Sackrowitz, H., & Samuel-Cahn, E. (1999). P values as random variables - Expected P values. The
American Statistician, 53 (4), 326-331. doi: 10.1080/00031305.1999.10474484.
[19] Bhattacharya, B., & Habtzghi, D. (2002). Median of the p-Value under the alternative hypothesis.
The American Statistician, 56 (3), 202-206. 10.1198/000313002146.
[20] Cumming, G. (2012). Understanding the New Statistics: Effect Sizes, Confidence Intervals, and
Meta-Analysis. Routledge (Taylor & Francis), New York. doi: 10.4324/9780203807002.
[21] Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25 (1), 7-29. doi:
10.1177/0956797613504966.
[22] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 3. Large
Samples. ForsChem Research Reports, 6, 2021-08, 1-22. doi: 10.13140/RG.2.2.31487.33449.
[23] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 7.
Implementation Remarks. ForsChem Research Reports, 6, 2021-12, 1-27. doi:
10.13140/RG.2.2.23632.64000.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (17 / 17)

Team 10 Primer
No ratings yet
Team 10 Primer
12 pages
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
100% (4)
ProfEd221 - Unit 5 - Feedbacking and Communicating Assessment Results PDF
12 pages
Test of Hypotheses: Hypothesis
100% (2)
Test of Hypotheses: Hypothesis
31 pages
Hypothesis Testing (Statistics)
100% (1)
Hypothesis Testing (Statistics)
23 pages
Practical 8 For Work System Design
100% (2)
Practical 8 For Work System Design
18 pages
Scope Statement For The Time Table Generation System For Thapar University
60% (5)
Scope Statement For The Time Table Generation System For Thapar University
4 pages
8.hypo Testing....
No ratings yet
8.hypo Testing....
44 pages
Testing of Hypothesis
No ratings yet
Testing of Hypothesis
58 pages
Chapter8 Notes PDF
No ratings yet
Chapter8 Notes PDF
13 pages
Testing of Hypotheses PDF
No ratings yet
Testing of Hypotheses PDF
21 pages
Lecture Notes 1
No ratings yet
Lecture Notes 1
147 pages
Week 1 To 3 Lectures Q A
No ratings yet
Week 1 To 3 Lectures Q A
16 pages
Chapter 6
No ratings yet
Chapter 6
47 pages
Chapter 7 Hypothesis Testing and Sample Size Determination - 2
No ratings yet
Chapter 7 Hypothesis Testing and Sample Size Determination - 2
69 pages
Hypothesis Test
83% (6)
Hypothesis Test
15 pages
Huypothesis Testing Final Notes 2020 - 2021
No ratings yet
Huypothesis Testing Final Notes 2020 - 2021
33 pages
Allen, S Group
No ratings yet
Allen, S Group
8 pages
Chapter IX Hypothesis Testing
No ratings yet
Chapter IX Hypothesis Testing
31 pages
MODULE 7 2 Hypothesis Testing CANVAS
No ratings yet
MODULE 7 2 Hypothesis Testing CANVAS
63 pages
Hypothesis Testing
100% (1)
Hypothesis Testing
60 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
58 pages
Topic 3. HYPOTHESIS AND ITS LOGIC PROCESS (2015-17)
No ratings yet
Topic 3. HYPOTHESIS AND ITS LOGIC PROCESS (2015-17)
24 pages
What Is Hypothesis Testing
100% (1)
What Is Hypothesis Testing
32 pages
Hypothesis Testing
0% (1)
Hypothesis Testing
97 pages
Hypothesis Testing and Estimation
No ratings yet
Hypothesis Testing and Estimation
7 pages
Learning Module - Statistics and Probability
No ratings yet
Learning Module - Statistics and Probability
71 pages
Null Vs Alternative Hypothesis, Rejection Region, and Significance Level Type I Error and Type II Error, Test For The Mean. Population Variance Known, P-Value
No ratings yet
Null Vs Alternative Hypothesis, Rejection Region, and Significance Level Type I Error and Type II Error, Test For The Mean. Population Variance Known, P-Value
14 pages
L10 11 Hypothesis & ANOVA
No ratings yet
L10 11 Hypothesis & ANOVA
13 pages
Relevant vs. Significant Differences in Hypothesis Testing
No ratings yet
Relevant vs. Significant Differences in Hypothesis Testing
27 pages
CH-8 Hypothesis Testing
No ratings yet
CH-8 Hypothesis Testing
37 pages
Optimal Significance Level and Sample Size in Hypothesis Testing 1 - Tests of Means
No ratings yet
Optimal Significance Level and Sample Size in Hypothesis Testing 1 - Tests of Means
45 pages
Eda Research
No ratings yet
Eda Research
11 pages
3 Hypothesis-Testing
No ratings yet
3 Hypothesis-Testing
59 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
61 pages
BUS600 Week-10 Note
No ratings yet
BUS600 Week-10 Note
9 pages
Mas S Mohktar Email: Mas - Dayana@um - Edu.my Phone (Office) : 0379677681
No ratings yet
Mas S Mohktar Email: Mas - Dayana@um - Edu.my Phone (Office) : 0379677681
22 pages
Hypothesis Testing: Before Any Sample Readings Are Considered
No ratings yet
Hypothesis Testing: Before Any Sample Readings Are Considered
8 pages
14hypothesis Testing
No ratings yet
14hypothesis Testing
24 pages
CHAPTER 5.1 Hypothesis Testing
No ratings yet
CHAPTER 5.1 Hypothesis Testing
26 pages
Hypothesis Testing - Lecture Notes
No ratings yet
Hypothesis Testing - Lecture Notes
7 pages
Hypothesis Testing Notes
No ratings yet
Hypothesis Testing Notes
7 pages
Week 12
No ratings yet
Week 12
8 pages
Module 3
No ratings yet
Module 3
6 pages
PSNM - Ch. 3
No ratings yet
PSNM - Ch. 3
32 pages
Infer Ential
No ratings yet
Infer Ential
25 pages
Basic Concepts in Hypothesis Testing (Rosalind L P Phang)
No ratings yet
Basic Concepts in Hypothesis Testing (Rosalind L P Phang)
7 pages
Chap 3 Hypothesis Testing
No ratings yet
Chap 3 Hypothesis Testing
29 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
5 pages
Chapter 8 Test On Hypothesis For A Single Sample
No ratings yet
Chapter 8 Test On Hypothesis For A Single Sample
83 pages
Testing of Hypothesis Hypothesis
No ratings yet
Testing of Hypothesis Hypothesis
32 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
59 pages
P&S Unit 5 (1) 111111111111
No ratings yet
P&S Unit 5 (1) 111111111111
16 pages
Advanced Stats LP
No ratings yet
Advanced Stats LP
8 pages
Lecture 4
No ratings yet
Lecture 4
28 pages
Topic 3 - Hypothesis Testing - 2
No ratings yet
Topic 3 - Hypothesis Testing - 2
29 pages
Statsprob - Reviewer q2
No ratings yet
Statsprob - Reviewer q2
24 pages
Waqar Ansari's RISE QM Ch#14
No ratings yet
Waqar Ansari's RISE QM Ch#14
20 pages
Theory of Decision
No ratings yet
Theory of Decision
9 pages
Hypothesis Testing Procedure PT 1
No ratings yet
Hypothesis Testing Procedure PT 1
32 pages
Biostat Week4 Lecture 2024B 13427701
No ratings yet
Biostat Week4 Lecture 2024B 13427701
57 pages
Statistical Hypothesis
No ratings yet
Statistical Hypothesis
9 pages
Generation of Sets of Partially Correlated Random Numbers
No ratings yet
Generation of Sets of Partially Correlated Random Numbers
30 pages
ForsChem Research Reports Vol. 1 (2016)
No ratings yet
ForsChem Research Reports Vol. 1 (2016)
47 pages
General Formulation and Numerical Solution of ODE Systems
No ratings yet
General Formulation and Numerical Solution of ODE Systems
32 pages
Understanding Work, Heat, and The First Law of Thermodynamics 1: Fundamentals
No ratings yet
Understanding Work, Heat, and The First Law of Thermodynamics 1: Fundamentals
40 pages
Multi-Algorithm Optimization
No ratings yet
Multi-Algorithm Optimization
33 pages
Heteroscedastic Regression Models
No ratings yet
Heteroscedastic Regression Models
29 pages
Optimal Model Structure Identification. 3. Heteroscedastic Models
No ratings yet
Optimal Model Structure Identification. 3. Heteroscedastic Models
38 pages
A Continuous Normal Approximation To The Binomial Distribution
No ratings yet
A Continuous Normal Approximation To The Binomial Distribution
25 pages
Estimation of The Mean Using Samples Obtained From Finite Populations
No ratings yet
Estimation of The Mean Using Samples Obtained From Finite Populations
19 pages
Confusion and Illusions in Collision Theory
No ratings yet
Confusion and Illusions in Collision Theory
42 pages
PID Controller Stability and Tuning Using A Time-Domain Approach
No ratings yet
PID Controller Stability and Tuning Using A Time-Domain Approach
32 pages
Towards A Robust and Unbiased Estimation of Standard Deviation
No ratings yet
Towards A Robust and Unbiased Estimation of Standard Deviation
33 pages
Molecular Modeling of Macroscopic Phase Changes 3 - Heat and Rate of Evaporation
No ratings yet
Molecular Modeling of Macroscopic Phase Changes 3 - Heat and Rate of Evaporation
34 pages
Clausius' vs. Boltzmann's Entropy
No ratings yet
Clausius' vs. Boltzmann's Entropy
11 pages
Molecular Modeling of Macroscopic Phase Changes 1 - Liquid Evaporation
No ratings yet
Molecular Modeling of Macroscopic Phase Changes 1 - Liquid Evaporation
20 pages
A General Multiscale Pair Interaction Potential Model
No ratings yet
A General Multiscale Pair Interaction Potential Model
25 pages
Probability Distribution of Intermolecular Potential Energies
No ratings yet
Probability Distribution of Intermolecular Potential Energies
15 pages
Molecular Modeling of Macroscopic Phase Changes 2 - Vapor Pressure Parameters
No ratings yet
Molecular Modeling of Macroscopic Phase Changes 2 - Vapor Pressure Parameters
43 pages
A Smooth Transition From Molecular To Brownian Dynamics Simulation
No ratings yet
A Smooth Transition From Molecular To Brownian Dynamics Simulation
26 pages
Inconsistencies in The Boltzmann-Gibbs Energy Distribution
No ratings yet
Inconsistencies in The Boltzmann-Gibbs Energy Distribution
26 pages
Optimal Significance Level and Sample Size in Hypothesis Testing 5 - Tests of Medians
No ratings yet
Optimal Significance Level and Sample Size in Hypothesis Testing 5 - Tests of Medians
22 pages
Inner Ring
No ratings yet
Inner Ring
16 pages
Blood Letting
No ratings yet
Blood Letting
4 pages
STA - Chapter 1 Lesson 3 Principles and Characteristics of Good Speech
No ratings yet
STA - Chapter 1 Lesson 3 Principles and Characteristics of Good Speech
4 pages
0625 - w15 - QP - 63with Ms PDF
No ratings yet
0625 - w15 - QP - 63with Ms PDF
9 pages
Stanford GSB Ee Sample Schedule MRR
No ratings yet
Stanford GSB Ee Sample Schedule MRR
1 page
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
No ratings yet
Reference Photo:: 9-7/8 In. (250.8mm) QD503X
1 page
Leadership Across Cultures
No ratings yet
Leadership Across Cultures
36 pages
BIO 101 - Lecture Notes 1
No ratings yet
BIO 101 - Lecture Notes 1
20 pages
ADP-233600-019 R1 MS of Air Curtain (A)
No ratings yet
ADP-233600-019 R1 MS of Air Curtain (A)
24 pages
3 Happiness Exercises
No ratings yet
3 Happiness Exercises
20 pages
XRIO User Manual
No ratings yet
XRIO User Manual
38 pages
Creativity Is Always A Social Process
No ratings yet
Creativity Is Always A Social Process
17 pages
General Knowledge For IAS in English
No ratings yet
General Knowledge For IAS in English
4 pages
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
No ratings yet
Installation Instructions: Diesel/Alternator Tachometer 3-3/8" & 5"
2 pages
OB Biruktawit Zegeye
No ratings yet
OB Biruktawit Zegeye
6 pages
English Formulas (Elements For Composing Written Responses)
No ratings yet
English Formulas (Elements For Composing Written Responses)
2 pages
The Hydrologic Budget
100% (1)
The Hydrologic Budget
6 pages
The Squamish Wildfire Plan
100% (2)
The Squamish Wildfire Plan
115 pages
Complete Notes of Bme
No ratings yet
Complete Notes of Bme
250 pages
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
No ratings yet
HDI OnQ RandI Set A Closed To Arrival Control On Rate Levels V1.0
11 pages
A Business Research On SPACEX
No ratings yet
A Business Research On SPACEX
5 pages
Academic Writing
No ratings yet
Academic Writing
12 pages
Shahzad 2014
No ratings yet
Shahzad 2014
21 pages
Menalled Et Al Canopy Develop Trop Tree Plantations
No ratings yet
Menalled Et Al Canopy Develop Trop Tree Plantations
15 pages
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
No ratings yet
Sustainable Industrial Chemistry 1st Edition Fabrizio Cavani Download
55 pages
MicroMonsta 2 Manual EN 2.3
No ratings yet
MicroMonsta 2 Manual EN 2.3
36 pages
(FREE PDF Sample) Rhetorical Criticism Exploration and Practice Fifth Edition. Edition Sonja K. Foss Ebooks
100% (2)
(FREE PDF Sample) Rhetorical Criticism Exploration and Practice Fifth Edition. Edition Sonja K. Foss Ebooks
84 pages

Hypothesis Test Resolution and Distribution of P-Values

Uploaded by

Hypothesis Test Resolution and Distribution of P-Values

Uploaded by

Vol.

Hypothesis Test Resolution and Distribution of P-values

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

3. -value Distribution for the Two-Sided -test

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

where is the inverse of the error function.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

with cumulative probability:

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

Figure 2. Evolution of the cumulative probability of -values for different values of √ | |.

distribution of -values closely resembles an exponential distribution. And for √ | | ,

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

4. Comparison with Optimal Significance Levels

For larger values of √ ( √ ) Eq. (4.1) can be empirically approximated by [3]:

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

Acknowledgment and Disclaimer

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03

You might also like