0% found this document useful (0 votes)
12 views17 pages

Hypothesis Test Resolution and Distribution of P-Values

The document discusses the concept of hypothesis testing, focusing on the resolution of statistical tests and the distribution of P-values. It introduces a new rule of thumb for determining significance levels based on test resolution, suggesting that the significance level should be set at 1.5 times the standard deviation in the P-value. The paper emphasizes the importance of considering test resolution to avoid misleading conclusions when evaluating hypotheses.

Uploaded by

Hugo Hernandez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views17 pages

Hypothesis Test Resolution and Distribution of P-Values

The document discusses the concept of hypothesis testing, focusing on the resolution of statistical tests and the distribution of P-values. It introduces a new rule of thumb for determining significance levels based on test resolution, suggesting that the significance level should be set at 1.5 times the standard deviation in the P-value. The paper emphasizes the importance of considering test resolution to avoid misleading conclusions when evaluating hypotheses.

Uploaded by

Hugo Hernandez
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Vol.

10, 2025-03

Hypothesis Test Resolution and Distribution of P-values

Hugo Hernandez
ForsChem Research, 050030 Medellin, Colombia
[email protected]

doi: 10.13140/RG.2.2.25155.72487

Abstract
The resolution of a statistical test can be interpreted as “the ability of the test to detect small
deviations from the null hypothesis”, in a form analogous to the resolution of a measurement
system. Statistical tests of hypotheses are commonly evaluated considering a fixed significance
level of 5% and test resolution of zero. As a result, the null hypothesis will eventually be rejected
if the sample size is large enough, even when the null hypothesis is true. By considering a non-
zero test resolution and the corresponding distribution of P-values at the limit of test
resolution, an alternative rule of thumb for the determination of adequate significance levels
for the test is obtained. This rule of thumb suggests that the significance level should be set 1.5
times the standard deviation in the P-value at the limit of the test resolution. Considering the
two-sided Z-test, this approach is found consistent with the optimal significance level obtained
by minimizing the total test error.

Keywords
Change of Variable Theorem, Confidence, Dance of P-values, Hypothesis Testing, Optimal
Significance Level, P-values, Statistical Significance, Test Resolution, Uncertainty, Z-test

1. Introduction
Hypothesis testing is “a statistical analysis that uses sample data to assess two mutually exclusive
theories about the properties of a population.” [1] Those theories, denoted as hypotheses, are
associated to a research question about the behavior of a subject population observed under
certain specific conditions [2]. The two mutually exclusive theories are denoted as the null
hypothesis ( ) and the alternative hypothesis ( ). The null hypothesis is the default theory,
usually associated to the lack of an effect of the particular conditions considered on the
observed behavior of the population. The alternative hypothesis, on the other hand, is the
challenging theory and represents a significant effect of the observation conditions on the
properties of the population (a significant deviation from the null hypothesis).

Cite as: Hernandez, H. (2025). Hypothesis Test Resolution and Distribution of P-values. ForsChem
Research Reports, 10, 2025-03, 1 - 17. doi: 10.13140/RG.2.2.25155.72487. Publication Date: 14/02/2025.
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Statistical tools are required for the validation of hypotheses since the sampling of data from a
population always introduces uncertainty. Thus, it is necessary to draw a conclusion using
approximate values of the population properties which are inferred with a limited confidence
and accuracy [3]. This means that statistical testing of hypotheses always involves error, and
absolute conclusions will never be obtained.
Two types of errors are possible during hypothesis testing:
 Type I errors (false positives): Resulting when the null hypothesis is erroneously
rejected. The probability of occurrence of false positives is the significance level ( ),
which is related to the confidence level ( ) according to the following expression:

(1.1)
 Type II errors (false negatives): Resulting when the null hypothesis is erroneously not
rejected (i.e. the alternative hypothesis is erroneously rejected). The probability of
occurrence of false negatives is denoted by , whereas the power of the test, defined
as the probability of correctly rejecting the alternative hypothesis, is:

(1.2)
depends on the type of test employed, on the resolution of the test, on the size of the
sample, and on the confidence level of the test. In general, as the confidence level increases,
also increases. If zero false positive errors were to be obtained in a test ( confidence),
then the probability of false negatives becomes ( power). Similarly, if zero false
negative errors were to be obtained ( power), the confidence of the tests drops to zero
( confidence). Clearly, a balance between test confidence and test power is required.
Typically, the statistical test for evaluating the hypotheses consists in determining the -value,
and comparing it with the significance level previously defined for the test. The -value can be
interpreted as the risk of erroneously rejecting the null hypothesis (type I error) considering
the available data, whereas the significance level can be interpreted as the maximum
acceptable risk for erroneously rejecting the null hypothesis (type I error) in the test. For this
reason, if the risk observed from the data is less than the maximum acceptable risk, then it is
safe to reject the null hypothesis. However, if the risk is greater than the maximum acceptable
risk, then it is unsafe to reject the null hypothesis.
The most common practice observed in the scientific literature is assuming a fixed significance
level of for any statistical test. This value was suggested by Fisher, when discussing what he
considered a convenient maximum acceptable risk for his own purposes, corresponding to
fail in trials [4].
The previous analysis is based on the assumption that the null hypothesis is true, since for the
alternative hypothesis, the exact value of the parameter should be known. Unfortunately, the

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (2 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

analysis based strictly on the null hypothesis does not take into account the resolution of the
test, and this may result in misleading conclusions. The concept of test resolution is explained
in Section 2. An illustrative example of the effect of test resolution (and sample size) on the
distribution of -values is presented in Section 3. In this case, the -test is used because it is a
nice illustrative example due to its mathematical simplicity. The distribution of -values
obtained is compared in Section 4 with the optimal significance level values obtained for the
two-sided -test in a previous report [3].

2. Test Resolution
To understand the concept of test resolution and its importance on statistical testing of
hypotheses, let us consider the following set of hypotheses for a test of means:

(2.1)

where is the mean value of a certain property in a population under observation, and is a
reference value being tested.
For example, let us assume that we want to test the mean oral temperature in humans. Since
we need a reference value, let us consider some values obtained from the scientific literature:
 In , Carl Wunderlich suggested a mean value of °C ( °F) [5].
 In 1992, Mackowiak and coworkers [6] proposed that the mean oral temperature was
°C ( °F).
 In 2023, Ley and coworkers [7] averaged the oral temperature of more than half a
million people, concluding that the mean temperature is °C ( °F), with a
standard deviation of °C.
So we have three different reference values here. If the true mean value for the oral
temperature in humans is °C, then by considering °C or °C,
depending on the sample size, we will eventually reject the null hypothesis. But if we consider
°C, the null hypothesis will never be rejected.
However, if the true mean oral temperature in humans is °C, the null hypothesis
( °C) will eventually be rejected. Furthermore, if the true mean temperature is
°C, if the sample size is large enough, the null hypothesis
( °C) will also eventually be rejected. Theoretically speaking, in both cases the null
hypothesis should be rejected, but this does not seem the correct decision. Clearly, we need to
consider a minimum detectable difference when performing the analysis. This minimum
detectable difference between the true mean and the reference value is the resolution of the
test ( ).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (3 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

The test resolution can be defined as: “the ability of the statistical test to detect small deviations
from the null hypothesis.” [8] Below this test resolution threshold, deviations from the null
hypothesis should not be detected. This concept is analogous to that of measurement
resolution [9]. In fact, the test resolution can be related to the resolution of the measurement
system employed in the analysis. For example, if a digital thermometer with a measurement
resolution of °C is used to determine oral temperatures, it is quite reasonable to also
employ this value for the resolution of the test. Alternatively, we can define the test resolution
proportional to the standard deviation in the observations (which includes the measurement
uncertainty along with other sources of variation). Or we can simply define the test resolution
in terms of the minimum acceptable difference required to take action, which is usually related
to a cost function. For example, updating the mean human temperature value in textbooks,
databases, websites, and university courses worldwide has a high cost. So, updating the mean
temperature value for a difference of less than °C is probably not worth the effort.
The test resolution ( ) can be interpreted as a different formulation of the null and alterative
hypotheses as follows:
| |
| |
(2.2)
Then, the original set of hypotheses will be obtained in the limit when (resolution zero).
Considering the nature of continuous numbers, if we consider , given enough
observations all hull hypotheses will be eventually rejected, since the true mean value (with all
possible decimals) will never be exactly described by the reference value.
Let us now consider the mathematical implications of the incorporation of test resolution in
the analysis (i.e. in the -values of the test). For illustrative purposes, we will consider only the
two-sided -test, where the population values are assumed to distribute normally, and the
standard deviation (or the variance) of the population is assumed to be exactly known [10].

3. -value Distribution for the Two-Sided -test


Let us consider a normal population variable described by the following randomistic [11]
expression:

(3.1)
where represents the true mean of the population, is the true standard deviation of the
population (assumed known), and is a type I standard normal random variable [12].
Let us now assume that a sample of random, independent observations of variable is
obtained. Then, the average of such sample will be (considering Eq. 3.1):

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (4 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

̅ ∑ ∑( ) ∑

(3.2)
where is a random variable representing the -th observation of variable , represents
the standard normal random variable associated to the -th observation of , and is the
standard normal random variable of the sum of standard random variables [13].
Now, the -test involves the determination of the test statistic as follows [14]:
̅

(3.3)
Replacing Eq. (3.2) in Eq. (3.3), the following expression is obtained:


(3.4)
Indicating that is a normal variable with a mean value of and standard deviation of .

Using this test statistic, we can determine the -value (probability value) of the two-sided -
test as follows [14]:
( (| |) )
(3.5)
where represents the cumulative probability function for the standard normal random
variable, and given by:
( )

( )

(3.6)
where represents the error function.
Now, if we replace Eq. (3.4) and (3.6) in Eq. (3.5) we obtain:

| |

(|√ ( ) |)
√ √
( ( ( )))
(3.7)
Since the distribution of is known (standard normal) then the distribution of can be
obtained from the change of variable theorem [15]. First, we need to solve Eq. (3.7) for :

√ ( ) √ ( )

(3.8)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (5 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

where is the inverse of the error function.


We also need to determine the derivative of Eq. (3.8):
( ) ( ) ( ( ))
√ √
( )
(3.9)
And then, applying the change of variable theorem we can obtain the probability density
function of , as follows:

( ( )) (√ ( ) √ ( )) (√ ( ) √ ( ))

( )

( )

(3.10)
This expression can be simplified into:

( )
√ ( ) ( ) √ ( ) ( )
( )
( )

(3.11)
And the corresponding cumulative probability function becomes:

( )
∫ ( )

(√ ( ) ( )) (√ ( ) ( ))

(3.12)
Eq. (3.11) and (3.12) describe the probability distribution of -values in a two-sided -test.
Eq. (3.12) is consistent with the cumulative distribution function of -values previously reported
by Cumming (Eq. B4 in [16]), which were also consistent with the results by Hung et al. [17].
Notice that the probability density of is also function of the sample size and the standardized
difference between the true mean and the reference value (which is the mean value of ).
Under the assumption of the null hypothesis with zero test resolution, that is when ,
the probability density function simply becomes:

( )
(3.13)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (6 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

with cumulative probability:

( )
(3.14)
Eq. (3.13) and (3.14) represent a uniform distribution of -values. That is if the true mean of the
population is exactly equal to the reference value, then any -value will be obtained with the
same probability due to random sampling.

Figure 1. Evolution of the probability density function of -values for different values of √ | |.

On the other hand, if the term is not exactly zero, a non-uniform distribution of -values
is obtained. Figure 1 illustrates the behavior of the probability density function of -values for
the two-sided -test, for different values of √ | |. Similarly, Figure 2 shows the
corresponding behavior of the cumulative probability function of -values.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (7 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Figure 2. Evolution of the cumulative probability of -values for different values of √ | |.

As the value of the term √ | | increases, the skewness of the distribution of -values

increases, and the probability of yielding small values increases. At about √ | | , the

distribution of -values closely resembles an exponential distribution. And for √ | | ,


Dirac’s delta function is finally obtained (the deterministic number [11]).
Notice that the effect of the difference between the true mean and the reference value is
amplified by the square root of sample size. That is, given enough elements in the sample, any
hypothesis test with zero test resolution will be rejected, even for an infinitesimally small
difference (which will always exist due to numerical truncation of the reference values).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (8 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Since the true mean value is unknown, we will never be able to determine the exact value of
√ | |, and therefore, we will never know the true distribution of -values in a test.
However, we may consider the scenario where the true mean is at the limit of the resolution of
the test.
Considering the set of hypotheses proposed in Eq. (2.2) with test resolution ( ) greater than
zero, the following probability distribution for the limit-case scenario is obtained:

( )
√ ( ) √ ( )
( )
( )

(3.15)

(√ ( )) (√ ( ))

( )

(3.16)
Now, we can define the standardized test resolution (or Cohen’s test resolution) as follows:

(3.17)
and then the probability distribution of -values for the two-sided -test becomes:

√ ( ) √ ( )
( ) ( )
(3.18)

(√ ( )) (√ ( ))
( )

(3.19)

Eq. (3.18) can be used to determine the expected -value of the test ( ( )) and its standard
deviation ( ( ) ) [17,18] (a measure of uncertainty), whereas Eq. (3.19) can be used to determine
the median -value of the test ( ( ) ) [19], as follows:

√ ( ) √ ( )
( ) ∫ ( )

√ ( ) √ ( )
(∫ ∫ )

(3.20)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (9 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

√ ( ) √ ( )
( ) ( (∫ ∫ ) ( ))

(3.21)

( ( ( )) √ ) ( ( ( )) √ )

(3.22)
Here, Eq. (3.22) is an implicit function of the median. Unfortunately, Eq. (3.20) to (3.22) cannot
be easily expressed in terms of simple, known functions, except for . However, they can
§
be empirically approximated by exponential functions as follows :

( ( ) )

( )

(3.23)

( )

(3.24)


( )

{ √
(3.25)

Exact and approximate expressions are graphically compared for different values of √ in
Figure 3 to Figure 5.

Figure 3. Expected -values for different values of √ . Left plot: Original scale. Right plot: Logarithm
scale of expected -values. Solid blue line: Exact values (Eq. 3.20). Dashed red line: Approximated values
(Eq. 3.23)

§
represents the coefficient evaluated for the logarithm transformation of the variables.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (10 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Figure 4. Standard deviation (uncertainty) in -values for different values of √ . Left plot: Original
scale. Right plot: Logarithm scale of standard deviation of -values. Solid blue line: Exact values (Eq.
3.21). Dashed red line: Approximated values (Eq. 3.24)

Figure 5. Median of -values for different values of √ . Left plot: Original scale. Right plot: Logarithm
scale of the median of -values. Solid blue line: Exact values (Eq. 3.22). Dashed red line: Approximated
values (Eq. 3.25)

The median of the -value distribution is observed to decay faster than the mean with
increasing values of √ (see Figure 6), due to the skewness of the distribution obtained. The
standard deviation initially increases with respect to the uniform distribution, and then decays
with increasing √ . Also due to increased skewness, standard deviation values are greater
than those of the mean or median -values for √ , approximately.

Figure 6. Properties of the -values distribution for the two-sided -test, for different values of √ . Left
plot: Original scale. Right plot: Logarithm scale of -values. Solid purple line: Expected values (Eq. 3.20).
Dashed orange line: Standard deviation values (Eq. 3.21). Dotted green line: Median values (Eq. 3.22).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (11 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Notice that for √ the standard deviation in -values is relatively large (greater than ).
That is, the uncertainty in the determination of the -value is greater than the typical
significance level employed for the test. This situation leads to the “dance of -values”
described by Cumming [20,21], where the test conclusion fluctuates between acceptance and
rejection of the null hypothesis for different random samples obtained from the same
population under identical conditions. The dance of -values for low √ values can be easily
observed by computer simulation [22].

On the other hand, for √ the standard deviation in -values is less than , but then the
expected -value is even smaller. So, when the typical significance level is used, the most
likely conclusion will be the rejection of the null hypothesis. For a constant sample size, as the
standardized test resolution ( ) increases, the null hypothesis will be more easily rejected in
clear contradiction with the hypothesis test formulated in Eq. (2.2), where the null hypothesis
should be more easily accepted (as the test resolution increases). This contradiction reflects
the fact that a constant significance level is inconsistent with tests having non-zero resolution.
Thus, the adequate significance level to be employed in a test must somehow resemble the
behavior of the distribution of -values. For the two-sided -test, as an example, the
significance level of the test should decay almost exponentially with respect to , just like
the main properties of the -value distribution.

4. Comparison with Optimal Significance Levels


In a previous report [3], the concept of optimal significance level ( ) was introduced as the
significance level value that minimizes the total test error (including both type I and type II
errors). In this report, the optimal significance level for the two-sided -test was found to be:

( )

√ √

( )
(4.1)

For √ √ we simply obtain . That is, the uncertainty in the test is so large that
the probability of erroneously rejecting the null hypothesis becomes . Thus, performing a
test of hypotheses with zero resolution, is similar to performing a test with zero confidence. In
fact, the null hypothesis will be eventually rejected given enough elements in the sample, even
when the null hypothesis is true.

For larger values of √ ( √ ) Eq. (4.1) can be empirically approximated by [3]:

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (12 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

√ ( √ )
√ √
(4.2)
Notice that Eq. (4.2) represents an exponential decay with respect to as suggested in the
previous Section.
These expressions are graphically compared to the properties of the distribution of -values in
Figure 7 (for √ ).

Figure 7. Comparison between optimal significant levels ( ) Properties of the -values distribution for
the two-sided -test, for different values of √ . Left plot: Original scale. Right plot: Logarithm scale of
-values. Solid purple line: Expected values (Eq. 3.20). Dashed orange line: Standard deviation values (Eq.
3.21). Dotted green line: Median values (Eq. 3.22). Solid red line: Exact optimal significant levels (Eq. 4.1).
Dashed blue line: Approximated optimal significant levels (Eq. 4.2).
Interestingly, the optimal significant levels determined by Eq. (4.2) are approximately
proportional to the standard deviation in the distribution of -values, and the proportionality
constant is on average about . This is reasonable, considering that the uncertainty in the -
values at the limit of test resolution should be less than the maximum type I error tolerance
(significance level). It also suggests that times the standard deviation in the -value
distribution might be used as a rule-of-thumb criterion for determining an adequate
significance level for a test, as illustrated in Figure 8.

Figure 8. Optima significance level vs. standard deviation in -values. Solid blue line: Empirical
approximation (Eq. 4.2). Dashed red line: Rule of thumb ( ).

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (13 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Using the approximated standard deviation given in Eq. (3.24), the expression for the rule of
thumb becomes for the two-sided -test:

√ √

(4.3)
On the other hand, using the expected -value or the median -value as significance levels for
the test implies that the type I error is lower than for the optimal significance level, but at the
expense of an even greater increase in type II error.

5. Summary
Statistical tests of hypotheses should always be evaluated considering non-zero test
resolutions, and adequate significance levels depending on both the test resolution and sample
size employed by the test.
An adequate significance level is, for example, the optimal significance level obtained by
minimizing the total error of the test [23].
In addition, in this report a new rule of thumb is proposed for the adequate significance level,
as times the uncertainty (standard deviation) in -values at the limit of the corresponding
test resolution. The uncertainty in the -value can be obtained from the probability density
function of -values. Such function can be obtained using the Change of Variable Theorem [15]
considering the definition of the P-value and the probability distribution of the data assumed
by the test.
For example, for the case of the two-sided -test, a normal probability density function (mean
and standard deviation ) of the data is assumed, and the -value is determined as follows:

( (| |) )
(3.5)
where


(3.4)
where is the reference value of the null hypothesis, is the sample size and represents a
standard normal distribution, and

( )

( )

(3.6)
is the standard normal cumulative probability function, where represents the error function.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (14 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

Then, by using the change of variable theorem, the following expression is obtained for the
probability density function of -values at the limit of standardized test resolution ( ):

√ ( ) √ ( )
( ) ( )
(3.18)
And from this function, the uncertainty in -values is obtained as follows:

√ ( ) √ ( )
( ) ( (∫ ∫ ) ( ))

(3.21)
where

√ ( ) √ ( )
( ) ∫ ( )

√ ( ) √ ( )
(∫ ∫ )

(3.20)
Since no simple analytical expression is obtained for the standard deviation, the following
approximation can be used (see Figure 4):

( )

(3.24)
Thus, the rule of thumb for an adequate significance level ( ) becomes:

√ √

(4.3)
which closely resembles the optimal significance levels of this test (see Figure 8):

( )

√ √

( )
(4.1)
and approximated as:
√ ( √ )
√ √
(4.2)

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (15 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

It is possible to avoid defining arbitrary significance levels and/or test resolutions by using the
concept of relevance instead of significance [8]. However, when the resolution of the test is
deemed important, minimization of total error [23] or determination of the -value uncertainty
(Section 3) can be used to define an adequate significance level for the test.

Acknowledgment and Disclaimer

The author gratefully acknowledges Prof. Jaime Aguirre (Universidad Nacional de Colombia)
for reading the manuscript and suggesting improvements.

This report provides data, information and conclusions obtained by the author(s) from original scientific
research, based on the best knowledge available to the author(s). The main purpose of this publication is
to openly share scientific knowledge. Any mistake, omission, error or inaccuracy published, if any, is
completely unintentional.

This research did not receive any specific grant from funding agencies in the public, commercial, or non-
profit sectors.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC
4.0). Anyone is free to share (copy and redistribute the material in any medium or format) or adapt
(remix, transform, and build upon the material) this work under the following terms:
 Attribution: Appropriate credit must be given, providing a link to the license, and indicating if
changes are made. This can be done in any reasonable manner, but not in any way that suggests
endorsement by the licensor.
 Non-Commercial: This material may not be used for commercial purposes.

References

[1] Frost, J. (2020). Hypothesis Testing: An Intuitive Guide for Making Data Driven Decisions. Statistics
by Jim Publishing, State College (PA, USA). https://statisticsbyjim.com/hypothesis-
testing/hypothesis-testing-intuitive-guide/
[2] Hernandez, H. (2020). Formulation and Testing of Scientific Hypotheses in the presence of
Uncertainty. ForsChem Research Reports, 5, 2020-01. doi: 10.13140/RG.2.2.36317.97767.
[3] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 1. Tests of
Means. ForsChem Research Reports, 6, 2021-06, 1-45. doi: 10.13140/RG.2.2.18643.09762.
[4] Fisher, R. A. (1934). Statistical Methods for Research Workers. 5th Ed. Revised and Enlarged. Oliver
and Boyd, Edinburgh. §12. The Normal Distribution. pp. 43-46. ark:/13960/t2v46472h.
[5] Mackowiak, P. A., & Worden, G. (1994). Carl Reinhold August Wunderlich and the Evolution of
Clinical Thermometry. Clinical Infectious Diseases, 18 (3), 458-467. JSTOR: 4457716.
[6] Mackowiak, P. A., Wasserman, S. S., & Levine, M. M. (1992). A critical appraisal of 98.6 F, the upper
limit of the normal body temperature, and other legacies of Carl Reinhold August Wunderlich.
JAMA, 268 (12), 1578-1580. doi: 10.1001/jama.1992.03490120092034.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (16 / 17)
Hypothesis Test Resolution
and Distribution of P-values
Hugo Hernandez
ForsChem Research
[email protected]

[7] Ley, C., Heath, F., Hastie, T., Gao, Z., Protsiv, M., & Parsonnet, J. (2023). Defining usual oral
temperature ranges in outpatients using an unsupervised learning algorithm. JAMA Internal
Medicine, 183 (10), 1128-1135. doi: 10.1001/jamainternmed.2023.4291.
[8] Hernandez, H. (2025). Relevant vs. Significant Differences in Hypothesis Testing. ForsChem
Research Reports, 10, 2025-01, 1 - 17. doi: 10.13140/RG.2.2.30012.35200.
[9] Croarkin, C. & Tobias, P. (2012). NIST/SEMATECH e-Handbook of Statistical Methods. Chapter 2:
Measurement Process Characterization. Section 2.4: Gage R&R Studies. Part 2.4.5: Analysis of Bias.
2.4.5.1. Resolution. https://www.itl.nist.gov/div898/handbook/mpc/section4/mpc451.htm. Latest
access: February 6, 2025.
[10] Ross, S. M. (2004). Introduction to Probability and Statistics for Engineers and Scientists. 3rd Ed.
Elsevier Academic Press, San Diego CA. 8.3. Tests concerning the Mean of a Normal Population.
pp. 293-311. ISBN: 0125980574.
[11] Hernandez, H. (2022). Standard Deterministic, Standard Random, and Randomistic Variables.
ForsChem Research Reports, 7, 2022-06, 1 - 18. doi: 10.13140/RG.2.2.36316.87688.
[12] Hernandez, H. (2018). Expected Value, Variance and Covariance of Natural Powers of
Representative Standard Random Variables. ForsChem Research Reports, 3, 2018-08, 1-19. doi:
10.13140/RG.2.2.15187.07205.
[13] Hernandez, H. (2023). Representative Functions of the Standard Normal Distribution. ForsChem
Research Reports, 8, 2023-01, 1 - 29. doi: 10.13140/RG.2.2.29607.83362.
[14] Montgomery, D. C., & Runger, G. C. (2003). Applied Statistics and Probability for Engineers. 3rd
Edition. John Wiley & Sons, Inc., New York. 9.2. Tests on the Mean of a Normal Distribution,
Variance Known. pp. 289-300. ISBN: 9780471204541.
[15] Hernandez, H. (2017). Multivariate Probability Theory: Determination of Probability Density
Functions. ForsChem Research Reports, 2, 2017-13, 1-13. doi: 10.13140/RG.2.2.28214.60481.
[16] Cumming, G. (2008). Replication and p intervals: p values predict the future only vaguely, but
confidence intervals do much better. Perspectives on Psychological Science, 3 (4), 286-300. doi:
10.1111/j.1745-6924.2008.00079.x.
[17] Hung, H. J., O'Neill, R. T., Bauer, P., & Köhne, K. (1997). The behavior of the p-value when the
alternative hypothesis is true. Biometrics, 11-22. doi: 10.2307/2533093.
[18] Sackrowitz, H., & Samuel-Cahn, E. (1999). P values as random variables - Expected P values. The
American Statistician, 53 (4), 326-331. doi: 10.1080/00031305.1999.10474484.
[19] Bhattacharya, B., & Habtzghi, D. (2002). Median of the p-Value under the alternative hypothesis.
The American Statistician, 56 (3), 202-206. 10.1198/000313002146.
[20] Cumming, G. (2012). Understanding the New Statistics: Effect Sizes, Confidence Intervals, and
Meta-Analysis. Routledge (Taylor & Francis), New York. doi: 10.4324/9780203807002.
[21] Cumming, G. (2014). The New Statistics: Why and How. Psychological Science, 25 (1), 7-29. doi:
10.1177/0956797613504966.
[22] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 3. Large
Samples. ForsChem Research Reports, 6, 2021-08, 1-22. doi: 10.13140/RG.2.2.31487.33449.
[23] Hernandez, H. (2021). Optimal Significance Level and Sample Size in Hypothesis Testing. 7.
Implementation Remarks. ForsChem Research Reports, 6, 2021-12, 1-27. doi:
10.13140/RG.2.2.23632.64000.

14/02/2025 ForsChem Research Reports Vol. 10, 2025-03


10.13140/RG.2.2.25155.72487 www.forschem.org / t.me/forschem (17 / 17)

You might also like