Statistical Significance and The PHC Curve Instant DOCX Download
Statistical Significance and The PHC Curve Instant DOCX Download
Visit the link below to download the full version of this book:
https://medipdf.com/product/statistical-significance-and-the-phc-curve/
Statistical Significance
and the PHC Curve
Hideki Toyoda
Department of Psychology
Waseda University
Shinjuku City, Tokyo, Japan
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
In March 2019, two impactful papers on statistics were published. One paper [1],
“Moving to a World Beyond ‘p < 0.05’”, was featured in the scholarly journal The
American Statistician, overseen by the American Statistical Association. The title of
the first chapter is “Don’t Say ‘Statistically Significant’”, and it uses the imperative
form to clearly forbid the use of significance testing. Continuing to use p-values is
equivalent to driving a car that the leading manufacturer has recalled and banned
from use.
Another paper [2], “Retire statistical significance”, was published in the presti-
gious scientific journal Nature. This commentary was endorsed by over 800 scientists,
advocating for the statement, “We agree, and call for the entire concept of statistical
significance to be abandoned.” Continuing to teach p-values is like driving a car
which the leading consumer advocacy group has recommended against using.
The p-value is a probability, yet it is difficult to interpret intuitively. Numerical
values used as analysis results are more user-friendly when they are intuitive and
easily understood by statistical novices. Fortunately, such indicators are already in
practical use and widely established in societal infrastructure. For example, they
are implemented in the “posterior probability of being spam” filter used in our
daily emails. Unlike p-values, this probability directly indicates the chance that an
incoming email is spam. It is expressed in terms everyone can understand and feel,
making it a tangible probability. How about adopting this methodology? Consider a
study comparing the duration of hospital stays between treatments A and B. Previ-
ously, research conclusions were typically stated as: “There was a statistically signif-
icant difference at the 5% level in the average duration of hospital stays.” This
phrasing is quite abstract. Instead, we suggest adapting to the specific context of the
research objectives, setting flexible research questions, and responding accordingly.
For example,
• The average duration of hospital stays for Group A is at least half a day shorter
than for Group B.
• 71% of patients in Group A have shorter hospital stays than the average for
Group B.
v
vi Preface
• Group A has an average hospital stay that is, on average, no more than 94% of
that of Group B.
Wouldn’t it be better to express conclusions in terms that are understandable not
only to physicians but also to patients? And like an email filter, this expression would
indicate the correct probability of the statement. Both analysts and readers can feel
the effectiveness of this approach. This book explains the importance of using the
Probability that Hypothesis is Correct (PHC), an intuitive measure that anyone
can understand, as an alternative to the p-value.
All the source code and data in each chapter are freely available on this
book’s GitHub repository: https://github.com/TOYODA-Hideki/Statistical-Signif
icance-and-the-PHC-Curve.
The computing environment under which this book was written was Windows 11
(64bit), R 4.2.2, Stan 2.31.0, CmdStanR 0.5.3.
The script that executes the contents of Chaps. 1–4 of the textbook is “post p01.R”.
The script that executes the contents of Chaps. 5–7 of the textbook is “post p02.R”.
The working directory in R language is “SSandPHC”.
vii
viii Contents
Q & A . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
Part I
The Brink of Statistical Science
Chapter 1
“Statistical Significance” Is Merely
a Necessary Condition
This intriguing story begins with a peculiar paper. Dr. Daryl Bem, an emeritus pro-
fessor at Cornell University and a renowned social psychologist, is well-known for
his EBE theory (a theory on how feelings of sexual orientation diversity transform
into erotic love), which is featured in many textbooks.
It was in 2011. In JPSP, the most prestigious journal in social psychology, Bem
suddenly published a paper titled “Feeling the Future” [5] (see Fig. 1.1). The paper,
with its somewhat strange and romantic title, contained numerous experiments and
their analyses, but the first experiment was particularly noteworthy. The experiment,
which involved nude photographs, went roughly as follows:
On a computer screen, two curtains are displayed side by side. Participants are
instructed: “Behind one of the curtains is a nude photo of the opposite sex, and
behind the other, there is none. Please click on the curtain you think has the
photo.” After the participant clicks on the curtain they feel has the nude photo,
a random number generator sets the photo behind only one curtain. The curtain
is then opened to check if the photo is there. This counts as one trial. Out of
1560 trials, 829 correctly guessed the location of the nude photo.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2024 3
H. Toyoda, Statistical Significance and the PHC Curve,
https://doi.org/10.1007/978-981-97-7748-8_1
4 1 “Statistical Significance” Is Merely a Necessary Condition
Bem analyzed this data using a statistical method called significance testing,
and applied for publication in JPSP based on the statistical significance of “Feeling
the Future”. In academic journals, papers are usually subjected to a review process,
known as peer review, to determine their scholarly value. Many academic papers that
use experiments and surveys have been published based on the results of significance
tests in statistics being deemed significant.
In significance testing, a probability known as the p-value is typically used, and
a result is considered “statistically significant” when this value falls below 5%. In
many fields of research, regardless of the analysis purpose or subject matter, the
criterion of “whether the p-value is less than 0.05” has been used as a standard for
value judgment.
The p-value of Bem’s precognition experiment was 1.3%, satisfying the con-
dition p < 0.05. The reviewers at JPSP were in a dilemma. Publishing a paper
claiming the existence of precognition could tarnish the reputation of a prestigious
academic journal. However, rejecting Bem’s paper solely because it seemed implau-
sible, despite meeting the p < 0.05 criterion, would not be fair or impartial. After
much deliberation, the reviewers accepted Bem’s argument, and “Feeling the Future”
was eventually published in JPSP.
As expected, Bem’s paper caused a significant stir and faced intense criticism
from various quarters. Of course, psychologists immediately began replication exper-
iments, but no one could reproduce the results. Although statistically significant, the
paper was psychologically meaningless.
1.2 The Purpose of Testing 5
The reason “Feeling the Future” was published in a prestigious journal of social
psychology was that it met the condition of p < 0.05 required by the significance
test. But what exactly is a significance test? There are many types of significance
tests. Here, we will take the binomial test used by Bem as an example and concretely
explain the general idea behind significance testing.
In statistics, an attempt that can result in different outcomes probabilistically is
called a trial. Suppose we conduct ten trials to predict the location of a nude photo.
The possible outcomes observed are called events. Hits and misses are events. For
example, let’s say the location was correctly predicted seven times. The number of
trials is called the number of trials n, and the number of hits is called the number
of successes x. The ratio of the number of successes to the number of trials is called
the sample proportion, and is calculated as
Number of Successes 7
Sample Proportion = = = 0.7. (1.1)
Number of Trials 10
The sample proportion is a ratio calculated from the data.
Some people might consider a 70% success rate in predicting the location of
a nude photo as evidence of precognitive ability. But does this sample proportion
indicate the existence of precognitive ability? No, it does not necessarily indicate
that.
Because even flipping a fair, unbiased coin, it’s not unusual to get heads 7 out
of 10 times. In trials like a coin toss, where there are only two possible outcomes
(heads or tails), the probability of getting heads is represented by a probability dis-
tribution called the binomial distribution (Fig. 1.2). A probability distribution
shows how likely different values are to be observed. A probability distribution is
sometimes simply referred to as a distribution. The binomial distribution is one type
of probability distribution.
According to Fig. 1.2, the probability of getting heads 7 out of 10 times in a coin
toss is 11.7%. This can be observed more than once in every 10 trials if repeated.
This time, the location of the nude photo was correctly predicted 7 times, but like
a coin toss, it might not be so accurate next time. Therefore, just considering the
sample proportion leads to endless debates like “It’s high enough, so there must be
precognitive ability!” or “No, it’s just a coincidence, like a coin toss.”
The most academically important thing is the reproducibility of the phenomenon.
What matters is that the average ratio in an infinite number of replication experiments
under the same conditions is sufficiently high. If that is demonstrated, it becomes
clear to anyone that precognitive ability exists.
The average ratio mentioned here is different from the sample proportion calcu-
lated from the current data, so to distinguish it, let’s call it the population proportion
π (pi). Academically, it’s the value of the population proportion, not the sample pro-
portion, that is of interest.