What Is Hypothesis Testing?: Tests
What Is Hypothesis Testing?: Tests
In hypothesis testing: an analyst tests a statistical sample, with the goal of providing evidence
on the plausibility of the null hypothesis. Statistical analysts test a hypothesis by measuring
and examining a random sample of the population being analysed. All analysts use a random
population sample to test two different hypotheses: the null hypothesis and the alternative
hypothesis. The null hypothesis is usually a hypothesis of equality between population
parameters; e.g., a null hypothesis may state that the population mean return is equal to zero.
The alternative hypothesis is effectively the opposite of a null hypothesis (e.g., the population
mean return is not equal to zero). Thus, they are mutually exclusive, and only one can be true.
However, one of the two hypotheses will always be true.
4 Steps of Hypothesis Testing
All hypotheses are tested using a four-step process:
The first step is for the analyst to state the two hypotheses so that only one can be right.
The next step is to formulate an analysis plan, which outlines how the data will be
evaluated.
The third step is to carry out the plan and physically analyses the sample data.
The fourth and final step is to analyse the results and either reject the null hypothesis, or
state that the null hypothesis is plausible, given the data.
Example of Hypothesis Testing
For example, a person wants to test that a penny has exactly a 50% chance of landing on
heads; the null hypothesis would be that 50% is correct, and the alternative hypothesis would
be that 50% is not correct.
Mathematically, the null hypothesis would be represented as Ho: P = 0.5. The alternative
hypothesis would be denoted as "Ha" and be identical to the null hypothesis, except with the
equal sign struck-through, meaning that it does not equal 50%.A random sample of 100-coin
flips is taken, and the null hypothesis is then tested. If it is found that the 100-coin flips were
distributed as 40 heads and 60 tails, the analyst would assume that a penny does not have a
50% chance of landing on heads and would reject the null hypothesis and accept the
alternative hypothesis.
If, on the other hand, there were 48 heads and 52 tails, then it is plausible that the coin could
be fair and still produce such a result. In cases such as this where the null hypothesis is
"accepted," the analyst states that the difference between the expected results (50 heads and
50 tails) and the observed results (48 heads and 52 tails) is "explainable by chance alone.
Related Terms
Alpha risk: is the risk in a statistical test of rejecting a null hypothesis when it is
actually true.
A test when a stock’s price approaches an established support or resistance level set
by the market.
A two-tailed test: is the statistical testing of whether a distribution is two-sided and if
a sample is greater than or less than a range of value
A goodness-of-fit test: helps you see if your sample data is accurate or somehow
skewed. Discover how the popular chi-square goodness-of-fit test works.
P-value: is the level of marginal significance within a statistical hypothesis test,
representing the probability of the occurrence of a given event.
Z-test: is a statistical test used to determine whether two population means are
different when the variances are known and the sample size is large.
2. What is the general procedure of hypothesis testing?
To test a hypothesis means: to tell (on the basis of the data the researcher has collected)
whether or not the hypothesis seems to be valid. In hypothesis testing the main question is
whether to accept the null hypothesis or not to accept the null hypothesis? Procedure for
hypothesis testing refers to all those steps that we undertake for making a choice between the
two actions i.e., rejection and acceptance of a null hypothesis. The various steps involved in
hypothesis testing are stated below:
(i) Making a formal statement: The step consists in making a formal statement of the null
hypothesis (HO) and also of the alternative hypothesis (Ha). This means that hypotheses
should be clearly stated, considering the nature of the research problem. For instance, Mr
Mohan of the Civil Engineering Department wants to test the load bearing capacity of an old
bridge which must be more than 10 tons, in that case he can state his hypotheses as under:
Null hypothesis HO: m = 10 tons
Alternative Hypothesis Ha: m > 10 tons
Take another example. The average score in an aptitude test administered at the national level
is 80. To evaluate a state’s education system, the average score of 100 of the state’s students
selected on random basis was 75. The state wants to know if there is a significant difference
between the local scores and the national scores. In such a situation the hypotheses may be
stated as under:
Null hypothesis HO: m = 80
Alternative Hypothesis Ha: m ¹ 80
The formulation of hypotheses is an important step which must be accomplished with due
care in accordance with the object and nature of the problem under consideration. It also
indicates whether we should use a one-tailed test or a two-tailed test. If Ha is of the type
greater than (or of the type lesser than), we use a one-tailed test, but when Ha is of the type
“whether greater or smaller” then we use a two-tailed test.
(ii) Selecting a significance level: The hypotheses are tested on a pre-determined level of
significance and as such the same should be specified. Generally, in practice, either 5% level
or 1% level is adopted for the purpose. The factors that affect the level of significance are: (a)
the magnitude of the difference between sample means; (b) the size of the samples; (c) the
variability of measurements within samples; and (d) whether the hypothesis is directional or
non-directional (A directional hypothesis is one which predicts the direction of the difference
between, say, means). In brief, the level of significance must be adequate in the context of the
purpose and nature of enquiry.
(iii) Deciding the distribution to use: After deciding the level of significance, the next step
in hypothesis testing is to determine the appropriate sampling distribution. The choice
generally remains between normal distribution and the t-distribution. The rules for selecting
the correct distribution are similar to those which we have stated earlier in the context of
estimation.
(iv). selecting a random sample and computing an appropriate value: Another step is to
select a random sample(s) and compute an appropriate value from the sample data concerning
the test statistic utilizing the relevant distribution. In other words, draw a sample to furnish
empirical data.
(v) Calculation of the probability: One has then to calculate the probability that the sample
result would diverge as widely as it has from expectations, if the null hypothesis were in fact
true.
(VI).Comparing the probability: Yet another step consists in comparing the probability thus
calculated with the specified value for α, the significance level. If the calculated probability is
equal to or smaller than α value in case of one-tailed test (and α /2 in case of two-tailed test),
then reject the null hypothesis (i.e., accept the alternative hypothesis), but if the calculated
probability is greater, then accept the null hypothesis. In case we reject HO, we run a risk of
(at most the level of significance) committing an error of Type I, but if we accept HO, then
we run some risk (the size of which cannot be specified as long as the H0 happens to be
vague rather than specific) of committing an error of Type II.
FLOW DIAGRAM FOR HYPOTHESIS TESTING
3. What is meant by type I and type II errors: Type I and Type II errors are subjected to
the result of the null hypothesis. In case of type I or type-1 error, the null hypothesis is
rejected though it is true whereas type II or type-2 error, the null hypothesis is not rejected
even when the alternative hypothesis is true. Type I error is a false positive conclusion,
while a Type II error is a false negative conclusion. Making a statistical decision always
involves uncertainties, so the risks of making these errors are unavoidable in hypothesis
testing.
The probability of making a Type I error is the significance level, or alpha (α), while the
probability of making a Type II error is beta (β). These risks can be minimized through
careful planning in your study design. A type I error appears when the null hypothesis (H0) of
an experiment is true, but still, it is rejected. It is stating something which is not present or a
false hit. A type I error is often called a false positive (an event that shows that a given
condition is present when it is absent). In words of community tales, a person may see the
bear when there is none (raising a false alarm) where the null hypothesis (H 0) contains the
statement.Type I error (false positive): the test result says you have coronavirus, but you
actually don’t.
Type II error (false negative): the test result says you don’t have coronavirus, but you
actually do.
Type 1 error, in statistical hypothesis testing, is the error caused by rejecting a null
hypothesis when it is true.
Type 1 error is caused when the hypothesis that should have been accepted is rejected.
Type I error is denoted by α (alpha) known as an error, also called the level of
significance of the test.
This type of error is a false negative error where the null hypothesis is rejected based on
some error during the testing.
The null hypothesis is set to state that there is no relationship between two variables and
the cause-effect relationship between two variables, if present, is caused by chance.
Type 1 error occurs when the null hypothesis is rejected even when there is no
relationship between the variables.
As a result of this error, the researcher might end up believing that the hypothesis works
even when it doesn’t.
Probability of type 1 error
The probability of Type I error is usually determined in advance and is understood as
the level of significance of testing the hypothesis.
If Type I error is fixed at 5 percent, it means that there are about 5 chances in 100 that
the null hypothesis, H0, will be rejected when it is true.
The rate or probability of type 1 error is symbolized by α and is also termed the level
as significance in a test.
It is possible to reduce type 1 error at a fixed size of the sample; however, while doing
so, the probability of type II error increases.
There is a trade-off between the two errors were decreasing the probability of one
error increases the probability of another. It is not possible to reduce both the errors
simultaneously.
Thus, depending on the type and nature of the test, the researchers need to decide the
appropriate level of type 1 error after evaluating the consequences of the errors.
Type 1 error causes
Type 1 error is caused when something other than the variable affects the other
variable, which results in an outcome that supports the rejection of the null
hypothesis.
Under such conditions, the outcome appears to have happened due to some causes
than chance, when in fact it is caused by chance.
Before a hypothesis is tested, a probability is set as a level of significance which
means that the hypothesis is being tested while taking a chance where the null
hypothesis is rejected even when it is true.
Thus, type 1 error might be due to chance/ level of significance set before the test
without considering the test duration and sample size
Type II error:
A Type II error means not rejecting the null hypothesis when it’s actually false.This is not
quite the same as “accepting” the null hypothesis, because hypothesis testing can only tell
you whether to reject the null hypothesis. Instead, a Type II error means failing to conclude
there was an effect when there actually was. In reality, your study may not have had
enough statistical power to detect an effect of a certain size.
Power is the extent to which a test can correctly detect a real effect when there is one. A
power level of 80% or higher is usually considered acceptable. The risk of a Type II error is
inversely related to the statistical power of a study. The higher the statistical power the lower
the probability of making a Type II error.
Example: Statistical power and Type II error when preparing your clinical study, you
complete a power analysis and determine that with your sample size; you have an 80%
chance of detecting an effect size of 20% or greater. An effect size of 20% means that the
drug intervention reduces symptoms by 20% more than the control treatment. However, a
Type II may occur if an effect that’s smaller than this size. A smaller effect size is unlikely to
be detected in your study due to inadequate statistical power.
Statistical power is determined by:
Size of the effect: Larger effects are more easily detected.
Measurement error: Systematic and random errors in recorded data reduce power.
Sample size: larger samples reduce sampling error and increase power.
Significance level: Increasing the significance level increases power.
To (indirectly) reduce the risk of a Type II error, you can increase the sample size or the
significance level.
Type II error rate
The alternative hypothesis distribution curve below shows the probabilities of obtaining all
possible results if the study were repeated with new samples and the alternative hypothesis
were true in the population. The Type II error rate is beta (β), represented by the shaded area
on the left side. The remaining area under the curve represents statistical power, which is 1 –
β Increasing the statistical power of your test directly decreases the risk of making a Type II
error
How to Find the Level of Significance? To measure the level of statistical significance of the
result, the investigator first needs to calculate the p-value. It defines the probability of identifying an effect
which provides that the null hypothesis is true. When the p-value is less than the level of significance (α),
the null hypothesis is rejected. If the p-value so observed is not less than the significance level α, then
theoretically null hypothesis is accepted. But practically, we often increase the size of the sample size and
check if we reach the significance level. The general interpretation of the p-value based upon the level of
significance of 10%:
If p > 0.1, then there will be no assumption for the null hypothesis
If p > 0.05 and p ≤ 0.1, it means that there will be a low assumption for the null hypothesis.
If p > 0.01 and p ≤ 0.05, then there must be a strong assumption about the null hypothesis.
If p ≤ 0.01, then a very strong assumption about the null hypothesis is indicated.
Confidence level: The probability that if a poll/test/survey were repeated over and over
again, the results obtained would be the same. A confidence level = 1 - alpha.While the level
of significance (1- α) indicates the probability or probability of error that the researcher sets
in making the decision to reject or support the null hypothesis, or it can be interpreted as the
error rate or level of error tolerated by the researcher, which is caused by the possibility of
errors in making sample (sampling error). The level of significance is expressed in percent
and denoted by α.
For example, α is assigned a significance level of = 5% or = 10%. That is, the researcher’s
decision to reject or support the null hypothesis has a probability of error of 5% or 10%. With
the confidence level used is 95%, it means that the level of certainty of the sample statistics to
correctly estimate the population parameters is 95%, or the confidence level to reject or
support the null hypothesis correctly is 95%. IN some computer-based statistical programs,
the level of significance is always included and is written as Sig. (= significance), or in other
computer programs written P-value. The Sig p value or P-value, as described above, is a
calculated error probability value or indicates the true probability level of error. This error
rate is used as the basis for making decisions in hypothesis testing.
While the level of confidence basically shows the degree of trustworthiness to which the
sample statistics can correctly estimate population parameters and / or the extent to which
decisions are made regarding the results of the null hypothesis test. In statistics, the
confidence level ranges from 0 to 100% and is denoted by α. Conventionally, researchers in
the social sciences often set α levels of confidence in the range of 95%–99% The confidence
level or also known as the confidence level or risk level is based on the idea that comes from
the Central Limit Theorem.
The level of confidence is denoted by 100 (1 – α) % as the main idea that comes from the
theorem is that if a population is repeatedly drawn the sample, then the average attribute
value obtained from these samples is parallel to the actual population value. Furthermore, the
values obtained from the samples that have been drawn are normally distributed in the form
of true/real values. The form of these values will be the sample values that are higher or
lower than the population value. In a normal distribution, about 95% of sample values are
within two standard deviations of the true population value. In other words, if a 95%
confidence level is chosen, then 95 out of 100 samples will have the true population value
within the range of precision specified previously.
The relationship between α and the confidence level is that the stated confidence level is a
percentage equivalent to the decimal value of 1- α, and vice versa. In addition to the level of
confidence, in research, the confidence coefficient is often used. The confidence coefficient is
the level of confidence expressed as a proportion, not as a percentage. For example, if you
have a 99% confidence level, the confidence coefficient will be 0.99. In general, the higher
the coefficient, the more confident we are that our test results are accurate. For example, the
coefficient 0.99 is more accurate than the coefficient 0.89. It’s rare to see a coefficient of 1
(meaning we are positive beyond a doubt that our results are completely 100% accurate). The
zero coefficients mean that we are not sure that our results are accurate at all. Lists the
confidence coefficient and confidence level.
Confidence coefficient (1 – α) Confidence level 100(1 – α) %
0.90 90%
0.95 95%
5. How can we test the hypothesis a particular coin is balanced? (b) What is the
meaning of type I and type II error in this case?
In hypothesis testing a decision between two alternatives, one of which is called the null
hypothesis and the other the alternative hypothesis, must be made. As an example, suppose
you are asked to decide whether a coin is fair or biased in favor of heads. In this situation the
statement that the coin is fair is the null hypothesis while the statement that the coin is biased
in favor of heads is the alternative hypothesis. To make the decision an experiment is
performed. For example, the experiment might consist of tossing the coin 10 times, and on
the basis of the 10-coin outcomes, you would make a decision either to accept the null
hypothesis or reject the null hypothesis (and therefore accept the alternative hypothesis). So,
in hypothesis testing acceptance or rejection of the null hypothesis can be based on decision
rule. As an example of a decision rule, you might decide to reject the null hypothesis and
accept the alternative hypothesis if 8 more heads occur in 10 tosses of the coin.