PSNM - Ch. 3
PSNM - Ch. 3
Statistical Hypotheses:
For example, suppose we wanted to determine whether a coin was fair and
balanced. A null hypothesis might be that half the flips would result in Heads and
half, in Tails. The alternative hypothesis might be that the number of Heads and Tails
would be very different. Symbolically, these hypotheses would be expressed as
H0: P = 0.5
Ha: P ≠ 0.5
Suppose we flipped the coin 50 times, resulting in 40 Heads and 10 Tails. Given this
result, we would be inclined to reject the null hypothesis. We would conclude, based
on the evidence, that the coin was probably not fair and balanced.
State the logical alternative to this hypothesis. This is called the alternate
hypothesis and is designated H a.
(Note the alternate hypothesis can have other forms since the concept of not
equal can imply 𝜇 > 10 or 𝜇 < 10.)
Hypothesis Tests
State the hypotheses. This involves stating the null and alternative
hypotheses. The hypotheses are stated in such a way that they are mutually
exclusive. That is, if one is true, the other must be false.
Formulate an analysis plan. The analysis plan describes how to use sample
data to evaluate the null hypothesis. The evaluation often focuses around a
single test statistic.
Analyze sample data. Find the value of the test statistic (mean score,
proportion, t statistic, z-score, etc.) described in the analysis plan.
Interpret results. Apply the decision rule described in the analysis plan. If the
value of the test statistic is unlikely, based on the null hypothesis, reject the
null hypothesis.
Decision Errors
Type I error. A Type I error occurs when the researcher rejects a null
hypothesis when it is true. The probability of committing a Type I error is
called the level of singnificance. This probability is also called alpha, and is
often denoted by α.
Type II error. A Type II error occurs when the researcher fails to reject a null
hypothesis that is false . The probability of committing a Type II error is called
Beta, and is often denoted by β. The probability of not committing a Type II
error is called the Power of the test.
the t-test
Assumptions: z-test
the underlying distribution is normal or the Central Limit Theorem can
be assumed to hold
the sample has been randomly selected
the population standard deviation is known or the sample size is at least
25.
Assumptions: the t- test
the underlying distribution is normal or the Central Limit Theorem can
be assumed to hold
the sample has been randomly selected
One-Tailed and Two-Tailed Tests
A test of a statistical hypothesis, where the region of rejection is on only one side of
the sampling distribution, is called a one-tailed test. For example, suppose the null
hypothesis states that the mean is less than or equal to 10. The alternative
hypothesis would be that the mean is greater than 10. The region of rejection would
consist of a range of numbers located on the right side of sampling distribution; that
is, a set of numbers greater than 10.
Decision Rules
The analysis plan includes decision rules for rejecting the null hypothesis. In
practice, statisticians describe these decision rules in two ways - with reference to a
P-value or with reference to a region of acceptance.
2. Compute M = ΣX/N.
3. Compute = .
6. If Zc > Zt
we reject the null hypothesis. Otherwise we accept null hypothesis.
This section explains how to compute a significance test for the mean of a normally-
distributed variable for which the population standard deviation (σ) is known. In
practice, the standard deviation is rarely known. However, learning how to compute
a significance test when the standard deviation is known is an excellent introduction
to how to compute a significance test in the more realistic situation in which the
standard deviation has to be estimated.
1. The first step in hypothesis testing is to specify the null hypothesis and the
alternate hypothesis. In testing hypotheses about µ, the null hypothesis is a
hypothesized value of µ. Suppose the mean score of all 10-year old children
on an anxiety scale were 7. If a researcher were interested in whether 10-year
old children with alcoholic parents had a different mean score on the anxiety
scale, then the null and alternative hypotheses would be:
H0: µ alcoholic = 7
H1: µ alcoholic ≠ 7
2. The second step is to choose a significance level. Assume the 0.05 level is
chosen.
The fourth step is to compute p, the probability (or probability value) of obtaining a
difference between M and the hypothesized value of µ (7.0) as large or larger than
the difference obtained in the experiment. Applying the general formula to this
problem,
The sample size (N) and the population standard deviation (σ) are needed to
calculate σM. Assume that N = 16 and σ= 2.0. Then,
8.1 − 7
𝑍=
0.50
= 2.2
5. Zc > Zt
we reject the null hypothesis. It is concluded that the mean anxiety score of
10-year-old children with alcoholic parents is higher than the population mean.
The probability of not committing a Type II error is called the power of a hypothesis
test.
Effect Size
To compute the power of the test, one offers an alternative view about the "true"
value of the population parameter, assuming that the null hypothesis is false. The
effect size is the difference between the true value and the value specified in the
null hypothesis.
For example, suppose the null hypothesis states that a population mean is equal to
100. A researcher might ask: What is the probability of rejecting the null hypothesis if
the true population mean is equal to 90? In this example, the effect size would be 90
- 100, which equals -10.
Sample size (n). Other things being equal, the greater the sample size, the
greater the power of the test.Significnce level (α). The higher the significance
level, the higher the power of the test. If you increase the significance level,
you reduce the region of acceptance. As a result, you are more likely to reject
the null hypothesis. This means you are less likely to accept the null
hypothesis when it is false; i.e., less likely to make a Type II error. Hence, the
power of the test is increased.
The "true" value of the parameter being tested. The greater the difference
between the "true" value of a parameter and the value specified in the null
hypothesis, the greater the power of the test. That is, the greater the effect
size, the greater the power of the test.
Problem 1
Other things being equal, which of the following actions will reduce the power of a
hypothesis test?
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
Solution
The correct answer is (C). Increasing sample size makes the hypothesis test more
sensitive - more likely to reject the null hypothesis when it is, in fact, false. Increasing
the significance level reduces the region of acceptance, which makes the hypothesis
test more likely to reject the null hypothesis, thus increasing the power of the test.
Since, by definition, power is equal to one minus beta, the power of a test will get
smaller as beta gets bigger.
Problem 2
(A) I only
(B) II only
(C) III only
(D) All of the above
(E) None of the above
Solution
The correct answer is (A). Increasing sample size makes the hypothesis test more
sensitive - more likely to reject the null hypothesis when it is, in fact, false. Thus, it
increases the power of the test. The effect size is not affected by sample size. And
the probability of making a Type II error gets smaller, not bigger, as sample size
increases.
Problem 1
(A) I only
(B) II only
(C) III only
(D) IV only
(E) V only
Solution The correct answer is (E). The P-value is the probability of observing a
sample statistic as extreme as the test statistic. It can be greater than the
significance level, but it can also be smaller than the significance level. It is not
computed from the significance level, it is not the parameter in the null hypothesis,
and it is not a test statistic.
Example for Hypothesis Test for a Proportion
1
H1 : P (two – tailed test)
2
Difference p P
280 1
0.06
500 2
1 1
PQ 2 2 0.02236
s.E of p
n 500
Difference 0.06
Z 2.68 258
S .E 0.02236
i.e the proportion of births of boys and girls may not be regarded equal.
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects
the null hypothesis. Typically, this involves comparing the P-value to the significance
level, and rejecting the null hypothesis when the P-value is less than the significance
level.
Problem 1
One-Tailed Test
Suppose the previous example is stated a little bit differently. Suppose the
CEO claims that at least 80 percent of the company's 1,000,000 customers are
very satisfied. Again, 100 customers are surveyed using simple random
sampling. The result: 73 percent are very satisfied. Based on these results,
should we accept or reject the CEO's hypothesis? Assume a significance level
of 0.05. (5%)
Solution: The solution to this problem takes four steps: (1) state the hypotheses, (2)
formulate an analysis plan, (3) analyze sample data, and (4) interpret results. We
work through those steps below:
State the hypotheses. The first step is to state the null hypothesis and an
alternative hypothesis.
Note that these hypotheses constitute a one-tailed test. The null hypothesis
will be rejected only if the sample proportion is too small.
Formulate an analysis plan. For this analysis, the significance level is 0.05.
The test method, shown in the next section, is a one-sample z-test.
Analyze sample data. Using sample data, we calculate the standard
deviation (σ) and compute the z-score test statistic (z).
Since we have a one-tailed test, the P-value is the probability that the z-score
is less than -1.75. We use the Normal Distribution Calculator to find P(z < -
1.75) = 0.04. Thus, the P-value = 0.04.
Interpret results. Since the P-value (0.04) is less than the significance level
(0.05), we cannot accept the null hypothesis. ( NULL HYPOTHESES IS
REJECTED)
Note: If you use this approach on an exam, you may also want to mention why this
approach is appropriate. Specifically, the approach is appropriate because the
sampling method was simple random sampling, the sample included at least 10
successes and 10 failures, and the population size was at least 10 times the sample
size.
Example: One Sample Hypothesis Test
H0: µ= 500
Ha: µ 500
Zc > Zt
we reject the null hypothesis.
Thus, we conclude that the population mean is not 500; that is we reject the
null hypothesis and accept the alternate, concluding that the mean is not 500.
Thus, we conclude that the actual mean score for the population from which
this sample was drawn falls between 507 and 585.
Using the t-table with 15 degrees of freedom, we find the closest t-value to
1.53 is 1.753
4.A sample of 400 students has a mean height of 171.38 cms. Can it be
reasonably regarded as a random sample from a large population with mean
height 171.17 and standard deviation 3.3 cms ?
Ans:
H 0 : 171.17
H 1 : 171.17
Difference x
3.3
S .Eofx 0.165
n 400
Diff . 0.21
Z 1.27 1.96
S .E. 0.165
Therefore, the sample may be regarded as a random sample from a population with
mean 171.17
Ans:
H 0 : 65.8
H 1 : 65.8
Difference x
66 65.8 0.2
n 100
As S.D of the population is not Known and sampling fraction 0.08 is more
N 1200
than 0.05,we use the following formula for S.E.
Diff . 0.2
Z 1.67 1.96
S .E. 0.12
Suppose we take a random sample from the distribution of population 1 and another
population 2.If we then subtract the two sample means, we get x1 x 2 .This
difference will be positive if x1 is larger than x2 and negative if x2 is larger than x1 .
The mean of the sampling distribution of the difference between sample means is
symbolized as or or simply 1 2 .
x1 x2 x1 x2
If 1 2 then, 0 .
x1 x2
The standard deviation of the distribution of the difference between the sample
means is called the standard error of the difference between two means and is
calculated by using this formula:
𝜎1 2 𝜎2 2
𝜎𝑑 = √ +
𝑛1 𝑛2
𝜎2 2 =variance of population 2
d= 𝑥
̅̅̅1 − ̅̅̅
𝑥2
If two population standard deviations are not known, we can estimate the standard
error of the difference between two means by using the formula
2 2
1 2
d
n1 n2
2
where, 1 =estimated variance of population 1
2
2 = estimated variance of population 2
When both sample sizes are greater than 30 we have to do two-tailed test of a
hypothesis about the difference between two means.
Steps:
(Often, researchers choose significance levels equal to 0.01, 0.05, or 0.10; but any
value between 0 and 1 can be used)
Example 1 The mean height of 50 male students who showed above average
participation in college athletics was 68.2 inches with a standard deviation of 2.5
inches; while 50 male students who showed no interest in such participation had a
mean height of 67.5 inches with a standard deviation of 2.8 inches.
(i)Test the hypothesis that male students who participate in college athletics are
taller than other male students.
(ii)By how much should the sample size of each of the two groups be increased in
order that the observed difference of 0.7 inches in mean heights be significant at the
5% level of significance.
Solution. Let 𝑋1 and 𝑋2 denote the height (in inches) of athletic participants and non-
athletic participants respectively. In the usual notations, we are given:
𝑠1 = 2.5 , 𝑛1 = 50,𝑥
̅̅̅1 = 68.2, 𝑠2 = 2.8 , 𝑛2 = 50, ̅̅̅
𝑥2 = 67.5
2 2
s1 s
d 2
n1 n2
6.25 7.84
50 50
0.53
𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 > 𝜇2
68.2−67.5
=
0.53
= 1.32
For a right –tailed test, the critical value of z at 5% level of significance is 1.645.
(i)Since, the calculated value of z (=1.32) is less than the critical value (=1.645),it is
not significant at 5% level of significance. Hence, the null hypothesis is accepted and
we conclude that the college athletes are not taller than other male students.
(ii)The difference between the mean heights of two groups, each of size n will be
significant at 5% level of significance if 𝑧 ≥ 1.645
68.2−67.5
=> ≥ 1.645
6.25 7.84
n n
0.7
Or ≥ 1.645
14.09
n
1.645 × 3.754 2
𝑛≥( ) ≈ 78
0.7
Hence the sample size of each of the two groups should be increased by atleast 78-
50=28,in order that the difference between the mean heights of two groups is
significant.
Example 2 Two independent samples of observations were collected. For the first
sample of 60 elements, the mean was 86 and the standard deviation 6.The second
sample of 75 elements had a mean of 82 and a standard deviation of 9.
(a)Compute the estimation standard error of the difference between the two means.
(b)Using 𝛼 = 0.01, test whether the two samples can reasonably be considered to
have come from populations with the same mean.
Solution. 𝑠1 = 6 , 𝑛1 = 60,𝑥
̅̅̅1 = 86, 𝑠2 = 9 , 𝑛2 = 75, ̅̅̅
𝑥2 = 82
2 2
s1 s
(a) d 2
n1 n2
36 81
60 75
1.296
(b)𝐻0 : 𝜇1 = 𝜇2
𝐻1 : 𝜇1 ≠ 𝜇2
The limits of the acceptance region are 𝑧 = ±2.58 or 𝑥 𝑥2 = 0 ± 𝑧 d
̅̅̅1 − ̅̅̅
=
±2.58(1.296)
= ±3.344
(𝑥
̅̅̅1̅−𝑥
̅̅̅2̅)−(𝜇1 −𝜇2)𝐻0
Because the observed 𝑧 value =
d
(86−82)−0
=
1.296
Hence, we reject 𝐻0 .
It is reasonable to conclude that the two samples come from different populations.
Suppose two independent small samples of size 𝑛1 and 𝑛2 are drawn from two
normal populations and the means of the samples are 𝑥 ̅̅̅1 and ̅̅̅
𝑥2 respectively. If we
want to test the hypothesis that population means are equal we can apply t test in
the following way.
Steps:
The table below shows three sets of null and alternative hypotheses. Each makes a
statement about the difference d between the mean of one population μ 1 and the
mean of another population μ 2. (In the table, the symbol ≠ means " not equal to ".)
1 μ1 - μ 2 = d μ1 - μ 2 ≠ d 2
2 μ1 - μ 2 > d μ1 - μ 2 < d 1
3 μ1 - μ 2 < d μ1 - μ 2 > d 1
When the null hypothesis states that there is no difference between the two
population means (i.e., d = 0), the null and alternative hypothesis are often stated in
the following form.
H0: μ 1 = μ2
Ha: μ 1 ≠ μ2
2. Choose the appropriate distribution and the critical value.
Under the assumption that both the population have the same variance.
̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| ̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| 𝑛1 𝑛2
𝑡= 1 1
= ×√
𝑆√𝑛 +𝑛 𝑆 𝑛1+𝑛2
1 2
1
where, 𝑆 2 = ̅̅̅1 )2 + ∑(𝑥2 − ̅̅̅
{∑(𝑥1 − 𝑥 𝑥 2 )2 }
𝑛1+𝑛2−2
1
= {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 }
𝑛1+𝑛2−2
1 1
where, 𝑆1 2 = ̅̅̅1 )2 and 𝑆2 2 =
∑(𝑥1 − 𝑥 𝑥 2 )2
∑(𝑥2 − ̅̅̅
𝑛1 𝑛2
against 𝐻1 : 1 2 the value of t is computed from the given data and it is compared
with the table value of t on appropriate degrees of freedom and at a required level of
significance. The decision regarding acceptance or rejection of the hypothesis is
then taken.
4. Sketch the distribution and mark the sample value and critical values.
Example 1. Samples of two types of electric bulbs were tested for length of life and
following data were obtained.
Type I Type II
Number of Units 8 7
Mean (in hours) 1134 1024
S.D.(in hours) 35 40
Test at 5% level whether the difference in the sample means is significant.
Solution. Here, 𝑛1 = 8, 𝑥
̅̅̅1 = 1134, 𝑆1 = 35
𝑛2 = 7, ̅̅̅
𝑥2 = 1024, 𝑆2 = 40
1
𝑆2 = {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 }
𝑛1+𝑛2−2
1
= {8(35) 2 + 7(40) 2 }
8+7−2
= 1615.38
Therefore, 𝑆 = √1615.38 = 40.192
|𝑥
̅̅̅1 − ̅̅̅|
𝑥2 𝑛1 𝑛2
𝑡= ×√
𝑆 𝑛1 + 𝑛2
|1134−1024| 8×7
= ×√
40.192 8+7
= 5.288
D.f= 𝑛1 + 𝑛2 − 2 = 13
Therefore, 𝐻0 is rejected.
Hence,the two types of bulbs differ significantly so far as their mean lives are
concerned.
Example 2. Below are given the gain in weights (in lbs) of cows fed on two diets X
and Y.
Diet 25 32 30 32 24 14 32
X
Diet 24 34 22 30 42 31 40 30 32 35
Y
Test at 5% level whether the two diets differ as regard their effects on mean
increase in weight.
Solution. H0 : 1 2
H1 : 1 2
𝑥1 𝑥2 𝑥1 − 𝑥̅̅̅1 ̅̅̅1 )2
(𝑥1 − 𝑥 𝑥2 − ̅̅̅
𝑥2 𝑥 2 )2
(𝑥2 − ̅̅̅
25 24 -2 4 -8 64
32 34 5 25 2 4
30 22 3 9 -10 100
32 30 5 25 -2 4
24 42 -3 9 10 100
14 31 -13 169 -1 1
32 40 5 25 8 64
30 -2 4
32 0 0
35 3 9
189 320 0 266 0 350
∑ x1 189 ∑ x2 320
𝑥
̅̅̅1 = = = 27 , ̅̅̅
𝑥2 = = = 32
n1 7 n2 10
1
𝑆2 = {𝑛1 𝑆1 2 + 𝑛2 𝑆2 2 }
𝑛1+𝑛2−2
1
= {266 + 350}
7+10−2
= 41.067
̅̅̅1̅−𝑥
|𝑥 ̅̅̅2̅| 𝑛1 𝑛2
𝑡= ×√
𝑆 𝑛1+𝑛2
|37−32| 7×10
= ×√
6.41 7+10
= 1.58
D.f= 𝑛1 + 𝑛2 − 2 = 15
Therefore, 𝐻0 is accepted.
Sometimes, however, it makes sense to take samples that are not independent of
each other. Often, the use of such dependent (or paired) samples enables us
perform more precise analysis, because they will allow us to control for extraneous
factors. With dependent samples, we still follow the same basic procedure of
hypothesis testing. The only difference is the use of different formula for the
estimated standard error of the sample differences and that we will require that both
samples to be of the same size.
3)By considering the hypothesis and given level of significance compute the value of
t according to acceptance region.
𝑥̅ −𝜇𝐻0
4) Compute the observed t value by the formula
.
x
Example 1 Sherri Welch is a quality control engineer with the windshield wiper
manufacturing division of Emsco, Inc. Emsco is currently considering two new
synthetic rubbers for its wiper blades, and Sherri was charges with seeing whether
blades made with the two compounds wear equally well. She equipped 12 cars
belonging to other Emsco employees with one blade made of each of the two
compounds. On cars 1 to 6, the right blade was made of compound A and the left
blade was made of compound B; on cars 7 to 12, compound A was used for the left
blade. The cars were driven under normal operating conditions until the blades no
longer did a satisfactory job of clearing the windshield of rain. The data below give
the usable life (in days) of the blades. At 𝛼 = 0.05, do the two compounds wear
equally well?
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left 162 323 220 274 165 271 233 156 238 211 241 154
blade
Right 183 347 247 269 189 257 224 178 263 199 263 148
blade
Solution.
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left blade 162 323 220 274 165 271 233 156 238 211 241 154
Right 183 347 247 269 189 257 224 178 263 199 263 148
blade
Difference 21 24 27 -5 24 -14 9 -22 -25 12 -22 6
∑𝑥 35
𝑥̅ = = = 2.9167 𝑑𝑎𝑦𝑠
𝑛 12
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (4397 − 12(2.9167) 2 ) = 390.45, 𝑠 = √𝑠 2 = 19.76 𝑑𝑎𝑦𝑠
𝑛−1 11
s 19.76
x 5.7042 days
n 12
𝐻0 : 𝜇𝐴 = 𝜇𝐵
𝐻1 : 𝜇𝐴 ≠ 𝜇𝐵
𝛼 = 0.05
𝑥̅ −𝜇𝐻0 2.9167−0
Because the observed t value =
= = 0.511 < 2.201
5.7042
x
(or 𝑥̅ = 2.9167 < 12.55),we do not reject 𝐻0 .The two compounds are not significantly
different with respect to usable life.
Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)
Solution.
Dealer 1 2 3 4 5 6 7 8 9
Apson 250 319 285 260 305 295 289 309 275
price(in
dollars)
Okaydata 270 325 269 275 289 285 295 325 300
price(in
dollars)
Difference 20 6 -16 15 -16 -10 6 16 25
∑ 𝑥 46
𝑥̅ = = = 5.1111 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛 9
1 1
𝑠2 = (∑ 𝑥 2 − 𝑛𝑥̅ 2 ) = (2190 − 9(5.1111) 2 ) = 244.36, 𝑠 = √𝑠 2 = 15.63 𝑑𝑜𝑙𝑙𝑎𝑟𝑠
𝑛−1 8
s 15.63
x 5.21 dollars
n 9
𝐻0 : 𝜇0 = 𝜇𝐴
𝐻1 : 𝜇0 > 𝜇𝐴
𝛼 = 0.05
𝑥̅ −𝜇𝐻0 5.1111−0
Because the observed t value =
= = 0.981 < 1.860
5.21
x
(or 𝑥̅ = 5.1111 < 9.69),we do not reject 𝐻0 .On average, the Apson inkjet printer is
not significantly less expensive than the Okaydata inkjet printer.
Every hypothesis test requires the analyst to state a null hypothesis and an
alternative hypothesis. The table below shows three sets of hypothesis. Each makes
a statement about the difference d between two population proportions, P 1 and P2.
1 P1 - P2 = 0 P1 - P 2 ≠ 0 2
2 P1 - P2 > 0 P1 - P2 < 0 1
3 P1 - P2 < 0 P1 - P2 > 0 1
When the null hypothesis states that there is no difference between the two
population proportions (i.e., d = 0), the null and alternative hypothesis for a two-tailed
test are often stated in the following form.
H0: P1 = P2
H1: P 1 ≠ P2
The analysis plan describes how to use sample data to accept or reject the null
hypothesis. It should specify the following elements.
Using sample data, complete the following computations to find the test statistic and
its associated P-Value.
Pooled sample proportion. Since the null hypothesis states that P1=P2, we
use a pooled sample proportion (P) to compute the standard error of the
sampling distribution.
𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2
Standard error. Compute the standard error (SE) of the sampling distribution
difference between two proportions.
1 1
SE = √PQ ( + )
n1 n2
where, P is the pooled sample proportion,
Q=1-P
Test statistic. The test statistic is a z-score (z) defined by the following
equation.
z = (P1 - P2) / SE
Example 1 In a year there are 956 births in a town A, of which 52.5% were males,
while in towns A and B combined, this proportion in a total of 1,406 births was
0.496.Is there any significant difference in the proportion of male births in the two
towns? Take 5% level of significance.
𝑛1 𝑃1 + 𝑛2 𝑃2
𝑃=
𝑛1 + 𝑛2
=> 𝑃2 = 0.434
Let H0: P1 = P 2
H1: P1 ≠ P 2
𝑄 = 1 − 𝑃 = 0.504
1 1
= √PQ ( + )
𝑛1 𝑛2
1 1
= √0.496 × 0.504 ( + )
956 450
= 0.027
z = (P1 - P2) / SE
0.091
=
0.027
= 3.368
Since, z>1.96, the null hypothesis is rejected at 5% level of significance, i.e. the data
are inconsistent with the hypothesis
Let 𝑃1 be the proportion of blue –eyed people in the first population =0.30
Combined proportion
n1 P1 + n2 P2
P=
n1 + n2
= 0.279
Let H0: P1 = P 2
H1: P1 ≠ P 2
Q = 1 − P = 0.721
1 1
SE= √PQ ( + )
𝑛 𝑛 1 2
1 1
= √0.279 × 0.721 ( + )
1200 900
= 0.0197
z = (P1 - P2) / SE
0.05
=
0.0197
= 2.538
Since, z>1.96, the null hypothesis is rejected at , i.e. the data are inconsistent with
the hypothesis
Steps
1.If 𝜎 is known ,and if we are doing a one-tailed test,,we will compute the probability
value
𝜎̂
𝜎𝑥̅ =
√𝑛
𝑥̅ −𝜇
3.Calculate z-score by using the formula 𝑧 =
𝜎𝑥̅
Interpret Results
If the sample findings are unlikely, given the null hypothesis, the researcher rejects
the null hypothesis. Typically, this involves comparing the P-value to the significance
level, and rejecting the null hypothesis when the P-value is less than the significance
level.
Example 1. The coffee Institute has claimed that more than 40% of the adults
regularly have a cup of coffee with breakfast. A random sample of 450 individuals
revealed that 200 of them were regular coffee drinker at breakfast.What is the
probability value for a test of hypothesis seeking to show that the Coffee Institute’s
claim was correct?
200
Sol. n=450, 𝑃̅ = = 0.4444
450
𝐻0 : 𝑃 = 0.4
𝐻1 : 𝑃 > 0.4
0.4444−0.4
𝑃 (𝑧 ≥ ) = 𝑃(𝑧 ≥ 1.92) = 0.5 − 0.4726 = 0.0274.
√0.4(0.6)/450
Under the null hypothesis that the two attributes A and B are independent, we
shall find the expected frequency of (𝑖, 𝑗 )𝑡ℎ cell.
𝐴𝑖
The probability that any observation will fall in the 𝑖𝑡ℎ row =
𝑁
𝑡ℎ 𝐵𝑖
Similarly the probability that any observation will fall in the 𝑗 column =
𝑁
𝐴𝑖
And probability that any observation will fall in the 𝑖𝑡ℎ row and 𝑗 𝑡ℎ column = ×
𝑁
𝐵𝑖
𝑁
𝐴𝑖 𝐵𝑖 𝐴𝑖 𝐵𝑖
∴ Expected frequency of (𝑖, 𝑗 )𝑡ℎ cell = 𝑒𝑖𝑗 = × ×𝑁 =
𝑁 𝑁 𝑁
Thus we can find expected frequencies of all the cells. From observed
frequencies 𝑂𝑖𝑗 and expected frequency 𝑒𝑖𝑗 ; the value of 𝜒2 can be obtained by
following formula:
2
(𝑂𝑖𝑗 − 𝑒𝑖𝑗 )
𝜒2 = ∑ ∑
𝑒𝑖𝑗
𝑖 𝑗
The number of independent cells in a 𝑟 × 𝑐 contingency table is (𝑟 − 1)(𝑐 − 1).
Hence the degrees of freedom in a 𝑟 × 𝑐 table is (𝑟 − 1)(𝑐 − 1).
For testing the hypothesis of independence of two attributes A and B, the value
of 𝜒2 is found out and is compared with the table value of 𝜒2 on (𝑟 − 1)(𝑐 − 1)
d.f. and at a required level of significance. If calculated value 𝜒2 is less than the
table of 𝜒2 , the hypothesis that the attributes are independent may be accepted.
(150)(120)
Expected frequency of cell (1,1) = = 90
200
The expected frequencies of different cells are indicated in brackets in above
table.
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
(100 − 90) 2 (50 − 60) 2 (20 − 30) 2 (30 − 20) 2
= + + +
90 60 30 20
= 1.11 + 1.67 + 3.33 + 5 = 11.11
𝑑. 𝑓. = (𝑟 − 1)(𝑐 − 1) = (2 − 1)(2 − 1) = 1
On 1 d.f. and at 5% significance level, table value of 𝜒2 = 3.84
2 2
i.e. 𝜒𝑐𝑎𝑙 > 𝜒𝑡𝑎𝑏
Hence𝐻0 is rejected
Thus performance depends upon training.
Example: The result in the last exam of a sample of 100 students is given
below:
1st class 2nd class 3rd class Total
Boys 10 28 12 50
Girls 20 22 2 50
Total 30 50 20 100
Can it be said that the performance in the exam depends upon gender.
Example: A die is thrown for 300 times and the following distribution is
obtained. Can the die be regarded as unbiased.
Number on the die 1 2 3 4 5 6
Frequency 41 44 49 53 57 56
Solution:𝐻0 : Die is unbiased i.e. the probability of getting any number on die
1
is .
6
3 49 50 0.02
4 53 50 0.18
5 57 50 0.98
6 56 50 0.72
Solution:𝐻0 : the proportion of accidents is the same for all the days i.e.
1
probability of an accident on any day is .
7
Day Mon Tue Wed Thurs Fri Sat Sun Total
Observed 14 16 8 12 11 9 14 84
frequency
Expected 12 12 12 12 12 12 12 84
frequency
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
(14 − 12)2 (16 − 12)2 (8 − 12) 2 (12 − 12) 2 (11 − 12) 2 (9 − 12)2
= + + + + +
12 12 12 12 12 12
(14 − 12) 2
4 + 6 + 16 + 0 + 1 + 9 + 4 50
+ = =
12 12 12
= 4.17
𝑑. 𝑓. = 𝑛 − 1 = 7 − 1 = 6
2 2
Table value 𝜒 on 6 d.f. and at 5% significance level 𝜒𝑡𝑎𝑏 =12.59
2 2
𝜒𝑐𝑎𝑙 < 𝜒𝑡𝑎𝑏
Hence 𝐻0 may be accepted. Thus proportions of accidnts is same for all days.
Exercise: the units produced by a plant are classified into four grades. The past
performance of the plant shows that the respective proportions are 8:4:2:1. To
check the run of the plant 600 parts are examined and classified as follows. Is
there any evidence of a change in production standards?
Grades 1st 2nd 3rd 4th Total
Units 340 130 100 30 600