Normal Distribuition - Hypothesis Testing
Normal Distribuition - Hypothesis Testing
n
E x
x
x.px 0
k
N k
n
x nx
x.
x N
0 n
N k
n nx Limit starts from x=1 since
x
N for x=0 the value becomes 0
x1 k! n
x!(k - N k
n k(k 1)! nx
x)!
x
x(x -1)!(k - x)! N
x1 n
k 1 N k
n x 1 n x
k
N
x1 n
N
Variance:
N
2 E X 2 E X
2
1
n
E X 2 x2.px
x
n 0
xx 1
x x
p x
0n n
x(x 1) p(x)
x x
x.p(x)
0 0
k
N k
n
x nx
x(x 1)
x N
0 n
n n!N -
k! Nk
x(x 1)
n x n !
N!
x x!(k - x)!
2n n(n 1)(n 2)!N -
k(k 1)(k 2)! Nk
x( x 1) n !
x x(x 1)(x 2)!(k - x)! n x N(N -1)(N -
2)!2n
k 2 N k n(n 1)
k(k 1) 1
x x2 nx N (N 1) N 2
2 n2
k 2 N k
n x2 nx
k(k 1)n(n 1)
N (N 1)
x N 2
2 n2
k(k 1)n(n 1) nk
N (N 1)
nk (N k)(N n)
N
nk k N n
1
N N N 1
N(N1)
N n n k k
1
N 1 n N
2
Example 5.13: A group of 10 individuals is used for a biological case study. The group contains 3
people with blood type O, 4 with blood type A, and 3 with blood type B. What is the probability that a
random sample of 5 will contain 1 person with blood type O, 2 people with blood type A, and 2
people with blood type B?
Solution:
3
(At first at a glance we can identify the problem as Poisson distribution since average no. of
outcomes per unit time is given)
Given, x=6; λt=4
t x 4 6
px; t p(6;4) e x! t e 4
0.1042 6! (Ans.)
15
(Ans.)
p(x 15) 1 p(x 15) 1 p(x;10) 1 0.9513
0.0487
x0
Example
Asteroids with a diameter of at least 1 km collide with the earth at a rate of approximately 2
per million years.
What is the probability that in a randomly selected million year period there is exactly one
collision?
Solution:
Given, x=1; λ=2
x 2 1
Px; e e 2 (Ans.)
0.2707
x! 1!
Example 5.22: (p:188)
In a certain industrial facility accidents occur infrequently. It is known that the probability of an
accident on any given day is 0.005 and accidents are independent of each other.
(a) What is the probability that in any given period of 400 days there will be an accident on
each day?
(b) What is the probability that there are at most 3 days with an accident?
Solution:
(We know that when p is known it falls into binomial problem but in case of large n and small p
binomial becomes poisson distribution)
Given, n=400, p=0.005; np=2
4
3
e 2 2 x
(b) PX 3 x!
0.857 x
0
Normal Curve:
Bell Shaped
Symmetrical
Total Area Under the curve =1
For avoiding complicacy, Normal distribution is transformed to Standard Normal Distribution by the
following transformation,
x
Z , Z=corresponding value of X
5
(b) Between z = -1.97 and z = 0.86
(a) Area to the right of (z=1.84) = 1- Area to the left of (z=1.84) = 1-0.9671 = 0.0329
(b) Area between (z = -1.97 and z = 0.86) = (Area left of z = 0.86) – (Area left of z = -1.97)
= 0.8051- 0.0244
= 0.7807 (Ans)
Example 6.3: Given, a standard normal distribution, find the value of k such that,
6
(a) k value leaving an area 0.3015 to the right must leave an area of 0.6985 to the left. From
tableit follows that k = 0.52
A3
(b) P (k < Z < -0.18) = 0.4197
P (k<-0.18) – P (k<Z) = 0.4197
P (k<Z) = P (k<-0.18) – 0.4197
P (k<Z) = 0.4286 – 0.4197 = 0.0089
Solution:
7
362 300
Z 50 1.24
(a) the z value for which there is an area 0.45 to the left is -0.13.
So, x = σz + µ = (6)(-0.13)+40 = 39.22
(b) The z value for which there is an area of (1-0.14) = 0.86 to the left is 1.08
So, x= (6)(1.08) + 40 = 46.48
8
2.3 3 (Ans.)
z 0.5 1.4 , using table A.3, P(X 2.3) P(Z 1.4) 0.0808
Question:
Let, n = 10 and p = ½, so that Y is binomial(10, ½). What is the probability that exactly five
people approve of the job the President is doing?
Solution.
9
10C 0.55 0.55 0.2460
5
Or, P(Y=5)=P(Y≤5)−P(Y≤4)=0.6230−0.3770=0.2460
That is, there is a 24.6% chance that exactly five of the ten people selected approve of the job
the President is doing.
= 0.251
(Reference: PennState Eberly College of Science)
Example 6.15 (p:215):
The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known
to have contracted this disease, what is the probability that less than 30 survive?
Solution:
Here, µ = np = 100*0.4 = 40,
σ=√𝑛𝑝𝑞 = 4.899
10
40
z 29.5
4.899 2.14
P(X 30) P(Z 2.14) (Ans
0.0162 )
Example 6.18
Suppose that telephone calls arriving at a particular switchboard follow a Poisson process with an
average 5 calls coming per minute. What is the probability that up to a minute will elapse until 2 calls
have come in to the switchboard?
Solution:
11
1 1
x/ 1
P(X 1)
2 xe dx 25 5x (Ans)
dx 0.96
0 xe 0
x
f (x; ) e , x0
0, elsewhere
12
Ref: https://stats.stackexchange.com/questions/2092/relationship-between-poisson-and-exponential-
distribution
A typical profile of failure rate over time is shown in Figure 4S-2. Because of its shape, it is referred
to as the bathtub curve.
#Explain the Bathtub curve of Failur Rate for Exponential Distribution:
Burn-In: Usually, a number of products or parts fail shortly after they are put into service, because
they are defective to begin with. Examples include electronics components such as capacitors. The
rate of failure decreases rapidly as the defective items are weeded out.
Steady-State: During the second phase, random failures occur. In many cases, this phase covers a
relatively long period of time (several years).
Wear-Out: In the third phase, failures occur because the items are worn out, and the failure rate
increases
The time to failure of a non-repairable item during the steady state phase can often be modelled
by the Exponential distribution with an average equal to the MTTF (see Figure 4S-4). Similar
results hold for repairable items.
The probability that the item put into service at time 0 will fail before some specified time, T, is equal
to the area under the curve between 0 and T.
Reliability = P(no failure before T) = e-λt
13
The probability that failure will occur before time T is 1 minus reliability:
P(failure before T ) = 1 − e-λt
Example 6.17:
Suppose, that a system contains a certain type of component whose time, in years, to failure is given
by
T. The random variable T is modeled nicely by the exponential distribution with mean time to failure
β=5. If 5 of these components are installed in different systems, what is the probability that at least 2
are still functioning at the end of 8 years.
Solution:
According to exponential distribution, Probability that a given component is still functioning after
2 years is,
P(T 8) e 8 / 5 0.2
So, if the component makes it to t0 hours, the probability of lasting an additional t hours is the same as the
probability of lasting t hours.
14
So there is no “punishment” through wear that may have ensued for lasting the first t0 hours.
Solution:
Given, Mean Time Between Failures, MTBF: θ=5.75 years
1
Hence, mean no. of events per unit time,
1 0.174(years)-
5.75 -0.174t 1
Function of exponential pdf: f(t) = 0.174e ; t>0
Probability of failure during the first 3 months or (3/12=0.25 years) (not instantaneous; so use
cdf equation)
(b) Probability of failure prior to (below) 5.75 years (not instantaneous; so use cdf equation)
15
F(t 5.75) 1 e0.1745.75 0.632
1
.2
0
.8
0
.6
0
.4
0 0 5 10 15 20 25 30
.2
R(t) f(t) cdf
Every point on R(t) gives the probability of operating for at least t years
Every point on f(t) gives the instantaneous probability of failure at t years
Example S-3 (Stevenson):
By means of extensive testing and data collection, a manufacturer has determined that a particular
model of its vacuum cleaners has an expected life that is Exponential with a mean of four years
and insignificant burn-in phase.
Find the probability that one of these vacuum cleaners will have a life that ends:
a. After the initial four years of service.
b. Before four years of service are completed.
c. Not before six years of service.
Solution:
Given, λ=1/4=0.25
(a) Life will end after 4 years. So it shows reliability for 4 years:
R(t) = e-λt= e-0.25(4)= 0.3679 (Ans)
(b) Probability of failure before 4 years= 1- p(reliability of four years) = 1-0.3679=0.6321
(Ans)
(c) Reliability of 6 years = e-0.25(6)= 0.2231 (Ans)
16
Example 6.21 (Walpole) p.224:
A certain washing machine is characterized by the following density function:
1 y / 4 , y
f (x) 4 e
0
0, elsewhere
This is an exponential with μ=4 years. The machine is considered a bargain if it is unlikely to
require a major repair before the 6th year.
(a) What is the probability P(Y>6)?
(b) What is the probability that major repair occurs in the first year?
Solution:
(c) Probability of repairing/failure after 6 years=Reliability of 6 years=e-0.25*6 =0.2231 (Ans)
(d) Probability that major repair occurs in the first year=1- e-0.25*1= 0.2211 (Ans)
[Probability & Statistics; “Rukmangadachari E.”]
17
“STATISTICAL DECISION THEORY/HYPOTHESIS TEST”
Statistical Decision:
Decision about population is made on the basis of sample information. Such decisions are
called statistical decisions.
Example: we may wish to decide on the basis of sample data whether new serum is really effective in
curing a disease, whether one educational rocedure is better than other.
Statistical Hypothesis:
In attempting to reach decision it is useful to make assumptions about the populations
involved. Such assumptions, which may or may not be true are called statistical hypotheses.
Types of Hypothesis:
There are 2 types of hypothesis for any statistical test.
1. Null Hypothesis
2. Alternative Hypothesis
Hypothesis can be formulated about mean, variances, differences of mean, or pdf forms
Null Hypothesis:
It is generally denoted by H0
H0 is any hypothesis which is to be tested for possible rejection under the assumption that is
true
Alternative Hypothesis:
It is denoted by H1 or HA
The statement that must be true if the null hypothesis is false/rejected.
Opposite to Null Hypothesis.
Ref: Hypothesis Testing, Mohammad Adil Khan
Types of errors and their possibilities:
Since either H0 or H1 got to be correct in reality. We can make 2 types of errors in our decision.
Decision
Reality
H0 True
Type І error (α) Correct
18
Test Statistics:
• It provides a basis for testing a null hypothesis.
• A value computed from the sample data that is used in making the decision about whether to
reject or accept the null hypothesis.
• Test statistic is denoted by Z and is given by
Z
Value of RandomVariable - Mean of RandomVariable X
Standard Deviationof RandomVariable
Example:
Suppose you are a quality inspector of your company and make the decisions of accepting and
rejecting a lot. You hypothesized a lot with null hypothesis µ=25. Now, accidentally you select a
sample from the lot whose value falls in the critical region and due to this reason you reject the lot.
But the lot truly is acceptable! This type of error when “rejecting a true lot!” is type І error or α
error
Example:
Suppose you are a quality inspector of your company and make the decisions of accepting and
rejecting a lot. You hypothesized a lot with null hypothesis µ=25. Now, accidentally you select a
sample from the lot whose value falls in the acceptable region and due to this reason you accept the
lot. But the lot truly is not acceptable! This type of error when “accepting a false lot!” is type ІІ
error or β error.
19
Significance Level (What does α=5% mean?):
• denoted by
• the probability that the test statistic will fall in the critical region when the null hypothesis
is actually true.
• common choices are 0.05, 0.01, and 0.10.
• =5% means there are about 5 chances in 100 of rejecting a true null hypothesis.
• In other words we say that we are 95% confident in making the correct decision.
Ref: Hypothesis Testing, Mohammad Adil Khan
Two-tailed, Right-tailed, Left-tailed Tests:
What is meant by Two tailed, Right tailed & Left tailed test?
• The tails in a distribution are the extreme regions bounded by critical values.
• A hypothesis for which the entire rejection is located in only one of the two tails, either in the
left tail or right tail of the probability distribution of the test statistic, is called one tail test or
one sided test.
• If the rejection region is divided equally between the two tails of the probability distribution of
the test statistics, this referred to as a two tailed test or two sided test.
• It is important to note that one tailed test and two tailed test differ only in location of the
critical
region, not in the size.
Ref: Hypothesis Testing, Mohammad Adil Khan
20
Controlling Type I and Type II Errors:
• To decrease both α and β, increase the sample size n.
Solution:
(a) According to CLT (Central Limit Theorem). The sample standard deviation is
x 2.5 0.5
n 25 [ given data is real life data, it is transformed to normal data by CLT]
21
To allow 0.05 probability for type І error requires that α/2=0.025 of the area in each tail. The
two correspondent values of z indicates the critical region. Where, z = standard normal distribution
variable.
x1
P(z z1 ) z1 )
P( x 0.025
[find the value of z from table A3 for which the left sided area is 0.025; z1=-1.96]
x1 3.5
0.5 1.96; x1 2.52
atm
P(z z2 ) P( x2 z2 )
x
0.025 [find the value of z from table A3 for which the right sided area is 0.025; z2=1.96]
x2 3.5
0.5 1.96; x2 4.48
atm
So, the critical region is defined by x1=2.52 atm and x2=4.48 atm. Any pressure value between this falls in the
(b)
µ=4.8
Z2=1.96
We have to to determine probability for a type ІІ error (probability of accepting a false lot)
which is the left side area (blue shaded) under the curve of x2=4.48 of alternative hypothesis since this value
matches with true lot
22
x2 4.48 4.80
Convert x2 to standard normal variable, z1 0.64
0.50 (Ans)
From table A3 left side area under the curve from -0.64 or P(type ІІ error)= 0.2611
“GOODNESS OF FIT TEST”
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed
value of a given phenomena is significantly different from the expected value.
In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed
sample distribution with the expected probability distribution.
Chi-Square goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution.
In Chi-Square goodness of fit test, sample data is divided into intervals. Then the numbers of points
that fall into the interval are compared, with the expected numbers of points in each interval.
(Oi E i)2
The Chi Square Goodness of Fit Test: 2
c Ei
Oi=Observed Value; Ei=Expected Value
The chi square can be used for discrete distributions like the binomial distribution and the Poisson
distribution, while the Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests can only be
used for continuous distributions.
Two potential disadvantages of chi square are:
1. The chi square test can only be used for data put into classes (bins). If you have non-binned
data you’ll need to make a frequency table or histogram before performing the test.
2. Another disadvantage of the chi-square test is that it requires a sufficient sample size in order
for
the chi-square approximation to be valid. [Ref: Statistics How To]
Solution:
The data are discrete. So, choose chi-square test.
Day Observed Expected χ2
(Oi-Ei)2
i Oi Ei
1 304 296.7 53.29 0.18
2 176 148.3 767.29 5.17
3 139 148.3 86.49 0.58
4 141 148.3 53.29 0.36
5 130 148.3 334.89 2.25
Total 890 889.9 8.54
23
Following Hypothesis testing procedure:
1. Hypothesis states that absenteeism is twice in Monday. So it states that absenteeism occurs in
the following ratios 2:1:1:1:1, with a sum of 6. There are 5 working days as from the given
data & i=1,2,3,4,5 indicates the day of the week i,e; if i=2, Monday, i=3 Tuesday & so on
The probability distribution or absenteeism:
2 , i
6 1
f (a)
1 , i
6
2,3,4,5
So the hypothesis:
H0: A has the distribution f(a)
H1: A does not have the distribution f(a)
2. Let, α=0.05 (means 5% possibility that we will make error)
3. No parameters need to be estimated
4. Det, E=np
5. Degrees of freedom k r 1 , r is the no. of parameters of the hypothesized distribution
that is estimated from the sample data.
For this problem, 5 1 4 , because there are 5 days and no parameters are estimated
from the sample
The acceptance region for 2 (4) and 0.05is 2 9.49
6.
Since, 8.54<9.49, H0 can’t be rejected. It appears that the absenteeism rate is twice as high as Mondays.
'
Cell, i Oi Ei=npi Ei' Oi (Oi-Ei)2 χ2
1 5 5.88 5.88 5 0.77 0.13
2 21 21.10 21.10 21 0.01 0.0004
3 35 31.08 31.08 35 15.37 0.49
4 15 17.74
5 3 3.89 21.94 19 8.64 0.39
6 1 0.31
N=80 1.01
24
1. H0 & H1 are given 2.
Let, α=0.05
3. From the given
data,
5 7.65
21 7.95
35 8.25
4. χ15
2
KeepMinimum frequency 5 by adding rows if less than 5
=1.01,8.55
5. 3 48.85
2 1 1 , After combining rows for the limitation of frequency value we have only 4
1
9.15
cells (k) with 2 (r) estimated parameters
8.224
The acceptance region for α=0.05 and
8 χ =1.01 is χ ≤3.84
2 2
6. 0
Since, 1.01<3.84, accept H0
And, σ=0.293 L/s from the data by, Since the lower value is μ-3σ & the upper value is μ+3σ
approximately; μ-3σ=7.65; 8.224-3σ=7.65; σ=0.19 OR, μ+3σ=9.15; σ=0.30
[Go to α=0.05 column and ν=1 row they intersect at 3.84 which is the required chi-square
value]
25