0% found this document useful (0 votes)
25 views25 pages

Normal Distribuition - Hypothesis Testing

Uploaded by

shahoriar2005119
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views25 pages

Normal Distribuition - Hypothesis Testing

Uploaded by

shahoriar2005119
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

Determine Mean & Variance of Hypergeometric Distribution:

Mean: We know, Expected value,

n
E x  
x
x.px 0
 k 
 N  k 
n
x nx
  x.  
x  N 
0 n
 N  k 
n nx Limit starts from x=1 since
  x  
 N  for x=0 the value becomes 0
x1 k! n

x!(k -  N  k 
n k(k 1)! nx
x)!   
  x
x(x -1)!(k - x)!  N 
x1 n
 k  1 N  k 
n  x  1 n  x 
 
k  
 N 
x1 n

Let, y=x-1; x=1, y=0


x=n, y=n-1

and  N   N! N (N 1)! N  N 1


 N  k    N  k   N  1 k  1 

 n 1  y   n  n!N - n 
! n(n 1)!(N - n)! n  n 1 
nx  n 1  y 

 k  1N  1 k  1 


n1  y n 1  y
k    
y N  N 1
0  
n 

nk
k  1 n n 1

N
Variance:
N

 2  E X 2  E X

2

1
n
E  X 2   x2.px
x
 n 0
  xx 1
x x
p x 
0n n
  x(x 1) p(x)  
x x
x.p(x)
0 0
 k 
 N  k 
n
x nx
  x(x 1)   
x  N 
0  n
n n!N -
k! Nk 
  x(x 1)  
n  x  n !
N!
x x!(k - x)!
2n n(n 1)(n  2)!N -
k(k 1)(k  2)! Nk 
  x( x  1)   n !
x x(x 1)(x  2)!(k - x)! n  x N(N -1)(N -
2)!2n
 k  2  N  k  n(n 1)
  k(k 1) 1        
x x2 nx N (N 1)  N  2  
2 n2
 k  2 N  k  
n x2 nx
 
k(k 1)n(n 1)
 N (N 1)  
x  N  2 
2  n2 
k(k 1)n(n 1) nk
 
N (N 1)

Var( X )  E X 2   EX


 N
2
k(k 1)n(n 1) nk n2k 2
 N (N 1)  
N2
N
nk  (k  1)(n  1) nk 
 N 1  1 
N  N 

nk  N(k 1)(n 1)  N(N 1)  nk(N 1) 


 
N  N(N1) 
nk  N 2  Nn  Nk  
 
nk  N(N1)
N 

nk  N(N  n)  k(N  n) 


 
N  N(N1) 

nk (N  k)(N  n)

N
nk  k  N  n 
 1   
N N  N 1 
N(N1)

 N  n  n  k    k 
1 
 N 1  n  N

2
Example 5.13: A group of 10 individuals is used for a biological case study. The group contains 3
people with blood type O, 4 with blood type A, and 3 with blood type B. What is the probability that a
random sample of 5 will contain 1 person with blood type O, 2 people with blood type A, and 2
people with blood type B?
Solution:

The person may be selected in following ways = 3C14C23C2

Total ways = 10C5


3 C 4 C 3 C 3
1 2 2
Probability of selecting the required sample of 5 people = = (Ans)
10 14
C5
“POISSON DISTRIBUTION”
The number of outcomes occurring during a given time interval or in a specified region, are called
Poisson experiments. The given time interval may be of any length, such as a minute, a day, a week, a
month, or even a year.
It can generate observations for the random variable X as,

‘Telephone calls per hour received by an office’


‘No. of days school is closed due to snow during
summer’
Properties:
1. The number of outcomes occurring in one time interval or specified region is independent of
the number that occurs in any other disjoint time interval or region in space. In this way we
say that poisson process has no memory.
2. The probability that a single outcome will occur during a very short time interval or in a small
region is proportional to the length of the time interval or the size of the region and does not
depend on the number of outcomes occurring outside this time interval or region.
3. The probability that more than one outcome will occur in such a short time interval or fall in
such a small region is negligible.
Formula:
e    x e x
When nthex probability
x nx e mand
of success is very small r of
 mno.
 trials n is very large the binomial distribution
P(x)= C p q can be reduced to P(x)= t r!  x!t 

x!
Here, m = np (the number of successes)
e = 2.71828
λ = average no. of outcomes per unit time, distance, area or volume
Example 5.20: (p:186)
During a laboratory experiment the average no. of radioactive particles passing through a counter in
1 millisecond is 4. What is the probability that 6 particles enter the counter in a given millisecond?
Solution:

3
(At first at a glance we can identify the problem as Poisson distribution since average no. of
outcomes per unit time is given)
Given, x=6; λt=4

 t x 4 6
px; t  p(6;4)  e x! t  e  4 
0.1042 6! (Ans.)

Example 5.21: (p:186)


Ten is the average no. of oil tankers arriving each day at a certain port city. The facilities at the port
can handle at most 15 tankers per day. What is the probability that on a given day tankers have to be
turned away?
Solution:
(We have to determine the probability such as the port had to deal tankers more than 15 in a day)
Given, λt=10;

15
(Ans.)
p(x  15)  1 p(x  15)  1  p(x;10)  1 0.9513 
0.0487
x0

Example
Asteroids with a diameter of at least 1 km collide with the earth at a rate of approximately 2
per million years.
What is the probability that in a randomly selected million year period there is exactly one
collision?
Solution:
Given, x=1; λ=2
 x 2 1
Px;    e    e  2  (Ans.)
0.2707
x! 1!
Example 5.22: (p:188)
In a certain industrial facility accidents occur infrequently. It is known that the probability of an
accident on any given day is 0.005 and accidents are independent of each other.
(a) What is the probability that in any given period of 400 days there will be an accident on
each day?
(b) What is the probability that there are at most 3 days with an accident?
Solution:
(We know that when p is known it falls into binomial problem but in case of large n and small p
binomial becomes poisson distribution)
Given, n=400, p=0.005; np=2

(a) PX  1  e2  21  0.271

4
3
 e 2  2 x
(b) PX  3  x! 
0.857 x
0

“AREAS UNDER NORMAL CURVE”


The curve of any continuous probability distribution or density function is constructed so that the area
under the curve bounded by the two ordinates x=x1 and x=x2 equals the probability that the random variable X
assumes.

Normal Curve:
 Bell Shaped
 Symmetrical
 Total Area Under the curve =1

Standard Normal Curve:


 Mean = 0
 Standard Deviation = 1

For avoiding complicacy, Normal distribution is transformed to Standard Normal Distribution by the
following transformation,
x
Z  , Z=corresponding value of X

Example 6.2 (p.-202):


Given, a standard normal distribution, find the area under the curve that lies
(a) To the right of z = 1.84 and

5
(b) Between z = -1.97 and z = 0.86

(a) Area to the right of (z=1.84) = 1- Area to the left of (z=1.84) = 1-0.9671 = 0.0329
(b) Area between (z = -1.97 and z = 0.86) = (Area left of z = 0.86) – (Area left of z = -1.97)
= 0.8051- 0.0244
= 0.7807 (Ans)

Example 6.3: Given, a standard normal distribution, find the value of k such that,

(a) P (Z>k) = 0.3015, and


(b) P (k<Z<-0.18) = 0.4197
Solution:

6
(a) k value leaving an area 0.3015 to the right must leave an area of 0.6985 to the left. From
tableit follows that k = 0.52
A3
(b) P (k < Z < -0.18) = 0.4197
P (k<-0.18) – P (k<Z) = 0.4197
P (k<Z) = P (k<-0.18) – 0.4197
P (k<Z) = 0.4286 – 0.4197 = 0.0089

From Table A3, we have, k = -2.37

Example 6.4 (p.-203):


Given, a random variable X having a normal distribution with, µ = 50 and σ =10
Find the probability that X assumes a value between 45 and 62.

Solution:

The z values corresponding


x 45  to
50 x1 = 45 and x2 = 62 are
62  50
z1    0.5 and, z2   1.2
 10
10 (Ans)
P(45<X<62) = P(-0.5<Z<1.2) = P(Z<1.2) – P(Z<-0.5) = 0.8849-0.3085 = 0.5764
Example 6.5 (p.-203):
Given that X has a normal distribution with µ = 300 and σ = 50,
Find the: Probability that X assumes a value greater than 362
Solution:

7
362  300
Z 50  1.24

P(X>362) = P(Z>1.24) = 1-P(Z<1.24) = 1-0.8925 = 0.1075 (Ans)


Using the Normal Curve in Reverse:
x
Z   or, x  z 
Example 6.6 (p.-205): 
Given a normal distribution with µ = 40 and σ = 6, find the value of x that has
(a) 45% area to the left
(b) 14% of the area to the right
Solution:

(a) the z value for which there is an area 0.45 to the left is -0.13.
So, x = σz + µ = (6)(-0.13)+40 = 39.22

(b) The z value for which there is an area of (1-0.14) = 0.86 to the left is 1.08
So, x= (6)(1.08) + 40 = 46.48

Application of Normal Distribution:


Example 6.7
A certain type of storage battery lasts, on average, 3.0 years with a standard deviation of 0.5 year.
Assuming that the battery lives are normally distributed, find the probability that a given battery will
last less than 2.3 years.
Solution:

8
2.3  3 (Ans.)
z 0.5  1.4 , using table A.3, P(X  2.3)  P(Z  1.4)  0.0808

Same Type math: 6.8, 6.9, 6.10, 6.11, 6.12, 6.13

“NORMAL APPROXIMATION TO THE BINOMIAL CURVE”


Normal Distribution Continuous Distribution
Binomial Distribution Discrete Distribution
Poisson distribution Discrete Distribution
Now we want to use continuous probability distribution to approximate probabilities over a
discrete sample space.
For, Binomial Distribution,
x
P(X  x)  b(k; n, p)
 k0 ,

Mean, µ = np, σ2= np(1-p)= npq


To illustrate normal approximation to the binomial distribution,
 Draw the histogram of b(x;n,p)
 Superimpose normal curve

Question:
Let, n = 10 and p = ½, so that Y is binomial(10, ½). What is the probability that exactly five
people approve of the job the President is doing?

Solution.

9
10C 0.55  0.55  0.2460
5

Or, P(Y=5)=P(Y≤5)−P(Y≤4)=0.6230−0.3770=0.2460
That is, there is a 24.6% chance that exactly five of the ten people selected approve of the job
the President is doing.

(Reference: PennState Eberly College of Science)

Normal Approximation with Continuity Correction

(Reference: PennState Eberly College of Science)

= 0.251
(Reference: PennState Eberly College of Science)
Example 6.15 (p:215):
The probability that a patient recovers from a rare blood disease is 0.4. If 100 people are known
to have contracted this disease, what is the probability that less than 30 survive?
Solution:
Here, µ = np = 100*0.4 = 40,
σ=√𝑛𝑝𝑞 = 4.899

10
 40
z  29.5
4.899  2.14
P(X  30)  P(Z  2.14)  (Ans
0.0162 )

“GAMMA & EXPONENTIAL DISTRIBUTION”


When the probability density function yields variety of skewness then Gamma distribution is
applied to solve the problem.

α, β represents parameter of the population


Γ(α)=(α-1) Γ(α-1);

Example 6.18
Suppose that telephone calls arriving at a particular switchboard follow a Poisson process with an
average 5 calls coming per minute. What is the probability that up to a minute will elapse until 2 calls
have come in to the switchboard?
Solution:

Given, No. of events, α=2, β=1ϒ=1/5, Γ(2)=(2-1) Γ(2-1)=1

11
1 1
x/ 1
P(X  1) 
 2 xe dx  25 5x (Ans)
dx  0.96
0  xe 0

Exponential distribution is special form of gamma distribution. Putting α=1 in gamma


distribution we get exponential distribution.

  x
f (x;  ) e , x0
 0, elsewhere

Here, λ=mean number of events per unit time


Relationship to the Poisson Process of Exponential Distribution:
P(X>x) = e-λx
How did this come?
Try if you want to know the basic.

12
Ref: https://stats.stackexchange.com/questions/2092/relationship-between-poisson-and-exponential-
distribution

A typical profile of failure rate over time is shown in Figure 4S-2. Because of its shape, it is referred
to as the bathtub curve.
#Explain the Bathtub curve of Failur Rate for Exponential Distribution:
Burn-In: Usually, a number of products or parts fail shortly after they are put into service, because
they are defective to begin with. Examples include electronics components such as capacitors. The
rate of failure decreases rapidly as the defective items are weeded out.
Steady-State: During the second phase, random failures occur. In many cases, this phase covers a
relatively long period of time (several years).
Wear-Out: In the third phase, failures occur because the items are worn out, and the failure rate
increases

The time to failure of a non-repairable item during the steady state phase can often be modelled
by the Exponential distribution with an average equal to the MTTF (see Figure 4S-4). Similar
results hold for repairable items.
The probability that the item put into service at time 0 will fail before some specified time, T, is equal
to the area under the curve between 0 and T.
Reliability = P(no failure before T) = e-λt

13
The probability that failure will occur before time T is 1 minus reliability:
P(failure before T ) = 1 − e-λt

Example 6.17:
Suppose, that a system contains a certain type of component whose time, in years, to failure is given
by
T. The random variable T is modeled nicely by the exponential distribution with mean time to failure
β=5. If 5 of these components are installed in different systems, what is the probability that at least 2
are still functioning at the end of 8 years.
Solution:
According to exponential distribution, Probability that a given component is still functioning after
2 years is,

P(T  8)  e 8 / 5  0.2

Probability that at least 2 components functioning


5 after 8 years,
P(X  2)  b(x;5,0.2)  0.2627 (Ans)
x2

Explain Memoryless Property of Exponential Distribution:


In probability and statistics, memorylessness is a property of certain probability distributions. It
usually refers to the cases when the distribution of a "waiting time" until a certain event does not
depend on how much time has elapsed already. Only two kinds of distributions are memoryless:
exponential distributions of non-negative real numbers and the geometric distributions of non-
negative integers.
For example, in case of an electric component where distribution of life time has an exponential
distribution, the probability that the component lasts, say t hours, that is, P(X ≥ t), is the same as the
conditional probability
P(X ≥ t) = P( X  t0  t | X  t) [

probability of to+t given t]

So, if the component makes it to t0 hours, the probability of lasting an additional t hours is the same as the
probability of lasting t hours.

14
So there is no “punishment” through wear that may have ensued for lasting the first t0 hours.

Fig.: Memorylessness property of exponential distribution


In the graph f(x) curve shows the exponential distribution of lifetime of bulb. The total area under the
curve is 1. The used titled curve shows the lifetime curve of the bulb after an initial survive of x
days/hours/months/years. The area under this curve is 1. If this curve is superimposed on the
previous curve then it would be as same as the previous curve. This represents the memorylessness
property of exponential distribution.
Example 16.3 (Leland Blank) p.291:
Engineers have collected data from 100 compressors on natural gas pipelines and found that the
average life is 5.75 years and that failures follow the exponential distribution.
(a)Compute the probability of failure during the first year after installation. During the first three
months.
(b) Compute the probability of failure prior to the average life.
(c) Compute the probability of operating at least 10 years.
(d) Plot the reliability curve and compare it with the pdf curve.

Solution:
Given, Mean Time Between Failures, MTBF: θ=5.75 years
1
Hence, mean no. of events per unit time,  
1  0.174(years)-

 5.75 -0.174t 1
Function of exponential pdf: f(t) = 0.174e ; t>0

(Chance of no failure or reliability): e-λt


(Cumulative density function) cdf: 1- e-λt [compute probability of failure prior to t]
(a) Probability of failure during the first year after installation (not instantaneous; so use cdf equation)

F (t  1)  1  e 0.1741  1  0.84  0.16

Probability of failure during the first 3 months or (3/12=0.25 years) (not instantaneous; so use
cdf equation)

F (t  0.25)  1  e 0.1740.25  1  0.957  0.042

(b) Probability of failure prior to (below) 5.75 years (not instantaneous; so use cdf equation)

15
F(t  5.75) 1  e0.1745.75  0.632

(c) Probability of operating at least 10 years or chance of no failures at least 10


years R(t) = e-λt= e-0.174(10)= 0.176
(d) Plot the functions: R(t) = e-λt

f(t) = λ e-λt = 0.174e-0.174t


Determine, the values for t=5, 10, 15, 20 & plot them

1
.2

0
.8

0
.6

0
.4

0 0 5 10 15 20 25 30
.2
R(t) f(t) cdf

Every point on R(t) gives the probability of operating for at least t years
Every point on f(t) gives the instantaneous probability of failure at t years
Example S-3 (Stevenson):
By means of extensive testing and data collection, a manufacturer has determined that a particular
model of its vacuum cleaners has an expected life that is Exponential with a mean of four years
and insignificant burn-in phase.
Find the probability that one of these vacuum cleaners will have a life that ends:
a. After the initial four years of service.
b. Before four years of service are completed.
c. Not before six years of service.
Solution:
Given, λ=1/4=0.25
(a) Life will end after 4 years. So it shows reliability for 4 years:
R(t) = e-λt= e-0.25(4)= 0.3679 (Ans)
(b) Probability of failure before 4 years= 1- p(reliability of four years) = 1-0.3679=0.6321
(Ans)
(c) Reliability of 6 years = e-0.25(6)= 0.2231 (Ans)

16
Example 6.21 (Walpole) p.224:
A certain washing machine is characterized by the following density function:

 1  y / 4 , y 
f (x)   4 e
0
 0, elsewhere
This is an exponential with μ=4 years. The machine is considered a bargain if it is unlikely to
require a major repair before the 6th year.
(a) What is the probability P(Y>6)?
(b) What is the probability that major repair occurs in the first year?
Solution:
(c) Probability of repairing/failure after 6 years=Reliability of 6 years=e-0.25*6 =0.2231 (Ans)
(d) Probability that major repair occurs in the first year=1- e-0.25*1= 0.2211 (Ans)
[Probability & Statistics; “Rukmangadachari E.”]

17
“STATISTICAL DECISION THEORY/HYPOTHESIS TEST”
Statistical Decision:
Decision about population is made on the basis of sample information. Such decisions are
called statistical decisions.
Example: we may wish to decide on the basis of sample data whether new serum is really effective in
curing a disease, whether one educational rocedure is better than other.
Statistical Hypothesis:
In attempting to reach decision it is useful to make assumptions about the populations
involved. Such assumptions, which may or may not be true are called statistical hypotheses.

Types of Hypothesis:
There are 2 types of hypothesis for any statistical test.
1. Null Hypothesis
2. Alternative Hypothesis
Hypothesis can be formulated about mean, variances, differences of mean, or pdf forms

Subject Hypothesis Description Statistical Statement


Mean Population Mean is 25 μ=25
Two Means Two population have the same μ1= μ2
Mean
Proportion Fraction of population that is Π=0.05
defective is 0.05
pdf Population follows poisson pdf with P(x;0.5) is correct
λ=0.5
Two variances 2 population have equal variances 2 2
1 2

Null Hypothesis:


It is generally denoted by H0
 H0 is any hypothesis which is to be tested for possible rejection under the assumption that is
true
Alternative Hypothesis:

 It is denoted by H1 or HA
 The statement that must be true if the null hypothesis is false/rejected.
 Opposite to Null Hypothesis.
Ref: Hypothesis Testing, Mohammad Adil Khan
Types of errors and their possibilities:
Since either H0 or H1 got to be correct in reality. We can make 2 types of errors in our decision.
Decision
Reality

H0 True
Type І error (α) Correct

Correct Type ІІ error (β)


H0 False

18
Test Statistics:
• It provides a basis for testing a null hypothesis.
• A value computed from the sample data that is used in making the decision about whether to
reject or accept the null hypothesis.
• Test statistic is denoted by Z and is given by

Z
Value of RandomVariable - Mean of RandomVariable  X  
Standard Deviationof RandomVariable

Ref: Hypothesis Testing, Mohammad Adil


Khan
Critical Region:
• Set of values of the test statistic that would cause a rejection of the null hypothesis.
• Values consistent with H0 is called acceptance region.
• Values not consistent with H0 is called rejection region.

Ref: Hypothesis Testing, Mohammad Adil Khan


Critical Value:
Value or values that separate the critical region (where we reject the null hypothesis) from the values
of the test statistics that do not lead to a rejection of the null hypothesis.

Type І Error or α error:

• The mistake of rejecting the null hypothesis when it is true.


•  (alpha) represent the probability of type І error i.e.  is the probability of rejecting H0 when it is true.

Example:
Suppose you are a quality inspector of your company and make the decisions of accepting and
rejecting a lot. You hypothesized a lot with null hypothesis µ=25. Now, accidentally you select a
sample from the lot whose value falls in the critical region and due to this reason you reject the lot.
But the lot truly is acceptable! This type of error when “rejecting a true lot!” is type І error or α
error

Type ІІ Error or β error:


• Failing to reject Null Hypothesis when it is false.
•  represents the probability of type ІІ error i.e. probability of accepting H0 when H0 is false.

Example:
Suppose you are a quality inspector of your company and make the decisions of accepting and
rejecting a lot. You hypothesized a lot with null hypothesis µ=25. Now, accidentally you select a
sample from the lot whose value falls in the acceptable region and due to this reason you accept the
lot. But the lot truly is not acceptable! This type of error when “accepting a false lot!” is type ІІ
error or β error.

19
Significance Level (What does α=5% mean?):
• denoted by 
• the probability that the test statistic will fall in the critical region when the null hypothesis
is actually true.
• common choices are 0.05, 0.01, and 0.10.
•  =5% means there are about 5 chances in 100 of rejecting a true null hypothesis.
• In other words we say that we are 95% confident in making the correct decision.
Ref: Hypothesis Testing, Mohammad Adil Khan
Two-tailed, Right-tailed, Left-tailed Tests:
What is meant by Two tailed, Right tailed & Left tailed test?
• The tails in a distribution are the extreme regions bounded by critical values.
• A hypothesis for which the entire rejection is located in only one of the two tails, either in the
left tail or right tail of the probability distribution of the test statistic, is called one tail test or
one sided test.
• If the rejection region is divided equally between the two tails of the probability distribution of
the test statistics, this referred to as a two tailed test or two sided test.
• It is important to note that one tailed test and two tailed test differ only in location of the
critical
region, not in the size.
Ref: Hypothesis Testing, Mohammad Adil Khan

20
Controlling Type I and Type II Errors:
• To decrease both α and β, increase the sample size n.

General Rules for Testing Hypothesis:


1. State your problem and formulate a null hypothesis with alternative hypothesis
2. Decide upon a significance level,  of the test which is the probability of rejecting the null
hypothesis if it is true.
3. Choose an appropriate test statistic, determine and sketch the probability distribution of
the
test statistic, assuming H0 is true.
4. Determine the critical region in such a way that the probability of rejecting the null
hypothesis, if it is true, is equal to the significant level, .
5. Compute the value of the test statistic in order to decide whether to accept or reject the
null hypothesis H0.
a) 6. Formulate the decision rule as below
Reject the null hypothesis H0, if the computed value of test statistic falls in the rejection region and

b) conclude that H1 is true.


Accept the null hypothesis H0, otherwise
Example 19.1 (Blank):
A petroleum engineer has taken a sample of 25 readings and hypothesized that the pressure in a
vessel is 3.5 atmosphere (atm). The test is therefore between
H 0 :   3.5
 H1 :  
The pressure is normally distributed
3.5 with a standard deviation of 2.5 atm.
(a) Determine the critical region limits if a type І error is allowed 5% of the time.
(b) Compute the probability for a type ІІ error if in reality µ=4.8 atm.

Solution:
(a) According to CLT (Central Limit Theorem). The sample standard deviation is
 x    2.5  0.5
n 25 [ given data is real life data, it is transformed to normal data by CLT]

21
To allow   0.05 probability for type І error requires that α/2=0.025 of the area in each tail. The
two correspondent values of z indicates the critical region. Where, z = standard normal distribution
variable.

x1  
P(z  z1 )   z1 ) 
P(  x 0.025
[find the value of z from table A3 for which the left sided area is 0.025; z1=-1.96]
x1  3.5
0.5  1.96; x1  2.52
atm

P(z  z2 )  P( x2    z2 ) 
x
0.025 [find the value of z from table A3 for which the right sided area is 0.025; z2=1.96]
x2  3.5
0.5 1.96; x2  4.48
atm

So, the critical region is defined by x1=2.52 atm and x2=4.48 atm. Any pressure value between this falls in the

acceptance region and results in accepting null hypothesis H0

(b)

µ=4.8

Z2=1.96

We have to to determine probability for a type ІІ error (probability of accepting a false lot)
which is the left side area (blue shaded) under the curve of x2=4.48 of alternative hypothesis since this value
matches with true lot

22
x2   4.48  4.80
Convert x2 to standard normal variable, z1    0.64

0.50 (Ans)
From table A3 left side area under the curve from -0.64 or P(type ІІ error)= 0.2611
“GOODNESS OF FIT TEST”
Chi-Square goodness of fit test is a non-parametric test that is used to find out how the observed
value of a given phenomena is significantly different from the expected value.
In Chi-Square goodness of fit test, the term goodness of fit is used to compare the observed
sample distribution with the expected probability distribution.
Chi-Square goodness of fit test determines how well theoretical distribution (such as normal,
binomial, or Poisson) fits the empirical distribution.
In Chi-Square goodness of fit test, sample data is divided into intervals. Then the numbers of points
that fall into the interval are compared, with the expected numbers of points in each interval.

(Oi  E i)2
The Chi Square Goodness of Fit Test: 2
 c  Ei
Oi=Observed Value; Ei=Expected Value

The chi square can be used for discrete distributions like the binomial distribution and the Poisson
distribution, while the Kolmogorov-Smirnov and Anderson-Darling goodness of fit tests can only be
used for continuous distributions.
Two potential disadvantages of chi square are:
1. The chi square test can only be used for data put into classes (bins). If you have non-binned
data you’ll need to make a frequency table or histogram before performing the test.
2. Another disadvantage of the chi-square test is that it requires a sufficient sample size in order
for
the chi-square approximation to be valid. [Ref: Statistics How To]

Example 23.1 (Blank)


The personnel director at your company has asked you to verify the statement that absenteeism is
twice as bad on Mondays as the rest of the week. You are given personnel records for 3 months
covering 890 days of lost work.
Day Monday Tuesday Wednesday Thursday Friday
Days Lost 304 176 139 141 130

Solution:
The data are discrete. So, choose chi-square test.
Day Observed Expected χ2
(Oi-Ei)2
i Oi Ei
1 304 296.7 53.29 0.18
2 176 148.3 767.29 5.17
3 139 148.3 86.49 0.58
4 141 148.3 53.29 0.36
5 130 148.3 334.89 2.25
Total 890 889.9 8.54

23
Following Hypothesis testing procedure:
1. Hypothesis states that absenteeism is twice in Monday. So it states that absenteeism occurs in
the following ratios 2:1:1:1:1, with a sum of 6. There are 5 working days as from the given
data & i=1,2,3,4,5 indicates the day of the week i,e; if i=2, Monday, i=3 Tuesday & so on
The probability distribution or absenteeism:

2 , i
6 1
f (a) 
1 , i
6
2,3,4,5
So the hypothesis:
H0: A has the distribution f(a)
H1: A does not have the distribution f(a)
2. Let, α=0.05 (means 5% possibility that we will make error)
3. No parameters need to be estimated
4. Det, E=np
5. Degrees of freedom   k  r 1 , r is the no. of parameters of the hypothesized distribution
that is estimated from the sample data.
For this problem,   5 1  4 , because there are 5 days and no parameters are estimated
from the sample
The acceptance region for  2 (4) and   0.05is  2  9.49
6.
Since, 8.54<9.49, H0 can’t be rejected. It appears that the absenteeism rate is twice as high as Mondays.

Example 23.3 (Blank)


A normal distribution was fit to a flow data in Liters/second.
Flow (L/s) 7.65 7.95 8.25 8.55 8.85 9.15
Frequency 5 21 35 15 3 1
Test the hypothesis
H0: flow has a normal distribution
H1: flow does not have a normal distribution
Solution:
Flow (L/s) 7.65 7.95 8.25 8.55 8.85 9.15
Upper Cell 7.80 8.10 8.40 8.70 9.00 9.30
Boundary, x
x
z -1.45 -0.42 0.60 1.62 2.65 3.67

Area left of z, 0.0735 0.3372 0.7257 0.9474 0.9960 0.9999
p

'
Cell, i Oi Ei=npi Ei' Oi (Oi-Ei)2 χ2
1 5 5.88 5.88 5 0.77 0.13
2 21 21.10 21.10 21 0.01 0.0004
3 35 31.08 31.08 35 15.37 0.49
4 15 17.74
5 3 3.89 21.94 19 8.64 0.39
6 1 0.31
N=80 1.01

24
1. H0 & H1 are given 2.
Let, α=0.05
3. From the given
data,

5 7.65 
21 7.95 
35 8.25
4. χ15
2
KeepMinimum frequency 5 by adding rows if less than 5
=1.01,8.55
5. 3 48.85
 2 1  1 , After combining rows for the limitation of frequency value we have only 4
1
9.15
cells (k) with 2 (r) estimated parameters
 8.224
The acceptance region for α=0.05 and
8 χ =1.01 is χ ≤3.84
2 2

6. 0
Since, 1.01<3.84, accept H0
And, σ=0.293 L/s from the data by, Since the lower value is μ-3σ & the upper value is μ+3σ
approximately; μ-3σ=7.65; 8.224-3σ=7.65; σ=0.19 OR, μ+3σ=9.15; σ=0.30

[Go to α=0.05 column and ν=1 row they intersect at 3.84 which is the required chi-square
value]

25

You might also like