Stat 154-Statistical Methods II (Updated)
Stat 154-Statistical Methods II (Updated)
METHODS II
STAT 154
VINCENT K. DEDU
X ~ B(n, p) or X ~ Bin(n, p)
Note:
The number of trials n, and the
probability of success p are both
needed to describe the distribution
completely. They are known as the
parameters of the binomial
distribution.
Let P(failure)=q then q=1-p
If X ~ B (n, p ) the probability of obtaining
r successes in n trials is
P( X r ) where
n r n r
P( X r ) Cr p q for r = 0,1,2,......n
n n n!
Cr or
r (n r )!r !
EXAMPLE
At A-life Supermarket of
customers pay by credit card. Find
the probability that in a randomly
selected sample of ten customers
X ~ B (10, 0.6)
n r n r
P(X=r) = Cr p q
2 8
10C2 p q
a). P(X=2) = 2 8
45 0.6 0.4
= 0.011
=
b) P(X>7)= P(X=8) + P(X=9) +
P(X=10)
= 0.17
Example:
A random variable X is B(7,0.2).
Find to 3dp
a) P( X 3)
b) P (1 X 4)
c) P( X 1)
3 4
a) P(x=3) = 7 C3 * 0.2 * 0.8
EXPECTATION AND VARIANCE OF
BINOMIAL DISTRIBUTION
If the Random Variable, X is such that
Then
Example:
SOLUTION
iii. E(X) = np = 4(0.8) = 3.2
iv. Var(X) = np(1-p) = npq = 4(0.8)(0.2) =
0.64
Five independent trials of an
experiment are carried out. The
probability of successful outcomes is p
and probability of failure = 1- p= q
Write out the probability distribution of
X, where X is the number of successful
outcomes in five trials. Comment on
your answer.
. X ~ B (5, p ) X 0,1, 2,3, 4,5
P ( X 0) 5 C0 q 5 p 0 q 5
5 4 1 4 1
P ( X 1) C1q p 5q p
5 3 2 3 2
P ( X 2) C2 q p 10q p
5 2 3 2 3
P ( X 3) C3 q p 10q p
5 1 4 1 4
P ( X 4) C4 q p 5q p
5 0 5 0 5
P ( X 5) C5 q p 5q p
5 4 3 2 2 3 1 4 5 5
q 5q p 10q p 10q p 5q p p (q p)
5
(q p ) 1
sin ce q p 1
X ~ Po ( ) where
x
P( X x)
x! for x=0,1,2……
EXAMPLE
A student finds that the average number of
amoebas in 10ml of pond water from a
particular pond is four. Assuming that the
number of amoebas follow a Poisson
distribution, find the probability that in a
10ml sample
a)There are exactly five amoebas
b)There are no amoebas
c)There are fewer than three amoebas
SOLUTION
if X ~ Po ( )
E ( x ) Var ( x)
Examples
If X follows a Poisson distribution with
standard deviation 1.5, find P(X 3)
solution
if X ~ Po ( ) then Var (x) =
Var(x) = SD 2 1.52 2.25
so =2.25 and X ~ P0 (2.25)
a ) P ( x 3) 1 P ( x 3)
= 1 P ( x 0) P ( x 1) P ( x 2)
2.25 2.25 2
1 1 2.25 2!
1 0.6093
0.391
USING THE POISSON DISTRIBUTION AS AN
APPROXIMATION TO THE BINOMIAL
DISTRIBUTION
When n is large (n>50) and p is small
(p<0.1),the binomial distribution
X ~ B ( n, p )
can be approximated using a Poisson
distribution with the same mean ie
X ~ Po (np)
the approximation gets better as
n gets larger and p gets smaller
i.e.
as p 0
n
EXAMPLE
Eggs are packed into boxes of 500. On
average 0.7% of the eggs are found to be
broken when the eggs are unpacked.
Find correct to 2 significant figures, the
probability that in a box of 500 eggs;
a) Exactly three are broken
b) At least two are broken
SOLUTION
Let X be the number of broken
eggs in a box of 500
,
so
E(x) = np = 500 x 0.007 = 3.5
since and , we can use a Poisson
approximation
X ~ Po (3.5)
3.5 3
3.5
a) P( x 3) 0.22
3!
b) P( x 2) 1 P( x 2)
1 P( x 0) P( x 1)
.
35 3.5
1 3.5
0.86
ASSIGNMENT 2
The random variable X is
B(100,0.03). Find the following
probabilities
i)P(x=0) ii)P(x=2) iii)P(x=4) ,
using
a) The Binomial distribution
b) A Poisson approximation
c) Comment on your results in a and b
NORMAL DISTRIBUTION
Carl Friedrich Gauss (1777-1855)
The normal distribution is one of the
most important distributions in
statistics. Many quantities in life follow
a normal distribution.
E.g.
• Ages of students in a class
• Marks of students obtained in an exam.
• Heights of Social Sc. I students
The normal distribution X is
continuous. Its pdf f(x)
depends on two parameters
mean( ) and standard
deviation ( ).
To describe the distribution we
write;
X ~ N ( , ) 2
The normal distribution curve has the ff features;
• It is bell-shaped
• It is symmetrically about the mean
• It extends from to
• The total area under the curve is 1
2
( x )
1 2 2
f ( x) e , x
2
. X ~ N (0,1)
3 2 1 0 1 2 3
0
1
. 1
X ~ N (4, ) 4
4
1
2
2 3 4 5 6
.
X ~ N (50, 4) 50
2
48 49 50 51 52
FINDING THE PROBABILITIES
Consider
The probability that X lies between a and b is;
b
P ( a x b) f ( x ) dx
which is area under
a the curve.
a b
THE STANDARD NORMAL VARIABLE Z
In order to use the same set of tables for all
possible values of and 2 , the variable X is
standardized so that the mean is 0 and standard
deviation is 1. The standardized normal variable
is called Z, and Z~N(0,1).
X Z
Use:
X
Z where Z ~ N (0,1)
Eg: P(X<56)
X 50 56 50
Z 3
2 2
P(Z<3)
Find P(X<56) if X ~ N (50, 4)
We standardize as follows
X 56 50
P( < 2
)
=P (Z< 3)
USING STANDARD NORMAL TABLES
P ( Z z ) ( z ); area under the curve up to z
(Z )
0 Z
Eg.
Find a). P(Z<0.16)
P ( Z 0.16) (0.16)
(0.16) 0.5636
P( Z 0.16) 0.5636
b). P(Z<0.34)
P( Z 0.34) 0.6331
c). P(Z<0.43) = 0.6664
Question:
Find a). P(Z<0.85) b). P(Z>0.85)
Solution:
(0.85)
0 0.85
a).
P( Z 0.85) 0.8023
b).
0 0.85
P ( Z 0.85) 1 (0.85)
1 0.8023
0.1977
NEGATIVE VALUES
.
P ( Z a ) ( a ) 1 (a )
P ( Z a ) P ( Z a ) (a )
-a 0 a
Example
Z~N(0,1)
Use the standard Normal Table to
find;
a). P(Z<1.37)
b). P(Z>-1.37)
c). P(Z>1.37)
d). P(Z<-1.37)
.
P( Z 1.37) (1.37)
0.9147
0.91 (2s. f )
.
P( Z 1.37) 1 P( Z 1.37)
1 (1.37)
1 0.9147
0.0853
.
P ( Z 1.37) P ( Z 1.37)
1 (1.37)
1 0.9147
0.0853
For P(a<Z<b)
a b
P (a Z b) (b) (a )
Find P(0.345<Z<1.751)
P ( Z 1.751) P ( Z 0.345)
(1.751) (0.345)
0.9599 0.6368
0.3231
Show that;
I. (X
)
SOLUTION CONT’D
II. )
SOLUTION CONT’D
III.
THE CENTRAL LIMIT THEOREM
If a sample of any size is taken from a population with a
normal distribution with mean = and standard deviation =
x
the distribution of means of sample size n, will be normal
with a mean
standard deviation
for proportion (p) is given by:
p (1 p ) pˆ (1 pˆ )
pˆ Z Se( pˆ ) pˆ Z pˆ Z
2 2 n 2 n
where pˆ x is the point estimator for p which should
n ˆ
np
n 30
not be close to 0 or 1 and, and both and
npˆ (1 pˆ )
are 5. If these conditions are not met the
interval estimate becomes unreliable and is not
recommended to be used. The sample size needed to
estimate p with a specified maximum error of estimation,
E and confidence coefficient (1- ) is obtained as follows:
2
Z pˆ (1 pˆ )
pˆ (1 pˆ )
E Z , from which we have n 2
2
2 n E
Sample Sample
Sample
2
X ~ N ( , )
n
Consider the population 2, 4, 6. The
mean 4
X
X
(X X ) 2
n N
INTERVAL ESTIMATION
P(? X ?) 0.95
0.05 5%
1 0.95 95%
INTERVAL ESTIMATION
(CONFIDENCE INTERVAL)
We wish to find L1 and L2 such that;
P( L1 X L2 ) 1
= probability that it will not contain
X
X
Z when Z ~ N (0,1)
n
P( a < Z < b) = 95%
95%
2.5% 2.5%
1.64 1.64
Z 0.05 1.64
.
X Z X Z
2 n 2 n
2 2
28.5 1.64 28.5 1.64
32 32
X Z
2 n
where is estimated by the sample standard
deviation, s if it is unknown. The sample size, n
required for estimating may be obtained as
follows. The maximum error of estimation is
E Z
2 n
From which we obtain the sample size, n
Z 2 2
n 2
2
E
QUESTION 1
Where E =
= 1%
=
Hence
E=
But = 2.57
E = 2.57 .56
25.4
( 26.96)
is the 99% confidence interval for the
distribution.
QUESTION 2
( 29.193)
QUESTION 3
[15.412,16.588]
WHEN “n” IS SMALL (n)
s
X t
2
( n 1) n
where t
( n 1)
2
is the critical value obtained
from the t-distribution with
degrees of freedom (n-1) and s
sample standard deviation.
QUESTION 4
Given n=10
Standard deviation(s)=11.1
Sample mean = 121.2
Construct a 95% confidence interval for
the distribution .
SOLUTION
(
E=
T- DISTRIBUTION TABLE
= 2.262
E= 2.262 =7.94
121.2121.2
113.26
σ=81,n=29, = 375
Is “n” large or
small?
n is small(n<30)
Answer:
333.44
Question 6.
An industrial designer wants to
determine the average amount of
time it takes an adult to assemble an
“easy to assemble” toy. A sample of
16 times yielded an average time of
19.92 minutes, with a sample
standard deviation of 5.73 minutes.
Assuming normality of assembly
times, provide a 95% confidence
interval for the mean assembly time.
Answer
Is the confidence
interval.
FOR PROPORTION (P)
pˆ (1 pˆ )
pˆ Z
2 n
CONFIDENCE INTERVALS FOR DIFFERENCE
BETWEEN TWO PARAMETERS
Determining an interval estimate for the
difference between two population means ( 1 2 )
or two population proportions ( p1 p2 ) is a
means of comparing two population
parameters. The
(1 )100% confidence interval for ( 1 2 ),
like for the single parameter , two cases of sample sizes.
1- parameter 2 -parameters
u u1 u2
p p1 p2
• If the sample sizes are large, we have
_ _ _ _
( x1 x2 ) Z .Se ( x1 x2 )
2
_ _
2
2
( x1 x2 ) Z 1
2
2 n1 n2
where n1 30, n2 30
_ _
1 1
( x1 x2 ) t ,( n n 2) .Sp ,
2 1 2 n1 n2
2 2
(n1 1) s (n2 1) s
2
and s p
1
, 2
(n1 n2 2)
for
. (p1 p2 ) is given by
p1 (1 p1 ) p1 (1 p2 )
( p1 p2 ) Z
2 n1 n2
where n1 and n2 are large
x1 x2
and p1 and p2 .
n1 n2
EXAMPLE
a. A mid-semester examination in Statistics
was given 25 students randomly selected
from class A and also to another 40 students
randomly selected from class B. The mean
scores obtained from both samples and the
standard deviations are as shown below.
_
Class A n A 55 x A
66 s A 10
_
Class B nB 40 x 62 sB 8
B
2 2
s s
( x A xB ) Z A
B
2 nA nB
2 2
.
10 8
(66 62) (1.96)
55 40
4 1.96(1.85)
4 3.626
or 0.374 (u A uB ) 7.626
(u A uB ) (0.374,7.626)
CONFIDENCE INTERVAL FOR
POPULATION VARIANCE
CHI-SQUARE DISTRIBUTION
χ2 =
n = Sample Size
PROPERTIES OF THE CHI-SQUARE
DISTRIBUTION
χ2 - Distribution
1. Not Symmetrical
2. Values are non-negative
3. As the degrees of freedom goes up, the
distribution becomes more symmetrical but
never gets symmetrical
Cont’d
135
FINDING THE CONFIDENCE INTERVAL
n= 12, 0.05 = .025
= .025
χL2 χR2
3.816 21.92
1-.025 =.975
Cont’d
137
CONT’D
CONFIDENCE INTERVAL FOR VARIANCE
CONFIDENCE INTERVAL FOR STANDARD
DEVIATION
EXAMPLE 1
n=7, 315.6
Find a 95% confidence interval for
SOLUTION
C.I. = []
C.I. = [131.04 ; 1530.66]
EXAMPLE 2
A sample of 7 boxes of a certain type of cereal
with a nominal weight of 750grams had the
following weights;
C.I = []
Solution
C.I. =[]
C.I. = [2.32; 21.95]
Hypothesis Testing
Introduction:
In some practical problems of statistical inference we
may be required to take decision concerning the
parameters of the population instead of finding
estimates for them. The following are some
situations requiring such decisions:
HO
i) Null Hypothesis, denoted , which is the
tentative statement assumed to be true.
H
ii) Alternative Hypothesis, denoted 1 , which
contradicts the null hypothesis. It is accepted
only when sufficient evidence exist to establish
its truth.
Formulation of H0 and H1
Testing the validity of a claim
• The claim made is chosen as
the null hypothesis while the
challenge to the claim is
taken as the alternative
hypothesis.
(c) Formulation of H O and H1
When we wish to establish the validity of a
statement about a population using the
evidence obtained from a random sample data,
the negative of the statement is what we take
as the null hypothesis. The statement itself
constitutes the alternative hypothesis. In some
applications, it is not obvious how H and H
0 1
should be formulated. The following
guidelines for developing hypothesis of three
types of situations are suggested.
(i) Testing Research Hypothesis: This is formulated as
alternative hypothesis.
H1 : 0
(ii) One-Tailed test to the left, formulated as:
H 0 : 0
H1 : 0
(iii) Two-Tailed Test formulated as:
H 0 : 0
H1 : 0
Test Statistic
• It is a formular that leads to the
rejection or the acceptance of
the null hypothesis.
The test-statistic is
X
Z for n 30
n
X 0
t for n 30
s n
DECISION RULE
• Zis compared with the critical value Z /2 or Z
Reject H 0 if | Z | Z / 2 or | t | t ,( n 1)
2
AN OUTLINE FOR HYPOTHESIS TESTING
• State H 0 and H1
• Choose the level of significance
• Select a test statistic
• Find the critical region
• Compute the value of the statistic
• Draw conclusions (Reject or Accept)
Example
A car manufacturer claims that
average weekly income of owners of his
car is $180. An investigator takes a
random sample of 200 such car owners
and finds out that they have an average
weekly income of $184.26 with a
standard deviation of $24.12. On the
bases of the sample, do you agree on
the manufacturer’s claim. (Test at 5%
significance level)
Solution
H 0 : 180
H1 : 180
0.05
Since 2.50 is
H 0outside the acceptance region,
we reject And conclude that the weekly
income of the car owners is not $180.
Errors in Hypothesis
Testing:
When H0 is tested against H1 using a randomly
selected sample data, two possible errors may be
committed.
H0 is true H 0 is false
(H1 is true)
0.01 0.05
(g) P-Value:
The P-value is the smallest level of significance for
which the observed data would call for rejection of H 0
in favour ofH1 . The p-value dives additional
insight into the strength of the decision taken. A very
small p-value, such as 0.0001, indicates that there is
virtually no likelihood
H 0 that is true. On the other
hand, a high p-value such as 0.2 means H 0that is
not rejected and there is little likelihood that it is
false. The p-value is often referred to as the observed
level of significance. For given level of significance, ,
null hypothesis,
H0
i. Is rejected if p-value
ii. Is failed to be rejected if p-value
END