Statistic 1
Statistic 1
The set of all possible outcomes of a random experiment is called the sample
space. It is represented as S.
Each outcome in the sample space is called an element / member. Thus, the sample space
s of possible outcome when a coin is tossed is written as
S
S = {H ,T } H-head
T-tail
Random experiment – an experiment that can result in different outcomes, even though it is
repeated in the same manner.
Example 1
Consider a random experiment of tossing a die. If we are interested in the numbert that shows on
the top face, the sample space,
S1 = {1,2,3,4,5,6,}
If we are interested only in whether the number is odd or even, the sample space
S 2 = { odd , even }
*more than one sample space can be used to describe the outcomes of an experiment.
Example 2
Suppost that tree items are selected at random from a manufacturing process. Each item is
inspected and classified defective, D or nondefective, N
1
1st item 2nd item 3rd item outcomes
DDD
D
D
N DDN
D
D DND
N
N N
DNN
D
NDD
N
NDN
N D
NND
N
NNN
1.2 Events
2
An event is a subset of a sample space
From example 2, we may be interested in the event B that the number of defectives is greater than
1. The event B is written as
This subset represents all the elements for which the event is true.
Example
Let A = { pizza }
The complement of A is A′ = {burger , lasagna }
The intersection of two events A and B , stated by the symbol A ∩ B , is the event containing
all elements that are common to A and B .
S A B
2 4
1 6 5 3
Two events A and B are mutually exclusive or disjoint if A ∩ B = 0/ , that is, if A and
3
B have no elements in common.
Then M ∩ N = 0/ = { }
The union of the two events A and B , stated by the symbol A ∪ B , is the event
containing all the elements that belong to A or B or both.
Then A ∪B = {a, b, c, d , e}
n ( A)
P ( A) = n( A) - Number of elements of event
n( S )
A
n( S ) - Number of elements of sample
space, S
Example 4
Solution
Denote A,B,C as the student majoring mecha, electrical and electronic major
n( B ) 8
a) P ( B ) = =
n( S ) 30
b) P( B ∪ C ) = n( B ∪ C ) = 20 = 2
n( S ) 30 3
4
Adaptive rule
A∩ B
P ( A ∪ B ) = P ( A) + P ( B ) − P ( A ∩ B )
A B
Corallary
The probability of an event B occuring when it is known that some event A has
occurred is called a conditional probability and it is stated by P(B A)
P (B A) is usually read ̀the probability that B occurs given that A occurs’ or the
probability B given A’
P( A ∩ B )
P ( B A) = provided P( A) > 0
P ( A)
Example 5
The probability that a regularly scheduled flight departs on time is P( D ) = 0.83 ; the
probability that it arrives on time is P( A) = 0.82 ; the probability that is departs and
arrives on time is P( D ∩ A) = 0.78 . Find the probability that a plane
5
a) arrives on time that it departed on time
b) departed on time given that it has arrived on time
solution
P ( D ∩ A) 0.78
a) P ( A D ) = = = 0.94
P( D ) 0.83
b) P( D A) = P( D ∩ A) = 0.78 = 0.95
P ( A) 0.82
Example 6
Sample space S is the population of adults in a small town and categorized according to
gender and employment status. (table 1)
Find,
The probability of a men is chosen given that the one chosen is employed
P( M E ) =
460 23
=
600 30
Or
P( M ∩ E ) n( M ∩ E ) 460
P( M E ) = , P( M ∩ E ) = =
P( E ) n( s ) 900
n( E ) 600
P( E ) = =
n( s ) 900
hence
P( M ∩ E )
P( M E ) =
P( E )
460
23 as before
= 900 =
600 30
900
6
If in an experiment the events A and B can both occur, then
P ( A ∩B ) = P ( A ) P ( B A) , provided P ( A) > 0
P ( A ∩ B ) = P ( A ) P ( B ), , P ( B A ) =P ( B )
(independent)
Example 7
In the card tossing experiment, 2 cards are drawn in succession ( 5 yellow and 10 blue
cards) without replacement.
7
4
P ( y21 y1 ) =
14
yellow y2 y1 y2 P( y1 ∩ y2 )
blue b2 y1b2 P( y1 ∩ b2 )
10
P( b21 y1 ) =
14
5
P ( y1 ) =
15
yellow y1
blue b1
10
y ( b1 ) =
15
5
P ( y21b1 ) =
14
yellow y2 b1 y2 P( b1 ∩ y2 )
b2 b1b2 P( b1 ∩ b2 )
Blue
9
P ( b21b1 ) =
14
5 4
y1 y2 : P ( y1 ∩ y2 ) = P ( y1 ) × P ( y21 y1 ) = × =
15 14
5 10
y1b2 : P( y1 ∩b2 ) = P ( y1 ) × P ( b21 y1 ) = × =
15 14
8
10 5
b1 y2 : P( b1 ∩ y2 ) = P( b1 ) × P( y21b1 ) = × =
15 14
10 9
b1b2 : P ( b1 ∩b2 ) = P ( b1 ) × P( b21b1 ) = × =
15 14
With replacement
5
P( y21 y1 ) = y1 y2 P( y1 ∩ y2 )
15
yellow y2
Blue b2 y1b2 P( y1 ∩ b2 )
10
P( b21 y1 ) =
15
5
P ( y1 ) =
15
yellow y1
Blue b1
10
y ( b1 ) =
15
5
P ( y21b1 ) = b1 y2 P( b1 ∩ y2 )
15
yellow y2
blue b2 b1b2 P( b1 ∩ b2 )
10
P ( b21b1 ) =
15
5 5
y1 y2 : P ( y1 ∩ y2 ) = P( y1 ) × P ( y2 ) = × =
15 15
5 10
y1b2 : P ( y1 ∩ b2 ) = P ( y1 ) × P ( b2 ) = × =
15 15
9
10 5
b1 y2 : P ( b1 ∩ y2 ) = P ( b1 ) × P ( y2 ) = × =
15 15
10 10
b1b2 : P ( b1 ∩b2 ) = P ( b1 ) × P ( b2 ) = × =
15 15
5 5
P ( y21 y1 ) = and P ( y2 ) =
15 15
Suppose that we are now given the additional information that 36 of those employed and
12 of those unemployed are members of swimming club.
P ( A) = P ( E ) P ( A E ) + P ( E ′) P ( A E ′)
2 3 1 1
= × + ×
3 50 3 25
4
=
75
P(
A E )
P ( E ∩A) = P ( E ) P ( A E )
A
A′ P ( E ∩ A′)
P (E )
E
E′ A′
P( E ′)
P ( A E ′)
P ( E ′ ∩ A) = P ( E ′) P ( A E ′) A
P( E ′ ∩ A′)
P ( A′) = 1 − P ( A)
The generalization of this illustration (where the sample space is partitioned into K
subsets (2 subsets in this case) is covered by
11
k k
P( A) = ∑ P( Bi ∩ A) = ∑ P( Bi )P ( A Bi )
i =1 i =1
Example 8
In a certain assembly plant, three machine B1 , B2 and B3 make 30%,45% and 25%,
respectively, of the products. It is known that 2%, 3% and 2% of the products made by
each machine, respectively are defective. What is the probability that the product selected
is defective.
Solution
S
B1 D B2
B1 ∩ D B2 ∩ D
B3 ∩ D
B3
P ( D1B1 ) = 0.02
B1 DP( B1 ∩ D )
P( B1 ) = 0.3
P( B2 ) = 0.45 D′ P( B1 ∩ D′ )
P ( D1B2 ) = 0.03
B2
DP( B2 ∩ D )
B3
P( B3 ) = 0.25
P( D1B3 ) = 0.02
D′ P( B2 ∩ D′ )
D P( B ∩ D ) 3
D′
P( D ) = P ( B ) P ( D1B ) + P ( BP)(PB(3D∩1B
1 1 2D′) + P( B ) P( D1B )
2 3 3
= ( 0.03 )( 0.02 ) + ( 0.45 )( 0.03 ) + ( 0.25 )( 0.02 )
= 0.0245
P( D′) = 1 − P( D )
12
Now, we wish to find the conditional probability,
P ( Bi A)
Bayes rule: If the events B1 , B2 , Bk constitute a partition of the sample space s such
that P( Bi ) =/ 0 for i = 1,2 k , , then for any event A in S such that P( A) =/ 0
P( Br ∩ A) P ( Br ) P ( A Br )
P( Br A) = k
= k
∑ P ( B ∩ A) ∑ P ( B ) P ( A B )
i =1
i
i =1
i i
r =1,2, k
Example 8 (continue)
If the product was chosen randomly and found to be defective, what is the probability that
it was made by machine B3 ?
Solution
P( B3 ) P( D1B3 )
P( B31D ) =
P( B1 ∩ D ) + P( B2 ∩ D ) + P( B3 ∩ D )
P( B3 ) P( D1B3 )
=
P( B1 ) P( D1B1 ) + P( B2 ) P( D1B2 ) + P( B3 ) P( D1B3 )
P ( B3 ) P( D1B3 )
=
P( D )
=
( 0.25 )( 0.02 )
0.0245
It is important to allocate a numerical description to the outcome. For example, when two
electronic components are tested
13
D DD
D N DN D-defective
N-Non defective
N
D ND
N NN
Sample space DD DN ND NN
x x =2 x =1 x =1 x =0
Here, x is called random variable which associates the number of the defective items.
X Capital letter, X -denote a random variable
Small letter, x -one of its values
A random variable is a function that assigns a real number to each outcome in the
sample space of a random experiment
(count data)
A discrete random variable is a random variable with a finite range or countably infinite.
A discrete random variable assume each of its values with a certain probability.
14
S = { DD , DN , ND , NN }
X : Discrete random variable which represent the number of defectives
X =x 2 1 0
P( X = x ) 1 2 1 1
P( x = 2) = P( X = 1) = = P( X = 0) =
4 4 2 4
15
u = E( X ) = ∑x f ( x )
i
i i , if X is discrete
x f ( x )dx , if X is
∞
∫ −∞
continuos
[ ]
σ 2 = E ( X − u ) = ∑( x − u ) f ( x )
2 2
E( X 2 ) − u2
If X is discrete.
( )
E X 2 = ∑ xi2 f ( xi )
i
[
σ 2 = E ( X −u )2 = ∫ ] ∞
−∞
( x − u ) 2 f ( x ) dx
( )
= E X 2 −u2
If X is continuos
( )
E X 2 = ∫ xi2 f ( x )dx
∞
−∞
σ = + σ2
Question 1
The distribution of the number of imperfections per 10 meters of synthetic fabric is given
by
y 0 1 2
f ( y) 1 3 1
i) Find the expected 5 5 5
number of imperfections, E ( X ) = u , variance, σ and standard deviation,
2
σ
ii) Find, P ( y ≤1) , P( y >1) , P ( y = 0 ) and P( 0 ≤ y ≤ 2) .
Question 2
16
The porportion of impurities in a batch and the density function is given by
f ( x) = x, 0 < x < 1
2 − x ,1 ≤ x < 2
0 , elsewhere
2
i) u = E ( y ) = ∑ yi P( yi )
i =0
= y0 P( y0 ) + y1 P( yi ) + y 2 P ( y2 )
( ) ( ) ( )
= 0 1 +13 + 2 1 =1
5 5 5
σ 2 = E( y2 ) − E 2 ( y)
( )
E y 2 = y02 p ( y0 ) + y12 p ( y1 ) + y22 p ( y2 )
5
( ) ( ) ( )
= 0 1 + 1 3 + 22 1
5 5
=7
5
E 2 ( y ) = u 2 =1
∴σ 2 = 7 − 1 = 2/5 σ = 2
5 5
ii) P ( y ≤ 1) = P ( y = 0 ) + P( y = 1)
=1 +3 =4
5 5 5
P( y > 1) = P( y = 2) = 1
5
P( 0 ≤ y ≤ 2 ) = P( y = 0 ) + P ( y = 1) + P ( y = 2 )
= 1 + 3 + 1 =1
5 5 5
P( y = 0 ) = 1
5
2) i) u = E ( x ) = ∞ xf ( x )dx
∫−∞
= ∫ x.xdx + ∫ x ( 2 − x )dx
1 2
0 1
=1
17
σ 2 = E( x2 ) − E 2 ( x)
( )
E x 2 = ∫ x 2 f ( x )dx
−∞
∞
1 2
= ∫ x 2 xdx + ∫ x 2 ( 2 − x ) dx
0 1
= 14
12
∴ σ 2 = 14 − (1) = 1
2
σ = 0.408
12 6
ii)
4
(
P 1 < x< 3
2
)
1 3 27
= ∫1 xdx + ∫ 2 2 − xdx =
4
1 32
P x< 3( 2
)
1 3 7
= ∫ xdx + ∫ 2 2 − xdx =
0 1 8
The simplest of all discrete probability distribution is one where the random variable
assumes each of its values with an equal probability.
If the random variable X assumes the value x1 , x2 xk with equal probabilities, then
the discrete uniform distribution is given by
f ( x; k ) = 1 , x = x1 , x2 , xk
k
We use the notation f ( x; k ) instead of f ( x ) to indicate that the uniform distribution
depends on the parameter k
Example
18
When a fair die is tossed, each element of the sample space s = {1,2,3,4,5, 6} occurs with
probability 1
6
Example 1
Solution
0 .3 0.3
26 − (1.96 ) < u < 2.6 + (1.96 )
36 36
19
0. 3 0.3
26 − ( 2.575 ) < u < 2.6 + ( 2.575 )
36 36
2.47 < u < 2.73
* we are 95% confidence that the sample mean X = 2.6 differs from the true mean u
by amount less than 0.1 and 99% confident that the different is less than 0.13
σ
Lower one-sided bound: X − Zα
n
Example 2
Solution
The upper 95% bound is given by α = 0.05 (leaving an area 0.05 to the
right, − 0.95 to the left)
σ
x + Zα = 6.2 + (1.645 ) 4 25
n
= 6.858 seconds.
Hence, we are 95% confident that the mean reaction is less than 6.858 seconds.
The probability distribution of this discrete random variable is called the binomial
distribution and its value will be represented by b( x; n, p )
20
A Bernoulli trial can result in a success with probability p and a failure
with probability q =1 − p .Then the probability distribution of the binomial random
variable X , the number of successes in n independent trials,is
n
b( x; n, p ) = p x q n −x x = 0,1, n
x
u = np σ 2 = npq q =1 − p
Example 1
The probability that a certain kind of component will survive a shock test is 3 / 4 .Find
the probability that exactly 2 of the next 4 components tested survives.
Solution
n=2 n=4 p= 3
4
b( x, n, p ) = b( 2,4,3 / 4)
= 4
2
( 4) ( 4)
3
2
1
2
= 4! 32
4
2!2!
4
27
= 128
r
Binomial sum : B (r , n, p )
= ∑ b( x, n, p)
x =0
r −1
1. p ( X < r )
= ∑ b( x, n, p).
x =0
r r −1
2. p ( X = r )
= ∑ b( x, n, p) ∑ b( x, n, p).
x =0 x =0
b
3. p (a ≤ X ≤ b) = ∑ b( x, n, p)
x =a
21
b a −1
4. p ( X ≥ r )
= 1 p( X < r )
r −1
= 1 ∑ b( x, n, p).
x =0
Example 2
The probability that a patient recovers from a rare blood disease is 0.4. If 15 people are
known to have contracted this disease, what is the probability that
a) at least 10 survive
b) from 3 to 8 survive
c) exactly 5 survive
then, find the mean and variance of the binomial random variable
solution
9
a) p( X ≥10 )
= 1 − P ( X <10 ) = 1 − ∑ b( x,15 ,0.4)
x =0
= 1 − 0.9662
= 0.0338 8
p (3 ≤ X ≤ 8) = ∑ b( x;15 ,0.4)
b)
x =3
8 2
= ∑ b( x;15 ,0.4)
x =0
∑ b( x;15 ,0.4)
x =0
= 0.9050 − 0.0271
= 0.8779 .
(c) p ( X = 5)
= b(5,15 ,0.4)
5 4
= ∑ b( x;15 ,0.4
x =0
∑ b( x;15 ,0.4)
x =0
= 0.4032 − 0.2173
= 0.1859 .
Mean,
= u = (15 ) (0.4) = 6 variance = σ2 = (15 )( 0.4)( 0.6) = 3.6
22
Example 3
Solution
3 2
(a) b(3,10 ,0.3)
= p ( X = 3) = ∑ b( x;10 ,0.3) − ∑ b( x;10 ,0.3)
x =0 x =0
(b) p( x > 3)
= 1 − p( x ≤ 3)
= 1 − 0.6496 = 0.2668 .
23
u =σ2 = λ
Example 1
Solution
x =6 λ =4
e −4 46 6 5
p ( x = 6) = p(6,4) = = ∑ p ( x;4) − ∑ p ( x;4)
6! x =0 x =0
= 0.8893 −0.7851
= 0.1042 .
Example 2
Ten is the average number of oil tankers arriving each day at a certain port city. The
facilities at the port can handle at most 15 tankers per day. What is the probability that on
a given day tankers have to be turned away?
Solution
Let X be the number of tankers arriving each day.
P( X >15 ) =1 − P( X ≤15 )
15
= 1 − ∑ P ( x;10 )
x =0
= 1 − 0.9513
= 0.0487 .
This distributions is characterized by a density function that is ‘flat’ and thus the
24
The density function of continuous uniform random variable x on the interval
[A,B] is
1
B − A, A ≤ X ≤ B
f ( x; A, B) =
0, e l s e w h e r e
Example
The density function for a uniform random variable on the interval [1,3]
1 1
f ( x;1,3) = = 1≤ x ≤3
B−A 2
0
elsewhere
25
The density of the normal random variable X , with mean u and variance σ2 , is
1
1 ( x − u )2
n( x; u ,σ ) = 2σ 2
2πσ e− ,− ∞ < x < ∞
Where
π = 3.14159 e = 2.71828 ...
Fig 2:(a)
26
Fig 3:(b)
Fig 4:(c)
The probability that the random variable X assumes a value between x = x1 and
,
x = x2 i.e.
P( x1 < X < x2 )
is represented by the area of the shaded region.
27
P( x1 < X < x2 )
Fig 5: = area of the shaded region
Due to a hopeless task to attempt to set up separate tables for every conceivable values of
u and σ , we use the transformed normal distribution.
X σ 2 =1
Normal random variable, normal random variable, Z with u=0 and
transform
The transformation,
X −u
Z=
σ
28
Example 1
P ( x1 < Xx 2 )
= P ( Z1 < Z < Z 2 )
Given a standard normal distribution, find the area under the curve that lies
(a) to the right of Z = 1.84
(b) between and
Z = −1.97 Z = 0.86 .
Solution
(a) P( Z >1.84 ) =1 − 0/ (1.84 ) =1 − 0.9671 = 0.0329
Example 2
Given a standard normal distribution, find the value of right of K such that
Solution
29
(a) value leaving the area of to the right, must then leave an area of
K 0.3015
0.6985 to the left
∴K = 0.52
(b) / ( −0.18 ) − 0
0 / ( K ) = 0.4197
/
0 0/ ( K ) = 0.4286 − 0.4197
∴K = −2.37 .
Example 3
Given a random variable X having a normal distribution with u = 50 and σ =10 , find
the probability that X assumes a value between 45 and 62 .
Solution
45 − 50 62 − 50
Z1 = = −0.5 Z2 = = 1.2
10 10
Therefore,
P ( 45 < X < 62 ) = P ( −0.5 < Z <1.2)
=0
/(1.2) −0 /( −0.5)
=0.8849 −0.3085
=0.5764 .
Example 4
30
Given a normal distribution with and σ = 6, find the value of x that has
u = 40
Solution
Question 1
A certain type of storage battery lasts on average 3.0 years with a standard deviation of
0.5 year. Assuming that the battery lives are normally distributed, find the probability
that a given battery will last less than years.
2. 3
31
Question 2
An electrical firm manufactures light bulbs that have a life before burn-out, that is
normally distributed with mean equal to 800 hours and a standard deviation of 40 hours.
Find the probability that a bulb burns between 778 and 834 hours.
Question 3
The average grade for an exam is 74 , and the standard deviation is 7. if 12% of the
class is given A’s and the grades are curves to follow a normal distribution, what is the
lowest possible A and the highest possible B?
x
P( X ≤ x) = ∑ b ( k ; n, p )
k =0
Area under normal curve to the left of
≈ x + 0 .5
= x + 0.5 − np
P( Z ≤
npq
32
4 3
= 0.1268 .
u = np =
15 (0.4) = 6
Approximate area between x = 3.5 and
1 x2 = 4.5
σ 2
= npq = 15 (0.4) (0.6)3.6 =
3.5 − 6
Converting to Z values, Z1 = = −1.32
1.897
4.5 − 6
Z2 = = −0.79
1.897
Example 2 P (7 ≤ X ≤ 9)
= 0.9662 − 0.6098
= 0.3564 .
6.5 − 6
Converting Z values, Z1 = = 0.26
1.897
9.5 − 6
Z2 = = 1.85
1.897
Hence, P (7 ≤ X ≤ 9)
≈ P (0.26 < Z <1.85 )
33
= P ( Z <1.85 ) − P ( Z < 0.26 )
= 0.9678 − 0.6026
= 0.3652 .
0, elsewhere
Where B > 0
and
u =B σ 2 = B2
P ( Z < 0)
=φ(0)
=0.5
34
P ( Z <3.30 )
=φ(3.3)
=0.9995
P ( Z <0.7 )
=φ(0.7)
=0.7580
P ( Z <−0.1)
=1 −φ(0.1)
=1 −0.5398
=0.4602
P ( Z >−1.96 )
= P ( Z <1.96 )
=φ(1.96 )
=0.9750
P ( Z >−0.66 )
=1 −φ(0.66 ))
=1 −0.7454
=0.2546
35
P ( 0 < Z < 2)
=φ( 2) −φ(0)
=0.97725 −0.5
=0.4772
36
- size of the population N is the number of object or observation in the population
- population is said to be finite or infinite depending on the size of N being finite
or infinite
Notation
Population Sample
Mean u x
Standard deviation σ s
proportion p P
Sample mean and sample variance are two important statistics with are measure of a
random sample X 1 , X 2 , X n of size n
(measure of central tendency)
n
Sample mean ∑X i
(measure of central tendency)
=X = i =1
n
∑( X )
n
i −X2
Sample variance (measure of variability
=s = 2 i =1
n −1
of data about the mean)
37
n∑ X i2 − ( ∑ X i )
2
=
n( n −1)
Suppose that a random sample of n observation is taken from a normal population with
mean, u and variance σ 2
Each observation X i , i =1,2, n of the random sample will have the same normal
distribution. The normal approximation of the sample mean, X can be calculated using:
X −u
Z=
σ/ n
Where n –sample size, u and σ are mean and variance from the population and X is
mean of random sample
Example 1
An electrical firm manufactures light bulbs that have a length of life that is approximately
normally distributed, with mean equal to 800 hours and a standard deviation of 40 hours.
Find the probability that a random sample of 16 bulbs will have an average life of less
than 775 hours.
Solution
38
What is the t – Distribution used for?
- to deal with inference about the population mean. ( S as estimator of σ)
-will be explained in the next chapter.
Example 3
A chemical engineer claims that the population mean yield of a certain batch process is
500 grams per mililiter of raw material. To check this claim, he samples 25 batches each
month. If the computed t –value falls between −t0.05 and t0.05 , he is satisfied with
his claim.
What conslusion should be draw from a sample that has mean x = 518 grams per mililiter
and a sample standard deviation, S = 40 grams? Assume the distribution of yields to be
approximately normal.
Earlier in problems of inference on a population mean, it was assumed that the population
standard deviation, σ is known.
Often, in fact an estimate of σ must be suplied by the same sample information that
produced the sample average x
• if the sample size is large enough, say n ≥ 30 ,the distribution of T does not
differ considerably from the standard normal.
• However, for n < 30 , it is useful to deal with the exact distribution of T
- in developing the sampling distribution of T ,we shall assume that our
random sample was selected from a normal population.
39
Further explaination on sampling distribution
1. Draw all possible samples of size n from a given finite population of size N
Total number of possible samples of size n from the population of size N is given
N1
NCn = 0
=K
n1( N − n ) 1
0 0
2. Compute a statistic S (such as mean,s.d, median, mode etc) for each of these samples.
S = S ( x1 , x2 , x3 xn )
Sample number 1 2 3 K
Statistic S S1 S2 S3 S K
- Thus, sampling distribution describe how the statistic S will vary from one sample to
the other of the same size.
- The difference in the value of the statistic is known as sampling fluctuations.
- If the statistic S is mean, the sampling distribution is known as sampling distribution of
means.
S Sampling distribution
Mean Sampling dist of mean
Variance Sampling dist of variance
Proportion Sampling dist of proportion
Median Sampling dist of median
40
4.1 Statistical inference
Statistical Inference
Bayesian method
Interval estimate
Point estimate
Case study-Estimation
A candidate for president post may wish to estimate the true proportion of voters favoring
him by obtaining te opinions from a random sample of 100 eligible voters. The fraction
of voters in the sample favoring him could be used as an estimate of the true proportion
in the population of voters.
To sum up,
41
If X is the mean of a random variable sample of size n from a population with known
variance σ2 , a
α = 0.05
α = 0.01
100 (1 − α ) % Confidence interval
For u is given by
X − Zα σ σ
< u < X + Zα
2 n 2 n
-when α = 0.05 , we have a 95% confidence interval. (we can be 95%confident that the
given interval contain the u )
-Then when α = 0.01 , we obtain a wider 99% confidence interval. The wider
confidence interval, the more confident we can be that the given interval contain the u.
Error
X − Zα α X u X + Zα 2 α n
2 n
The size of error is the absolute value of the difference between u and X
We can be 100 (1 − α ) % confident that this error will not exceed Zα 2 σ n
4.3. Estimation
4.3.1 Point estimation
~ as tought in chapter 3, (3.2 3.5) ~
42
For u by considering the sampling distribution of X
1 −α
1 :degree of confidence
X −u
P − Zα < Z < Zα = 1 − α , Z =
2 2 α n
X −u
P − Zα < < Zα = 1 − α
2 α n 2
α σ
P X − Zα < u < X + Zα = 1− α
2 n 2 n
This procedure is the same as 4.2.2 section except that σ is replaced by S and the
standard normal distribution ( Z distribution) is replaced by the t –distribution.
(* For large sample ( n ≥ 30 ) with unknown variance, use z dist. The unknown σ is
substituted by s )
Refering figure 2,
43
If X and s are mean and standard deviation at a random sample from a normal
population with unknown variance σ 2 ,a 100 (1 − α ) % confidence interval for u is
S S
X − tα < u < X + tα
2 n 2 n
Example
The contents of 7 similar containers of sulfuric acid are 9.8,10.2, 10.4,9.8, 10.0, 10.2
and 9.6 liters. Find a 95% confidence interval for the mean of all such containers,
assuming an approximate normal distribution.
Solution
The sample mean and standard deviation for the given data are
44
The value x of the statistic X in the particular sample and sampling distribution is a
point estimate of a population parameter u
One is interested in finding out whether brand A floor wax is more scuff resistant than
brand B floor wax. He might hypothesis that brand A is better than brand B and after
proper testing (using particular sample and sampling distribution, he then accept or reject
this hypothesis.
Here, we do not attempt to estimate a parameter but instead we try to arrive at a correct
decision about a prestated hypothesis. As in estimation, sampling theory is also applied
here.
Null Hypothesis
- Null hypothesis is tested for possible rejection by assuming (before testing) that it is
true.
- Stated as H 0 and the statement expressed as equality (=)
H 0 : u = u0
i.e we assumes that the population mean equals to u0
Alternative hypothesis
H1 : u =/ u0
Test of hypothesis
45
Test of hypothesis is a procedure to decide whether to accept or reject the null hypothesis
based on statistic in sampling distribution.
Decision
Accept H 0 Reject H 0
H 0 is true Correct decision Type! error
H 0 is false Type II error Correct decision
Critical region
Consider any statistics S ∗ (statistics Z or T for example), the area under the
probability curve can be divided
Example
Non critical region critical region
Η0 : µ = µ0 x − µ0
Ζ=
Η1 : µ ≠ µ0 σ/ n
t=
Η0 : µ = µ0 Ζ
Η1 : µ < µ0 t
46
Η0 : µ = µ0 Ζ
Η1 : µ > µ0 t
a. To test whether the population mean u equals to specified constant u0 or not, i.e.
when Η1 : u ≠ u0
1. Η0 : u = u0
2. Η1 : u ≠ u0
3. Choose α
4. since Η1 : u ≠ u0 (two tailed test)
Calculate − Ζα 2 and − Ζα 2
5. Compute test statistics
x − u0
Ζ= =
σ/ n
6. Decision: reject Η0 if Ζ < − Ζα or Ζ > Ζα
2 2
47
4. since Η1 : u < u0 (left one tailed test)
Calculate − Ζα
5. Compute test statistics
x − u0
Ζ=
σ/ n
6. Decision: reject Η if Ζ < −Ζ
0 x
c. when Η1 : u > µ0
1. Η0 : u = u0
2. Η1 : u > u0
3. Choose α
4. since Η1 : u > u0 (right one tailed test)
5. Compute test statistics
−
x − u0
Ζ=
σ/ n
6. Decision: reject Η if Ζ > −Ζ
0 α
Example 1
A manufacture of sports equipment has develop a new synthetic fishing line that he
claims has a mean breaking strength of 8 kilograms with a standard deviation of 0.5
kilograms. Test the hypothesis that u = 8 kilograms against the alternative that u ≠ 8
kilograms if a random sample of 50 lines is tested and found to have a mean breaking
strength of 7.8 kilograms. Use a 0.01 level of significance.
Solution
1. Η 0 : u = 8 kg
2. Η1 : u ≠ 8 kg
3. α = 0.01
48
x = 7.8 kg , n = 50 , u =8 kg
σ =0.5 kg .
x − u0 7.8 − 8
Z= = = −2.83
σ n 0.5 50
6. Decision:
α = 0.01
Reject and conclude that the average breaking strength is not equal to 8 but in
Example 2
A random sample 0f 100 recorded deaths in the United States during the past year
showed an average life span of 71.8 years. Assuming the population standard deviation of
8.9 years, does this seem to indicate that the mean life span today is greater than 70 years
use a 0.05 level of significance.
Solution
1. Η0 : u = 70 years
2. Η1 : u > 70 years
3. α = 0.05
4. Critical region: Ζα =1.645
49
x − µ0 = 71 .8 − 70 = 2.02 .
Ζ=
σ / n 8.9 / 100
6. Decision:
Example 1
Solution
1. Η0 : u = 46 Kilowatt hours
2. Η1 : u < 46 Kilowatt hours
3. α = 0.05
4. Critical region: tα = −1.796 with 11 degrees of freedom
5. Compute statistics T
50
Hence,
x − u0 42 − 46
t= = = −1.16
S / n 11 .9 / 12
6. Decision:
For σ unknown and for small sample size, the decision criterion is based on the t
distribution with V = n −1 degrees of freedom.
x − u0
t=
S/ n
The test is similar as in section 4.3.1 except t values are used in place of Ζ and σ is
replaced by S
5.1.1 Introduction
51
Curve fitting is a method of determining the mathematical equation between the variables
X and Y by finding the best fit curve, C : Y = f ( x ) that approximates a given set of
data.
Relationship
- The relationship between two variables, X and Y (or more) may exist. For
example : blood pressure and age, nutrition and IQ etc.
Scatter diagram
- Scatter diagram is the plotted points of the set of given N paired observation of X
and Y , i.e ( x1 , y1 ), ( x2 , y2 ) ... ( xn , yn ) in XY plane
plane
Approximating curves, C
- A smooth curve that approximates the given set of N data points plotted in the scatter
diagram.
52
-Let ( x1 , y1 ), ( x2 , y2 ), ( xn , yn ) be a given set of N data points
d i =Yi −Yˆi Denotes the difference between Yi and the corresponding value Yi
estimate
Yˆi = f ( X i )
N
- The curve that having the minimum ∑d
i =1
i
2
is the best fitting curve and
∑d
i =1
i
2
is called the residuals or error sum of squares.
1. Y = a0 + a1 X (straight line)
2. Y = a0 + a1 X + a2 X
2
(quadratic curve)
3. Y = A B* (exponential curve) etc.
Y = a0 + a1 X = f ( X )
∑Y i = Na 0 + a1 ∑X i
∑X Y i i = a0 ∑X i + a1 ∑X i2
Example 1
X : 1 2 3 4 5 6
53
Y:6 4 3 5 4 2
Solution
∑X =21 ∑Y =24 ∑X 2
= 91 ∑Y 2
=106 ∑XY =75
n =6
∑Y = Na 0 + a1 ∑X a1 =
∑XY = a0 ∑X + a1 ∑X 2 a0 =
24 = 6a0 + 21a1
75 = 21 a0 + 91 a1
Y = 5.7999 − 0.514 X
- The study of the relationship (if it exists) between two (or variables).
- Mainly used for
54
- Linear regression – the relationship between the variables is linear. Represented by a
straight line.
If Y is dependent on X then Y = a0 + a1 X is known as regression line of Y on X
Similarly, if X depends on Y , then X = b0 + b1Y is known as regression line of X
on Y
Y = α + βx + ε
α -unknown intercept
β-unknown slope
ε - random error (assume to be normally distributed with mean E (ε ) = 0 and variance
σ 2 (residual variance)
X = b0 + b1Y
∑X = Nb 0 + b1 ∑Y
∑XY = b0 ∑Y + b1 ∑Y 2
Or 21 = 6b0 + 24 b1
75 = 24 b0 +106 b1
- In the regression analysis, the relation in the form of mathematical equation is obtained.
- While in correlation analysis, we wish to find whether the relationship exist and the
strenght of the relationship is measured.
55
Correlation coefficient-measure the closeness
Type of correlation
n n
n
∑ x ∑ y
∑ xy − i =1 i =1
n
r= i =1
n n n n
n ∑ x ∑ x n ∑ y ∑ y
i =1 i =1
∑x −
2
∑ y − i =1 i =1
2
n n
i =1 i =1
56
Example 1
Find
a) t0.025 when v = 14
b) −t0.01 when v = 10
c) t0.995 when v = 7
d) t − value which v = 14 that leaves an area of 0.025 to the left
solution:
a) t0.025 = 2.145
b) −t0.01 = 2.764
c) t0.995 = t1−0.005 = −t0.005 = −3.499
d) t0.975 = −t0.025 = −2.145
Example 2
Find k for a random sample of size 24 from a normal distribution such that
Solution
57
b) p( k < t < 2.807 ) = 0.095
tα = 2.807 with v = 23 , α = 0.005
Area to the left of k is 1 − 0.005 − 0.095 = 0.9
Then, area to the right of K is 0.1
∴K = t0.1 =1.319 with v = 23
Example 2
A process for making certain ball bearings is under control if the diameters of the
bearings have a mean of 0.5 cm. If the random sample of 10 of these bearings has a mean
diameter of 0.5060 cm and s.d of 0.004 cm, is the process under control? (use α = 0.005
)
Solution
Then,
x −u
t= = 4.7434 since 4.7434 > tα = t0.005 = 3.250
s n
the process is not under control.
z
t
t dist z dist
Mean 0 0
variance 1 More than 1(depend on v = n −1)
as n → ∞ , variance approaches 1
58
Critical values of t dist
Critical value, tα : the area under the curve to the right of tα equal to α
Symmetry property
t1−K = −tα
Confidence interval
X − Zα σ σ
< u < X + Zα
2 n 2 n
X − tα s σ
< u < X + tα
2 n 2 n
59
Test of hypothesis –to conclude
1.
2.
3.
4.calculate: if two sided
− Zα and Zα (or − tα and tα )
2 2 2 2
If one sided left if one sided-right
− Zα (or −tα ) Zα (or tα )
x − u0 x − u0
Z= or t=
σ n σ n
6. Conclusion
60
61