Probability Notes Level1
Probability Notes Level1
Faculty of Engineering,
University of Moratuwa
by
Dr. T S G Peiris
Department of Mathematics
Note: In these handouts only the important points are given. It is necessary that the students
to attend all the classes to acquire more details and to expose in tackling different
statistical problems related to engineering applications.
Course Content:
Introduction to probability using set theory, Conditional probability and independence,
Applications of Bayes theorem, Discrete and continuous random variables, Properties of the
probability distributions (Binomial, Normal, Standard Normal, Student‟s t, Poisson and
Exponential) and their applications, Descriptive statistics and Introduction to Minitab for
data analysis
Learning Outcomes:
Upon successful completion of this course, students should be able to
1
Interpret the results of data analysis
Recommended Readings:
Mathematics for Engineers – J M J A Cooray
Business Statistics Concept and Application (Many books on, “Business Statistics”
are available in the library. All of those have very good practical applications)
2
1. BASIC DEFINITIONS IN SET THEORY
1.2 Element: The objects comprising the set are called its elements.
1.3 Universal Set: An universal set is a set which contains all objects, including itself and
is denoted by U .
1.4 Null Set: The set contains no elements is called null set and is denoted by
3
1.5 Laws of Set Theory
( A B) C A ( B C )
Distributive law: A ( B C ( A B) ( A C )
4
2. FUNDAMENTAL PRINCIPAL OF COUNTING
Note: If some procedure can be performed in n1 different ways and a second procedure can
be performed in n2 ways , third procedure can be performed in n3 ways, and so forth
then the number of ways the procedure can be performed in the order indicated is
n1 n2 n3 .............
2.2 Permutation
An arrangement of set of n objects in a given order is called a permutation. An arrangements
of any (r n) objects from n objects taken at a time is denoted by
n!
n
Pr n( p, r )
(n r )!
Eg The number of different signals each consists of 8 flags in a vertical line formed from a
set of 4 indistinguishable red flags and 3 indistinguishable white flags and one blue flag is
8!/ 4! 3! .
2.4 Combinations
A combination of n objects taken r at a time is denoted by c(n, r ) where
p(n, r ) n!
n
Cr c(n, r )
r! (n r )!r!
Comparison between combinations and permutations of the four letters a, b, c and d taken 3
at a time
Table 1
Combinations Permutations
abc abc, acb, bca, bac, cab, cba
abd abd, adb, bad, bda, dab, dba
acd acd, adc, cad, cda, dac, dca
bcd bcd, bdc, cbd, cdb, dbc, dcb
5
3. PROBABILITY THEORY
3.1 Probability
Probability is the likelihood or chance that a particular event will occur.
The theory of probability makes some sense to find the mathematical foundation (numerical
measure) for uncertainty. It enables us to make decisions under condition of uncertainty. The
theory of probability is useful in day to day life and has many applications in all the fields of
engineering. There are three approaches to the subject of probability. They are ,
(i) prior classical probability,
(ii) empirical classical probability
(iii) subjective probability.
Probability theory is based on the paradigm of a random experiment.
6
3.4 Trial
The performance of a random experiment is called a trial. Many random variables can be
associated in a trial
Eg. Throwing a die and throwing a coin three times are trials.
3.5 Event
Outcome of a trail is an event.
Let A be an event that two or more heads appear consecutively from an experiment of
throwing a coin three times. Then A HHH , HHT , THH .
7
3.11 Mutually Exclusive Event
Events are mutually exclusive if they cannot happen at the same time.
If we toss a coin, either heads or tails might turn up, but not heads and tails at the same
time.
In a single throw of a die, we can only have one number shown at the top face. The
numbers on the face are mutually exclusive events
If A and B are mutually exclusive events then the probability of A happening OR the
probability of B happening is P( A) P( B). That is P( A B) P( A) P( B).
Choosing a marble from a jar AND landing on heads after tossing a coin
Attending to Maths class and playing a tennis game
A spinner has 4 equal sectors colored yellow, blue, green and red. After spinning the
spinner, what is the probability of landing on each color?
The possible outcomes of this experiment are yellow, blue, green, and red.
number of ways to land on yellow 1
P( yellow )
total number of colors 4
8
3.15 Axioms of Probability
Let S be a random sample space and A be an event within S. Then
(1) 0 P( A) 1
(2) P( S ) 1
(3) The sum of the probabilities of all simple events must be 1.
(4) If A and B are mutually exclusive events then P( A B) P( A) P( B).
(5) If Ai (i 1, 2, ......n) are mutually exclusive events then P( Ai ) P( Ai )
i
Theorem 3: If A B then P( A) P( B)
Addition Theorem
If A and B are any two events then probability that at least one of them occurs (that is A or
B occurs) is denoted by P( A B) and given by,
P( A B) P( A) P( B) P( A B) .
9
Note: Simple probability is also called marginal probability as the total number of success
(those who planned to purchase) can be obtained from the appropriate margin of
contingency table.
Example 2. Suppose in the follow up study the following additional information was
obtained from the 300 households who actually purchased a big screen TV.
Table 3
Purchased Purchased DVD
HDTV Yes No
Yes 38 42
No 70 150
10
3.20 Partitions and Baye’s Thoerm
n
Suppose the events A1 , A2 ,..... An be partitions of the sample space S s.t. S Ai .
i 1
= ( A1 B) ( A2 B) ........ ( An B)
n n
P( B) p( Ai B) p( Ai ). p( B / Ai )
i 1 i 1
Bayes’ Theorm
p( Ai B) p( Ai ) p( B / Ai )
( P( Ai / B) n
p( B)
p( A ). p( B / A )
i 1
i i
3.21 Independence
An event B is said to be an independent event of an event A if the probability that B occurs
is not influenced by whether event A has or has not occurred.
That is P( B) P( B / A)
p( B A)
We know from eq. (1) that P( B / A)
p( A)
If A and B are independent P( B / A) P( B) .
Thus it is clear if A and B are two independent events then P( A B) P( A) P( B) .
11
4. Properties of Random Variables
In mathematics, random variables are used in the study of probability. They were developed
to assist in the analysis of games of chance, stochastic events, and the results of scientific
experiments by capturing only the mathematical properties necessary to answer probabilistic
questions. There are two types of random variables; discrete and continuous depending on the
type of measurement of the random variable.
Eg. Consider the tossing of pair of fair dice. Then the possible outcome is
S (1,1), (1,2), ...............(1,6),.............(6,1), (6,2),..................(6,6)
Thus n(S ) 36 . Let X be a RV such that X max( a, b) here (a, b) is the outcome of the
pair of dice. Then the possible values that X can have are , {1,2,3,4,5,6}=S(x) (say) and
n(X(s )= 6. Then,
P(X = 1) = P{(1,1)} = 1/36 = f (1) (say)
P(X = 2) = P{(1,2), (2,2), (2,1)} Thus f (2) = 3/36 (say)
Similarly f (3) = 5/36, f (4) = 7/37, f (5) = 9/36, f (6) = 11/36.
Thus we can form a table given below and it is called as probability distribution of X.
Table 4
xi 1 2 3 4 5 6
f(xi) 1/36 3/36 5/36 7/36 9/36 11/36
12
0.35
0.3
0.25
P(X=s)
0.2
0.15
0.1
0.05
0
1 2 3 4 5 6
Value of RV - X
13
4.5 Distribution of Y
0.18
0.16
0.14
0.12
P(Y=s)
0.1
0.08
0.06
0.04
0.02
0
2 3 4 5 6 7 8 9 10 11 12
Values of RV - Y
1
Cumulative Prob. of Y
0.8
0.6
0.4
0.2
0
2 3 4 5 6 7 8 9 10 11 12
Values of RV - Y
Note: Continuous random variable is not defined at specific values. Instead, it is defined
over an interval of values, and is represented by the area under a curve.
b
Thus if X is a continuous RV with pdf of f (x) then P(a x b) f ( x)dx .
a
As for discrete case f (x) > 0 and
f ( x)dx 1 .
14
5. PARAMETERS OF A DISTRIBUTION
In order compare different distribution various parameters (statistical indicators) have been
defined. The physical meaning of each of the indicator is explained in the class.
In probability theory the expected value (or expectation, or mean) of a discrete random
variable is the sum of the probability of each possible outcome of the experiment multiplied
by the outcome value. Thus,
= x f ( x)dx
if X is a continuous RV
Eg. (a) Consider rv X = max (a,b) where (a,b) is the outcome of tossing two fair dices. Then
the pdf of X is given by (as shown above) Table 7.
Table 7 – Pdf of X
xi 1 2 3 4 5 6
f(xi) 1/36 3/36 5/36 7/36 9/36 11/36
Eg. If X is continuous rv with pdf f(x) where f(x) = kx2 (1-x) , 0< x < 1
= o otherwise
Then it can be shown that k =12 and E(X)=3/5 using the property of
f(x).dx =1
Properties of E(X)
15
5.2. Variance of a distribution ( 2 ) – Var(X)
= x
i
2
i f ( xi ) 2 = E(X2) – [E(x)]2
x f ( x ) dx - 2 when X is continuous
2
V(X) =
E(X2) = 12(1/36) + 22 (3/36) + 32 (5/36) + 42 (7/36) + 52 (9/36) + 62 (11/36)
= 701/36 = 21.9
Properties of V(X)
V(X+k) = V(X),
Var (aX) = a2V(X)
It is defined as the square root of variance. This indicator has more benefits than the variance
in interpreting results.
Remark: Let Y be a random variable with mean µ ad standard deviation . Then the
Y
standardized random variable Z is defined as Z so that that V(Z) = 1 and E(Z) = 0.
This is very common transformation in statistics and would be very useful in all applications.
More details are discussed later in the class.
16
5.4. Covariance between X and Y- xy
If X and Y are two RVs then the extent to which two random variables vary together (co-vary)
is measured by an indicator known as Covariance and it is denoted by Cov(X,Y) and given
by
E(XY) = x y h( x , y )
i i i i X Y if X and Y are discrete
Note:
Positive covariance: It indicates that higher than mean values of one variable tend to be
paired with higher than mean values of other variable.
Negative covariance: It indicates that higher than mean values of one variable tend to be
paired with lower than mean values of other variable.
Zero covariance: If the two random variables are independent then the covariance will
be zero.
However, covariance is zero does not imply that two variables are independent
Note :
V(X + Y ) = V(X) + V(Y) + 2Cov(X,Y).
Example:
A pair of fair dice is tossed. Let X = max (a, b) and Y = a + b where (a ,b) is any ordered pair
belongs to S.
Table 8
Y Sum
2 3 4 5 6 7 8 9 10 11 12
1 1/36 0 0 0 0 0 0 0 0 0 0 1/36
2 0 2/36 1/36 0 0 0 0 0 0 0 0 3/36
X 3 0 0 2/36 2/36 1/36 0 0 0 0 0 0 5/36
4 0 0 0 2/36 2/36 2/36 1/36 0 0 0 0 7/36
5 0 0 0 0 2/36 2/36 2/36 2/36 1/36 0 0 9/36
6 0 0 0 0 0 2/36 2/36 2/36 2/36 2/36 1/36 11/36
Sum 1/36 2/36 3/36 4/36 5/36 6/36 5/36 4/36 3/36 2/36 1/36
17
Properties of the Covariance
If X, Y, W, and V are real-valued random variables and a, b, c, d are constant ("constant" in
this context means non-random), then the following can be proved easily.
Notes:
(a) Cov (X, a) = 0,
(b) Cov (X, Y) = Cov (Y, X)
(c) Cov (aX, bY) = abCov (X, Y)
(d) Cov (X + a, Y+b) = Cov (X, Y)
(e) Cov (aX+bY, cW+ dV) = cCov(X,W) + adCov(X, ) + bcCov(Y, W) + bdCov(Y,V)
Note: Proofs of the axioms and their applications are discussed in the class
Limitation
Because of the number represent by Cov(X,Y) depends on the units of the data it is difficult to
compare COV among different data sets, having different scales. As a solution to that
another very useful indictor, known as correlation coefficient has been introduced.
xy E ( x X )(Y Y )
xy = Corr(X,Y)
x Y x Y
E ( XY ) E ( X ) E (Y )
xy
E ( X ) [ E ( X )]2 E (Y 2 ) [ E (Y )]2
2
Skewness can come in the form of "negative skewness" or "positive skewness", depending on
whether data points are skewed to the left (negative skew) or to the right (positive skew) of
the data average.
1. negative skew: The left tail is longer; the mass of the distribution is concentrated on
the right of the figure. It has relatively few low values. The distribution is said to be
left-skewed.
2. positive skew: The right tail is longer; the mass of the distribution is concentrated on
the left of the figure. It has relatively few high values. The distribution is said to be
right-skewed.
18
(+) ve (right) skewed (-) ve (left) skewed
mode < median < mean mean < median < mode
Figure 4 Negative skewness and positive skewness
5.7 Kurtosis – K.
It is a measure of the tallness or flatness ("peakedness") of the pdf. It is a measure of
whether the data are peaked or flat relative to a normal distribution.
K [ E ( X X )4 ] / X4 .
(x i x)2
Variance 2 s2 i 1
(n 1)
Standard Deviation s
n
(x i x )( y i y )
Covariance i 1
(n 1)
n
(x i x )( y i y )
Correlation coefficient rXY i 1
(n 1) s X sY
n
Skewness Sk
(x
i 1
i x)3
(n 1) s 3
n
Kurtosis K
(x
i 1
i x)4
(n 1) s 4
Note: More details with applications are discussed in the class.
19
6. DESCRIPTIVE STATISTICS (Statistical Indicators)
One important use of descriptive statistics is to summarize a collection of data in a clear and
understandable way. Collected data may be in either ungrouped form or grouped form. In statistics there
are various types of descriptive statistics. Such statistics are known as “statistical indicators”. The
statistical indicators play an important role in statistical data analysis.
Table 10
Parameter Estimator
Ungrouped Grouped
Arithmetic mean n n n
x xi / n x f i xi / f i
i 1 i 1 i 1
n n
Weighted mean
w x i i fwx i i i
x i 1
n
x i 1
n
w
i 1
i w
i 1
i fi
n n
= i xi such that i 1
i 1 i
N / 2 ( f )1
L1
f median
Median n 1
ranked observation
2 L1=lower class boundary of the median
class
20
6.2 Indicators to Measure Dispersion
The indicators of dispersion are important for describing the spread of the data. Some of such indicators
are as follows:
x
i 1
i x
(c) Mean Absolute Deviation: MAD =
n
The pth percentile is a value so that roughly p percentage of the data are smaller and (100-p) percent of
the data are larger. A percentile is a measure of relative standing against all other observations in the
sample.
p
Desired Percentile: L p (n 1)
100
21
6.4 Indicator to measure relative dispersion
Coefficient of Variation (CV) is a relative measure that indicates the magnitude of variation relative to the
magnitude of the mean.
sd
Coefficient of variation (CV) = * 100%
x
Eg. A manufacturer of television tubes has two types of tubes namely A and B. Mean life
time tubes A and B are 1495 hrs and 1875 and SD of tubes are 280 hrs and 380 hrs
respectively.
280 310
The CV of A = *100 =18.7% and CV of B = *100 = 16.9%
1495 1875
Correlation Coefficient is one of the most common and most useful statistical indicators to
describe the association (degree of linear relationship) between two variables.
=
x y i i n( x )( y )
.
( xi2 nx 2 ) ( yi2 n y 2
Figure 5
22
Table 11
number x y
1 20 35
2 25 45
3 30 50
4 35 65
5 40 60
6 45 65
7 50 70
8 60 65
Scatterplot of y vs x
70
60
y
50
40
30
20 30 40 50 60
x
Figure 6
Note1:
Positive correlation: If x and y have a strong positive linear correlation, r is close to +1. An r
value of exactly +1 indicates a perfect positive fit That is, the relationship between x and y
variables is such that y increases as x increases.
Negative correlation: If x and y have a strong negative linear correlation, r is close to -1. An r
value of exactly -1 indicates a perfect negative fit. Negative values
indicate a relationship between x and y such that as values for x increase, values
for y decrease.
Note 2: A correlation greater than 0.8 is generally described as strong, whereas a correlation
less than 0.5 is generally described as weak. These values can vary based upon the "type" of
data being examined and size of the sample. THIS IS VERY SUBJECTIVE CRITERIA.
23
7. Chebyshev’s Theorm and Empirical Rule
Ex.
a) According to Chebyshev‟s theorem, at least what % of any set of observations will be
within 1.8 standard deviations of the mean?
b) The mean income of a group of a sample observations is Rs. 500 and the sd=Rs. 40.
According to Chebyshev‟s theorem at least what % of the income will be lie between Rs.
400 and 600.
c) HebysIf a group of data has a mean of 54 and a standard deviation of 78.5, what is the
interval that should contain at least 93.8% of the data?
d) Given a data set comprised of 4117 measurements that is bell-shaped with a mean of 862.
If 99.7% of the data lies between 580 and 1144 then what is the standard deviation?
e) Given a group of data with mean 40 and standard deviation 15, at least what percent of
the data will fall between 10 and 70?
24
8. Discrete (Binomial -B, Poisson- P) and
Continuous (Normal- N. Exponential - E) Distributions
Examples.
The engineer is interested in the number of break down buses in a sample of 100 lot.
The doctor studies the number of survivors‟ vs deaths after treatment for a sample of 200
patients
A teacher may interest how many heads occurs when by throwing 60 coins
Ex. 1. A family has 6 children. Find the probability that there are (a) 3 boys and 3 girls and
(b) fewer boys than girls. The probability that any child being a boy = ½.
(b) P(fewer boys than girls) = P(no boys) + P(1 boy) + P(2 boys)
25
Ex. 2. A multiple choice test has four possible answers to each of 16 questions. A student
guesses the answer to each question, i.e., the probability of getting a correct answer on any
given question is 0.25. What is probability that at least 14 questions be correct?
16
= C14 (1/4014(3/4)2+ C15
16
(1/4015(3/4)1+ C16
16
(1/4016(3/4)0
Expectation – E(Y)
n
E(Y) = k P(Y k )
k 0
n
n!
= k k!(n k )!p
k 0
k
(1 p) n k
n
n(n 1)!
= k k (k 1)!(n k )!p
k 0
k
(1 p) n k
n
n(n 1)!
= k k (k 1)!(n k )!( p * p
k 1
k 1
)(1 p) n k
n
(n 1)!
= np p k 1 (1 p) n k
k 1 ( k 1)!( n k )!
m
(m)!
= np p s 1 (1 p) m s ( m = n - 1 and s = k – 1)
s 0 ( s )!( m s )!
= np 1 = np
26
Variance – V(Y)
n
n!
E(Y2) =
k
k 0
2
k!(n k )!
p k (1 p) n k
n
(n 1)!
= np k 2 p k 1 (1 p) n k
k 0 k ( k 1)! ( n k )!
n 1
(n 1)!
= np ( s 1) p s (1 p) n s 1 (let s = k-1)
s 0 s!(n s 1)!
n 1
(n 1)! n 1
(n 1)!
= np ( s) p s (1 p) n s 1 + np s!(n s 1)!p (1 p)
s n s 1
s 0 s!(n s 1)! s 0
= np(n-1)p + 1 = np(np + q)
[First term is the mean of B(n-1, p) and second term is sum of probabilities of B(n-1,p)
V (Y ) np(np q) n 2 p 2 npq ]
Note:
If X ~ B(n, p) and Y ~ B(m, p) and X and Y are independent then X + Y is also a binominal
distribution with (n+m, p) parameters.(Proof is not required)
27
8.2. Normal Distribution
The normal distribution is the most important family of continuous probability distributions
in statistics which is widely applicable in all fields. The distribution is defined by two
parameters, namely mean ("average", μ) and variance ("variability", σ2). The normal
distribution, is known as the Gaussian distribution and is denoted by Y ~ N(µ, 2 ).
PDF of the Normal distribution is given by
1 (x )2
f x ( x) exp[ ] ……………….. [1]
2 2 2 2
In general all normal random variables are converted to the standard normal. If X ~ N(μ,σ2),
X
then Z is a standard normal distribution.
It can be shown that E(Z) = 0 and V(Z) =1. Thus it is written as Z ~ N(0,1).
One of the useful properties of the std. normal distribution is shown below.
Proof:
1 ( y ) 2
exp[ ].dy
E(Y) =
y*
22 2 2
y
Let t = then dy = dt
t2
1 -
E(Y) =
2
(t )e 2
dt
t2 t2
1 - 1 -
=
2
te 2
dt +
2
e 2
dt
= 0+
t2 t2
1 - 1 -
[it can be easily shown that
2
e 2
dt = 1 and
2
te 2
dt =0]
=
28
1 ( y ) 2
E(Y2) = y2* exp[
22
].dy
2 2
t2
1 -
(t ) e 2 2
= dt
2
t2 t2 t2
2 - 2 - 2 -
2 2 2 2
= t e dt + te dt + e dt
2 2 2
t2
2 -
dt + 2
2 2
= t e
2
t2
1.2 x2
1 2
(1) (a) P(Z 1.2) =
2
e dx
0 x2 1.2 2
1 2 1 x2
=
2
e dx +
0 2
e dx
= 0. 5000 + 0.3849
= 8849
1.13 x2 1.13 x2
1 2 1 2
=1-
2
e dx = 1 – [0.5000 +
0 2
e dx
29
(2) Let T be the temperature (oF) in May in a given year and distributed normally with mean
68 and SD 6. Find the probability that the temperature is between 70 & 80.
T ~ N (68, 62)
70 68 T - 68 80 68
P( 70 T 80 ) = P( ) = P(0.33 Z 2.0)
6 6 6
(3) The radius of the nails of a sample of 800 is normally distributed with men 66 mm and
variance 25. Find the number of nails with radius between 65 and 70 mm.
65 66 70 66
P(65 R 70 ) = P( Z ) = P (-0.20 Z 0.80)
5 5
If n is large enough (n > 30) the skewness of the distribution is not too great and in such
situation if Y ~B(n, p) then for large n, Y~ N(np, npq)
Note: But to use the normal approximation to calculate this probability, we apply the
continuity correction in converting discrete to continuous variable.
Ex.
1. A fair die is tossed 180 times. Find the probability that the face 6 will appear between 29
and 32 inclusive.
Let Y = No. of times six appear
Thus Y ~ B(180, 1/6)
Using normal approximation to Binomial distribution,
Y ~ N (np, npq) where n = 180, p = 1/6 and q = 1-p = 5/6
E(Y) = np = 180 1/6 = 30 , V(Y) = npq = 180 1/6 5/6 = 25
SD (Y) = 5
30
29 30 Y 30 32 30
P (29 Y 32) = P Z
5 5 5
= P (-0.2 Z 0.4) = = 0.1554 + 0.0793 = 0.2347
3. The grades on a short quiz in statistics were 0, 1, 2, .. ,10 points depending on the
number of 10 questions. The mean grade = 6.7 and sd =1.2. Assuming the grades to
be normally distributed determine (a) % of students scoring 6 points and (b)
maximum grade of the lower 10% and (c ) the minimum grade of highest 10% in the
class.
31
8.3. Poisson Distribution
The Poisson distribution is also a discrete distribution which is used to model the number of
events occurring within a given time interval.
K e
K 1e
s e
E(Y2) = k2
K 0 k!
= k
K 1 (k 1)!
= (s 1)
s 0 s!
(k-1 = s)
s e s e
= s + = E(Y) + *1 = 2 +
s 0 s! s 0 s!
Note:
Since there is a relationship between Binomial & Normal there is a relation between Poisson
and Normal. It is given by ,
P ( ) N ( , ) = N ( , )
2
32
Eg. 1. If a probability that an individual suffer a bad reaction from injection of a given type is
0.001 determine the probability that out of 2000 individuals (a) exactly 3 and (b) more than 2
will suffer bad reaction.
Let Y = number of individuals suffer bad reaction
As N is large and p is small it can be assumed that Y ~ P( ) where = np =2000*0.001 = 2.
23 e 2
P(Y=3) = = 0.1804
3!
Assuming Y ~ B (2000, 0.001), P(Y=3) = 2000C3(0.001)3(1-0.001)1997 = 0.1805
1 1
H/E: Prove that E(X) = and V(X) = 2
Ex. Suppose that customers arrive at bank‟s ATM at the rate 20 per hour. If a customer has
just arrived, what is the probability that the next customer arrive within 6 minutes ?
X ~ E( ) = E(20)
P(X < 0.1) = 1 e 20(0.1) =0.8647
33
9. STATISTICAL INFERENCES
It is not possible to find parameters (mean, variance etc) of a population due to obvious
reasons. Thus we have to compute a value (or range) that represents a ``good'' guess for the
true values of the parameter to make conclusions (inferences) on the population based on
sample. There are two types of estimators namely
(a) Point estimator and
(b) interval estimator
For example:
We want to know the average salary of chemical engineering graduate. So we selected 25
people at random. The mean annual income is 60,000/=. This is a point estimate. Using an
interval estimate we say that the mean annual income is between 40,000 and 85,000/= with
95% confidence.
xi (x x)
i
2
ˆ sample mean x = i 1
and ˆ 2 sample var aince s 2 = i 1
n n
we want - E(x) 0
34
Note: The difference is known as the „bias‟ of the parameter estimated based on sample.
Thus to obtain more precise estimator for the population parameter “bias” should be zero.
That is estimator should be unbiased.
x i
1 n 1 n
Proof: E( x ) = E( i 1
n
) = i n
n i 1
E ( x ) =
i 1
=
Note 2: Sample variance s2 is not an unbiased estimator for the population variance 2
n 2 n
( xi x )
i 1
x
i 1
2
i nx 2
Proof: s 2 = =
n n
n n n
2
n 2
2
V ( x ) [E ( x )] i i n[V ( x ) {E ( x )} n n
2 2 2 n
n
= - = -
n n n n
n 2
xi 1 (n 1) 2 ( xi x )
[V( x ) =V(
n
) =
n2
i
V ( x ) =
n
. Hence i 1
n
is not an unbiased
estimator for 2 .
n
Thus E[ ( xi x ) 2 ] = (n-1) 2
i 1
(x x) i
2
Thus i 1
is an unbiased estimator for the population variance 2 .
(n 1)
35
9.4 Central Limit Theorem (without proof)
For large n (n> 30) the distribution of mean is approximately normal with mean and
2
variance (irrespective of population and mean and variance are finite). Thus, the Central
n
Limit theorem is the foundation for many statistical procedures. The distribution of an
average will tend to be Normal as the sample size increases, regardless of the distribution
from which the average is taken.
Case 1: is known
To explain how confidence intervals are constructed, we are going to work backwards and
begin by assuming characteristics of the population.
2
Let X ~ N ( , 2 ) then X ~ N ( , )
n
X
Z= ~ N (0,1)
/ n
Based on standard normal assumption we know that an actual sample statistic lying in the
interval , 2 and 3 ( 0 and 2 =1 ) about 68%, 95% and 99%
confidence. Thus we use this concept to compute CI for mean instead of point estimator. As
P( -1.96 Z +1.96 ) = 95.0% (as shown below)
Figure 7
36
X
P( -1.96 +1.96 ) = 95% P( -1.96 X + 1.96 ) = 95%
/ n n n
Thus 95% CI for mean is given by X 1.96
n
So 99% CI for mean is given by X 2.58 and 90% CI for mean is given by X 1.65
n n
Thus % CI for mean (when is known) is given by X / 2
n
Example 1: Suppose that we found that the mean mark (out of 20) of 50 students in Mid-
term test is 12 with a standard deviation of 6. What can we conclude about the average
marks of students with a 95% confidence level?
6
x 1.96 =12 1.96 =12 1.66 = [10.35 - 13.66] = [10,14]
n 50
Thus we are 95% confidence that interval for the mean marks is between 10.35 and 13.66.
We can also say that 1.66 is the margin of error.
Example 1. The blood cholesterol levels of a population of teachers have mean 202 and SD
14. If a sample of 30 teachers is selected approximate the probability that the sample mean of
their blood cholesterol level will lie between 198 and 206. Repeat it for sample size of 64
(Class Exercise)
37
Case 2: Confidence Interval when population variance is not known
When is unknown (i.e. estimated by a sample variance, s ) and thus 100(1- )% CI for
s
mean is given by X / 2
n
Case 3: Confidence Interval when population variance is not known and n < 30
When is unknown (i.e. estimated by a sample variance) and sample size is small (< 30)
s
% CI for mean is given by X t / 2,n 1
n
x
Note : In this case it is assumed that ~ t n1
s/ n
Eg. Given the following GPA for 6 students: 2.80, 3.20, 3.75, 3.10, 2.95, 3.40. Calculate a
95% confidence interval for the population mean GPA.
0.339
Ans: 3.2 2.57 = [2.84 – 3.56] ( based on case 3)
6
0.339
If we assume normal the CI = 3.2 1.96 = [2.93 – 3.47]
6
38
10. POINT ESTIMATOR AND CONFIDENCE INTERVAL FOR PROPORTION
x 1 np
E( ) = E ( x) = p Thus p̂ is an unbiased estimator for p.
n n n
x 1 pq
V( ) = 2 V ( x) npq / n 2 =
n n n
pˆ (1 pˆ )
pˆ / 2 SD( pˆ ) = pˆ / 2 .
n
Example: Out of random sample of 100 boxes coming from a particular machine, 82 were
non defective. Construct the 99% CI estimate of the proportion of non defectives.
82
Estimator for non defective = pˆ
100
0.82(1 0.82)
Required CI = 0.82 2.576 = [72.1, 91.9]
100
Example. The Ceylon Daily News reported that a poll in Jaffna 46% of the population was in
the favour of the present paddy prices with a margin error of 3%. How many people were
questioned?
pˆ (1 pˆ )
Margin error = / 2 ,
n
0.46 (1 0.46)
Thus 1.962 0.032 n = 1060
n
Example: A sample pole of 100 voters chosen at random from all voters in a given district
indicated that 55% of them were in favor of a particular candidate. Find the (a) 95% and (b)
99% confidence limits for the proportion of all the voters in favor of this candidate. How
larger sample of voters should we take in order to be 99% confident that the candidate will be
elected.
39
Home Exercises (Tutorial) – Applying Concepts
1. Using the company records of last 500 working days the Manager of the company has
summarized the number of cars sold per day as follow.
Number of cars sold/day Frequency
0 40
1 100
2 142
3 66
4 36
5 30
6 26
7 20
8 16
9 14
10 8
11 2
(a) Construct the pdf and cdf of the number of cars sold (say, Y).
(b) Compute the expected value of Y and SD
(c) Find the number of cars sold less than or equals to 50%?
3. The following data are the estimated market value (in Rs. 100,000) of 50 companies.
26.8 8.6 6.5 30.6 15.4 18.0 7.6 21.5
11.0 10.2 28.3 15.5 31.4 23.4 4.3 20.2
33.5 7.9 11.2 1.0 11.7 18.5 6.8 22.3
12.9 29.8 1.3 14.1 29.7 18.7 6.7 31.4
30.4 20.6 5.2 37.8 13.4 18.3 27.1 32.7
6.1 .9 9.6 35.0 17.1 1.9 1.2 16.6
31.1 16.1
(a) Determine the mean, standard deviation and the median of the market values and
interpret.
(b) Using the empirical rule about 95% of the values would occur between what values.
(c) Determine the coefficient of variation and interpret.
(d) Estimate Q1 and Q3 values and interpret.
(e) Draw a Box plot and write brief report of the variability of data.
40
4. Consider the following joint distribution of X and Y.
X Y Total
-2 -1 4 3
1 0.1 0.2 0.0 0.3 0.6
2 0.2 0.1 0.1 0.0 0.4
Total 0.3 0.3 0.1 0.3
3. The portfolio expected return and portfolio risk of two asset investments X and Y is given
41
42