Stat 4-6 Chapter
Stat 4-6 Chapter
correlation
CHAPTER 4
4. SIMPLE LINEAR REGRESSION AND CORRELATION
Linear regression and correlation is studying and measuring the linear relationship among
two or more variables. When
only two variables are involved, the analysis is referred to as simple correlation and simple
linear regression analysis, and when there are more than two variables the term multiple
regression and partial correlation is used.
Correlation Analysis: deals with the measurement of the closeness of the relationship
which are described in the regression equation.
We say there is correlation if the two series of items vary together directly or inversely.
The presence of correlation between two variables may be due to three reasons:
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
1.One variable being the cause of the other. The cause is called “subject” or
“independent” variable, while the effect is called “dependent” variable.
2.Both variables being the result of a common cause. That is, the correlation
that exists between two variables is due to their being related to some third
force.
Example:
Let X1= ESLCE result
Y1= rate of surviving in the University
Y2= the rate of getting a scholar ship.
Both X1&Y1 and X1&Y2 have high positive correlation, likewiseY1 & Y2 have
positive correlation but they are not directly related, but they are related to each
other via X1.
Examples:
Price of teff in Addis Ababa and grade of students in USA.
Weight of individuals in Ethiopia and income of individuals in Kenya.
r
( X i X )(Yi Y ) and the short cut formula is
( X i X ) (Yi Y )
2 2
n XY ( X )( Y )
r
[n X 2 ( X ) 2 ] [n Y 2 ( Y ) 2
r
XY nXY
[ X 2 nX 2 ] [ Y 2 nY 2 ]
Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
Interpretation of r
1.Perfect positive linear relationship ( if r 1)
2.Some Positive linear relationship ( if r is between 0 and 1)
3.No linear relationship ( if r 0)
4.Some Negative linear relationship ( if r is between -1 and 0)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)
r
XY nXY
[ X 2 n X 2 ] [ Y 2 nY 2 ]
10331 10(31.2)(32.9)
(9920 10(973.4)) (11003 10(1082.4))
66.2
0.363
182.5
This means mid semester exam and final exam scores have a slightly positive correlation.
Exercise The following data were collected from a certain household on the monthly
income (X) and consumption (Y) for the past 10 months. Compute the simple correlation
coefficient.
X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
The above formula and procedure is only applicable on quantitative data, but when we
have qualitative data like efficiency, honesty, intelligence, etc we calculate what is
called Spearman’s rank correlation coefficient as follows:
Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by D i
iii. Use the following formula
6 Di
2
rs 1
n(n 2 1)
Where rs coefficient of rank correlation
D the difference between paired ranks
n the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies.
Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7
Solution:
X Y R1-R2 D2
(R1) (R2) (D)
2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12
6 Di
2
6(12)
rs 1 1 0.786
n(n 2 1) 7(48)
b
( X i X )(Yi Y ) XY nXY
( X i X )2 X 2 nX 2
a Y bX
Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.
Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).
b)
where:
Yˆ 7.0194 0.9560 X
7.0194 0.9560(85) 88.28
Exercise: A car rental agency is interested in studying the relationship between the distance
driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The following
summarized information is given based on samples of size 5.
2
i 1 X i 147,000,000 i 1 Yi 314
5 5 2
- To know how far the regression equation has been able to explain the variation in Y we
2
use a measure called coefficient of determination ( r )
(Yˆ Y ) 2
i.e r 2
(Y Y ) 2
Where r the simple correlation coefficient.
2
- r gives the proportion of the variation in Y explained by the regression of Y on X.
- 1 r gives the unexplained proportion and is called coefficient of indetermination.
2
SX Y
( X i X )(Yi Y ) XY nXY
n 1 n 1
Xˆ a1 b1Y
b1
XY nXY
Y nY
2 2
b1SY
a1 X b1Y , r
SX
Here X is dependent and Y is independent.
4Y 15 X 530 0 and
20 X 3Y 975 0
Determine which is regression of Y on X and X on Y
Solution
We will assume one of the equation as regression of X on Y and the other as Y on X
and calculate r
15 3 9
r 2 bYX * bXY 0,1
4 20 16
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
CHAPTER 5
5. ELEMENTARY PROBABILITY
Introduction
Probability theory is the foundation upon which the logic of inference is built.
It helps us to cope up with uncertainty.
In general, probability is the chance of an outcome of an experiment. It is the
measure of how likely an outcome is to occur.
Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of times under
similar conditions and it is possible to enumerate the total number of outcomes with out
predicting an individual outcome. It is also called random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5, 6
but it is not possible to predict which outcome will occur.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space: Set of all possible outcomes of a probability experiment
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of a
random experiment. They are denoted by capital letters.
Example: Considering the above experiment let A be the event of odd numbers, B be the event of
even numbers, and C be the event of number 8.
A 1,3,5
B 2,4,6
C or empty space or impossible event
Remark: If S (sample space) has n members then there are exactly 2 n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non-occurrence of A and is
' c
denoted by A , or A , or A contains those points of the sample space which don’t belong
to A.
8. Elementary Event: an event having only a single element or sample point.
9. Mutually Exclusive Events: Two events which cannot happen at the same time.
10. Independent Events: Two events are independent if the occurrence of one does not affect
the probability of the other occurring.
11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.
Solution
a) S={1,2,3,4,5,6}
b) S={(HH),(HT),(TH),(TT)}
c) S={t /t≥0}
Sample space can be
Countable ( finite or infinite)
Uncountable.
Counting Rules
In order to calculate probabilities, we have to know
The number of elements of an event
The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule
To list the outcomes of the sequence of events, a useful device called tree diagram is used.
Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or
milk with bread, cake and sandwich. How many possibilities does he have?
Solutions:
Tea
Bread
Cake
Sandwich
Coeffee
Bread
Cake
Milk Sandwich
Bread
Cake
Sandwich
If a choice consists of k steps of which the first can be made in n1 ways, the second can be
made in n2 ways, …, the kth can be made in nk ways, then the whole choice can be made in
(n1 * n2 * ........ * nk ) ways.
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4-digit identification card. How many
different cards are possible if a) Repetitions are permitted?
b) Repetitions are not permitted.
Solutions
a)
1st digit 2nd digit 3rd digit 4th digit
5 5 5 5
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.
Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! n * (n 1) * (n 2) * ..... * 3 * 2 *1
2. The arrangement of n objects in a specified order using r objects at a time is called
the permutation of n objects taken r objects at a time. It is written as n Pr and the
formula is
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
n!
n Pr
(n r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike etc is
n!
k1!*k 2 * ... * k n
Example:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there if two letters are used at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?
Solutions: 1. a)
Here n 4, there are four disnict object
There are 4! 24 permutations.
b)
Here n 4, r 2
4! 24
There are 4 P2 12 permutations.
(4 2)! 2
2.
Here n 10
Of which 2 are C , 2 are O, 2 are R ,1E ,1T ,1I ,1N
K1 2, k 2 2, k 3 2, k 4 k 5 k 6 k 7 1
U sin g the3rd rule of permutation , there are
10!
453600 permutations.
2!*2!*2!*1!*1!*1!*1!
Exercises:
1. Six different statistics books, seven different physics books, and 3 different
Economics books are arranged on a shelf. How many different arrangements are
possible if;
i. The books in each particular subject must all stand together
ii. Only the statistics books must stand together
2. If the permutation of the word WHITE is selected at random, how many of the
permutations
i. Begins with a consonant?
ii. Ends with a vowel?
iii. Has a consonant and vowels alternating?
Combination
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
a) If there is no restriction select three clocks from 15 clocks and this can be
done in :
n 15 , r 3
n n! 15!
455 ways
(n r )!*r! 12!*3!
r
b) None of the defective clocks is included.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
This is equivalent to zero defective and three non defective, which can be done
in:
2 13
* 286 ways.
0 3
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non defective, which can be done in:
2 13
* 156 ways.
1 2
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
2 13
* 13 ways.
2 3
Exercises:
1. Out of 5 Mathematician and 7 Statistician a committee consisting of 2
Mathematician and 3 Statistician is to be formed. In how many ways this can
be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians cannot be included on the committee.
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of
poems, and a dictionary, in how many ways this can be done if
a) There is no restriction.
b) The dictionary is selected?
c) 2 novels and 1 book of poems are selected?
Definition: If a random experiment with N equally likely outcomes is conducted and out
of these NA outcomes are favorable to the event A, then the probability that event A occur
denoted P(A) is defined as:
N A No. of outcomes favourable to A n( A)
P ( A)
N Total number of outcomes n( S )
Examples:
Solutions:
80
Total selection N n( S )
10
a) Let A be the event that all will be defective.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
30 50
Total way in which A occur * N A n( A)
10 0
30 50
*
n( A) 10 0
P ( A) 0.00001825
n( S ) 80
10
b) Let A be the event that 6 will be non defective.
30 50
Total way in which A occur * N A n( A)
4 6
30 50
*
n( A) 4 6
P ( A) 0.265
n( S ) 80
10
c) Let A be the event that all will be non defective.
30 50
Total way in which A occur * N A n( A)
0 10
30 50
*
n( A) 0 10
P ( A) 0.00624
n( S ) 80
10
Exercises:
1. What is the probability that a waitress will refuse to serve alcoholic beverages to
only three minors if she randomly checks the I.D’s of five students from among
ten students of which four are not of legal age?
Solution: Let A be the event that the newly produced bulb is defective.
NA 60
P( A) lim 0.0006
N N 100,000
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event
A a real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.
1. P( A) 0
2. P( S ) 1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur
equals the sum of the two probabilities. i.e. P( A B) P( A) P( B)
4. If A and B are independent events, the probability that both will occur is the product
of the two probabilities. i.e. P(A ∩ B) = P(A)*P(B)
5. P( A' ) 1 P( A)
6. 0 P( A) 1
7. P(ø) =0, ø is the impossible event.
Remark: Venn-diagrams can be used to solve probability problems.
A
AUB A∩B
In general p( A B) p( A) p( B) p( A B)
Conditional Events: If the occurrence of one event has an effect on the next occurrence
of the other event then the two events are conditional or dependant events.
Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement
Since the first drawn ball is replaced for a second draw it doesn’t affect the
second draw. For this reason A and B are independent. Then if we let
2
A= the event that the first draw is red p ( A)
5
2
B= the event that the second draw is red p( B)
5
2. Draw a ball with out replacement
This is conditional b/c the first drawn ball is not to be replaced for a second draw
in that it does affect the second draw. If we let
2
A= the event that the first draw is red p ( A)
5
B= the event that the second draw is red p( B) ?
Let B= the event that the second draw is red given that the first draw is red P(B) = 1/4
The conditional probability of an event A given that B has already occurred, denoted by
p ( A B ) is
p( A B)
p( A B) = , p( B) 0
p( B)
Remark: (1) p( A B) 1 p( A B)
'
(2) p( B A) 1 p( B A)
'
Examples
1. For a student enrolling at freshman at certain university the probability is 0.25 that
he/she will get scholarship and 0.75 that he/she will graduate. If the probability is
0.2 that he/she will get scholarship and will also graduate. What is the probability
that a student who get a scholarship graduate?
Note: for any two events A and B the following relation holds.
pB pB A. p A p B A' . p A'
Probability of Independent Events
Two events A and B are independent if and only if p A B p A. p B
Here p A B p A ,
PB A pB
Example; A box contains four black and six white balls. What is the probability of
getting two black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution; Let A= first drawn ball is black
B= second drawn is black
Required p A B
a. p A B pB A. p A 3 / 94 10 2 15
b. p A B p A. pB 4 104 10 4 25
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
CHAPTER 6
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Definition: A random variable is a numerical description of the outcomes of the experiment or
a numerical valued function defined on sample space, usually denoted by capital letters.
Example: If X is a random variable, then it is a function from the elements of the sample space
to the set of real numbers. i.e. X is a function X: S R
A random variable takes a possible outcome and assigns a number to it.
Example: Flip a coin three times, let X be the number of heads in three tosses.
S HHH , HHT , HTH , HTT , THH , THT , TTH , TTT
X HHH 3,
X HHT X HTH X THH 2,
X HTT X THT X TTH 1
X TTT 0
X = {0, 1, 2, 3, 4, 5}
X assumes a specific number of values with some probabilities.
Random variables are of two types:
1. Discrete random variable: are variables which can assume only a specific number of
values. They have values that can be counted
Examples:
Toss coin n times and count the number of heads.
Number of children in a family.
Number of car accidents per week.
Number of defective items in a given company.
Number of bacteria per two cubic centimeter of water.
2. Continuous random variable: are variables that can assume all values between any two
give values.
Examples:
Height of students at certain college.
Mark of a student.
Life time of light bulbs.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
Probability Distribution
Definition: a probability distribution consists of value that a random variable can assume and
the corresponding probabilities of the values.
Example: Consider the experiment of tossing a coin three times. Let X is the number of heads.
Construct the probability distribution of X.
Solution:
First identify the possible value that X can assume.
Calculate the probability of each possible distinct value of X and express X in the
form of frequency distribution.
X x 0 1 2 3
P X x 18 38 38 18
P X x 1 , if X is discrete.
x
f ( x)dx 1 , if is continuous.
x
Note:
1. If X is a continuous random variable then
b
P (a X b) f ( x)dx
a
Introduction to expectation
Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities
P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is
defined as:
E ( X ) X 1 P( X 1 ) X 2 P( X 2 ) .... X n P( X n )
n
X i P( X i )
i 1
2. Let X be a continuous random variable assuming the values in the interval (a, b) such
b b
that f ( x)dx 1 ,then E ( X ) x f ( x)dx
a a
Examples:
1. What is the expected value of a random variable X obtained by tossing a coin three
times where X is the number of heads?
Solution:
First construct the probability distribution of X
X x 0 1 2 3
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
P X x 18 38 38 18
E ( X ) X 1 P( X 1 ) X 2 P( X 2 ) .... X n P( X n )
0 *1 8 1 * 3 8 ..... 2 *1 8
1 .5
2. Suppose a charity organization is mailing printed return-address stickers to over
one million homes in Ethiopia. Each recipient is asked to donate either $1, $2, $5,
$10, $15, or $20. Based on past experience, the amount a person donates is believed
to follow the following probability distribution:
6
E ( X ) xi P( X xi ) $7.25
i 1
Where:
n
E ( X 2 ) xi P( X xi ) , if X is discrete
2
i 1
x 2 f ( x)dx , if X is continuous.
x
Examples:
1. Find the mean and the variance of a random variable X in example 2 above.
Solution:
E ( X ) 7.25
Var ( X ) E ( X 2 ) [ E ( X )]2 82.15 7.252 29.59
Exercise: Two dice are rolled. Let X is a random variable denoting the sum of the numbers
on the two dice.
i) Give the probability distribution of X
ii) Compute the expected value of X and its variance
There are some general rules for mathematical expectation.
Let X and Y are random variables and k is a constant.
RULE 1: E (k ) k
RULE 2: Var (k ) k
RULE 3: E (kX ) kE ( X )
RULE 4: Var (kX ) k
2
Var ( X )
RULE 5: E ( X Y ) E ( X ) E (Y )
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
n
P( X x) p x q n x , x 0,1,2,...., n
x
And this is some times written as: X ~ Bin(n, p)
When using the binomial formula to solve problems, we have to identify three things:
The number of trials ( n )
The probability of a success on any one trial ( p ) and
The number of successes desired ( X ).
Examples:
1. What is the probability of getting three heads by tossing a fair con four times?
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
Solution: Let X be the number of heads in tossing a fair coin four times
X ~ Bin(n 4, p 0.50)
n
P ( X x) p x q n x , x 0,1,2,3,4
x
4
0.5 x 0.54 x
x
4
0.54
x
4
P ( X 3) 0.54 0.25
3
2. Suppose that an examination consists of six true and false questions, and assume that a student
has no knowledge of the subject matter. The probability that the student will guess the correct
answer to the first question is 30%. Likewise, the probability of guessing each of the remaining
questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution: Let X = the number of correct answers that the student gets.
X ~ Bin(n 6, p 0.30)
a) P( X 3) ?
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
n
P( X x) p x q n x , x 0,1,2,..6
x
6
0.3 x 0.7 6 x
x
P ( X 3) P ( X 4) P( X 5) P ( X 6)
0.060 0.010 0.001
0.071
Thus, we may conclude that if 30% of the exam questions are answered by guessing, the
probability is 0.071 (or 7.1%) that more than four of the questions are answered correctly
by the student.
b) P( X 2) ?
P( X 2) P( X 2) P( X 3) P( X 4) P( X 5) P( X 6)
0.324 0.185 0.060 0.010 0.001
0.58
c) P( X 3) ?
P( X 3) P( X 0) P( X 1) P( X 2) P( X 3)
0.118 0.303 0.324 0.185
0.93
d) P( X 5) ?
P( X 5) 1 P( X 5)
1 {P( X 5) P( X 6)}
1 (0.010 0.001)
0.989
Exercises:
a. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If eight of
these TVs are randomly selected from across the country and tested, what is the probability
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
that exactly three of them are defective? Assume that each TV is made independently of
the others.
b. An allergist claims that 45% of the patients she tests are allergic to some type of weed.
What is the probability that
I. Exactly 3 of her next 4 patients are allergic to weeds?
II. None of her next 4 patients are allergic to weeds?
c. Explain why the following experiments are not Binomial
I. Rolling a die until a 6 appears.
II. Asking 20 people how old they are.
III. Drawing 5 cards from a deck for a poker hand.
Remark: If X is a binomial random variable with parameters n and p then
E ( X ) np , Var ( X ) npq
2. Poisson Distribution
A random variable X is said to have a Poisson distribution if its probability distribution is
given by:
x e
P( X x) , x 0,1,2,......
x!
Where the average number .
The Poisson distribution depends only on the average number of occurrences per unit time
of space.
The Poisson distribution is used as a distribution of rare events, such as: Arrivals,
Accidents, Number of misprints, Hereditary, Natural disasters like earth quake, etc.
The process that gives rise to such events is called Poisson process.
Example: If 1.6 accidents can be expected an intersection on any given day, what is the
probability that there will be 3 accidents on any given day?
Solution: Let X =the number of accidents, 1.6
1.6 x e 1.6
X poisson1.6 p X x
x!
1.63 e 1.6
p X 3 0.1380
3!
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
Exercise: On the average, five smokers pass a certain street corners every ten minutes,
what is the probability that during a given 10 minutes the number of smokers passing will
be
a. 6 or fewer
b. 7 or more
c. Exactly 8…….
If X is a Poisson random variable with parameter then
E (X ) , Var (X )
Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with np .
(np) x e ( np )
P( X x) , x 0,1,2,......
x!
Where np the average number .
Usually we use this approximation if np 5 . In other words, if n 20 and np 5 [or
n(1 p) 5 ], then we may use Poisson distribution as an approximation to binomial distribution.
Example: Find the binomial probability P(X=3) by using the Poisson distribution if p 0.01
and n 200 . Solution:
U sin g Poisson , np 0.01* 200 2
23 e 2
P ( X 3) 0.1804
3!
U sin g Binomial , n 200, p 0.01
200
P ( X 3) (0.01)3 (0.99)99 0.1814
3
A random variable X is said to have a normal distribution if its probability density function is
1 x 2
1
2
f ( x) e , x , , 0
2
Where E ( X ), 2 Variance( X )
and 2 are the Parameters of the Normal Distribution.
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
1
ordinate is at x and is given by f ( x)
2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean
is 0.5. f ( x)dx 1
6. It is unimodal, i.e., values mound up only in the center of the curve.
7. Mean Median mod e
8. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the standard
normal distribution was derived by using the transformation
X
1
1 2z 2
Z f ( z) e
2
- Areas under the standard normal distribution curve have been tabulated in various ways. The
most common ones are the areas between Z 0 and a positive value of Z .
- Given normal distributed random variable X with mean and s tan dard deviation
a X b
P ( a X b) P ( )
a b
P ( a X b) P ( Z )
Note:
P ( a X b) P ( a X b)
P ( a X b)
P ( a X b)
Examples:
1. Find the area under the standard normal distribution which lies
a) Between Z 0 and Z 0.96
Solution:
Area P(0 Z 0.96) 0.3315
Solution:
Area P( Z 0.35)
P(0.35 Z 0) P( Z 0)
P(0 Z 0.35) P( Z 0)
0.1368 0.50 0.6368
d) To the left of Z 0.35
Solution:
Area P( Z 0.35)
1 P ( Z 0.35)
1 0.6368 0.3632
e) Between
Z 0.67 and Z 0.75
Solution:
Solution
P(0 Z z ) 0.4726 and from table
P(0 Z 1.92) 0.4726
z 1.92.....uniqueness of Areea.
b) The area to the left of z is 0.9868
Solution
P ( Z z ) 0.9868
P ( Z 0) P ( 0 Z z )
0.50 P (0 Z z )
P (0 Z z ) 0.9868 0.50 0.4868
and from table
P (0 Z 2.2) 0.4868
z 2 .2
3. A random variable X has a normal distribution with mean 80 and standard deviation
4.8. What is the probability that it will take a value
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
Solution
X is normal with mean, 80, s tan dard deviation, 4.8
a)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
X 87.2
P ( X 87.2) P ( )
87.2 80
P( Z )
4.8
P ( Z 1.5)
P ( Z 0) P (0 Z 1.5)
0.50 0.4332 0.9332
b)
X 76.4
P ( X 76.4) P( )
76.4 80
P( Z )
4.8
P( Z 0.75)
P( Z 0) P (0 Z 0.75)
0.50 0.2734 0.7734
c)
81.2 X 86.0
P (81.2 X 86.0) P( )
81.2 80 86.0 80
P( Z )
4.8 4.8
P (0.25 Z 1.25)
P (0 Z 1.25) P (0 Z 1.25)
0.3934 0.0987 0.2957
4. A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area
under the normal curve lies to the right of 72.9
Solution
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation
X 72.9
P( X 72.9) 0.2005 P( ) 0.2005
72.9 62.4
P( Z ) 0.2005
10.5
P( Z ) 0.2005
10.5
P (0 Z ) 0.50 0.2005 0.2995
And from table P(0 Z 0.84) 0.2995
10.5
0.84
12.5
5. A random variable has a normal distribution with 5 .Find its mean if the
probability that the random variable will assume a value less than 52.5 is 0.6915.
Solution
52.5
P( Z z ) P( Z ) 0.6915
5
P(0 Z z ) 0.6915 0.50 0.1915.
But from the table
P(0 Z 0.5) 0.1915
52.5
z 0.5
5
50
Exercise: Of a large group of men, 5% are less than 60 inches in height and 40% are
between 60 & 65 inches. Assuming a normal distribution, find the mean and standard
deviation of heights.