0% found this document useful (0 votes)
94 views37 pages

Stat 4-6 Chapter

This document discusses simple linear regression and correlation. It defines linear regression as developing a mathematical equation showing how variables are related, while correlation measures the closeness of that relationship. Simple correlation examines the relationship between two variables and can be positive, negative, or no correlation. Simple linear regression fits a line to plotted data points between one independent and one dependent variable. Calculating the correlation coefficient involves summing products of deviations from the means to determine the strength and direction of the linear relationship between two variables.

Uploaded by

Gizachew Nadew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
94 views37 pages

Stat 4-6 Chapter

This document discusses simple linear regression and correlation. It defines linear regression as developing a mathematical equation showing how variables are related, while correlation measures the closeness of that relationship. Simple correlation examines the relationship between two variables and can be positive, negative, or no correlation. Simple linear regression fits a line to plotted data points between one independent and one dependent variable. Calculating the correlation coefficient involves summing products of deviations from the means to determine the strength and direction of the linear relationship between two variables.

Uploaded by

Gizachew Nadew
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

Lecture notes on Introduction to Statistics IX: simple linear regression and

correlation

CHAPTER 4
4. SIMPLE LINEAR REGRESSION AND CORRELATION
Linear regression and correlation is studying and measuring the linear relationship among
two or more variables. When
only two variables are involved, the analysis is referred to as simple correlation and simple
linear regression analysis, and when there are more than two variables the term multiple
regression and partial correlation is used.

Regression Analysis: is a statistical technique that can be used to develop a mathematical


equation showing how variables are related.

Correlation Analysis: deals with the measurement of the closeness of the relationship
which are described in the regression equation.
We say there is correlation if the two series of items vary together directly or inversely.

Simple Correlation: Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and


Y  (Y1 , Y2 ,...Yn )
 When higher values of X are associated with higher values of Y and lower values
of X are associated with lower values of Y, then the correlation is said to be positive
or direct.
Examples:
- Income and expenditure
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
 When higher values of X are associated with lower values of Y and lower values
of X are associated with higher values of Y, then the correlation is said to be
negative or inverse.
Examples:
- Demand and supply
- Income and the proportion of income spent on food.
The correlation between X and Y may be one of the following
1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)

The presence of correlation between two variables may be due to three reasons:
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

1.One variable being the cause of the other. The cause is called “subject” or
“independent” variable, while the effect is called “dependent” variable.
2.Both variables being the result of a common cause. That is, the correlation
that exists between two variables is due to their being related to some third
force.
Example:
Let X1= ESLCE result
Y1= rate of surviving in the University
Y2= the rate of getting a scholar ship.

Both X1&Y1 and X1&Y2 have high positive correlation, likewiseY1 & Y2 have
positive correlation but they are not directly related, but they are related to each
other via X1.

3.Chance: The correlation that arises by chance is called spurious correlation.

Examples:
 Price of teff in Addis Ababa and grade of students in USA.
 Weight of individuals in Ethiopia and income of individuals in Kenya.

Therefore, while interpreting correlation coefficient, it is necessary to see if there is any


likelihood of any relation ship existing between variables under study.
The correlation coefficient between X and Y denoted by r is given by

r
 ( X i  X )(Yi  Y ) and the short cut formula is
 ( X i  X )  (Yi  Y )
2 2

n XY  ( X )( Y )
r
[n X 2  ( X ) 2 ] [n Y 2  ( Y ) 2
r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]
Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
Interpretation of r
1.Perfect positive linear relationship ( if r  1)
2.Some Positive linear relationship ( if r is between 0 and 1)
3.No linear relationship ( if r  0)
4.Some Negative linear relationship ( if r is between -1 and 0)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

5.Perfect negative linear relationship ( if r  1)


Examples:

1. Calculate the simple correlation between mid semester and final exam scores of 10
students (both out of 50)

Student Mid Sem.Exam Final Sem.Exam


(X) (Y)
1 31 31
2 23 29
3 41 34
4 32 35
5 29 25
6 33 35
7 28 33
8 31 42
9 31 31
10 33 34
Solution:
n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4
 XY  10331, X 2
 9920, Y 2
 11003

r
 XY  nXY
[  X 2  n X 2 ] [  Y 2  nY 2 ]
10331  10(31.2)(32.9)

(9920  10(973.4)) (11003  10(1082.4))
66.2
  0.363
182.5
This means mid semester exam and final exam scores have a slightly positive correlation.

Exercise The following data were collected from a certain household on the monthly
income (X) and consumption (Y) for the past 10 months. Compute the simple correlation
coefficient.

X: 650 654 720 456 536 853 735 650 536 666
Y: 450 523 235 398 500 632 500 635 450 360
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

 The above formula and procedure is only applicable on quantitative data, but when we
have qualitative data like efficiency, honesty, intelligence, etc we calculate what is
called Spearman’s rank correlation coefficient as follows:

Steps
i. Rank the different items in X and Y.
ii. Find the difference of the ranks in a pair , denote them by D i
iii. Use the following formula
6 Di
2
rs  1 
n(n 2  1)
Where rs  coefficient of rank correlation
D  the difference between paired ranks
n  the number of pairs
Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies.

Lipstick types A B C D E F G
Aster 2 1 4 3 5 7 6
Almaz 1 3 2 4 5 6 7

Solution:
X Y R1-R2 D2
(R1) (R2) (D)
2 1 1 1
1 3 -2 4
4 2 2 4
3 4 -1 1
5 5 0 0
7 6 1 1
6 7 -1 1
Total 12
6 Di
2
6(12)
 rs  1   1   0.786
n(n 2  1) 7(48)

Yes, there is positive correlation.


Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Simple Linear Regression


- Simple linear regression refers to the linear relation ship between two variables
- We usually denote the dependent variable by Y and the independent variable by X.
- A simple regression line is the line fitted to the points plotted in the scatter diagram,
which would describe the average relation ship between the two variables. Therefore,
to see the type of relation ship, it is advisable to prepare scatter plot before fitting the
model.
Y    X  
Where :Y  Dependent var iable
- The linear model is: X  independent var iable
  Re gression cons tan t
  regression slope
  random disturbance term
Y ~ N (   X ,  2 )
 ~ N (0,  2 )

- To estimate the parameters (  and  ) we have several methods:


 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.

- The above model is estimated by: Yˆ  a  bX


Where a is a constant which gives the value of Y when X=0 .It is called the Y-intercept.
b is a constant indicating the slope of the regression line, and it gives a measure of the
change in Y for a unit change in X. It is also regression coefficient of Y on X.
- a and b are found by minimizing SSE      (Yi  Yˆi )
2 2

Where : Yi  observed value


Yˆi  estimated value  a  bX i
And this method is known as OLS (ordinary least square)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

- Minimizing SSE    gives


2

b
 ( X i  X )(Yi  Y )   XY  nXY
 ( X i  X )2  X 2  nX 2
a  Y  bX

Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.

a) Calculate a simple correlation coefficient


b) Fit a regression equation of Statistics on Accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.

Accounting Statistics
X2 Y2 XY
X Y
1 74.00 81.00 5476.00 6561.00 5994.00
2 93.00 86.00 8649.00 7396.00 7998.00
3 55.00 67.00 3025.00 4489.00 3685.00
4 41.00 35.00 1681.00 1225.00 1435.00
5 23.00 30.00 529.00 900.00 690.00
6 92.00 100.00 8464.00 10000.00 9200.00
7 64.00 55.00 4096.00 3025.00 3520.00
8 40.00 52.00 1600.00 2704.00 2080.00
9 71.00 76.00 5041.00 5776.00 5396.00
10 33.00 24.00 1089.00 576.00 792.00
11 30.00 48.00 900.00 2304.00 1440.00
12 71.00 87.00 5041.00 7569.00 6177.00
Total 687.00 741.00 45591.00 52525.00 48407.00
Mean 57.25 61.75
a)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two
variables are positively correlated (Y increases as X increases).

b)

where:

 Yˆ  7.0194  0.9560 X is the estimated regression line.


c) Insert X=85 in the estimated regression line.

Yˆ  7.0194  0.9560 X
 7.0194  0.9560(85)  88.28
Exercise: A car rental agency is interested in studying the relationship between the distance
driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The following
summarized information is given based on samples of size 5.
2
i 1 X i  147,000,000  i 1 Yi  314
5 5 2

i 1 X i  23,000 , i 1Yi  36 , i 1 X i Yi  212, 000


5 5 5

a) Find the least squares regression equation of Y on X


b) Compute the correlation coefficient and interpret it.
c) Estimate the maintenance cost of a car which has been driven for 6 km
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

- To know how far the regression equation has been able to explain the variation in Y we
2
use a measure called coefficient of determination ( r )

 (Yˆ  Y ) 2
i.e r  2

 (Y  Y ) 2
Where r  the simple correlation coefficient.
2
- r gives the proportion of the variation in Y explained by the regression of Y on X.
- 1  r gives the unexplained proportion and is called coefficient of indetermination.
2

Example: For the above problem (example 1): r  0.9194

 r 2  0.8453  84.53% of the variation in Y is explained and only 15.47% remains


unexplained and it will be accounted by the random term.

o Covariance of X and Y measures the co-variability of X and Y together. It is


denoted by S XY and given by

SX Y 
 ( X i  X )(Yi  Y )   XY  nXY
n 1 n 1

o Next we will see the relation ship between the coefficients.


2
S S
i. r  XY  r 2  X2 Y 2
S X SY S X SY
bS rS
ii. r X b Y
SY SX
o When we fit the regression of X on Y , we interchange X and Y in all formulas, i.e.
we fit
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Xˆ  a1  b1Y

b1 
 XY  nXY
 Y  nY
2 2

b1SY
a1  X  b1Y , r
SX
Here X is dependent and Y is independent.

Choice of Dependent and Independent variable

- In correlation analysis there is no need of identifying the dependent and independent


variable, because r is symmetric. But in regression analysis
If bYX is the regression coefficient of Y on X
bXY is the regression coefficient of X on Y
bYX S X bXY SY
Then r   r 2  bYX * bXY
SY SX
- Moreover, bYX and bX Y are completely different numerically as well as
conceptually.

- Let us consider three cases concerning these coefficients.


1. If the correlation is perfect positive, i.e. r  1 then the b values reciprocals of each
other.
2. If S X  SY , then irrespective of the value of r the b values are equal, i.e.
r  bYX  bXY ( but this is unlikely case)
3. The most important case is when S X  SY and r  1, here the b values are not
equal or reciprocals to each other, but rather the two lines differ , intersecting at the
common point ( X , Y )
 Thus to determine if a regression equation is X on Y or Y on X , we have to
use the formular 2  bYX * bXY
 If r [1,1] , then our assumption is correct
 If r [1,1] , then our assumption is wrong
Example: The regression line between height (X) in inches and weight (Y) in lbs of
male students are:
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

4Y  15 X  530  0 and
20 X  3Y  975  0
Determine which is regression of Y on X and X on Y

Solution
We will assume one of the equation as regression of X on Y and the other as Y on X
and calculate r

Assume 4Y  15 X  530  0 is regression of X on Y


20 X  3Y  975  0 is regression of Y on X
Then write these in the standard form.
530 4 4
4Y  15 X  530  0  X   Y  bXY 
15 15 15
 975 20 20
20 X  3Y  975  0  Y   X  bYX 
3 3 3
 4  20 
 r 2  bXY * bYX      1.78  1 ,
 15  3 
This is impossible (contradiction). Hence our assumption is not correct. Thus
4Y  15 X  530  0 is regression of Y on X
20 X  3Y  975  0 is regression of X on Y
To verify:
 530 15 15
4Y  15 X  530  0  Y   X  bYX 
4 4 4
975 3 3
20 X  3Y  975  0  X   Y  bXY 
20 20 20

 15  3  9
 r 2  bYX * bXY       0,1
 4  20  16
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

CHAPTER 5
5. ELEMENTARY PROBABILITY
Introduction
 Probability theory is the foundation upon which the logic of inference is built.
 It helps us to cope up with uncertainty.
 In general, probability is the chance of an outcome of an experiment. It is the
measure of how likely an outcome is to occur.
Definitions of some probability terms
1. Experiment: Any process of observation or measurement or any process which
generates well defined outcome.
2. Probability Experiment: It is an experiment that can be repeated any number of times under
similar conditions and it is possible to enumerate the total number of outcomes with out
predicting an individual outcome. It is also called random experiment.
Example: If a fair die is rolled once it is possible to list all the possible outcomes i.e.1, 2, 3, 4, 5, 6
but it is not possible to predict which outcome will occur.
3. Outcome: The result of a single trial of a random experiment
4. Sample Space: Set of all possible outcomes of a probability experiment
5. Event: It is a subset of sample space. It is a statement about one or more outcomes of a
random experiment. They are denoted by capital letters.
Example: Considering the above experiment let A be the event of odd numbers, B be the event of
even numbers, and C be the event of number 8.
 A  1,3,5
B  2,4,6
C    or empty space or impossible event
Remark: If S (sample space) has n members then there are exactly 2 n subsets or events.
6. Equally Likely Events: Events which have the same chance of occurring.
7. Complement of an Event: the complement of an event A means non-occurrence of A and is
' c
denoted by A , or A , or A contains those points of the sample space which don’t belong
to A.
8. Elementary Event: an event having only a single element or sample point.
9. Mutually Exclusive Events: Two events which cannot happen at the same time.
10. Independent Events: Two events are independent if the occurrence of one does not affect
the probability of the other occurring.
11. Dependent Events: Two events are dependent if the first event affects the outcome or
occurrence of the second event in a way the probability is changed.

Example:. What is the sample space for the following experiment

a) Toss a die one time.


b) Toss a coin two times.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

c) A light bulb is manufactured. It is tested for its life length by time.

Solution
a) S={1,2,3,4,5,6}
b) S={(HH),(HT),(TH),(TT)}
c) S={t /t≥0}
 Sample space can be
 Countable ( finite or infinite)
 Uncountable.
Counting Rules
In order to calculate probabilities, we have to know
 The number of elements of an event
 The number of elements of the sample space.
That is in order to judge what is probable, we have to know what is possible.
In order to determine the number of outcomes, one can use several rules of counting.
- The addition rule
- The multiplication rule
- Permutation rule
- Combination rule
To list the outcomes of the sequence of events, a useful device called tree diagram is used.

Example: A student goes to the nearest snack to have a breakfast. He can take tea, coffee, or
milk with bread, cake and sandwich. How many possibilities does he have?

Solutions:

Tea
Bread
Cake
Sandwich
Coeffee
Bread
Cake
Milk Sandwich

Bread
Cake

Sandwich

 There are nine possibilities.

The Multiplication Rule:


Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

If a choice consists of k steps of which the first can be made in n1 ways, the second can be
made in n2 ways, …, the kth can be made in nk ways, then the whole choice can be made in
(n1 * n2 * ........ * nk ) ways.
Example: The digits 0, 1, 2, 3, and 4 are to be used in 4-digit identification card. How many
different cards are possible if a) Repetitions are permitted?
b) Repetitions are not permitted.
Solutions
a)
1st digit 2nd digit 3rd digit 4th digit
5 5 5 5
There are four steps
1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 5 ways.
3. Selecting the 3rd digit, this can be made in 5 ways.
4. Selecting the 4th digit, this can be made in 5 ways.

 5 * 5 * 5 * 5  625 different cards are possible.


b)
1st digit 2nd digit 3rd digit 4th digit
5 4 3 2

There are four steps


1. Selecting the 1st digit, this can be made in 5 ways.
2. Selecting the 2nd digit, this can be made in 4 ways.
3. Selecting the 3rd digit, this can be made in 3 ways.
4. Selecting the 4th digit, this can be made in 2 ways.

 5 * 4 * 3 * 2  120 different cards are possible.

Permutation
An arrangement of n objects in a specified order is called permutation of the objects.
Permutation Rules:
1. The number of permutations of n distinct objects taken all together is n!
Where n! n * (n  1) * (n  2) * ..... * 3 * 2 *1
2. The arrangement of n objects in a specified order using r objects at a time is called
the permutation of n objects taken r objects at a time. It is written as n Pr and the
formula is
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

n!
n Pr 
(n  r )!
3. The number of permutations of n objects in which k1 are alike k2 are alike etc is
n!

k1!*k 2 * ... * k n

Example:
1. Suppose we have a letters A,B, C, D
a) How many permutations are there taking all the four?
b) How many permutations are there if two letters are used at a time?
2. How many different permutations can be made from the letters in the word
“CORRECTION”?
Solutions: 1. a)
Here n  4, there are four disnict object
 There are 4!  24 permutations.
b)
Here n  4, r  2
4! 24
 There are 4 P2    12 permutations.
(4  2)! 2
2.
Here n  10
Of which 2 are C , 2 are O, 2 are R ,1E ,1T ,1I ,1N
 K1  2, k 2  2, k 3  2, k 4  k 5  k 6  k 7  1
U sin g the3rd rule of permutation , there are
10!
 453600 permutations.
2!*2!*2!*1!*1!*1!*1!
Exercises:
1. Six different statistics books, seven different physics books, and 3 different
Economics books are arranged on a shelf. How many different arrangements are
possible if;
i. The books in each particular subject must all stand together
ii. Only the statistics books must stand together
2. If the permutation of the word WHITE is selected at random, how many of the
permutations
i. Begins with a consonant?
ii. Ends with a vowel?
iii. Has a consonant and vowels alternating?
Combination
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

A selection of objects with out regard to order is called combination.


Example: Given the letters A, B, C, and D list the permutation and combination for selecting
two letters.
Solutions:
Permutation Combination
AB BA CA DA AB BC
AC BC CB DB AC BD
AD BD CD DC AD DC
Note that in permutation AB is different from BA. But in combination AB is the same as BA.
Combination Rule
The number of combinations of r objects selected from n objects is denoted by
n
C
n r or   and is given by the formula:
r
n n!
  
 r  (n  r )!*r!
Examples:
1. In how many ways a committee of 5 people is chosen out of 9 people?
Solutions:
n9 , r 5
n n! 9!
     126 ways
 
r ( n  r )!*r! 4!* 5!
2. Among 15 clocks there are two defectives .In how many ways can an inspector chose
three of the clocks for inspection so that:
a) There is no restriction.
b) None of the defective clock is included.
c) Only one of the defective clocks is included.
d) Two of the defective clock is included.
Solutions: n=15 of which 2 are defective and 13 are non-defective; and r=3

a) If there is no restriction select three clocks from 15 clocks and this can be
done in :
n  15 , r  3
n n! 15!
     455 ways
  (n  r )!*r! 12!*3!
r
b) None of the defective clocks is included.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

This is equivalent to zero defective and three non defective, which can be done
in:
 2  13 
  *    286 ways.
0  3 
c) Only one of the defective clocks is included.
This is equivalent to one defective and two non defective, which can be done in:
 2  13 
  *    156 ways.
1  2 
d) Two of the defective clock is included.
This is equivalent to two defective and one non defective, which can be done in:
 2  13
  *    13 ways.
 2  3 
Exercises:
1. Out of 5 Mathematician and 7 Statistician a committee consisting of 2
Mathematician and 3 Statistician is to be formed. In how many ways this can
be done if
a) There is no restriction
b) One particular Statistician should be included
c) Two particular Mathematicians cannot be included on the committee.
2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of
poems, and a dictionary, in how many ways this can be done if
a) There is no restriction.
b) The dictionary is selected?
c) 2 novels and 1 book of poems are selected?

Approaches to measuring Probability


There are four different conceptual approaches to the study of probability theory. These
are:
 The classical approach.
 The frequentist approach.
 The axiomatic approach.
 The subjective approach.

The classical approach

This approach is used when:


- All outcomes are equally likely.
- Total number of outcome is finite, say N.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Definition: If a random experiment with N equally likely outcomes is conducted and out
of these NA outcomes are favorable to the event A, then the probability that event A occur
denoted P(A) is defined as:
N A No. of outcomes favourable to A n( A)
P ( A)   
N Total number of outcomes n( S )
Examples:

1. A fair die is tossed once. What is the probability of getting


a) Number 4?
b) An odd number?
c) An even number?
d) Number 8?
Solutions:
First identify the sample space, say S
S  1, 2, 3, 4, 5, 6
 N  n( S )  6
a) Let A be the event of number 4 c) Let A be the event of even numbers
A  4 A  2,4,6
 N A  n( A)  1  N A  n( A)  3
n( A) n( A)
P( A)  1 6 P ( A)   3 6  0 .5
n( S ) n( S )
b) Let A be the event of odd numbers d) Let A be the event of number 8
A  1,3,5 A  {}
 N A  n( A)  3  N A  n( A)  0
n( A) n( A)
P ( A)   3 6  0 .5 P ( A)  0 60
n( S ) n( S )
2. A box of 80 candles consists of 30 defectives and 50 non defective candles. If 10 of
this candles are selected at random, what is the probability that
a) All will be defective.
b) 6 will be non-defective
c) All will be non-defective

Solutions:
 80 
Total selection     N  n( S )
 10 
a) Let A be the event that all will be defective.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

 30   50 
Total way in which A occur    *    N A  n( A)
 10   0 
 30   50 
 * 
n( A)  10   0 
 P ( A)    0.00001825
n( S )  80 
 
 10 
b) Let A be the event that 6 will be non defective.

 30   50 
Total way in which A occur    *    N A  n( A)
4 6
 30   50 
 * 
n( A)  4   6 
 P ( A)    0.265
n( S )  80 
 
 10 
c) Let A be the event that all will be non defective.
 30   50 
Total way in which A occur    *    N A  n( A)
 0   10 
 30   50 
 * 
n( A)  0   10 
 P ( A)    0.00624
n( S )  80 
 
 10 
Exercises:
1. What is the probability that a waitress will refuse to serve alcoholic beverages to
only three minors if she randomly checks the I.D’s of five students from among
ten students of which four are not of legal age?

2. If 3 books are picked at random from a shelf containing 5 novels, 3 books of


poems, and a dictionary, what is the probability that
a) The dictionary is selected?
b) 2 novels and 1 book of poems are selected?
 Short coming of the classical approach:
This approach is not applicable when:
- The total number of outcomes is infinite.
- Outcomes are not equally likely.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

The Frequentist Approach


This is based on the relative frequencies of outcomes belonging to an event.
Definition: The probability of an event A is the proportion of outcomes favorable to A in the
long run when the experiment is repeated under same condition.
NA
P ( A)  lim
N  N
Example: If records show that 60 out of 100,000 bulbs produced are defective. What is the
probability of a newly produced bulb to be defective?

Solution: Let A be the event that the newly produced bulb is defective.
NA 60
P( A)  lim   0.0006
N  N 100,000
Axiomatic Approach:
Let E be a random experiment and S be a sample space associated with E. With each event
A a real number called the probability of A satisfies the following properties called axioms of
probability or postulates of probability.
1. P( A)  0
2. P( S )  1, S is the sure event.
3. If A and B are mutually exclusive events, the probability that one or the other occur
equals the sum of the two probabilities. i.e. P( A  B)  P( A)  P( B)
4. If A and B are independent events, the probability that both will occur is the product
of the two probabilities. i.e. P(A ∩ B) = P(A)*P(B)
5. P( A' )  1  P( A)
6. 0  P( A)  1
7. P(ø) =0, ø is the impossible event.
Remark: Venn-diagrams can be used to solve probability problems.

A
AUB A∩B
In general p( A  B)  p( A)  p( B)  p( A  B)

Conditional probability and Independency


Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Conditional Events: If the occurrence of one event has an effect on the next occurrence
of the other event then the two events are conditional or dependant events.

Example: Suppose we have two red and three white balls in a bag
1. Draw a ball with replacement

Since the first drawn ball is replaced for a second draw it doesn’t affect the
second draw. For this reason A and B are independent. Then if we let
2
A= the event that the first draw is red p ( A) 
5
2
B= the event that the second draw is red  p( B) 
5
2. Draw a ball with out replacement

This is conditional b/c the first drawn ball is not to be replaced for a second draw
in that it does affect the second draw. If we let
2
A= the event that the first draw is red p ( A) 
5
B= the event that the second draw is red  p( B)  ?
Let B= the event that the second draw is red given that the first draw is red P(B) = 1/4

Conditional probability of an event

The conditional probability of an event A given that B has already occurred, denoted by
p ( A B ) is
p( A  B)
p( A B) = , p( B)  0
p( B)
Remark: (1) p( A B)  1  p( A B)
'

(2) p( B A)  1  p( B A)
'

Examples
1. For a student enrolling at freshman at certain university the probability is 0.25 that
he/she will get scholarship and 0.75 that he/she will graduate. If the probability is
0.2 that he/she will get scholarship and will also graduate. What is the probability
that a student who get a scholarship graduate?

Solution: Let A= the event that a student will get a scholarship


B= the event that a student will graduate
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

given p ( A)  0.25, p ( B)  0.75, p A  B   0.20


Re quired pB A
p A  B  0.20
p  B A    0.80
p  A 0.25
2. If the probability that a research project will be well planned is 0.60 and the
probability that it will be well planned and well executed is 0.54, and 0.what is the
probability that it will be well executed given that it is well planned?
Solution; Let A= the event that a research project will be well
Planned
B= the event that a research project will be well
Executed
given p ( A)  0.60, p A  B   0.54
Re quired pB A
p A  B  0.54
p  B A    0.90
p  A 0.60
Exercise: A lot consists of 20 defectives and 80 non-defective items from which two
items are chosen without replacement. Events A & B are defined as A = the first item
chosen is defective, B = the second item chosen is defective
a) What is the probability that both items are defective?
b) What is the probability that the second item is defective?

Note: for any two events A and B the following relation holds.

pB   pB A. p A  p B A' . p A'   
Probability of Independent Events
Two events A and B are independent if and only if p  A  B   p  A. p B 

Here p A B  p A ,   
PB A pB   
Example; A box contains four black and six white balls. What is the probability of
getting two black balls in drawing one after the other under the following conditions?
a. The first ball drawn is not replaced
b. The first ball drawn is replaced
Solution; Let A= first drawn ball is black
B= second drawn is black
Required p A  B 
a. p A  B   pB A. p A  3 / 94 10  2 15
b. p A  B   p A. pB   4 104 10  4 25
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

CHAPTER 6
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Definition: A random variable is a numerical description of the outcomes of the experiment or
a numerical valued function defined on sample space, usually denoted by capital letters.
Example: If X is a random variable, then it is a function from the elements of the sample space
to the set of real numbers. i.e. X is a function X: S  R
A random variable takes a possible outcome and assigns a number to it.
Example: Flip a coin three times, let X be the number of heads in three tosses.
 S  HHH , HHT , HTH , HTT , THH , THT , TTH , TTT 
 X HHH   3,
X HHT   X HTH   X THH   2,
X HTT   X THT   X TTH   1
X TTT   0
X = {0, 1, 2, 3, 4, 5}
X assumes a specific number of values with some probabilities.
Random variables are of two types:
1. Discrete random variable: are variables which can assume only a specific number of
values. They have values that can be counted
Examples:
 Toss coin n times and count the number of heads.
 Number of children in a family.
 Number of car accidents per week.
 Number of defective items in a given company.
 Number of bacteria per two cubic centimeter of water.
2. Continuous random variable: are variables that can assume all values between any two
give values.
Examples:
 Height of students at certain college.
 Mark of a student.
 Life time of light bulbs.
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

 Length of time required to complete a given training.

Probability Distribution
Definition: a probability distribution consists of value that a random variable can assume and
the corresponding probabilities of the values.
Example: Consider the experiment of tossing a coin three times. Let X is the number of heads.
Construct the probability distribution of X.
Solution:
 First identify the possible value that X can assume.
 Calculate the probability of each possible distinct value of X and express X in the
form of frequency distribution.

X x 0 1 2 3

P X  x  18 38 38 18

 Probability distribution is denoted by P for discrete and by f for continuous random


variable.
Properties of Probability Distribution:
1.
P ( x)  0, if X is discrete.
f ( x)  0, if X is continuous.
2.

 P X  x   1 , if X is discrete.
x

 f ( x)dx  1 , if is continuous.
x

Note:
1. If X is a continuous random variable then
b
P (a  X  b)   f ( x)dx
a

2. Probability of a fixed value of a continuous random variable is zero.


Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

 P(a  X  b)  P(a  X  b)  P(a  X  b)  P(a  X  b)


3. If X is discrete random variable then
b 1
P ( a  X  b)   P ( x )
x  a 1
b 1
P ( a  X  b)   p ( x )
xa
b
P ( a  X  b)   P ( x )
x  a 1
b
P ( a  X  b)   P ( x )
xa
4. Probability means area for continuous random variable.

Introduction to expectation
Definition:
1. Let a discrete random variable X assume the values X1, X2, ….,Xn with the probabilities
P(X1), P(X2), ….,P(Xn) respectively. Then the expected value of X, denoted as E(X) is
defined as:
E ( X )  X 1 P( X 1 )  X 2 P( X 2 )  ....  X n P( X n )
n
  X i P( X i )
i 1
2. Let X be a continuous random variable assuming the values in the interval (a, b) such
b b
that  f ( x)dx  1 ,then E ( X )   x f ( x)dx
a a

Examples:
1. What is the expected value of a random variable X obtained by tossing a coin three
times where X is the number of heads?
Solution:
First construct the probability distribution of X

X x 0 1 2 3
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

P X  x  18 38 38 18

 E ( X )  X 1 P( X 1 )  X 2 P( X 2 )  ....  X n P( X n )
 0 *1 8  1 * 3 8  .....  2 *1 8
 1 .5
2. Suppose a charity organization is mailing printed return-address stickers to over
one million homes in Ethiopia. Each recipient is asked to donate either $1, $2, $5,
$10, $15, or $20. Based on past experience, the amount a person donates is believed
to follow the following probability distribution:

X x $1 $2 $5 $10 $15 $20

P X  x  0.1 0.2 0.3 0.2 0.15 0.05

What is expected that an average donor to contribute?


Solution:

X x $1 $2 $5 $10 $15 $20 Total

P X  x  0.1 0.2 0.3 0.2 0.15 0.05 1

xP( X  x) 0.1 0.4 1.5 2 2.25 1 7.25

6
 E ( X )   xi P( X  xi )  $7.25
i 1

Mean and Variance of a random variable


Let X is given random variable.
1. The expected value of X is its mean
 Mean of X  E (X )
2. The variance of X is given by:

Variance of X  var( X )  E ( X 2 )  [ E ( X )]2


Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Where:
n
E ( X 2 )   xi P( X  xi ) , if X is discrete
2

i 1

  x 2 f ( x)dx , if X is continuous.
x
Examples:
1. Find the mean and the variance of a random variable X in example 2 above.
Solution:

X x $1 $2 $5 $10 $15 $20 Total

P X  x  0.1 0.2 0.3 0.2 0.15 0.05 1

xP( X  x) 0.1 0.4 1.5 2 2.25 1 7.25

x 2 P( X  x) 0.1 0.8 7.5 20 33.75 20 82.15

 E ( X )  7.25
Var ( X )  E ( X 2 )  [ E ( X )]2  82.15  7.252  29.59
Exercise: Two dice are rolled. Let X is a random variable denoting the sum of the numbers
on the two dice.
i) Give the probability distribution of X
ii) Compute the expected value of X and its variance
 There are some general rules for mathematical expectation.
Let X and Y are random variables and k is a constant.
RULE 1: E (k )  k
RULE 2: Var (k )  k

RULE 3: E (kX )  kE ( X )
RULE 4: Var (kX )  k
2
Var ( X )
RULE 5: E ( X  Y )  E ( X )  E (Y )
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Common Discrete Probability Distributions


1. Binomial Distribution
A binomial experiment is a probability experiment that satisfies the following four
requirements called assumptions of a binomial distribution.
1. The experiment consists of n identical trials.
2. Each trial has only one of the two possible mutually exclusive outcomes, success or
a failure.
3. The probability of each outcome does not change from trial to trial, and
4. The trials are independent, thus we must sample with replacement.

Examples of binomial experiments


 Tossing a coin 20 times to see how many tails occur.
 Asking 200 people if they watch BBC news.
 Registering a newly produced product as defective or non defective.
 Asking 100 people if they favor the ruling party.
 Rolling a die to see if a 5 appears.
Definition: The outcomes of the binomial experiment and the corresponding probabilities of these
outcomes are called Binomial Distribution.
Let P  the probabilit y of success
q  1  p  the probabilit y of failure on any given trial
Then the probability of getting x successes in n trials becomes:

n
P( X  x)    p x q n  x , x  0,1,2,...., n
 x
And this is some times written as: X ~ Bin(n, p)
When using the binomial formula to solve problems, we have to identify three things:
 The number of trials ( n )
 The probability of a success on any one trial ( p ) and
 The number of successes desired ( X ).
Examples:
1. What is the probability of getting three heads by tossing a fair con four times?
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Solution: Let X be the number of heads in tossing a fair coin four times
X ~ Bin(n  4, p  0.50)
n
 P ( X  x)    p x q n  x , x  0,1,2,3,4
 x
 4
  0.5 x 0.54  x
 x
 4
  0.54
 x
 4
 P ( X  3)   0.54  0.25
 3
2. Suppose that an examination consists of six true and false questions, and assume that a student
has no knowledge of the subject matter. The probability that the student will guess the correct
answer to the first question is 30%. Likewise, the probability of guessing each of the remaining
questions correctly is also 30%.
a) What is the probability of getting more than three correct answers?
b) What is the probability of getting at least two correct answers?
c) What is the probability of getting at most three correct answers?
d) What is the probability of getting less than five correct answers?
Solution: Let X = the number of correct answers that the student gets.
X ~ Bin(n  6, p  0.30)
a) P( X  3)  ?
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

n
 P( X  x)    p x q n  x , x  0,1,2,..6
 x
6
  0.3 x 0.7 6  x
 x

 P ( X  3)  P ( X  4)  P( X  5)  P ( X  6)
 0.060  0.010  0.001
 0.071
Thus, we may conclude that if 30% of the exam questions are answered by guessing, the
probability is 0.071 (or 7.1%) that more than four of the questions are answered correctly
by the student.
b) P( X  2)  ?
P( X  2)  P( X  2)  P( X  3)  P( X  4)  P( X  5)  P( X  6)
 0.324  0.185  0.060  0.010  0.001
 0.58
c) P( X  3)  ?
P( X  3)  P( X  0)  P( X  1)  P( X  2)  P( X  3)
 0.118  0.303  0.324  0.185
 0.93
d) P( X  5)  ?
P( X  5)  1  P( X  5)
 1  {P( X  5)  P( X  6)}
 1  (0.010  0.001)
 0.989
Exercises:
a. Suppose that 4% of all TVs made by A&B Company in 2000 are defective. If eight of
these TVs are randomly selected from across the country and tested, what is the probability
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

that exactly three of them are defective? Assume that each TV is made independently of
the others.
b. An allergist claims that 45% of the patients she tests are allergic to some type of weed.
What is the probability that
I. Exactly 3 of her next 4 patients are allergic to weeds?
II. None of her next 4 patients are allergic to weeds?
c. Explain why the following experiments are not Binomial
I. Rolling a die until a 6 appears.
II. Asking 20 people how old they are.
III. Drawing 5 cards from a deck for a poker hand.
Remark: If X is a binomial random variable with parameters n and p then

E ( X )  np , Var ( X )  npq
2. Poisson Distribution
A random variable X is said to have a Poisson distribution if its probability distribution is
given by:

x e  
P( X  x)  , x  0,1,2,......
x!
Where   the average number .
The Poisson distribution depends only on the average number of occurrences per unit time
of space.
The Poisson distribution is used as a distribution of rare events, such as: Arrivals,
Accidents, Number of misprints, Hereditary, Natural disasters like earth quake, etc.
The process that gives rise to such events is called Poisson process.

Example: If 1.6 accidents can be expected an intersection on any given day, what is the
probability that there will be 3 accidents on any given day?
Solution: Let X =the number of accidents,   1.6
1.6 x e 1.6
X  poisson1.6  p X  x  
x!
1.63 e 1.6
p X  3   0.1380
3!
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Exercise: On the average, five smokers pass a certain street corners every ten minutes,
what is the probability that during a given 10 minutes the number of smokers passing will
be
a. 6 or fewer
b. 7 or more
c. Exactly 8…….
If X is a Poisson random variable with parameter  then

E (X )   , Var (X )  
Note: The Poisson probability distribution provides a close approximation to the binomial
probability distribution when n is large and p is quite small or quite large with   np .
(np) x e  ( np )
P( X  x)  , x  0,1,2,......
x!
Where   np  the average number .
Usually we use this approximation if np  5 . In other words, if n  20 and np  5 [or
n(1  p)  5 ], then we may use Poisson distribution as an approximation to binomial distribution.
Example: Find the binomial probability P(X=3) by using the Poisson distribution if p  0.01
and n  200 . Solution:
U sin g Poisson ,   np  0.01* 200  2
23 e  2
 P ( X  3)   0.1804
3!
U sin g Binomial , n  200, p  0.01
 200 
 P ( X  3)   (0.01)3 (0.99)99  0.1814
 3 

Common Continuous Probability Distributions


1. Normal Distribution
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

A random variable X is said to have a normal distribution if its probability density function is
1 x  2
1   
2  
f ( x)  e ,    x  ,      ,   0
 2
Where   E ( X ),  2  Variance( X )
 and  2 are the Parameters of the Normal Distribution.
Properties of Normal Distribution:
1. It is bell shaped and is symmetrical about its mean and it is mesokurtic. The maximum
1
ordinate is at x   and is given by f ( x) 
 2
2. It is asymptotic to the axis, i.e., it extends indefinitely in either direction from the mean.
3. It is a continuous distribution.
4. It is a family of curves, i.e., every unique pair of mean and standard deviation defines a
different normal distribution. Thus, the normal distribution is completely described by two
parameters: mean and standard deviation.
5. Total area under the curve sums to 1, i.e., the area of the distribution on each side of the mean

is 0.5.   f ( x)dx  1

6. It is unimodal, i.e., values mound up only in the center of the curve.
7. Mean  Median  mod e  
8. The probability that a random variable will have a value between any two points is equal to
the area under the curve between those points.
Note: To facilitate the use of normal distribution, the following distribution known as the standard
normal distribution was derived by using the transformation

X 
1
1 2z 2

Z  f ( z)  e
 2

Properties of the Standard Normal Distribution:


- Same as a normal distribution, but also mean is zero, variance is one, standard Deviation is
one
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

- Areas under the standard normal distribution curve have been tabulated in various ways. The
most common ones are the areas between Z  0 and a positive value of Z .
- Given normal distributed random variable X with mean  and s tan dard deviation 
a X  b
P ( a  X  b)  P (   )
  
a b
 P ( a  X  b)  P ( Z )
 
Note:
P ( a  X  b)  P ( a  X  b)
 P ( a  X  b)
 P ( a  X  b)
Examples:
1. Find the area under the standard normal distribution which lies
a) Between Z  0 and Z  0.96
Solution:
Area  P(0  Z  0.96)  0.3315

b) Between Z  1.45 and Z  0


Solution:
Area  P(1.45  Z  0)
 P(0  Z  1.45)
 0.4265
c) To the right of Z  0.35
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Solution:
Area  P( Z  0.35)
 P(0.35  Z  0)  P( Z  0)
 P(0  Z  0.35)  P( Z  0)
 0.1368  0.50  0.6368
d) To the left of Z  0.35
Solution:
Area  P( Z  0.35)
 1  P ( Z  0.35)
 1  0.6368  0.3632

e) Between
Z  0.67 and Z  0.75
Solution:

Area  P(0.67  Z  0.75)


 P(0.67  Z  0)  P(0  Z  0.75)
 P(0  Z  0.67)  P(0  Z  0.75)
 0.2486  0.2734  0.5220
f) Between Z  0.25 and Z  1.25
Solution:
Area  P(0.25  Z  1.25)
 P(0  Z  1.25)  P(0  Z  0.25)
 0.3934  0.0987  0.2957
2. Find the value of Z if
a) The normal curve area between 0 and z(positive) is 0.4726
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

Solution
P(0  Z  z )  0.4726 and from table
P(0  Z  1.92)  0.4726
 z  1.92.....uniqueness of Areea.
b) The area to the left of z is 0.9868
Solution
P ( Z  z )  0.9868
 P ( Z  0)  P ( 0  Z  z )
 0.50  P (0  Z  z )
 P (0  Z  z )  0.9868  0.50  0.4868
and from table
P (0  Z  2.2)  0.4868
 z  2 .2

3. A random variable X has a normal distribution with mean 80 and standard deviation
4.8. What is the probability that it will take a value
a) Less than 87.2
b) Greater than 76.4
c) Between 81.2 and 86.0
Solution
X is normal with mean,   80, s tan dard deviation,   4.8
a)
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

X  87.2  
P ( X  87.2)  P (  )
 
87.2  80
 P( Z  )
4.8
 P ( Z  1.5)
 P ( Z  0)  P (0  Z  1.5)
 0.50  0.4332  0.9332

b)

X  76.4  
P ( X  76.4)  P(  )
 
76.4  80
 P( Z  )
4.8
 P( Z  0.75)
 P( Z  0)  P (0  Z  0.75)
 0.50  0.2734  0.7734
c)

81.2   X  86.0  
P (81.2  X  86.0)  P(   )
  
81.2  80 86.0  80
 P( Z )
4.8 4.8
 P (0.25  Z  1.25)
 P (0  Z  1.25)  P (0  Z  1.25)
 0.3934  0.0987  0.2957
4. A normal distribution has mean 62.4.Find its standard deviation if 20.0% of the area
under the normal curve lies to the right of 72.9
Solution
Lecture notes on Introduction to Statistics IX: simple linear regression and
correlation

X  72.9  
P( X  72.9)  0.2005  P(  )  0.2005
 
72.9  62.4
 P( Z  )  0.2005

10.5
 P( Z  )  0.2005

10.5
 P (0  Z  )  0.50  0.2005  0.2995

And from table P(0  Z  0.84)  0.2995
10.5
  0.84

   12.5
5. A random variable has a normal distribution with   5 .Find its mean if the
probability that the random variable will assume a value less than 52.5 is 0.6915.
Solution
52.5  
P( Z  z )  P( Z  )  0.6915
5
 P(0  Z  z )  0.6915  0.50  0.1915.
But from the table
 P(0  Z  0.5)  0.1915
52.5  
z  0.5
5
   50
Exercise: Of a large group of men, 5% are less than 60 inches in height and 40% are
between 60 & 65 inches. Assuming a normal distribution, find the mean and standard
deviation of heights.

You might also like