0% found this document useful (0 votes)

12 views11 pages

W 1

The document discusses linear regression and correlation, including covariance, correlation coefficients, rank correlation, and simple linear regression. Linear regression is used to develop a mathematical equation showing the relationship between variables, while correlation determines the strength and direction of a relationship between two variables.

Uploaded by

tsegaab temesgen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views11 pages

W 1

Uploaded by

tsegaab temesgen

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Chapter 10

10. Linear Regression and Correlation

Introduction
Linear regression and correlation is studying and measuring the linear relationship among two or
more variables. When only two variables are involved, the analysis is referred to as simple
correlation and simple linear regression analysis, and when there are more than two variables the
term multiple regression and partial correlation is used.
Regression Analysis: is a statistical technique that can be used to develop a mathematical
equation showing how variables are related.

10.1 The covariance and the correlation coefficient

Covariance:
 The covariance between two random variables is a measure of the nature of between the
two.
 The sign of the covariance indicates whether the relationship between two dependent random
variables is positive or negative.
 Covariance of X and Y measures the co-variability of X and Y together. It is denoted by

S XY and given by

SX Y 
 ( X i  X )(Yi  Y )   XY  nXY
n 1 n 1
Correlation Analysis: deals with the measurement of the closeness of the relationship which are
described in the regression equation.
We say there is correlation if the two series of items vary together directly or inversely.

Simple Correlation: Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and

Y  (Y1 , Y2 ,...Yn )
 When higher values of X are associated with higher values of Y and lower values of X are
associated with lower values of Y, then the correlation is said to be positive or direct.
Examples:
- Income and expenditure

1
- Number of hours spent in studying and the score obtained
- Height and weight
- Distance covered and fuel consumed by car.
 When higher values of X are associated with lower values of Y and lower values of X are
associated with higher values of Y, then the correlation is said to be negative or inverse.
Examples:
- Demand and supply
- Income and the proportion of income spent on food.
The correlation between X and Y may be one of the following:
1. Perfect positive (slope=1)
2. Positive (slope between 0 and 1)
3. No correlation (slope=0)
4. Negative (slope between -1 and 0)
5. Perfect negative (slope=-1)
The presence of correlation between two variables may be due to three reasons:
1. One variable being the cause of the other. The cause is called “subject” or “independent”
variable, while the effect is called “dependent” variable.
2. Both variables being the result of a common cause. That is, the correlation that exists
between two variables is due to their being related to some third force.
Example:
Let X1= ESLCE result
Y1= rate of surviving in the University
Y2= the rate of getting a scholar ship.
Both X1&Y1 and X1&Y2 have high positive correlation, likewiseY1 & Y2 have positive
correlation but they are not directly related, but they are related to each other via X1.
3. Chance: The correlation that arises by chance is called spurious correlation.
Examples:
 Price of teff in Addis Ababa and grade of students in USA.
 Weight of individuals in Ethiopia and income of individuals in Kenya.
Therefore, while interpreting correlation coefficient, it is necessary to see if there is any likelihood of
any relationship existing between variables under study.
2
The correlation coefficient between X and Y denoted by r is given by

r
 ( X i  X )(Yi  Y ) and the short cut formula is
2 2
 ( X i  X )  (Yi  Y )
n XY  ( X )( Y )
r
[n X 2  ( X ) 2 ] [n Y 2  ( Y ) 2

r
 XY  nXY
[ X 2  nX 2 ] [ Y 2  nY 2 ]
Remark: Always this r lies between -1 and 1 inclusively and it is also symmetric.
Interpretation of r
1.Perfect positive linear relationship ( if r  1)
2.Some Positive linear relationship ( if r is between 0 and 1)

3.No linear relationship ( if r  0)

4.Some Negative linear relationship ( if r is between -1 and 0)

5.Perfect negative linear relationship ( if r  1)

Examples:
1. Calculate the simple correlation between mid semester and final exam scores of 10 students
(both out of 50)
Student Mid Sem. Exam Final Sem. Exam

(X) (Y)

1 31 31

2 23 29

3 41 34

4 32 35

5 29 25

3
6 33 35

7 28 33

8 31 42

9 31 31

10 33 34

Solution:

n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4

 XY  10331, X 2
 9920, Y 2
 11003

r
 XY  nXY
[  X 2  nX 2 ] [  Y 2  nY 2 ]
10331  10(31.2)(32.9)

(9920  10(973.4)) (11003  10(1082.4))
66.2
  0.363
182.5
This means mid semester exam and final exam scores have a slightly positive correlation.

Exercise The following data were collected from a certain household on the monthly income (X)
and consumption (Y) for the past 10 months. Compute the simple correlation coefficient.

X: 650 654 720 456 536 853 735 650 536 666

Y: 450 523 235 398 500 632 500 635 450 360

10.2 The rank correlation coefficient

 The above formula and procedure is only applicable on quantitative data, but when we have
qualitative data like efficiency, honesty, intelligence, etc we calculate what is called
Spearman’s rank correlation coefficient as follows:
Steps
i. Rank the different items in X and Y.

4
ii. Find the difference of the ranks in a pair , denote them by Di
iii. Use the following formula
2
6 Di
rs  1 
n(n 2  1)
Where rs  coefficient of rank correlatio n
D  the difference between paired ranks
n  the number of pairs

Example:
Aster and Almaz were asked to rank 7 different types of lipsticks, see if there is correlation
between the tests of the ladies.

Lipstick types A B C D E F G

Aster 2 1 4 3 5 7 6

Almaz 1 3 2 4 5 6 7

Solution:

X Y R1-R2 D2 2
6 Di 6(12)
 rs  1  2
 1   0.786
(R1) (R2) (D) n(n  1) 7(48)
2 1 1 1
Yes, there is positive correlation.
1 3 -2 4 10.3 Simple Linear Regression

4 2 2 4 - Simple linear regression refers to the linear

relationship between two variables
3 4 -1 1
- We usually denote the dependent variable by Y and
5 5 0 0 the independent variable by X.

7 6 1 1 - A simple regression line is the line fitted to the points

plotted in the scatter diagram, which would describe
6 7 -1 1
the average relationship between the two variables.
Total 12 Therefore, to see the type of relationship, it is

5
advisable to prepare scatter plot before fitting the model.
Y    X  
Where :Y  Dependent var iable
- The linear model is: X  independen t var iable
  Re gression cons tan t
  regression slope
  random disturbanc e term
Y ~ N (   X ,  2 )
 ~ N (0,  2 )

- To estimate the parameters (  and  ) we have several methods:

 The free hand method
 The semi-average method
 The least square method
 The maximum likelihood method
 The method of moments
 Bayesian estimation technique.
- The above model is estimated by: Yˆ  a  bX
Where a is a constant which gives the value of Y when X=0 .It is called the Y-intercept. b is
a constant indicating the slope of the regression line, and it gives a measure of the change in Y
for a unit change in X. It is also regression coefficient of Y on X.

- a and b are found by minimizing SSE      (Yi  Yˆi )

2 2

Where : Yi  observed value

Yˆi  estimated value  a  bX i

And this method is known as OLS (ordinary least square)

2
- Minimizing SSE    gives

b
 ( X i  X )(Yi  Y )   XY  nXY
2 2 2
(Xi  X )  X  nX
6
a  Y  bX
Example 1: The following data shows the score of 12 students for Accounting and Statistics
examinations.

a) Calculate a simple correlation coefficient

b) Fit a regression equation of Statistics on Accounting using least square estimates.
c) Predict the score of Statistics if the score of accounting is 85.

Accounting Statistics
X2 Y2 XY
X Y

1 74.00 81.00 5476.00 6561.00 5994.00

2 93.00 86.00 8649.00 7396.00 7998.00

3 55.00 67.00 3025.00 4489.00 3685.00

4 41.00 35.00 1681.00 1225.00 1435.00

5 23.00 30.00 529.00 900.00 690.00

6 92.00 100.00 8464.00 10000.00 9200.00

7 64.00 55.00 4096.00 3025.00 3520.00

8 40.00 52.00 1600.00 2704.00 2080.00

9 71.00 76.00 5041.00 5776.00 5396.00

10 33.00 24.00 1089.00 576.00 792.00

11 30.00 48.00 900.00 2304.00 1440.00

12 71.00 87.00 5041.00 7569.00 6177.00

Total 687.00 741.00 45591.00 52525.00 48407.00

7
Mean 57.25 61.75

The Coefficient of Correlation (r) has a value of 0.92. This indicates that the two variables are
positively correlated (Y increases as X increases).
b)

where:

 Yˆ  7.0194  0.9560 X is the estimated regression line.

c) Insert X=85 in the estimated regression line.

Yˆ  7.0194  0.9560 X
 7.0194  0.9560(85)  88.28
Exercise: A car rental agency is interested in studying the relationship between the distance
driven in kilometer (Y) and the maintenance cost for their cars (X in birr). The following
summarized information is given based on samples of size 5.

5 2 5 2
 i 1 X i  147,000,000  i 1 Yi  314
5 5 5
 i 1 X i  23,000 , i 1Yi  36, i 1 X i Yi  212, 000
a) Find the least squares regression equation of Y on X

8
b) Compute the correlation coefficient and interpret it.
c) Estimate the maintenance cost of a car which has been driven for 6 km
- To know how far the regression equation has been able to explain the variation in Y we use a
2
measure called coefficient of determination ( r )

2 (Yˆ  Y ) 2
i.e r  2
 (Y  Y )
Where r  the simple correlatio n coefficien t.

R- Square
2
- r -value measures the percentage of variation in the values of the dependent variable that can
be explained by the variation in the independent variable.
2
- r -value varies from 0 to 1.
- A value of 0.7654 means that 76.54% of the variance in Y can be explained by the changes in
X. the remaining 23.46% of the variation in Y is presumed to be due to random variability.
2
- r gives the proportion of the variation in Y explained by the regression of Y on X.
- 1  r 2 gives the unexplained proportion and is called coefficient of indetermination.
Example: For the above problem (example 1): r  0.9194

 r 2  0.8453  84.53% of the variation in Y is explained and only 15.47% remains

unexplained and it will be accounted by the random term.

10.4 Multiple linear regression and correlations (three-VLRM)

Multiple Linear Regression
So far, we have seen the concept of simple linear regression where a single predictor variable X
was used to model the response variable Y. In many applications, there is more than one factor
that inﬂuences the response. Multiple regression models thus describe how a single response
variable Y depends linearly on a number of predictor variables.
Multiple regression analysis is a statistical method or tool used to predict or drive the value a
dependent variable based on the values of two or more independent or predictor variables. It is

9
the simultaneous combination of multiple factors to assess how and to what extent they affect a
certain outcome.
The value being predicted is termed dependent variable because its outcome or value depends on
the behavior of other variables. The independent variables’ value is usually ascertained from the
population or sample.
The Model
The primary objective of regression is to develop a regression model, to explain the
relationship between two or more variables in a given population.

The multiple linear regression model with k predictor variables and a response Y,
can be written as:

Where
Y=the dependent Variables
=the independent Variable
=coefficients of the slope
=Coefficients independent variables
The above equation has one key feature. It assumes that all individuals are drawn from a single
population with common population parameters. The term is the residual or random error for
individual i and represents the deviation of the observed value of the response for this individual
from that expected by the model. These error terms are assumed to have a normal distribution
with mean zero and variance 2.


 i   i   Is normally distributed with mean zero and variance 2

Assumptions of multiple regression model:
 Multiple Regression model has linear relationship between dependent variable and
explanatory variables. Multiple regression technique does not test whether data are linear.
On the contrary, it proceeds by assuming that the relationship between the Y and each of
Xi's is linear. Hence as a rule, it is prudent to always look at the scatter plots of (Y, Xi), i= 1,

10
2,…,k. If any plot suggests non-linearity, one may use a suitable transformation to attain
linearity.
 Another important assumption is non-existence of multicollinearity the independent
variables are not related among themselves. At a very basic level, this can be tested by
computing the correlation coefficient between each pair of independent variables.
 The error terms follow normally distribution, i.e. 쳌䁐 .homoscedasticity.
 The values of explanatory variable is fixed
 ㌳ R , no autocorrelation among error term
 The error terms and independent variables are independent ,i.e =0
 The rank of explanatory variables is k and where k is number of the parameters or number
of column and it should be less than the number of observation (n).
Examples:
• The selling price of a house can depend on the desirability of the location, the number of
bedrooms, the number of bathrooms, the year the house was built, the square footage of the lot
and a number of other factors.
• The height of a child can depend on the height of the mother, the height of the father, nutrition,
and environmental factors.

Correlation Regression
100% (1)
Correlation Regression
25 pages
Chapter-4-Simple Linear Regression & Correlation
100% (3)
Chapter-4-Simple Linear Regression & Correlation
9 pages
Correlation and Regression
100% (6)
Correlation and Regression
36 pages
Review: I Am Examining Differences in The Mean Between Groups
100% (2)
Review: I Am Examining Differences in The Mean Between Groups
44 pages
Correlation and Regression
No ratings yet
Correlation and Regression
7 pages
Correlation
100% (1)
Correlation
29 pages
Correlation & Regression (Complete) .PDF Theory Module-6-B
100% (1)
Correlation & Regression (Complete) .PDF Theory Module-6-B
9 pages
Correlation and Simple Linear Regression: Y. I.E. X
100% (1)
Correlation and Simple Linear Regression: Y. I.E. X
9 pages
Correlation and Regression: by Tushar Bhatt
100% (1)
Correlation and Regression: by Tushar Bhatt
66 pages
Unit 3 Simple Correlation and Regression Analysis1
No ratings yet
Unit 3 Simple Correlation and Regression Analysis1
16 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Correlation Anad Regression
No ratings yet
Correlation Anad Regression
13 pages
Stat Chapter 6
No ratings yet
Stat Chapter 6
23 pages
BStats 2
No ratings yet
BStats 2
66 pages
CH 6
No ratings yet
CH 6
42 pages
Correlation and Regression
No ratings yet
Correlation and Regression
39 pages
Stastics ll:6
No ratings yet
Stastics ll:6
22 pages
Chapter 9
No ratings yet
Chapter 9
14 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
11 pages
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
No ratings yet
Introduction To Correlationand Regression Analysis BY Farzad Javidanrad PDF
52 pages
Stat II Chapter 6
No ratings yet
Stat II Chapter 6
11 pages
PS - Module 3 - ViRa
No ratings yet
PS - Module 3 - ViRa
104 pages
Statics Chapter 999
No ratings yet
Statics Chapter 999
9 pages
Stat 4-6 Chapter
No ratings yet
Stat 4-6 Chapter
37 pages
Y X y X N B: Linear Regression
No ratings yet
Y X y X N B: Linear Regression
7 pages
Final Exam Guidelines-1
No ratings yet
Final Exam Guidelines-1
9 pages
Lecture 11
No ratings yet
Lecture 11
16 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Regression and Correlation
No ratings yet
Regression and Correlation
19 pages
5 - Chapter9-Linear Regression
No ratings yet
5 - Chapter9-Linear Regression
15 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Correlation-Regression 2019
No ratings yet
Correlation-Regression 2019
76 pages
Chapter 4 - Correlation and Linear Regression
No ratings yet
Chapter 4 - Correlation and Linear Regression
28 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
Lecture 11-Correlation and Linear Regression
No ratings yet
Lecture 11-Correlation and Linear Regression
7 pages
Unit-I-Correlation and Regression
No ratings yet
Unit-I-Correlation and Regression
64 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
No ratings yet
Business Statistics Method: by Farah Nurul Aisyah (4122001020) Jasmine Alviana Zalzabillah (4122001070)
35 pages
Corr PDF
No ratings yet
Corr PDF
30 pages
PSNM - Ch. 1
No ratings yet
PSNM - Ch. 1
16 pages
Correction
No ratings yet
Correction
10 pages
Correlation & Regression
No ratings yet
Correlation & Regression
26 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Regression and Correlation Notes
No ratings yet
Regression and Correlation Notes
28 pages
Correlation 140708105710 Phpapp01
No ratings yet
Correlation 140708105710 Phpapp01
21 pages
Microsoft PowerPoint Session 4 PDF
No ratings yet
Microsoft PowerPoint Session 4 PDF
86 pages
15 MAY - NR - Correlation and Regression
No ratings yet
15 MAY - NR - Correlation and Regression
10 pages
Corelation & Regression
No ratings yet
Corelation & Regression
21 pages
Correlation & Simple Regression
No ratings yet
Correlation & Simple Regression
15 pages
Chapter-9-Simple Linear Regression & Correlation
No ratings yet
Chapter-9-Simple Linear Regression & Correlation
9 pages
Class Note II - 044242
No ratings yet
Class Note II - 044242
19 pages
Regression Correlation
No ratings yet
Regression Correlation
22 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
Correlation and Regression
No ratings yet
Correlation and Regression
4 pages
07 - Correlation and Regression Analysis-1
No ratings yet
07 - Correlation and Regression Analysis-1
13 pages
31 Mathematics Correlation Regression
No ratings yet
31 Mathematics Correlation Regression
9 pages
Correlation and Regression
No ratings yet
Correlation and Regression
3 pages

W 1

Uploaded by

W 1

Uploaded by

Chapter 10

10. Linear Regression and Correlation

10.1 The covariance and the correlation coefficient

Simple Correlation: Suppose we have two variables X  ( X 1 , X 2 ,... X n ) and

3.No linear relationship ( if r  0)

5.Perfect negative linear relationship ( if r  1)

n  10, X  31.2, Y  32.9, X 2  973.4, Y 2  1082.4

10.2 The rank correlation coefficient

4 2 2 4 - Simple linear regression refers to the linear

7 6 1 1 - A simple regression line is the line fitted to the points

- To estimate the parameters (  and  ) we have several methods:

- a and b are found by minimizing SSE      (Yi  Yˆi )

Where : Yi  observed value

And this method is known as OLS (ordinary least square)

a) Calculate a simple correlation coefficient

1 74.00 81.00 5476.00 6561.00 5994.00

2 93.00 86.00 8649.00 7396.00 7998.00

3 55.00 67.00 3025.00 4489.00 3685.00

4 41.00 35.00 1681.00 1225.00 1435.00

5 23.00 30.00 529.00 900.00 690.00

6 92.00 100.00 8464.00 10000.00 9200.00

7 64.00 55.00 4096.00 3025.00 3520.00

8 40.00 52.00 1600.00 2704.00 2080.00

9 71.00 76.00 5041.00 5776.00 5396.00

10 33.00 24.00 1089.00 576.00 792.00

11 30.00 48.00 900.00 2304.00 1440.00

12 71.00 87.00 5041.00 7569.00 6177.00

Total 687.00 741.00 45591.00 52525.00 48407.00

 Yˆ  7.0194  0.9560 X is the estimated regression line.

 r 2  0.8453  84.53% of the variation in Y is explained and only 15.47% remains

10.4 Multiple linear regression and correlations (three-VLRM)

 i   i   Is normally distributed with mean zero and variance 2

You might also like