0% found this document useful (0 votes)
6 views34 pages

Chapter 15 - 1

This document covers correlation and simple linear regression analysis, including the use of regression equations, computation of correlation coefficients, and measures of variation. It explains the importance of residual analysis, autocorrelation, and statistical inference related to regression models. Additionally, it provides examples and references to software tools for practical application of these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views34 pages

Chapter 15 - 1

This document covers correlation and simple linear regression analysis, including the use of regression equations, computation of correlation coefficients, and measures of variation. It explains the importance of residual analysis, autocorrelation, and statistical inference related to regression models. Additionally, it provides examples and references to software tools for practical application of these concepts.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 34

15

p t er
C ha

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple
Linear Regression Analysis

Correlation and Simple Linear Regression Analysis 1


Learning Objectives

Upon completion of this chapter, you will be able to:

Copyright© Dorling Kindersley India Pvt. Ltd


 Use the simple linear regression equation
 Compute the coefficient of correlation and understand its
interpretation.
 Understand the concept of measures of variation, coefficient of
determination, and standard error of the estimate
 Understand and use residual analysis for testing the
assumptions of regression
 Measure autocorrelation by using the Durbin–Watson statistic
 Understand statistical inference about slope, correlation
coefficient of the regression model, and testing the overall
model

Correlation and Simple Linear Regression Analysis 2


Measures of Association
 Measures of association are statistics for measuring the strength
of relationship between two variables.

Copyright© Dorling Kindersley India Pvt. Ltd


 Correlation measures the degree of association between two
variables.
 Karl Pearson’s coefficient of correlation is a quantitative measure
of the degree of relationship between two variables. Suppose
these variables are x and y, then Karl Pearson’s coefficient of
correlation is defined as

 The coefficient of correlation lies in between +1 and –1.

Correlation and Simple Linear Regression Analysis 3


Figure 15.1: Interpretation of correlation coefficient

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 4
Example 15.1

Table 15.2 shows the sales revenue and advertisement expenses of a


company for the past 10 months. Find the coefficient of correlation

Copyright© Dorling Kindersley India Pvt. Ltd


between sales and advertisement.

Correlation and Simple Linear Regression Analysis 5


Table 15.3 : Calculation of correlation coefficient between sales
and advertisement

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 6
Figure 15.9: Five examples of correlation coefficient

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 7
Using MS Excel, Minitab and SPSS for
Computing Correlation Coefficient
 Ch 15 Solved Examples\Excel\Ex 15.1.xls

Copyright© Dorling Kindersley India Pvt. Ltd


 Ch 15 Solved Examples\Minitab\Ex 15.1.MPJ
 Ch 15 Solved Examples\SPSS\Ex 15.1.sav
 Ch 15 Solved Examples\SPSS\Output Ex 15.1.spv

Correlation and Simple Linear Regression Analysis 8


Introduction to Simple Linear Regression
 Regression analysis is the process of developing a statistical
model, which is used to predict the value of a dependent variable

Copyright© Dorling Kindersley India Pvt. Ltd


by at least one independent variable.
 In simple linear regression analysis, there are two types of
variables. The variable whose value is influenced or to be
predicted is called dependent variable and the variable which
influences the value or is used for prediction is called
independent variable.
 In regression analysis, independent variable is also known as
regressor or predictor, or explanatory while the dependent
variable is also known as regressed or explained variable. In a
simple linear regression analysis, only a straight line relationship
between two variables is examined.

Correlation and Simple Linear Regression Analysis 9


A Deterministic and Probabilistic Model

Copyright© Dorling Kindersley India Pvt. Ltd


ε is the error of the regression line in fitting the points of the
regression equation. If a point is on the regression line, the
corresponding value of ε is equal to zero. If the point is not on the
regression line, the value of ε measures the error.
It can be noticed that in the deterministic model, all the points are
assumed to be on the regression line and hence, in all the cases
random error ε is equal to zero. Probabilistic model includes an error
term which allows the value of y to vary for any given value of x.

Correlation and Simple Linear Regression Analysis 10


Figure 15.10: Error in simple regression

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 11
Figure 15.11: Summary of the estimation process for simple linear regression.

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 12
Example 15.2
A cable wire company has spent heavily on advertisements. The sales
and advertisement expenses (in thousand rupees) for the 12
randomly selected months are given in Table 14.2. Develop a

Copyright© Dorling Kindersley India Pvt. Ltd


regression model to predict the impact of advertisement on sales.

Correlation and Simple Linear Regression Analysis 13


Using Ms Excel, Minitab, and Spss for Simple
Linear Regression

Copyright© Dorling Kindersley India Pvt. Ltd


 Ch 15 Solved Examples\Excel\Ex 15.2.xls
 Ch 15 Solved Examples\Minitab\EX 15.2.MPJ
 Ch 15 Solved Examples\SPSS\Ex 15.2.sav
 Ch 15 Solved Examples\SPSS\Output Ex 15.2.spv

Correlation and Simple Linear Regression Analysis 14


Measures of Variation

Copyright© Dorling Kindersley India Pvt. Ltd


Figure 15.28 : Measures of variation in simple linear regression

Correlation and Simple Linear Regression Analysis 15


Measures of Variation (Contd.)
 While developing a regression model to predict the dependent
variable with the help of the independent variable, we need to

Copyright© Dorling Kindersley India Pvt. Ltd


focus on a few measures of variation. Total variation (SST) can
be partitioned into two parts: variation which can be attributed
to the relationship between x and y and unexplained variation.
 The first part of variation, which can be attributed to the
relationship between x and y, is referred to as explained
variation or regression sum of squares (SSR). The second part of
the variation, which is unexplained can be attributed to factors
other than the relationship between x and y, and is referred to
as error sum of squares (SSE).

Correlation and Simple Linear Regression Analysis 16


Measures of Variation (Contd.)

Total sum of squares (SST) = Regression sum of squares (SSR) +

Copyright© Dorling Kindersley India Pvt. Ltd


Error sum of squares (SSE)

Correlation and Simple Linear Regression Analysis 17


Coefficient of Determination
 The ratio of regression sum of squares (SSR) to total sum of
squares (SST) leads to a very important result which is

Copyright© Dorling Kindersley India Pvt. Ltd


referred to as coefficient of determination.
 The values of coefficient of determination ranges from 0 to 1.

Correlation and Simple Linear Regression Analysis 18


Standard Error of the Estimate

Standard deviation measures the deviation of data around the

Copyright© Dorling Kindersley India Pvt. Ltd


arithmetic mean; similarly, standard error can be understood as the
standard deviation around the regression line.
Standard error of the estimate

A large standard error indicates a large amount of variation or


scatter around the regression line and a small standard error
indicates small amount of variation or scatter around the regression
line. A standard error equal to zero indicates that all the observed
data points fall exactly on the regression line.

Correlation and Simple Linear Regression Analysis 19


Table 15.5: Predicted (regressed) values and residuals for Example 15.2

Copyright© Dorling Kindersley India Pvt. Ltd


Correlation and Simple Linear Regression Analysis 20
Using Residual Analysis to Test the
Assumptions of Regression
 Linearity of the Regression Model

Copyright© Dorling Kindersley India Pvt. Ltd


Linearity of the regression model can be obtained by plotting
the residuals on the vertical axis against the corresponding xi
values of the independent variable on the horizontal axis. There
should not be any apparent pattern in the plot for a fit
regression model.
 Constant Error Variance (Homoscedasticity)
The assumption of homoscedasticity is also referred to as
constant error variance. As the name suggests, the assumption
of homoscedasticity or constant error variance requires that the
variance around the line of regression should be constant for all
the values of xi.

Correlation and Simple Linear Regression Analysis 21


Using Residual Analysis to Test the
Assumptions of Regression (Contd.)

Copyright© Dorling Kindersley India Pvt. Ltd


Figure 15.41 : Violation of the homoscedasticity assumption of regression

Correlation and Simple Linear Regression Analysis 22


Using Residual Analysis to Test the
Assumptions of Regression (Contd.)

Copyright© Dorling Kindersley India Pvt. Ltd


Independence of Error
The assumption of independence of error indicates that the value of
error ε, for any particular value of independent variable x, should
not be related to the value of error ε, for any other value of
independent variable x. This means that the errors around the line
of regression should be independent for each value of the
independent variable x.

Correlation and Simple Linear Regression Analysis 23


Using Residual Analysis to Test the
Assumptions of Regression (Contd.)

Copyright© Dorling Kindersley India Pvt. Ltd


Figure 15.44 : Graph of non- Figure 15.45 : Graph of non-
independence of error (Case 1) independence of error (Case 2)

Correlation and Simple Linear Regression Analysis 24


Using Residual Analysis to Test the
Assumptions of Regression (Contd.)

Copyright© Dorling Kindersley India Pvt. Ltd


 Normality of Error
The assumption of normality around the line of regression can
be measured by plotting a histogram between residuals and
frequency distribution.
The normal probability plot of the residuals should roughly
follow a straight line for meeting the assumption of normality.
A straight line connecting all the residuals indicates that
the residuals are normally distributed.

Correlation and Simple Linear Regression Analysis 25


Measuring Autocorrelation:
The Durbin–Watson Statistic

Copyright© Dorling Kindersley India Pvt. Ltd


 When a researcher collects data over a period of time, there is a
possibility that the errors for a specific time period may be
correlated with the errors of another time period because
residuals at any given time period may tend to be similar to
residuals at another period of time.
 This is called autocorrelation and the presence of
autocorrelation in any regression model raises questions about
the validity of the model.

Correlation and Simple Linear Regression Analysis 26


Measuring Autocorrelation:
The Durbin–Watson Statistic (Contd.)
The Durbin–Watson statistic measures the degree of correlation

Copyright© Dorling Kindersley India Pvt. Ltd


between each residual and the residual of the immediately
preceding time period.
Durbin–Watson statistic

If there is no correlation between residuals, the value of D will be


close to 2. In case of negative correlation, the value of D will be
greater than 2 and can reach its maximum value 4.

Correlation and Simple Linear Regression Analysis 27


Copyright© Dorling Kindersley India Pvt. Ltd
Figure 15.50: Using Durbin–Watson statistic for detecting autocorrelation

Correlation and Simple Linear Regression Analysis 28


Example 15.2
A retail outlet of a footwear company is facing a slump in sales. The company has
adopted a policy of giving incentives to its salesmen for additional sales in order to
boost the sales volume. The total incentives offered by the company and the sales
volumes for 15 weeks (in thousand rupees) selected at random are given in Table 15.6.

Copyright© Dorling Kindersley India Pvt. Ltd


Fit a line of regression and also determine whether autocorrelation is present.
Correlation and Simple Linear Regression Analysis 29
Using MS Excel, Minitab and SPSS
for Example 15.3

Copyright© Dorling Kindersley India Pvt. Ltd


Ch 15 Solved Examples\Excel\Durbin Watson.xls
Ch 15 Solved Examples\Minitab\DURBIN WATSON.MPJ
Ch 15 Solved Examples\SPSS\Ex Durbin-Watson.sav
Ch 15 Solved Examples\SPSS\Output Durbin-Watson.spv

Figure 15.55 : Durbin–Watson statistic range for Example 15.3

Correlation and Simple Linear Regression Analysis 30


Statistical Inference About Slope, Correlation
Coefficient of the Regression Model, and
Testing the Overall Model

Copyright© Dorling Kindersley India Pvt. Ltd


t Test for the slope of the regression line

Figure 15.56(A): Computation of the t statistic for Example 15.2 using MS Excel

Correlation and Simple Linear Regression Analysis 31


Testing the Overall Model

F statistic for testing the slope

Copyright© Dorling Kindersley India Pvt. Ltd


Figure 15.57 (A) : Computation of the F statistic from MS Excel for Example 15.2

Correlation and Simple Linear Regression Analysis 32


Estimate of Confidence Interval
for the Population Slope (β1)

Copyright© Dorling Kindersley India Pvt. Ltd


 So, the upper limit is 23.5263 (19.0704 + 4.4559) and the lower
limit is 14.6145 (19.0704 – 4.4559).
 So, population slope β1 is estimated with 95% confidence to be
in the interval of 14.6145 and 23.5263.
 Hence, 14.6145 ≤ β1 ≤ 23.5263
Correlation and Simple Linear Regression Analysis 33
Statistical Inference About Correlation
Coefficient of the Regression Model
 Correlation coefficient (r) measures the strength of the relationship between two variables.

Copyright© Dorling Kindersley India Pvt. Ltd


 The population correlation coefficient (ρ) can be hypothesized as equal to zero. In this case,
the null and the alternative hypotheses can be stated as follows:

Figure 15.59: Calculation of Pearson correlation coefficient using SPSS


Ch 15 Solved Examples\Minitab\CORRELATION.MPJ
Ch 15 Solved Examples\SPSS\Output Correlation.spv

Correlation and Simple Linear Regression Analysis 34

You might also like