This document covers correlation and simple linear regression analysis, including the use of regression equations, computation of correlation coefficients, and measures of variation. It explains the importance of residual analysis, autocorrelation, and statistical inference related to regression models. Additionally, it provides examples and references to software tools for practical application of these concepts.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
6 views34 pages
Chapter 15 - 1
This document covers correlation and simple linear regression analysis, including the use of regression equations, computation of correlation coefficients, and measures of variation. It explains the importance of residual analysis, autocorrelation, and statistical inference related to regression models. Additionally, it provides examples and references to software tools for practical application of these concepts.
Use the simple linear regression equation Compute the coefficient of correlation and understand its interpretation. Understand the concept of measures of variation, coefficient of determination, and standard error of the estimate Understand and use residual analysis for testing the assumptions of regression Measure autocorrelation by using the Durbin–Watson statistic Understand statistical inference about slope, correlation coefficient of the regression model, and testing the overall model
Correlation and Simple Linear Regression Analysis 2
Measures of Association Measures of association are statistics for measuring the strength of relationship between two variables.
Correlation measures the degree of association between two variables. Karl Pearson’s coefficient of correlation is a quantitative measure of the degree of relationship between two variables. Suppose these variables are x and y, then Karl Pearson’s coefficient of correlation is defined as
The coefficient of correlation lies in between +1 and –1.
Correlation and Simple Linear Regression Analysis 3
Figure 15.1: Interpretation of correlation coefficient
Correlation and Simple Linear Regression Analysis 7 Using MS Excel, Minitab and SPSS for Computing Correlation Coefficient Ch 15 Solved Examples\Excel\Ex 15.1.xls
Correlation and Simple Linear Regression Analysis 8
Introduction to Simple Linear Regression Regression analysis is the process of developing a statistical model, which is used to predict the value of a dependent variable
by at least one independent variable. In simple linear regression analysis, there are two types of variables. The variable whose value is influenced or to be predicted is called dependent variable and the variable which influences the value or is used for prediction is called independent variable. In regression analysis, independent variable is also known as regressor or predictor, or explanatory while the dependent variable is also known as regressed or explained variable. In a simple linear regression analysis, only a straight line relationship between two variables is examined.
Correlation and Simple Linear Regression Analysis 9
ε is the error of the regression line in fitting the points of the regression equation. If a point is on the regression line, the corresponding value of ε is equal to zero. If the point is not on the regression line, the value of ε measures the error. It can be noticed that in the deterministic model, all the points are assumed to be on the regression line and hence, in all the cases random error ε is equal to zero. Probabilistic model includes an error term which allows the value of y to vary for any given value of x.
Correlation and Simple Linear Regression Analysis 10
Correlation and Simple Linear Regression Analysis 12 Example 15.2 A cable wire company has spent heavily on advertisements. The sales and advertisement expenses (in thousand rupees) for the 12 randomly selected months are given in Table 14.2. Develop a
Figure 15.28 : Measures of variation in simple linear regression
Correlation and Simple Linear Regression Analysis 15
Measures of Variation (Contd.) While developing a regression model to predict the dependent variable with the help of the independent variable, we need to
focus on a few measures of variation. Total variation (SST) can be partitioned into two parts: variation which can be attributed to the relationship between x and y and unexplained variation. The first part of variation, which can be attributed to the relationship between x and y, is referred to as explained variation or regression sum of squares (SSR). The second part of the variation, which is unexplained can be attributed to factors other than the relationship between x and y, and is referred to as error sum of squares (SSE).
Correlation and Simple Linear Regression Analysis 16
Measures of Variation (Contd.)
Total sum of squares (SST) = Regression sum of squares (SSR) +
arithmetic mean; similarly, standard error can be understood as the standard deviation around the regression line. Standard error of the estimate
A large standard error indicates a large amount of variation or
scatter around the regression line and a small standard error indicates small amount of variation or scatter around the regression line. A standard error equal to zero indicates that all the observed data points fall exactly on the regression line.
Correlation and Simple Linear Regression Analysis 19
Table 15.5: Predicted (regressed) values and residuals for Example 15.2
Linearity of the regression model can be obtained by plotting the residuals on the vertical axis against the corresponding xi values of the independent variable on the horizontal axis. There should not be any apparent pattern in the plot for a fit regression model. Constant Error Variance (Homoscedasticity) The assumption of homoscedasticity is also referred to as constant error variance. As the name suggests, the assumption of homoscedasticity or constant error variance requires that the variance around the line of regression should be constant for all the values of xi.
Correlation and Simple Linear Regression Analysis 21
Using Residual Analysis to Test the Assumptions of Regression (Contd.)
Independence of Error The assumption of independence of error indicates that the value of error ε, for any particular value of independent variable x, should not be related to the value of error ε, for any other value of independent variable x. This means that the errors around the line of regression should be independent for each value of the independent variable x.
Correlation and Simple Linear Regression Analysis 23
Using Residual Analysis to Test the Assumptions of Regression (Contd.)
Normality of Error The assumption of normality around the line of regression can be measured by plotting a histogram between residuals and frequency distribution. The normal probability plot of the residuals should roughly follow a straight line for meeting the assumption of normality. A straight line connecting all the residuals indicates that the residuals are normally distributed.
Correlation and Simple Linear Regression Analysis 25
Measuring Autocorrelation: The Durbin–Watson Statistic
When a researcher collects data over a period of time, there is a possibility that the errors for a specific time period may be correlated with the errors of another time period because residuals at any given time period may tend to be similar to residuals at another period of time. This is called autocorrelation and the presence of autocorrelation in any regression model raises questions about the validity of the model.
Correlation and Simple Linear Regression Analysis 26
Measuring Autocorrelation: The Durbin–Watson Statistic (Contd.) The Durbin–Watson statistic measures the degree of correlation
Correlation and Simple Linear Regression Analysis 28
Example 15.2 A retail outlet of a footwear company is facing a slump in sales. The company has adopted a policy of giving incentives to its salesmen for additional sales in order to boost the sales volume. The total incentives offered by the company and the sales volumes for 15 weeks (in thousand rupees) selected at random are given in Table 15.6.
Fit a line of regression and also determine whether autocorrelation is present. Correlation and Simple Linear Regression Analysis 29 Using MS Excel, Minitab and SPSS for Example 15.3
So, the upper limit is 23.5263 (19.0704 + 4.4559) and the lower limit is 14.6145 (19.0704 – 4.4559). So, population slope β1 is estimated with 95% confidence to be in the interval of 14.6145 and 23.5263. Hence, 14.6145 ≤ β1 ≤ 23.5263 Correlation and Simple Linear Regression Analysis 33 Statistical Inference About Correlation Coefficient of the Regression Model Correlation coefficient (r) measures the strength of the relationship between two variables.
The population correlation coefficient (ρ) can be hypothesized as equal to zero. In this case, the null and the alternative hypotheses can be stated as follows:
Figure 15.59: Calculation of Pearson correlation coefficient using SPSS