0% found this document useful (0 votes)
143 views23 pages

Linear Regression

Linear regression estimates the coefficients of a linear equation that best predicts a dependent variable from independent variables. It assumes a linear relationship between variables and that error terms are normally distributed with constant variance. The example analyzes polishing time data, finding time can be predicted from diameter. About half the variation in time is explained by the model, which is statistically significant. Diagnostic plots show the error term is approximately normally distributed.

Uploaded by

Rabiqa Rani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
143 views23 pages

Linear Regression

Linear regression estimates the coefficients of a linear equation that best predicts a dependent variable from independent variables. It assumes a linear relationship between variables and that error terms are normally distributed with constant variance. The example analyzes polishing time data, finding time can be predicted from diameter. About half the variation in time is explained by the model, which is statistically significant. Diagnostic plots show the error term is approximately normally distributed.

Uploaded by

Rabiqa Rani
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 23

Linear Regression

Linear Regression Linear Regression estimates the coefficients of the linear equation, involving one or more independent variables, that best predict the value of the dependent variable. For example, to predict a salesperson's total yearly sales (the dependent variable) from independent variables such as age, education, and years of experience. Example. Is the number of games won by a basketball team in a season related to the average number of points the team scores per game? A scatterplot indicates that these variables are linearly related. The number of games won and the average number of points scored by the opponent are also linearly related. These variables have a negative relationship. As the number of games won increases, the average number of points scored by the opponent decreases. With linear regression, you can model the relationship of these variables. A good model can be used to predict how many games teams will win.

The linear regression model assumes that there is a linear, or "straight line," relationship between the dependent variable and each predictor. This relationship is described in the following formula. yi=b0+b1xi1+...+bpxip+ei where yi is the value of the ith case of the dependent scale variable p is the number of predictors bj is the value of the jth coefficient, j=0,...,p xij is the value of the ith case of the jth predictor ei is the error in the observed value for the ith case The model is linear because increasing the value of the jth predictor by 1 unit increases the value of the dependent by bj units. Note that b0 is the intercept, the model-predicted value of the dependent variable when the value of every predictor is equal to 0.

For the purpose of testing hypotheses about the values of model parameters, the linear regression model also assumes the following: The error term has a normal distribution with a mean of 0. The variance of the error term is constant across cases and independent of the variables in the model. An error term with non-constant variance is said to be heteroscedastic. The value of the error term for a given case is independent of the values of the variables in the model and of the values of the error term for other cases.

Example The Nambe Mills company has a line of metal tableware products that require a polishing step in the manufacturing process. To help plan the production schedule, the polishing times for 59 products were recorded, along with the product type and the relative sizes of these products, measured in terms of their diameters. We can use linear regression to determine whether the polishing time can be predicted by product size. Before running the regression, we should examine a scatterplot of polishing time by product size to determine whether a linear model is reasonable for these variables.

To produce a scatterplot of time by diam, from the menus choose: Graphs Scatter/Dot...

Click Define

Select time as the y variable and diam as the x variable. Click OK. These selections produce the scatterplot

To see a best-fit line overlaid on the points in the scatterplot, activate the graph by double-clicking on it. Select a point in the Chart Editor. Click the Add fit line tool, then close the Chart Editor

Creating a Scatterplot of the Dependent by the Independent

The resulting scatterplot appears to be suitable for linear regression, with two possible causes for concern

Running the Analysis

To run a linear regression analysis, from the menus choose: Analyze Regression Linear

...

Select time as the dependent variable. Select diam as the independent variable. Select type as the case labeling variable. Click Plots

Select *SDRESID as the y variable and *ZPRED as the x variable. Select Histogram and Normal probability plot. Click Continue. Click Save in the Linear Regression dialog box

Select Standardized in the Predicted Values group. Select Standardized in the Residuals group, Click Continue. Click OK in the Linear Regression dialog box

These selections produce a linear regression model for polishing time based on diameter. Diagnostic plots of the Studentized residuals by the model-predicted values are requested, and various values are saved for further diagnostic testing.

Coefficients
This table shows the coefficients of the regression line.

It states that the expected polishing time is equal to 3.457 * DIAM - 1.955. If Nambe Mills plans to manufacture a 15inch casserole, the predicted polishing time would be 3.457 * 15 - 1.955 = 49.9, or about 50 minutes.

Checking the Model Fit


The ANOVA table tests the acceptability of the model from a statistical perspective. The Regression row displays information about the variation accounted for by your model The Residual row displays information about the variation that is not accounted for by your model

The regression and residual sums of squares are approximately equal, which indicates that about half of the variation in polishing time is explained by the model. The significance value of the F statistic is less than 0.05, which means that the variation explained by the model is not due to chance. While the ANOVA table is a useful test of the model's ability to explain any variation in the dependent variable, it does not directly address the strength of that relationship.

The model summary table reports the strength of the relationship between the model and the dependent variable R, the multiple correlation coefficient, is the linear correlation between the observed and model-predicted values of the dependent variable. Its large value indicates a strong relationship

R Square, the coefficient of determination, is the squared value of the multiple correlation coefficient. It shows that about half the variation in time is explained by the model.

As a further measure of the strength of the model fit, compare the standard error of the estimate in the model summary table to the standard deviation of time reported in the descriptive statistics table. Without prior knowledge of the diameter of a new product, our best guess for the polishing time would be about 35.8 minutes, with a standard deviation of 19.0

. .

With the linear regression model, the error of your estimate is considerably lower, about 13.7

Checking the Normality of the Error Term

A residual is the difference between the observed and modelpredicted values of the dependent variable. The residual for a given product is the observed value of the error term for that product. A histogram or P-P plot of the residuals will help you to check the assumption of normality of the error term The shape of the histogram should approximately follow the shape of the normal curve. This histogram is acceptably close to the normal curve.

You might also like