Simple Linear Regression in SPSS
Simple Linear Regression in SPSS
1.
STAT 314
Ten Corvettes between 1 and 6 years old were randomly selected from the classified ads of The Arizona Republic. The following data were obtained, where x denotes age, in years, and y denotes price, in hundreds of dollars. x y a. b. c. d. e. f. g. h. i. j. k. l. m. 6 125 6 115 6 130 2 260 2 219 5 150 4 190 5 163 1 260 4 160
Graph the regression equation to determine if there is a possible linear relationship. Compute and interpret the linear correlation coefficient, r. Determine the regression equation for the data. Graph the regression equation and the data points. Identify outliers and potential influential observations. Compute and interpret the coefficient of determination, r2 . Obtain the residuals and create a residual plot. Decide whether it is reasonable to consider that the assumptions for regression analysis are met by the variables in questions. At the 5% significance level, do the data provide sufficient evidence to conclude that the slope of the population regression line is not 0 and, hence, that age is useful as a predictor of price for Corvettes? Obtain and interpret a 95% confidence interval for the slope, , of the population regression line that relates age to price for Corvettes. Obtain a point estimate for the mean price of all 4-year-old Corvettes. Determine a 95% confidence interval for the mean price of all 4-year-old Corvettes. Find the predicted price of a randomly selected 4-year-old Corvette. Determine a 95% prediction interval for the price of a randomly selected 4-year-old Corvette.
1.
Enter the age values into one variable and the corresponding price values into another variable (see figure, below).
2.
Select Graphs Scatter (Simple) with the Y Axis variable (Price) and the X Axis variable (Age) entered (see figures, below). Click Titles to enter a descriptive title for your graph, and click Continue. Click OK.
a.
Graph the regression equation to determine if there is a possible linear relationship. The points seem to follow a somewhat linear pattern with a negative slope.
3.
Select Analyze
Correlate
4.
Select Age and Price as the variables, select Pearson as the correlation coefficient, deselect Flag significant correlations, and click OK. (See the left figure, below.)
b.
Compute and interpret the linear correlation coefficient, r. The correlation coefficient is 0.968 (see the right figure, above). This value of r suggests a strong negative linear correlation since the value is negative and close to 1. Since the above value of r suggests a strong negative linear correlation, the data points should be clustered closely about a negatively sloping regression line. This is consistent with the graph obtained above. Therefore, since we see a strong negative linear relationship between Age and Price, linear regression analysis can continue.
5.
Select Analyze
Regression
6.
Since we want to predict the price of 4-year-old Corvettes, enter the number 4 in the Age variable column of the data window after the last row. Enter a . for the corresponding Price variable value (this lets SPSS know that we want a prediction for this value and not to include the value in any other computations). (See figure, below.)
7.
Select Price as the dependent variable and Age as the independent variable. Click Statistics, select Estimates and Confidence Intervals for the regression coefficients, select Model fit, and click Continue. Click Plots, select Normal Probability Plot, and click Continue. Click Save, select Unstandardized predicted values, select Unstandardized and Studentized residuals, select Mean and Individual prediction intervals at the 95% level (or whatever level the problem requires), and click Continue. Click OK. (See the four figures, following.)
The output from this procedure is extensive and will be shown in parts in the following answers. c. Determine the regression equation for the data.
From the above output, the regression equation is: PRICE = 291.602 27.903AGE .
8.
From within the output window, double-click on the scatterplot to enter edit mode. From the Chart menu, select Options. Check Total for Fit Line and click Fit Options. Select Linear Regression and Include constant in equation and click Continue. Click OK. Now your scatterplot displays the linear regression line computed above.
d.
e.
Identify outliers and potential influential observations. There do not appear to be any points that lie far from the cluster of data points or far from the regression line; thus there are no possible outliers or influential observations.
f.
The coefficient of determination is 0.937; therefore, about 93.7% of the variation in the price data is explained by age. The regression equation appears to be very useful for making predictions since the value of r 2 is close to 1.
9.
The residuals and standardized values (as well as the predicted values, the prediction interval endpoints, and the confidence interval endpoints) can be found in the data window.
10.
To create a residual plot, select Graphs Scatter (Simple) with the residuals (res_1) as the Y Axis variable and Age as the X Axis variable. Click Titles to enter Residual Plot as the title for your graph, and click Continue. Click OK. Double-click the graph, select Chart Reference Line,select Y scale, add position of line 0, and click OK.
11.
To create a studentized residual plot (what the textbook calls a standardized residual plot), select Graphs Scatter (Simple) with the studentized residuals (sres_1) as the Y Axis variable and Age as the X Axis variable. Click Titles to enter Studentized Residual Plot as the title for your graph, and click Continue. Click OK. Double-click the graph, select Chart Reference Line,select Y scale, add position of line 0, and click OK.
12.
To assess the normality of the residuals, consult the P-P Plot from the regression output.
g.
Obtain the residuals and create a residual plot. Decide whether it is reasonable to consider that the assumptions for regression analysis are met by the variables in questions.
The residual plot shows a random scatter of the points (independence) with a constant spread (constant variance) with no values unusually far from the reference line (no outliers). The studentized residual plot shows a random scatter of the points (independence) with a constant spread (constant variance) with no values unusually far from the reference line (no outliers). The normal probability plot of the residuals shows the points close to a diagonal line; therefore, the residuals appear to be normally distributed. Thus, the assumptions for regression analysis appear to be met.
h.
At the 10% significance level, do the data provide sufficient evidence to conclude that the slope of the population regression line is not 0 and, hence, that age is useful as a predictor of price for Corvettes?
Step 1 : Hypotheses
Significance Level Critical Value(s) and Rejection Region(s) Reject the null hypothesis ifp-value 0.05. Test Statistic
= 0.05
Step 5 : Step 6 :
T = 10.887, and p-value 0.000 Conclusion Since p-value 0.000 0.05, we shall reject the null hypothesis. State conclusion in words At the = 0.05 level of significance, there exists enough evidence to conclude that the slope of the population regression line is not zero and, hence, that age is useful as a predictor of price for Corvettes.
i.
Obtain and interpret a 95% confidence interval for the slope, , of the population regression line that relates age to price for Corvettes.
We are 95% confident that the slope of the regression line is somewhere between 33.813 and 21.993. In other words, we are 95% confident that for every year older Corvettes get, their average price decreases somewhere between $3,381.30 and $2,199.30. j. Obtain a point estimate for the mean price of all 4-year-old Corvettes.
The point estimate (pre_1) is 179.9903 hundred dollars ($17,999.03). k. Determine a 95% confidence interval for the mean price of all 4-year-old Corvettes.
We are 95% confident that the mean price of all four-year-old Corvettes is somewhere between $16,958.46 (lmci_1) and $19,039.60 (umci_1). l. Find the predicted price of a randomly selected 4-year-old Corvette.
The predicted is 179.9903 hundred dollars ($17,999.03). m. Determine a 95% prediction interval for the price of a randomly selected 4-year-old Corvette.
We are 95% certain that the price of a randomly selected Corvette is somewhere between $14,552.92 (lici_1) and $21,445.14 (uici_1).