Practice 01 Linear Regression
Practice 01 Linear Regression
Practice 01 Linear Regression
N.B. For each exercise, the working SPSS data set is indicated. The download link for
the data sets can be found in the section called “Download Your Resources Here”.
1
Data set: cpuperform.sav
This data set contains technical information about 209 computers. Create regression
models that predict the relative CPU performance (prp) based on the following
variables: myct, mmin, mmax, cach, chmin, chmax. (Each variable is explained in its
label.) Use all three stepwise regression methods available in SPSS – stepwise,
backward and forward – and retain the model with the highest prediction accuracy.
2
Data set: winequality.sav
This data set contains information about various types of wines. Your task is to find the
best predictors for the wines quality (quality) from the following 11 variables: fixed
acidity, volatile acidity, citric acid, residual sugar, chlorides, free sulfur dioxide, total
sulfur dioxide, density, pH, sulphates and alcohol. (Each variable is explained in its
label.) Use all three stepwise regression methods available in SPSS – stepwise,
backward and forward. Which method gives the model with the greatest prediction
accuracy? Which method gives the model with the smallest number of predictors?
3
Data set: housedata.sav
This data set contains information about over 21,000 houses. You are supposed to find
the best predictors for a house price (price) out of the following variables: bedrooms,
bathrooms, sqft_living, sqft_lot, floors, grade, sqft_basement and old. (Each variable is
explained in its label.) Use all three stepwise regression methods available in SPSS –
stepwise, backward and forward – and retain the model that provides good prediction
accuracy with the smallest number of predictors.
4
Data set: bostonhousing.sav
This data set contains information about 506 houses in Boston, USA. You have to
predict the median house value (medv) using the following variables: crim, zn, indus,
nox, rm, age, dis, rad, tax, ptratio and lstat. (Each variable is explained in its label.)
Identify the model with the highest prediction accuracy using these regression methods:
stepwise, backward and forward.
5
Data set: cellular.sav
This data set contains information about 250 customers of a cell phone company. You
have to find the best predictors of the customers’ propensity to leave score (score). To
that effect, build the following nested regression models:
1. The first model uses as the following variables as predictors: minutes, bill
2. The second model uses as the following variables as predictors: minutes, bill,
business, los
3. The third model uses as the following variables as predictors: minutes, bill,
business, los, income
Each variable is explained in its label.
Which is the model with the greatest prediction accuracy? Which are the best predictors
of the propensity to leave?
6
Data set: mallcost.sav
This data set contains information about the cost of construction for 40 malls. You have
to find the best predictors for this cost (variable cost) by creating three nested
regression models:
1. The first model uses as the following variable as predictors: sqft
2. The second model uses as the following variables as predictors: sqft, inorout
3. The third model uses as the following variables as predictors: sqft, inorout, yrext
Each variable is explained in its label.
Which is the model with the greatest prediction accuracy? What are the best predictors
of the construction cost?