ISOM2500 Regression Practice Questions
ISOM2500 Regression Practice Questions
ISOM2500 Regression Practice Questions
1. Let (𝑥𝑖 , 𝑦𝑖 ), 𝑖 = 1,2, … , 𝑛 be a sample of 𝑛 paired data, also let 𝑦𝑖 = 𝛽0 + 𝛽1 𝑥𝑖 be the simple
regression line of the data. The least squares estimate of 𝛽1 represents
2. A regression model was fitted and the residuals were checked to be approximated normally
distributed. The scatter plot of residuals vs predicted values on the right consists of 104 observations.
The RMSE:
(a) is around 0.
(b) is around 40
(c) is around 25.
(d) cannot be estimated from the plot.
[3-5] Suppose that in the population the annual salary (𝑦𝑖 ) of the CEO measured in million dollars is related
to the annual sales of the company (𝑥𝑖 ) measured in million dollars according to the following regression
model:
𝑦𝑖 = 5 + 0.1𝑥𝑖 + 𝜀𝑖 ,
3. What is the standard deviation CEO salaries in million dollars for CEOs of firms with annual
sales of five million dollars?
4. What is the expected difference in million dollars between the salary of CEO of a firm with five
million dollars in annual sales and the CEO of a firm with annual sales of eight million dollars?
(a) A linear model is okay because the association between the two variables is fairly strong.
(b) The linear model is no good because the correlation is near 0.
(c) The linear model is no good because some residuals are large.
(d) The linear model is no good because of the curve in the residuals.
[7-8] Let (𝑋, 𝑌) be a random paired random variables. Suppose you run a linear least squares regression of
𝑌 on 𝑋. The estimated regression line is 𝑌̂ = 3 + 2𝑋. The t-statistic for testing the null hypothesis that
𝛽1 = 1 is 3.
7. You get an additional data point with X = 2 and Y = 7 and run the regression again including the
new data point. What happens to the estimated slope coefficient?
(a) It increases.
(b) It decreases.
(c) It remains the same.
(d) Cannot tell based on the information given.
8. What happens to the sample standard error of residuals in the new regression run using the new data
point relative to sample standard error of residuals in the original regression?
(a) It increases.
(b) It decreases.
(c) It remains the same.
(d) Cannot tell based on the information given.
9. The p-value of testing the slope equals 0 in a simple regression is 0.45. Then
10. The following results were obtained from a simple regression analysis:
𝑦̂ = 27.2895 − 1.2024𝑥
For each unit change in the independent variable x, the estimated change in the mean value of the dependent
variable y is equal to
11. Which assumption of SRM is violated in the residual plot at the right?
Summary of Fit
RSquare 0.755
Parameter Estimates
Term Estimate Std Error t Ratio Prob>|t|
Intercept 4.8 0.148 32.50 <.0001*
Assume the transformed x and y agrees with SRM, which of the following statement is correct?
(a) As the price increase by 1%, the sales decrease by 1.75% on the average
(b) As the price increase by $1, the sales decrease by 1.75 units on the average.
(c) As the price increase by 1%, the sales decrease by 1.75 units
(d) As the price increase by 1$, the sales decrease by 175%.
13. The heights (y) of 50 men and their shoes sizes (x) were obtained. The variable height is measured
in centimetres and the shoe sizes of these 50 men ranged from 8 to 11. From these 50 pairs of
observations, the least squares regression line predicting height from shoe size was computed to
be 𝑦̂ = 130455 + 4.7498𝑥. What height would you predict for a man with a shoe size of 13?
(a) 130.46cm
(b) 192.20cm
(c) 182.70cm
(d) I would not use this regression line to predict the height of a man with a shoe size of 13.
[14-16] A large national bank charges local companies for using their services. A bank officer reported the
results of a regression analysis designed to predict the bank's charges (𝑌), measured in dollars per month,
for services rendered to local companies. One explanatory variable used to predict service charge to a
company is the company's sales revenue (𝑋), measured in millions of dollars. Data for 21 companies who
use the bank's services were used to fit the model. The results of the simple linear regression are provided
below.
14. Interpret the estimate of the standard deviation of the error terms.
(a) About 95% of the observed service charges fall within $65 from 0 of the least squares line.
(b) About 95% of the observed service charges equal their corresponding predicted values.
(c) About 95% of the observed service charges fall within $130 from 0 of the least squares
line.
(d) For every $1 million increase in sales revenue, we expect a service charge to increase $65.
(a) There is sufficient evidence (at the α = 0.05) to conclude that sales revenue (X) is a useful
linear predictor of service charge (Y).
(b) There is insufficient evidence (at the α = 0.05) to conclude that sales revenue (X) is a useful
linear predictor of service charge (Y).
(c) Sales revenue (X) is a poor predictor of service charge (Y).
(d) For every $1 million increase in sales revenue, we expect a service charge to increase
$0.034.
16. A 95% confidence interval for 𝛽1 is [15, 30]. Interpret the interval.
(a) We are 95% confident that the mean service charge will fall between $15 and $30 per month.
(b) We are 95% confident that the sales revenue (X) will increase between $15 and $30 million for
every $1 increase in service charge (Y).
(c) We are 95% confident that average service charge (Y) will increase between $15 and $30 for every
$1 million increase in sales revenue (X).
(d) At the α= 0.05 level, there is no evidence of a linear relationship between service charge (Y) and
sales revenue (X).
[17-18] A medium-sized business has a policy that keeps its weekly advertising budget within the range
from $2000 to $6000. The marketing manager has collected data from a sample of weeks, recording the
amount spent on advertising (ADV) and the revenue (REV) for each week. The amounts spent on
advertising are recorded in thousands of dollars. Revenue amounts are in actual dollars. After examining
the data, the manager decides to use a (natural) log transformation on both variables in order to derive a
̂ = 6.8 + 2.2 log 𝐴𝐷𝑉
regression line. The log-log equation is determined to be: log 𝑅𝐸𝑉
17. In the context of this application, elasticity refers to which of the following
18. What percentage increase in revenue would be predicted for a 0.5% increase in dollars spent on
advertising?
(a) how small percentage changes in x are associated with small percentage changes in y
(b) how small percentage changes in y are associated with small percentage changes in x.
(c) how changes in y effect changes in x
(d) how changes in x effect changes in y
[20-21] It is believed that, the average numbers of hours spent studying per day (HOURS) during
undergraduate education should have a positive linear relationship with the starting salary (SALARY,
measured in thousands of dollars per month) after graduation. Given below is the output from regressing
starting salary on number of hours spent studying per day for a sample of 51 students.
R Square 0.7845
Observations 51
20. What's the value of the test statistic to test whether average SALARY are linearly correlated on
HOURS?
21. The 90% confidence interval for the average change in SALARY (in thousands of dollars) as a
result of spending an extra hour per day studying is
22. An employee, John, is 30 years old. According to the regression equation, what is his expected
number of absent days in the coming fiscal year?
23. Test the regression coefficient 𝛽1 of age is larger than 0 using 5% significance level.
24. Find a 95% confidence interval for the regression coefficient of age.
(a) [0.1056, 0.4286] (b) [0.2006, 0.3104] (c)[0.1961, 0.3114 ] (d) [0.0056, 0.4286]
25. The sample mean and sample standard deviation for age are 37.87 and 10.39, respectively. Find a
95% prediction interval for the absent days of a 30 years old employee.
(a) [2.907, 3.773] (b) [2.907, 4.025] (c) [1.998, 4.025] (d) [1.998, 3.773] (e) [1.054, 5.625]
[26-27] An insurance agent has selected a sample of drivers that she insures whose ages are in the range
from 16 to 42 years. For each driver, she records the age of the driver (𝑥) and the dollar amount of claims
(𝑦) that the driver filled in the previous 12 months. A scatterplot showing the dollar amount of claims as
the response and the age as the predictor shows a linear trend. The least squares regression line is
determined to be: 𝑦̂ = 3715 − 75.4𝑥. A plot of the residuals versus age of the drivers showed no pattern,
and the following were reported: 𝑟 2 = 0.822 and the standard deviation of the residual is 312.1 .
(a) If the age of a driver increases from 20 to 21, the dollar amount of claims is expected to
decrease by $75.4 on average.
(b) If the age of a driver increases by one year, the dollar amount of claims is predicted to
increase by $3715.
(c) One can use the least squares regression line to obtain a reliable prediction of the dollar
amount of claims for a driver whose age is 55 years.
(d) The dollar amount of claims for a driver of 10 years old is expected to be $2961.
(a) 82.2% of the variation in the dollar amounts of claims is explained by the age of the driver.
(b) The correlation coefficient, 𝑟, between the response and the predictor is 0.907.
(c) If there are 38 drivers included in this model, then 𝑆𝑆𝐸 = 3506631 (SSE: sum of squared
residuals).
(d) If the unit of 𝑦 changes from dollar to thousand dollars, 𝑟 2 remains unchanged.
28. A regression analysis between sales (in $1,000) and advertising (in $1,000) resulted in the following
least squares line: Sales = 80 + 5Advertising . This implies that
29. If a test of hypothesis has a Type I error probability (α) of 0.01, it means that
(a) if the null hypothesis is true, you don't reject it 1% of the time.
(b) if the null hypothesis is true, you reject it 1% of the time.
(c) if the null hypothesis is false, you don't reject it 1% of the time.
(d) if the null hypothesis is false, you reject it 1% of the time.
30. In a study of the association between the car mileage (miles per gallon, mpg) and the car weight, it
is found that the association is curved. To make the association to be linear, one decides to change
the response to be 100 multiple of the reciprocal of the mileage. The scatterplot of the new response
vs the car weight (in thousands of pounds) is shown below.
A least-squares linear regression is fitted to the transformed variables, and yields the following
equation:
Estimated new response = 0.95 + 1.25 × Weight (000 lbs).
Based on the equation, what's the predicted mileage (measured in mpg) for a car of weight 5,000
pounds?
(a) 6251
(b) 0.016
(c) 7.2
(d) 13.89
31. For a given sample size n, if the level of significance (𝛼) is decreased, the type II error of
the test (𝛽 )
33. If the role of the explanatory variable and the response variable are switched in a regression and
correlation situation, which of the following would stay the same?
34. A statistics professor used 𝑋 = “number of class days attended” (out of 30) as an explanatory
variable to predict 𝑌 = “score received on final exam” for a class of his students. The resulting
regression equation was 𝑌̂ = 39.4 + 1.4𝑋. Which of the following statements is true?
(a) If attendance increases by 1.4 days, the expected exam score will increase by 1 point
(b) If attendance increases by 1 day, the expected exam score will increase by 39.4 points
(c) If attendance increases by 1 day, the expected exam score will increase by 1.4 points
(d) If the student does not attend at all, the expected exam score is 1.4.
35. Given the regression equation 𝑌̂ = −4.3 + 5.9𝑋, which of the following statements is incorrect?
(a) The difference between the actual 𝑌 values and the mean of 𝑌.
(b) The difference between the actual 𝑌 values and the predicted of 𝑌.
(c) The square root of the slope.
(d) The predicted value of 𝑌 for the average 𝑋 value.
38. In simple linear regression, the least squares estimate of the y-intercept (𝑏0) represents the
40. Which of the following assumptions concerning the probability distribution of the random error
term is stated incorrectly?
[43-44] Each worker at an assembly plant that produces clock radios is responsible for the entire assembly
of each unit they work on. The plant manager has collected data from a sample of workers: the number of
years (YRS) of experience at the plant, and the number of hours per unit (TIME) required for assembly.
The scatterplot of TIME versus YRS is shown below.
43. Which of the following is an appropriate reason why a regression line should not be used to make
predictions based on the data?
(a) 2.61
(b) 2.66
(c) 2.80
(d) 3.12
(a) If there is a positive correlation r between x and y, then the slope b1 must also be positive.
(b) The units on the intercept b0 and the slope b1 will be the same as the units on the variable y.
(c) If r 2 = 0.85 , then it is appropriate to conclude that a change in x will cause a change in y.
(d) None of above is true
[49-50] A least-squares linear regression is fitted to a data set, and the residual plot is shown below.
(a) A linear model is okay because the association between the two variables is fairly strong.
(b) The linear model is not good because the correlation between the response and the predictor is
near 0.
(c) The linear model is not good because some residuals are large.
(d) The linear model is not good because of the curve in the residuals.
50. If one uses the least-squares linear regression to make predictions, which of the following statements
is true?