Practice Midterm Questions 1 and 2
Practice Midterm Questions 1 and 2
Practice Midterm Questions 1 and 2
Questions 1 and 2
Consider the histogram and the associated box plot of the dataset called “Crime Data”, below
and answer questions 1 and 2
1. The data was normalized (by subtracting the mean and dividing by standard deviation of
the data). Which of the following graphs best represents the normalized data?
(A)$ (B)
(C) (D)
2. Based on the histogram of the original data, which of the following statements is most
likely?
A. If we use this variable as a predictor (independent) variable, it will definitely be an
influential variable
B. If we use this variable as a predictor (independent) variable, it will definitely be a
leverage point
C. The distribution is left skewed
D. The distribution is right skewed$
Questions 3 and 4
Consider the boxplot below which gives the distribution of work experience (in months) of
employees in the company. The employees are categorized into two groups based
performance. The first category is “Low” Performers (with an annual rating of 3.15 or less
out of 4.00) denoted by “0” and “High” players (with an annual rating of above 3.15 out of
4.00) denoted by “1”.
4. We would like to build a prediction model for predicting the category of employees (Low
or High performers). Then, ________
A. Work experience is not a good predictor because there is a large amount of overlap
B. Work experience is a good predictor even though there is a large amount of overlap$
C. Work experience cannot be used as a predictor because there is a large difference
between the two medians
D. Work experience cannot be used because the two distributions are not similar
Questions 5 to 10
Yajvin is trying to estimate the relationship between compensation and other factors. He had
collected data on 150 employees in the company from HR department. He collected the data
on the Compensation (Comp) (Rs. lakhs), Months of work experience (WorkEx), Age in
months (Age), Number of years of post-high school education (EDU) and Number
Promotions (Promo) in the company. The correlation matrix across all the variables is given
below:
5. He has estimated a simple linear regression with EDU as the dependent variable and
Comp as the independent variable. What percentage of the variation of the dependent
variable is explained by the regression equation (i.e., what is the R2?)?
A. 57.65% B. 33.24%$ C. 75.93% D. 82.96%
7. What is the value of the t statistic with respect to the regression coefficient of the variable
“WorkEx”?
A. 0.0070 B. 0.0309 C. 0.0000 D. 4.4143$
8. If we reject the null hypothesis that the variable Promo has no impact on the
compensation, what is the probability of Type I error?
A. 0.09%$ B. 99.91% C. 14.72% D. cannot be calculated
9. Based on the regression equation estimated above and appropriate hypothesis tests, which
of the variables need to be dropped from the regression equation?
A. WorkEx
B. EDU$
C. Age
D. Promo