MEM Group Problem Set 2022
MEM Group Problem Set 2022
Q1. Assume that you use the following equation to model the effect of hours spent on online
remedial sessions by an EdTech firm (contracted by the institute) on the performance of
students. This service is provided free for students.
Student Performancei = b0 + b1*log(RemSi) + b2*(log(RemSi))2
+ b3*RemSi + b4*(RemSi)2 + ei
i represents the ith student. The dependent variable ‘Student Performance’ can be either Marks
or log(Marks) in the end-term exam. Assume that there is no upper limit on marks. Marks are
measured in units. RemS is measured in hours and varies across students. There are 90
observations in this cross-sectional dataset. Note: Do not consider issues of potential
multicollinearity or endogeneity when answering the following questions.
(a) Discuss which of the variables would you keep in the above equation for each of the sub-
parts below and write down the regression equation:
(i) Suppose you wish to estimate the effect of a 1 Hour change in RemS on the percent
change in Marks.
(ii) Suppose you wish to estimate the effect of a 1 percent change in RemS on the percent
change in Marks.
(iii) Suppose you wish to estimate the effect of a 1 percent change in RemS on Marks
in tens of marks (instead of marks). Use the original Marks variable in the equation, but
interpret the coefficient so that effect is described in terms of change per ten marks.
(iv) Suppose you wish to estimate the effect of a 1 percent change in RemS on Marks
if your hypothesis is that Marks increases at an increasing rate with higher RemS
growth.
(b) Instead of Marks and RemS, suppose RemS is measured as a percent of the students’ total
available hours in the term, and Marks is measured as a percent of the maximum marks.
Suppose you have the new RemS variable as the only explanatory variable in the regression.
What would be the interpretation of the estimated coefficient?
(c) Suppose after the end of the first term, the institute decides to change the EdTech firm
providing the remedial sessions for the second term. You wish to estimate how the elasticity
of Marks with respect to RemS has changed after the change in the EdTech firm. What variables
would you drop or add to the above equation? Please write down the new equation and explain
the interpretation of the coefficient(s).
Q2. Suppose you have estimated a Logit regression with the dependent variable being a
dummy for whether a person is a smoker or not (=1 if smoker, 0 if non-smoker) for 500
individuals. The results of this estimation are given below:
where C is a constant, AGE is age of the individual, EDUC is the number of years of
education, INCOME is monthly income, and PCIGS79 is the price of cigarettes. The z-
statistic is coefficient divided by standard error.
Using the above model, please estimate the probability of smoking for two individuals facing
different prices of cigarettes:
AGE EDUC INCOME PCIGS79
Person A 28 15 12,500 60.0
Person B 63 10 20,000 60.8
Now, suppose you have to determine how a unit increase in education affects the probability
of smoking, holding other variables constant. Would this depend only on the coefficient of
EDUC above? Why/why not? Please explain using words and equations as relevant.
Q3. The following regressions are based on the CPI data with a total of 40 annual observations.
The t-statistics are provided below the coefficient estimates.
̂ 𝒕 = 0.0372CPIt-1
(i) ∆𝑪𝑷𝑰
t-statistic: (9.6427)
̂ 𝒕 = 1.8052 + 0.0208CPIt-1
(ii) ∆𝑪𝑷𝑰
t-statistic: (2.5000) (2.7583)
a) Examining the above three regressions, what can you say about stationarity of the CPI time
series?
b) How would you choose among the three models?
c) Equation (i) is Eq. (iii) minus the intercept and trend. Which tests would you use to decide
if the implied restrictions of model 1 are valid? Please calculate the relevant test statistic and
compare against the critical values in the relevant statistical tables (available in the Appendix
in Wooldridge textbook).