Assignment9 - Copy
Assignment9 - Copy
Sahar Parsa
Fall 2024
The nineth assignment is due on Friday, November 22rd, 2024. It covers the material related to logit and
probit methods as well as panel data. For the Data questions, report the output of your analysis in a “report
style” pleasing to read and add the codes you used to generate your results.
Question 1
Four hundred driver’s license applicants were randomly selected and asked whether they passed their driving
test (P assi = 1) or failed their test (P assi = 0); data were also collected on their gender (M alei = 1 if
male and = 0 if female) and their years of driving experience (Experiencei , in years). The following table
summarizes several estimated models.
Probit Logit Linear Probability
Experience 0.031 0.040 0.006
(0.009) (0.016) (0.002)
Constant 0.712 1.059 0.774
(0.126) (0.221) (0.034)
a. Using the results in column (1), does the probability of passing the test depend on Experience?
Assumed that Matthew has 10 years of driving experience, what is the probability that he will pass the
test? Christopher is a new driver (zero years of experience). What is the probability that he will pass
the test? The sample included values of Experience between 0 and 40 years, and only four people in
the sample had more than 30 years of driving experience. Jed is 95 years old and has been driving
since he was 15. What is the model’s prediction for the probability that Jed will pass the test? Do you
think that this prediction is reliable? Why or why not?
b. Answer (a) using the results in column (2). Sketch the predicted probabilities from the probit and logit
in columns (1) and (2) for values of Experience between 0 and 60. Are the probit and logit models
similar?
c. Answer (a) using the results in column (3). Sketch the predicted probabilities from the probit and
linear probability in columns (1) and (3) as a function of Experience for values of Experience between 0
and 60. Do you think that the linear probability is appropriate here? Why or why not?
Question 2
Suppose that, for one semester, you can collect the following data on a random sample of college juniors
and seniors for each class taken: a standardized final exam score, percentage of lectures attended, a dummy
variable indicating whether the class is within the student’s major, cumulative grade point average prior to
the start of the semester, and SAT score.
a. Is this dataset a cluster data? Why would you classify this data set as a cluster sample? Roughly, how
many observations would you expect for the typical student?
b. Write a model that explains final exam score on the percentage of lectures attended and the other
characteristics. Use s to subscript student and c to subscript class. Which variables do not change
1
within a student?
c. If you pool all of the data and use OLS, what are you assuming about the unobserved student
characteristics that affect performance and attendance rate? What roles do SAT score and prior GPA
play in this regard?
d. If you think SAT score and prior GPA do not adequately capture student ability, how would you
estimate the effect of attendance on final exam performance?
Question 3
From Stock and Watson Chapter 11: Consider a model for new capital investment in a particular industry
(say, manufacturing), where the cross section observations are at the county level and there are T years of
data for each county: