0% found this document useful (0 votes)
6 views3 pages

Homework 2

The document outlines a series of econometric problems that utilize fixed effects methods to analyze the impact of education on wages using twin data, the influence of student populations on rental prices in college towns, the effects of open container laws on drunk driving arrests in Florida and Georgia, and the impact of calorie disclosure laws on calorie consumption in California. Each problem includes specific tasks such as estimating regressions, interpreting coefficients, and discussing biases and controls. The exercises emphasize the importance of controlling for omitted variables and using appropriate data structures to derive causal inferences.

Uploaded by

jadelala54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views3 pages

Homework 2

The document outlines a series of econometric problems that utilize fixed effects methods to analyze the impact of education on wages using twin data, the influence of student populations on rental prices in college towns, the effects of open container laws on drunk driving arrests in Florida and Georgia, and the impact of calorie disclosure laws on calorie consumption in California. Each problem includes specific tasks such as estimating regressions, interpreting coefficients, and discussing biases and controls. The exercises emphasize the importance of controlling for omitted variables and using appropriate data structures to derive causal inferences.

Uploaded by

jadelala54
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Problem 1

The goal of this exercise is to see how fixed effects methods can be used, even when there is no time
dimension in the data. For this purpose, we will use data on twins from a “famous” paper by Orley
Ashenfelter and Alan Krueger.1 Ashenfelter and Krueger wanted to identify the causal effect of years of
education on wages. Of course, regressing wages on years of education for any sample of individuals is
not going to give us causal effects because of omitted variable bias. To get an unbiased estimate of the
effect of education on wages, the authors used survey data on twins (You should understand why using
data on twins will help us get rid of omitted variable bias when you solve the questions below). The data
was collected by a team of five interviewers at the 16th Annual Twins Day Festival in Twinsburg, Ohio,
in August 1991. A booth was set up at the festival’s main entrance, and an ad inviting all adult twins
to participate in the survey was placed in the festival program. In addition, the interviewers approached
adult twins for an interview, and almost every pair of twins accepted to be interviewed. We’re going to
use this dataset for this problem. You can find it in the stata file TWINS.dta posted on moodle. The dta
file is a modified version of the original data used by Ashenfelter and Krueger.
Plug the data in STATA and use the “brow” command to look carefully at how the data is organized:
Data is organized so that each row contains information on one twin.
Twin pairs or twin sets are identified with an id number, called pairid, that can be used to connect the
twins to each other.
The variable id assigns an id number to each twin or each individual.
The variable wage is hourly wage in $ for each individual or each twin.
The variables age, educ and gender represent each twin’s or each individual’s age, years of education and
gender respectively.

1. Estimate a standard OLS regression of the effect of years of education on log of wages. Pay no
attention to the twin aspect in this regression. Include controls for being female, age, and the square
of age. Interpret the coefficients for years of education and being a female.

2. A standard concern with the regression in 1 is that the estimated effect of education on wages might
be biased because of omitted variables. An example of an omitted variable that might be biasing
our result is an individual’s ability. Given that ability is omitted from the regression in 1, would you
expect that the estimated effect of education on wages is upward or downward biased, and why?

3. The estimate in 1 suffers from bias due to the exclusion of ability from the regression. We want to
correct for ability bias by estimating returns to years of education using twin fixed effects on our
sample of identical twins. The idea here is that identical twins share the exact same genes. Therefore,
we would expect them to have the same ability. How would you control for ability using identical
twins and the fixed effects model? (Hint: Ability is a factor that varies across different sets of twins
but does not vary across twins) Run the relevant regression (Note: Do not include age, the square
of age and female in the regression. In other words, run a regression of log of wages on years of
education and the fixed effects).

4. Is the effect of years of education on wages different between the regressions in 3 and 1? What does
this say about the direction of the bias from the regression in 1?

5. Is it possible to include the variables age, the square of age and female in the regression in 3? And
why?
1
Ashenfelter, O., Krueger, A. (1994). Estimates of the Economic Return to Schooling from a New Sample of Twins. The
American Economic Review, Vol. 84, No. 5, pp. 1157-1173.

1
Problem 2
Use the data on moodle entitled rental.dta. The data cover the years 1980 and 1990 and include rental
prices and other variables for college towns (Note: The word “college town” refers to a city where a large
number of university students reside). The idea here is to see whether a stronger presence of students
affects rental rates. Here’s a description of the variables:

city=id for each city


year=year for which data are reported
lpop=log of number of residents
lrent=log of rental prices
lavginc=log of average income
number of students
pctstu=student population as a percentage of city population during the school year (i.e. number of residents ×
100)

1. Estimate the following regression lrent = β0 +β1 lpop+β2 lavginc+β3 pctstu+. Report the estimates
for each of the coefficients. Interpret the estimate on pctstu.

2. Using the regression in the previous part, test whether a stronger presence of students affects rental
rates at the 0.01 significance level.

3. Estimate the first-difference version of the regression in 1. How does the estimates differ in the
differenced equation compared to those in the original regression? Which set of regression estimates
do you believe the most?

4. When and why would you use the differenced regression equation?

Problem 3
In 1985, neither Florida nor Georgia had laws banning open alcohol containers in vehicle passenger com-
partments. By 1990, Florida had passed such a law, but Georgia had not.

1. Suppose you can collect random samples of the driving-age population in both states, for 1985 and
1990. Let arrest be a binary variable equal to 1 if a person was arrested for drunk driving during the
year. Write down a regression that allows you to test whether the open container law reduced the
probability of being arrested for drunk driving. Discuss what each of the variables in your regression
control for, and which coefficient in your model measures the effect of the law.

2. Why might you want to control for other factors in the model? What might some of these factors
be?

3. Now, suppose that you can only collect data for 1985 and 1990 at the county-level for the two states
(Note: states are divided into counties in the U.S.). The dependent variable would be the fraction of
licensed drivers arrested for drunk driving during the year. How does the regression you would run
differ from the regression using individual-level data described in part 1?

2
Problem 4
In January 1, 2010, California adopted a law that required large restaurant chains with 20 or more locations
in California to disclose calorie information for its food items. Suppose I wanted to know the effect of that
policy on calorie consumption. I have data for Oregon which does not have such a law. I have data for
2009 and 2010.

1. Describe how you could use a difference-in-difference framework to estimate the effect of the law.

2. Suppose that I added data from 2008. Describe how that extra year of data would be helpful for
testing the validity of the difference-in-difference design (i.e. whether the regression in 1 is capturing
the causal effect of the law).

You might also like