0% found this document useful (0 votes)
3 views

Assignment10Sol - Copy

Intro to Econometrics NYU

Uploaded by

warn2104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Assignment10Sol - Copy

Intro to Econometrics NYU

Uploaded by

warn2104
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Assignment 10: ECON-UA 266 - Intro to Econometrics

Sahar Parsa

Fall 2024

The tenth assignment is due on Friday, December 6th, 2024. It covers the material related to quasi-experimental
methods.

Question 1
Read the following article: “The Immigration Equation,” by Roger Lowenstein. The New York Times Magazine
Section, July 9, 2006: https://www.nytimes.com/2006/07/09/magazine/the-immigration-equation.html
You can also download it from LexisNexis that you can access from the Library. Answer the following three
questions:
a. What are the two positions that are taken on the impact of immigration on the labor market by Card
and Borjas?
Solution :
Borjas argued that immigrants hurt the economic prospects of the Americans they compete with. And now
that the most significant contingent of immigrants is poorly educated Mexicans, they hurt poorer Americans,
especially African-Americans the most.
Card, however, said that immigration is no big deal and that a lot of the opposition to it is most likely social
or cultural.
b. Explain what a natural experiment is and the importance of “natural experiments” in economics. Are
natural experiment part of the quasi-experiment toolkit? Discuss how Card uses a natural experiment
to estimate the impact of illegal immigration on the labor market. What does he find?
Solution :
A natural experiment is an observational study that benefits the properties of a randomized control experiment,
that is generated outside of the control of an experiment. For instance, an earthquake could generate randomly
economic damages one could exploit to study the effect of economic loss on voting behavior. Card and Borjas
use such natural experiment to study the effect of immigration on local labor markets. These are known as
quasi-experimental methods.
In 1980, 125,000 Cubans were suddenly permitted to immigrate. They who named Marielitos, arrived in
South Florida with virtually no advance notice, and approximately half remained in the Miami area, They
joined an already-sizable Cuban community and swell the city’s labor force by 7 percent. Card compared the
aftershocks in Miami(testing group) with the labor markets in four cities – Tampa, Atlanta, Houston, and Los
Angeles(control group) – that hadn’t suddenly been injected with immigrants. There were no labor market
indications of this big immigration, so that cause and effect in this “natural experiment” were delineated.
Card concluded that the Mariel influx appears to have virtually no effect on the wages or unemployment
rates of less-skilled workers. This founding was confirmed by some observations. First, Card found that
Miami black workers did better than in control cities. Their wages were fractionally higher than in 1979,
while control cities’ back wages were down. Second, although unemployment in all of the cities rose the

1
following year, black unemployment in Miami had retreated to below its level of 1979 by 1985, while in the
control cities it remained much higher.
c. Explain the evidence that Borjas generates to support his argument. What data does he use?
Solution :
With the graph Borjas generated, during the 80’s and ’90s, for instance, immigrants caused dropouts to suffer
a 5 percent decline relative to college graduates. Assuming businesses did not hire any of the new immigrants,
Borjas’s finding would translate to a hefty 9 percent wage loss for the unskilled over those two decades and
lesser declines for other groups.

Question 2
In 1985, neither Florida nor Georgia had laws banning open alcohol containers in vehicle passenger compart-
ments. By 1990, Florida had passed such a law, but Georgia had not.
a. Suppose you can collect random samples of the driving-age population in both states, for 1985 and
1990. Let arrest be a binary variable equal to unity if a person was arrested for drunk driving during
the year. Without controlling for any other factors, write down a linear probability model that allows
you to test whether the open container law reduced the probability of being arrested for drunk driving.
Which coefficient in your model measures the effect of the law?
Solution :

Arrestit = β0 + β1 Y90 + β2 F Li + β3 × Y90 × F Li + µit


where,
β3 would be the coefficient of interest studying the differential effect of the probability of being arrested after
the law was instituted in Florida. It measures the effect of the law. We are controlling for the the time fixed
effect with Y90 and F Lit for whether someone lived in Florida.
b. Why might you want to control for other factors in the model? What might some of these factors be?
Solution :
All factors affecting the arrest probability in both states should be controlled and added to the model. For
example, age, race, or gender distributions, which might be changing through time between these two states.
Note that state characteristics that will be constant through time will be picked up by the florida fixed effect.
The same, time aggregate characteristics would be picked up by the Y9 0 dummy. These factors could also
help us derive a more precise estimator with a lower std. error.
c. Now, suppose that you can only collect data for 1985 and 1990 at the county level for the two states.
The dependent variable would be the fraction of licensed drivers arrested for drunk driving during the
year. How does this data structure differ from the individual-level data described in part (a)? What
econometric method would you use?
Solution :
Thanks to the county level data, we have a panel data at the county level. Before we have a pooled
cross-section of individuals. Now we can control of unobserved county level characteristics using county fixed
effect.

Arrestct = βc + β90 + β3 × Y90 × F Lct + µct


where β90 is a fixed effect for 1990. Notice that we can’t have both years fixed effect and the county fixed effect
due to multicollinearity problem. Also, we don’t need to state fixed effect Florida, because we can control for
county level fixed effect which is more disaggregated than a state. You could use a county demeaned method
to estimate the model. Alternatively, you could use a first difference model:

2
Arrestc90 = βc + β90 + β3 × F Lc90 + µc90

Arrestc85 = βc + µc85
Arrestc90 − Arrestc85 = β90 + β3 × F Lc90 + µc90 − µc85
Note that β90 appears as a constant in our first difference to measure the time fixed effect. Hence, our model
requires a constant with the first difference due to the presence of the time fixed effect.

Question 3
In politics, there is a large literature exploring whether Democrats differ from Republican politicians in office.
To that end, scholars have been using a variety of outcome variables, among others data on spending and
revenues. Consider the following model:

Yi = β0 + β1 × Democrati + εi
where Yi is a measure of spending in city i and Democrati is a dummy equal to one if the mayor is a Democrat
and zero otherwise.
a. Can you explain interpret what β1 is capturing?
Solution :
β1 shows the effect of the Democratic governor on the government spending.
b. Can you explain why theoretically why β1 should be different from zero. Do you expect it to be positive
or negative.
Solution :
It is commonly believed that Democrats are more likely than Republicans to support social policies, increase
government involvement, and increase government spending.
c. Explain why estimating β1 with OLS will generate a biased estimator.
Solution :
1. One source of endogeneity could be voters’ preferences. Those factors such as labor-market conditions,
voter characteristics, quality of candidates, the resources available for campaigns, and other unmeasured
characteristics of states and candidates would bias estimates of the impact of the party allegiance of
governors. These factors can influence who wins the election.
2. Endogeneity problems also come from states’ economic and demographic characteristics. Those
characteristics include population, and whether the state is located in the south, GDP level.
d. Suppose you have information about all the elections in all american cities from 1950 to today with the
margin of victory of the party in office. Can you use this variable and restrict the dataset to the close
elections, i.e., margins of victory less than 1% to estimate the regression above with OLS?
Solution :
The margin of victory is the difference between the percentage of the vote cast for the winner and the runner
up, i.e., the candidate who finishes second. One could use elections where the winning candidate won out of
a small margin of victories, i.e., less than 1%. Out of few vote shares, such as the outcome of the race can be
seen as almost random. In such races, one can compare race where a Democrat won out of a few votes to
races where the Republican won out of a few votes. These states would on average look the same and would
ex ante only differ because of the party of the governor. This could give us a perfect natural experiment for
the effect of the party on government spending. In fact, a number of academic studies have used this method
known as a regression discontinuity.

3
e. Will it lead to an unbiased estimator? Explain why?
Solution :
Because the identity of the governor elected would be determined by a small margin, it is less likely to
correlate with state level characteristics that could drive a difference in the governor spending and state run
by a Democrat or a Republican.
f. Does it lead to an external validity problem? Explain why?
Solution :
This methodology has external validity issues because the effect will only be for races that are very competitive.
The candidate won out of a small margin. Highly competitive races differ with races with a larger margin
as it could give more room for the candidate to implement different policies more in line with their other
identities as opposed to their partisan identities.
This model only applies to the states that share a small margin of victory (competitive), and only estimate
the partisan effect among these substates. This is known as a local treatment effect.

Question 4 - Data Question


Download the (micro) census data of 1940 and 1950. Keep only white women between 25 and 49 years
old. Download information about the labor force participation and the number of weeks they worked in the
previous year as well as their sex, age, race, and education.
a. Generate the descriptive statistics.
library(tidyverse)
library(foreign)
library(stargazer)
mydata <- read.csv("census.csv")

mydata=mydata %>%
mutate(Age = as.numeric(as.character(age))) %>%
na.omit()

white_female<-mydata %>%
select(year, sex, Age, race, educ, labforce, wkswork1) %>%
subset( sex=="female" & race=="white" & Age>=25 & Age<=49)

library(arsenal)

##
## Attaching package: 'arsenal'
## The following object is masked from 'package:lubridate':
##
## is.Date
newdata1 = white_female %>%
group_by(year)

my_controls = tableby.control(
total = F,
test = F,
numeric.stats = c("meansd", "medianq1q3", "min", "max"),
cat.stats = c("countpct"),
ordered.stats = c("countpct"),

4
stats.labels = list(
meansd = "Mean ± SD",
medianq1q3 = "Median (IQR)",
min = "Min",
max = "Max"
)
)

# Descriptive statistics table


tableby(year~ + Age + educ + labforce + wkswork1,
data = newdata1,
control = my_controls) %>%
summary(text = T, digits = 2) %>%
knitr::kable(caption = "Descriptive statistics")

Table 1: Descriptive statistics

1940 (N=217967) 1950 (N=294311)


Age
- Mean ± SD 36.21 (7.18) 35.99 (6.94)
- Median (IQR) 36.00 (30.00, 42.00) 36.00 (30.00, 42.00)
- Min 25.00 25.00
- Max 49.00 49.00
educ
- 1 year of college 5740 (2.6%) 2850 (1.0%)
- 2 years of college 7794 (3.6%) 3046 (1.0%)
- 3 years of college 3065 (1.4%) 1404 (0.5%)
- 4 years of college 8321 (3.8%) 3377 (1.1%)
- 5+ years of college 2115 (1.0%) 1422 (0.5%)
- grade 10 17951 (8.2%) 6332 (2.2%)
- grade 11 9482 (4.4%) 4502 (1.5%)
- grade 12 45195 (20.7%) 23837 (8.1%)
- grade 5, 6, 7, or 8 88657 (40.7%) 20933 (7.1%)
- grade 9 15113 (6.9%) 5155 (1.8%)
- n/a or no schooling 3800 (1.7%) 218750 (74.3%)
- nursery school to grade 4 10734 (4.9%) 2703 (0.9%)
labforce
- no, not in the labor force 158934 (72.9%) 212625 (72.2%)
- yes, in the labor force 59033 (27.1%) 81686 (27.8%)
wkswork1
- Mean ± SD 12.35 (20.60) 3.70 (12.47)
- Median (IQR) 0.00 (0.00, 26.00) 0.00 (0.00, 0.00)
- Min 0.00 0.00
- Max 52.00 52.00

b. Do you observe anything unusual about the dataset? Do you think there could be something wrong
with the data? How would you describe the problem?
Solution :
According to the descriptive statistics table, the average number of working weeks decreased from 12.35 to
3.70. And it is unusual that the portion of no schooling women increased from 12.5% to 74.3%. This suggests
the dataset has errors in measurement and recollection error. We should correct this before using or we could
have biased estimators if we use these variables as our covariates.

You might also like