Financial Econometrics Homework 6
Financial Econometrics Homework 6
Financial Econometrics Homework 6
1. Introduction
There are two main specifications to characterize ai : (i) fixed effects (ii) random effects. The
Hausman Test is used to differentiate or to choose between fixed effects model and random effects
model in panel analysis.
2. Problem Statement
Economic theory as it is currently practiced predicts that raising the minimum wage will result
in higher unemployment in a labor market with perfect competition. New Jersey (NJ), a U.S.
state, increased its minimum wage to $5.05. This change took effect in April 1992. In the project,
the study by David Card and Alan B. Krueger about the effect of a raise in minimum wages on
employment will be replicated.
Using a DiD methodology, Card and Krueger (1994) demonstrated how the increase in the mini-
mum wage resulted in more jobs being created in the fast-food restaurant industry. In their study,
Pennsylvania (PA), a neighboring state in the United States that was not affected by the policy
change, serves as the control group. A representative sample of fast-food restaurants in NJ and
PA participated in the survey the authors conducted before and after the minimum wage increase.
3. DiD Theory
’treatment group’ versus a ’control group’ in a natural experiment. The main assumption is that,
although treatment and comparison groups may have different levels of the outcome prior to the
start of treatment, their trends in pre-treatment outcomes should be the same. The estimated
impact of the treatment is then the OLS estimate of the parameter:
Indeed, ȳ11 = α0 + a and ȳ12 = α0 + a + b + δ, are historical means of Y for all individuals
belonging to group 1, respectively, before and after treament date. ȳ21 = α0 and ȳ22 = α0 + b are
the historical means of Y for all individuals belonging to group 2, before and after treament date.
Accordingly, the two differences (with respect to before and after the treatment dates) are:
(ȳ12 − ȳ11 ) = a and (ȳ22 − ȳ21 ) = a + δ
Finally, the Difference-in-difference is:
which can be estimated by OLS provided that the residuals of the the mean equations have no
cross and serial correlations.
The New Jersey-Pennsylvania Data Set in the study of Card and Krueger with 410 observations
is also used in this analysis.
chain == 3 ~ "roys",
chain == 4 ~ "wendys")) %>%
# state value label
mutate(state = case_when(state == 1 ~ "New Jersey",
state == 0 ~ "Pennsylvania")) %>%
# Region dummy
mutate(region = case_when(southj == 1 ~ "southj",
centralj == 1 ~ "centralj",
northj == 1 ~ "northj",
shore == 1 ~ "shorej",
pa1 == 1 ~ "phillypa",
pa2 == 1 ~ "eastonpa")) %>%
# meals value label
mutate(meals = case_when(meals == 0 ~ "none",
meals == 1 ~ "free meals",
meals == 2 ~ "reduced price meals",
meals == 3 ~ "both free and reduced price meals")) %>%
# meals value label
mutate(meals2 = case_when(meals2 == 0 ~ "none",
meals2 == 1 ~ "free meals",
meals2 == 2 ~ "reduced price meals",
meals2 == 3 ~ "both free and reduced price meals")) %>%
# status2 value label
mutate(status2 = case_when(status2 == 0 ~ "refused second interview",
status2 == 1 ~ "answered 2nd interview",
status2 == 2 ~ "closed for renovations",
status2 == 3 ~ "closed permanently",
status2 == 4 ~ "closed for highway construction",
status2 == 5 ~ "closed due to Mall fire")) %>%
5. Descrptive Statistics
Pre-treatment Mean
output:
variable ‘New Jersey‘ Pennsylvania
1 emptot 20.4 23.3
2 pct_fte 32.8 35.0
3 wage_st 4.61 4.63
4 hrsopen 14.4 14.5
Post-Treatment Mean
output:
variable New Jersey Pennsylvania
1 emptot 21.0 21.2
2 pct_fte 35.9 30.4
3 wage_st 5.08 4.62
4 hrsopen 14.4 14.7
In April 1992, the U.S. state of New Jersey (NJ) raised the minimum wage from $4.25 to $5.05. It
can be seen that Despite the increase in wages, full-time equivalent employment increased in New
Jersey relative to Pennsylvania. Whereas New Jersey stores were initially smaller, employment
gains in New Jersey coupled with losses in Pennsylvania led to a small and statistically insignificant
interstate.
5
In Yi,j , i group (1 = treatment group, 2 = control group). j indicates before (1) or after (2) the
treatment.
difference of difference of
difference of difference of
November November
NJ and PA NJ and PA
and February and February
within February within November
within NJ within PA
Formula NJ Nov-NJ Feb PA Nov-PA Feb NJ Feb-PA Feb NJ Nov-PA Nov
Total
0.5880214 -2.165584 -2.891761 -0.1381549
Employment
The difference between the difference of November and February within NJ and PA is calculated
as (NJnov-NJfeb)-(PAnov-PAfeb) = 2.75.
7. Counter-theory Results
According to the classical economic theory, the increase of minimum wage should have decreased
in theory (counterfactual). However, the full-time equivalent (FTE) employment has increased
6
Understanding the DiD technique can be greatly aided by visually representing the relationship
between the treatment and control groups. It can be started by using the variable emptot differ-
ences for NJ and PJ in February and November that were computed in the preceding phase.
In addition, there is a need to know what would happen in New Jersey if the treatment (an
increase in the minimum wage) failed. The counterfactual result (NJ counterfactual) is what is
meant by this.
According to the DiD hypothesis, up until the start of the treatment, the trends of the treat-
ment and control groups are the same. Therefore, without treatment, NJ’s employment (emptot)
would decrease by the same amount as PA’s from February to November.
8. DiD Estimation
The series must be all I(0) before difference-in-difference analysis. Augmented Dickey-Fuller Tests
are done in order to verify this:
The results here only shows that the series full-time equivalent employment is I(0).
7
With linear regression, this result can be achieved very easy. At first, there is a need to create
two dummy variables. One indicates the start of the treatment (time) and is equal to zero before
the treatment and equal to one after the treatment. The other variable separates the observations
into a treatment and control group (treated). This dummy variable is equal to one for fast food
restaurants located in NJ and equal to zero for fast food restaurants located in PA.
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.331 1.072 21.767 <2e-16 ***
time -2.166 1.516 -1.429 0.1535
treated -2.892 1.194 -2.423 0.0156 *
time:treated 2.754 1.688 1.631 0.1033
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
The Durbin-Watson test statistic has always a value between 0 and 4, where:
[0-2): means positive autocorrelation
2: means no autocorrelation
(2-4]: mean negative autocorrelation
According to the Durbin-Watson Test, there is auto-correlation in the residuals of DID model.
In this method, the DiD estimator is estimated without the need to generate the interaction.
8
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.331 1.072 21.767 <2e-16 ***
time -2.166 1.516 -1.429 0.1535
treated -2.892 1.194 -2.423 0.0156 *
time:treated 2.754 1.688 1.631 0.1033
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Conclusion: The coefficient for ‘time:treated’ is the differences-in-differences estimator. The effect
is not significant with the treatment having a positive effect.
Durbin-Watson test
data: did_model2
DW = 1.8398, p-value = 0.008989
H0: There is no autocorrelation
H1: True autocorrelation is greater than 0
The Durbin-Watson test statistic has always a value between 0 and 4, where:
[0-2): means positive autocorrelation
9
2: means no autocorrelation
(2-4]: mean negative autocorrelation
According to the Durbin-Watson Test, there is auto-correlation in the residuals of DID model.
The key assumption here is the Parallel Trends Assumption: Absent treatment the outcomes
for the control and treatment group would follow parallel trends. A confounder variable is added
10
that leads to non-parallel trends. It is assumed that the outcome y also depends on a confounding
variable x that develops differently for the control and treatment group over time:
term estimate std.error statistic p-value
1 (Intercept) -3.71 0.0511 -72.5 0
2 time 0.978 0.0723 13.5 9.47e-38
3 treated 39.2 0.0569 688. 0
4 time:treated 50.1 0.0805 622. 0
Conclusion: The coefficient for ‘time:treated’ is the differences-in-differences estimator. The effect
is strong significant with the treatment having a positive effect when considering time-varying
effects.
So far, this would be the standard way to do a regression if there were no confounding variables.
But, since there are strong reasons to believe that age might distort the analysis here, there should
be a control for age in this analysis. In R, age is simply added to the equation. Instead of a
regression line, there is now a three-dimensional model (time, treated and EMPPT, that is part-
time employees). So the regression line becomes a regression plane (or surface). This is how this
looks like with the data:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.58246 1.11671 13.954 <2e-16 ***
time -2.32777 1.36278 -1.708 0.0880 .
treated -2.56004 1.07323 -2.385 0.0173 *
emppt 0.39645 0.02887 13.730 <2e-16 ***
time:treated 3.09423 1.51806 2.038 0.0419 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The null hypothesis of the test cannot be rejected because the p-value is larger than 0.05 inferring
that the DiD estimator is not valuable for forecasting the future values of EMPTOT.
12. R Codes
# ---------------------------------------------------------------------------------
# Title: Financial Econometrics 6 Difference-in-difference approach
# Member: Hongbo, Allen, Alexandra
# Date: 12/Nov/2022
# ---------------------------------------------------------------------------------
library(moments)
library(tseries)
install.packages(’stats’)
library(stats)
install.packages(forecast)
library(forecast)
library(urca)
install.packages("tsDyn")
library(tsDyn)
forecast::auto.arima
install.packages("sjlabelled")
library("sjlabelled")
install.packages("ggpubr")
library("ggpubr")
install.packages("plm")
library(plm)
install.packages("lmtest")
library(lmtest)
# ------------------------------------------------------------------------------
# Section 1 Data preparation for analysis
# ------------------------------------------------------------------------------
#njmin.zip is an archive with 5 files pertaining to the New Jersey - Pennsylvania surveys used
#in Card and Krueger’s book Myth and Measurement, chapter 2:
# This assignment will be replicating a study by David Card and Alan B. Krueger
about the effect of a raise in minimum wages on employment.
# -----------------
# Raw Data
# -----------------
# Unzip
unzip(tfile_path, exdir = tdir_path)
# Read codebook
13
# Region
variable_labels[41] <- "region of restaurant"
head(data_raw)
# -----------------
# Cleaned Data
# -----------------
# -----------------
# Transposed Data
# -----------------
# Structural variables
structure <- data_mod %>%
select(sheet, chain, co_owned, state, region)
15
# Wave 1 variables
wave1 <- data_mod %>%
select(-ends_with("2"), - names(structure)) %>%
mutate(observation = "February 1992") %>%
bind_cols(structure)
# Wave 2 variables
wave2 <- data_mod %>%
select(ends_with("2")) %>%
rename_all(~str_remove(., "2")) %>%
mutate(observation = "November 1992") %>%
bind_cols(structure)
# Final dataset
card_krueger_1994 <- bind_rows(wave1, wave2) %>%
select(sort(names(.))) %>% # Sort columns alphabetically
sjlabelled::copy_labels(data_mod) # Restore variable labels
# ------------
# Final Data
# ------------
card_krueger_1994_mod <- card_krueger_1994 %>%
mutate(emptot = empft + nmgrs + 0.5 * emppt,
pct_fte = empft / emptot * 100)
# ------------------------------------------------------------------------------
# Section 2 Descriptive Statistics
# ------------------------------------------------------------------------------
card_krueger_1994_mod %>%
select(chain, state) %>%
table() %>%
prop.table(margin = 2) %>%
apply(MARGIN = 2,
FUN = scales::percent_format(accuracy = 0.1)) %>%
noquote
# Pre-treatment Means
card_krueger_1994_mod %>%
filter(observation == "February 1992") %>%
group_by(state) %>%
summarise(emptot = mean(emptot, na.rm = TRUE),
pct_fte = mean(pct_fte, na.rm = TRUE),
wage_st = mean(wage_st, na.rm = TRUE),
16
# Post-treatment Means
card_krueger_1994_mod %>%
filter(observation == "November 1992") %>%
group_by(state) %>%
summarise(emptot = mean(emptot, na.rm = TRUE),
pct_fte = mean(pct_fte, na.rm = TRUE),
wage_st = mean(wage_st, na.rm = TRUE),
hrsopen = mean(hrsopen, na.rm = TRUE)) %>%
pivot_longer(cols=-state, names_to = "variable") %>%
pivot_wider(names_from = state, values_from = value)
# Figure
hist.feb <- card_krueger_1994_mod %>%
filter(observation == "February 1992") %>%
ggplot(aes(wage_st, fill = state)) +
geom_histogram(aes(y=c(..count..[..group..==1]/sum(..count..[..group..==1]),
..count..[..group..==2]/sum(..count..[..group..==2]))*100),
alpha=0.5, position = "dodge", bins = 23) +
labs(title = "February 1992", x = "Wage range", y = "Percent of stores", fill = "") +
scale_fill_grey()
# ------------------------------------------------------------------------------
# Section 3 First Difference
# ------------------------------------------------------------------------------
# ------------------------------------------------------------------------------
# Section 4 Average Treatment Effect
# ------------------------------------------------------------------------------
# calculate the difference between the difference of November and February within NJ and PA
(njnov-njfeb)-(panov-pafeb)
# calculate the difference between the difference of NJ and PA within November and February
(njnov-panov)-(njfeb-pafeb)
######Degression
# Combine data
did_plotdata <- bind_rows(differences,
nj_counterfactual,
intervention)
18
######Line Plot
did_plotdata %>%
mutate(label = if_else(observation == "November 1992",
as.character(state), NA_character_)) %>%
ggplot(aes(x=observation,y=emptot, group=state)) +
geom_line(aes(color=state), size=1.2) +
geom_vline(xintercept = "Intervention", linetype="dotted",
color = "black", size=1.1) +
scale_color_brewer(palette = "Accent") +
scale_y_continuous(limits = c(17,24)) +
ggrepel::geom_label_repel(aes(label = label),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(color=FALSE) +
labs(x="", y="FTE Employment (mean)") +
annotate(
"text",
x = "November 1992",
y = 19.6,
label = "{Difference-in-Differences}",
angle = 90,
size = 3
)
# ------------------------------------------------------------------------------
# Section 5 DiD Estimator
# ------------------------------------------------------------------------------
colnames(card_krueger_1994_mod)
#####Dummy Variable
card_krueger_1994_mod <- mutate(card_krueger_1994_mod,
time = ifelse(observation == "November 1992", 1, 0),
treated = ifelse(state == "New Jersey", 1, 0)
)
# residual tests
acf(did_model$residuals)
library(lmtest)
lmtest::dwtest(did_model)
acf(did_model2$residuals)
lmtest::dwtest(did_model2)
######fixed effects
# Within model
did.reg <- plm(emptot ~ time + treated + time:treated,
data = panel, model = "within")
# ------------------------------------------------------------------------------
# Section 6 DiD Estimator with X-variables (control-variables)
20
# ------------------------------------------------------------------------------
# ------------------------------------------------------------------------------
# Section 7 DiD Estimator with Anticipation Effects
# ------------------------------------------------------------------------------