0% found this document useful (0 votes)

18 views

Financial Econometrics Homework 6

In April 1992, New Jersey raised its minimum wage from $4.25 to $5.05 while Pennsylvania's minimum wage remained unchanged. Using a difference-in-differences analysis comparing employment in fast food restaurants in New Jersey and Pennsylvania before and after the policy change, the study found that despite higher wages, full-time equivalent employment increased more in New Jersey than Pennsylvania. This finding contradicts the classical economic theory prediction that raising the minimum wage would decrease employment.

Uploaded by

Allen Jerry Aries

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views

Financial Econometrics Homework 6

Uploaded by

Allen Jerry Aries

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Financial Econometrics

Empirical Application No. 6:

Difference-In-Difference Analysis

YE HONGBO, COLEASA Alexandra, ARIES Allen Jerry

Master 2 Finance Technology Data
UFR 02: École d’économie de la Sorbonne
Université Paris 1 Panthéon-Sorbonne

1. Introduction

Panel Data Regression Model

The general model can be written as:

yi,t = ai + bt + x′i,t β + εi,t
The model represents an attribute y of an entity i at different dates. The main hypothesis in panel
data analysis is that β is the same for all entities and all dates. There should be no cross-correlation
and no serial correlation. Individual effect is taken as ai while bt represents the time effect.

There are two main specifications to characterize ai : (i) fixed effects (ii) random effects. The
Hausman Test is used to differentiate or to choose between fixed effects model and random effects
model in panel analysis.

2. Problem Statement

Economic theory as it is currently practiced predicts that raising the minimum wage will result
in higher unemployment in a labor market with perfect competition. New Jersey (NJ), a U.S.
state, increased its minimum wage to $5.05. This change took effect in April 1992. In the project,
the study by David Card and Alan B. Krueger about the effect of a raise in minimum wages on
employment will be replicated.

Using a DiD methodology, Card and Krueger (1994) demonstrated how the increase in the mini-
mum wage resulted in more jobs being created in the fast-food restaurant industry. In their study,
Pennsylvania (PA), a neighboring state in the United States that was not affected by the policy
change, serves as the control group. A representative sample of fast-food restaurants in NJ and
PA participated in the survey the authors conducted before and after the minimum wage increase.

3. DiD Theory

DiD analysis is a statistical/econometric technique to mimic an experimental research design us-

ing observational study data. This involves studying the differential effect of a treatment on a
2

’treatment group’ versus a ’control group’ in a natural experiment. The main assumption is that,
although treatment and comparison groups may have different levels of the outcome prior to the
start of treatment, their trends in pre-treatment outcomes should be the same. The estimated
impact of the treatment is then the OLS estimate of the parameter:

δ = (ȳ12 − ȳ11 ) − (ȳ22 − ȳ21 )

Indeed, ȳ11 = α0 + a and ȳ12 = α0 + a + b + δ, are historical means of Y for all individuals
belonging to group 1, respectively, before and after treament date. ȳ21 = α0 and ȳ22 = α0 + b are
the historical means of Y for all individuals belonging to group 2, before and after treament date.
Accordingly, the two differences (with respect to before and after the treatment dates) are:
(ȳ12 − ȳ11 ) = a and (ȳ22 − ȳ21 ) = a + δ
Finally, the Difference-in-difference is:

(ȳ12 − ȳ11 ) − (ȳ22 − ȳ21 ) = δ

which can be estimated by OLS provided that the residuals of the the mean equations have no
cross and serial correlations.

4. Data Descriptions and Preparations

The New Jersey-Pennsylvania Data Set in the study of Card and Krueger with 410 observations
is also used in this analysis.

Table 1: Column Location

Name Start End Format Explanation

SHEET 1 3 3.0 sheet number (unique store id)
CHAIN 5 5 1.0 chain 1=bk; 2=kfc; 3=roys; 4=wendys
CO OWNED 7 7 1.0 1 if company owned
STATE 9 9 1.0 1 if NJ; 0 if Pa

Table 2: Dummies for Location

Name Start End Format Explanation

SHEET 1 3 3.0 sheet number (unique store id)
CHAIN 5 5 1.0 chain 1=bk; 2=kfc; 3=roys; 4=wendys
CO OWNED 7 7 1.0 1 if company owned
STATE 9 9 1.0 1 if NJ; 0 if Pa

Variable used in the model

# chain value label (four main restaurants)
mutate(chain = case_when(chain == 1 ~ "bk",
chain == 2 ~ "kfc",
3

chain == 3 ~ "roys",
chain == 4 ~ "wendys")) %>%
# state value label
mutate(state = case_when(state == 1 ~ "New Jersey",
state == 0 ~ "Pennsylvania")) %>%
# Region dummy
mutate(region = case_when(southj == 1 ~ "southj",
centralj == 1 ~ "centralj",
northj == 1 ~ "northj",
shore == 1 ~ "shorej",
pa1 == 1 ~ "phillypa",
pa2 == 1 ~ "eastonpa")) %>%
# meals value label
mutate(meals = case_when(meals == 0 ~ "none",
meals == 1 ~ "free meals",
meals == 2 ~ "reduced price meals",
meals == 3 ~ "both free and reduced price meals")) %>%
# meals value label
mutate(meals2 = case_when(meals2 == 0 ~ "none",
meals2 == 1 ~ "free meals",
meals2 == 2 ~ "reduced price meals",
meals2 == 3 ~ "both free and reduced price meals")) %>%
# status2 value label
mutate(status2 = case_when(status2 == 0 ~ "refused second interview",
status2 == 1 ~ "answered 2nd interview",
status2 == 2 ~ "closed for renovations",
status2 == 3 ~ "closed permanently",
status2 == 4 ~ "closed for highway construction",
status2 == 5 ~ "closed due to Mall fire")) %>%

Table 3: Distribution of Restaurants

chain New Jersey Pennsylvania

bk 41.1% 44.3%
kfc 20.5% 15.2%
roys 24.8% 21.5%
wendys 13.6% 19.0%
4

Table 4: Key Variables

Variable Description Formula

Emptot (y) Full-time Equivalent Employment emptot = empft + nmgrs + 0.5 * emppt
pct fte % of Full-time Employee pct fte = (empft / emptot )* 100
wage st Starting Wage ($/hr)
hrsopen Number Hours Open / Day

Table 5: Key Terms

Treatment Group Control Before Treatment After Treatment
New Jersey (NJ) Pennsylvania (PA) February November

5. Descrptive Statistics

Pre-treatment Mean
output:
variable ‘New Jersey‘ Pennsylvania
1 emptot 20.4 23.3
2 pct_fte 32.8 35.0
3 wage_st 4.61 4.63
4 hrsopen 14.4 14.5

Post-Treatment Mean

output:
variable New Jersey Pennsylvania
1 emptot 21.0 21.2
2 pct_fte 35.9 30.4
3 wage_st 5.08 4.62
4 hrsopen 14.4 14.7

In April 1992, the U.S. state of New Jersey (NJ) raised the minimum wage from $4.25 to $5.05. It
can be seen that Despite the increase in wages, full-time equivalent employment increased in New
Jersey relative to Pennsylvania. Whereas New Jersey stores were initially smaller, employment
gains in New Jersey coupled with losses in Pennsylvania led to a small and statistically insignificant
interstate.
5

Figure 1: Starting Wage Distribution February and November 2022 Comparison

6. Results of First Difference

Treatment group Control group Treatment group Control group

Notation (NJ) (PA) (NJ) (PA)
before treatment before treatment after treatment after treatment
Total Y1,1 Y2,1 Y1,2 Y2,2
Employment 20.4 23.3 21.0 21.2

In Yi,j , i group (1 = treatment group, 2 = control group). j indicates before (1) or after (2) the
treatment.

difference of difference of
difference of difference of
November November
NJ and PA NJ and PA
and February and February
within February within November
within NJ within PA
Formula NJ Nov-NJ Feb PA Nov-PA Feb NJ Feb-PA Feb NJ Nov-PA Nov
Total
0.5880214 -2.165584 -2.891761 -0.1381549
Employment

The difference between the difference of November and February within NJ and PA is calculated
as (NJnov-NJfeb)-(PAnov-PAfeb) = 2.75.

7. Counter-theory Results

According to the classical economic theory, the increase of minimum wage should have decreased
in theory (counterfactual). However, the full-time equivalent (FTE) employment has increased
6

after the policy of raising the minimum wage was published.

Understanding the DiD technique can be greatly aided by visually representing the relationship
between the treatment and control groups. It can be started by using the variable emptot differ-
ences for NJ and PJ in February and November that were computed in the preceding phase.

In addition, there is a need to know what would happen in New Jersey if the treatment (an
increase in the minimum wage) failed. The counterfactual result (NJ counterfactual) is what is
meant by this.

According to the DiD hypothesis, up until the start of the treatment, the trends of the treat-
ment and control groups are the same. Therefore, without treatment, NJ’s employment (emptot)
would decrease by the same amount as PA’s from February to November.

8. DiD Estimation

8.1 Stationarity Tests

The series must be all I(0) before difference-in-difference analysis. Augmented Dickey-Fuller Tests
are done in order to verify this:

"Level Variable emptot with Drift"

statistic 1pct 5pct 10pct
tau2 -18.89631 -3.43 -2.86 -2.57
phi1 178.53521 6.43 4.59 3.78

"Level Variable emptot with Drift and Trend"

statistic 1pct 5pct 10pct
tau3 -18.98278 -3.96 -3.41 -3.12
phi2 120.11572 6.09 4.68 4.03
phi3 180.17354 8.27 6.25 5.34

The results here only shows that the series full-time equivalent employment is I(0).
7

8.2 Dummy Variables

With linear regression, this result can be achieved very easy. At first, there is a need to create
two dummy variables. One indicates the start of the treatment (time) and is equal to zero before
the treatment and equal to one after the treatment. The other variable separates the observations
into a treatment and control group (treated). This dummy variable is equal to one for fast food
restaurants located in NJ and equal to zero for fast food restaurants located in PA.

8.3 Model 1: interaction method

The DiD estimator is an interaction between both dummy variables. This interaction can be
specified with the “:” operator in the formula of function lm() in addition to the individual dummy
variables.

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.331 1.072 21.767 <2e-16 ***
time -2.166 1.516 -1.429 0.1535
treated -2.892 1.194 -2.423 0.0156 *
time:treated 2.754 1.688 1.631 0.1033
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

Residual standard error: 9.406 on 790 degrees of freedom

(26 observations deleted due to missingness)
Multiple R-squared: 0.007401, Adjusted R-squared: 0.003632
F-statistic: 1.964 on 3 and 790 DF, p-value: 0.118
Conclusion: The coefficient for ‘time:treated’ is the differences-in-differences estimator. The effect
is not significant with the treatment having a positive effect.

8.4 Residual Test for DiD Model

Duson Watson Test
DW = 1.8398, p-value = 0.008989
H0: There is no autocorrelation
H1: True autocorrelation is greater than 0

The Durbin-Watson test statistic has always a value between 0 and 4, where:
[0-2): means positive autocorrelation
2: means no autocorrelation
(2-4]: mean negative autocorrelation
According to the Durbin-Watson Test, there is auto-correlation in the residuals of DID model.

8.5 Model 2: the multiplication method

In this method, the DiD estimator is estimated without the need to generate the interaction.
8

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 23.331 1.072 21.767 <2e-16 ***
time -2.166 1.516 -1.429 0.1535
treated -2.892 1.194 -2.423 0.0156 *
time:treated 2.754 1.688 1.631 0.1033
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 9.406 on 790 degrees of freedom

Multiple R-squared: 0.007401,Adjusted R-squared: 0.003632
F-statistic: 1.964 on 3 and 790 DF, p-value: 0.118

Conclusion: The coefficient for ‘time:treated’ is the differences-in-differences estimator. The effect
is not significant with the treatment having a positive effect.

8.6 Residual Test for didm odel2

Durbin-Watson test

data: did_model2
DW = 1.8398, p-value = 0.008989
H0: There is no autocorrelation
H1: True autocorrelation is greater than 0

The Durbin-Watson test statistic has always a value between 0 and 4, where:
[0-2): means positive autocorrelation
9

2: means no autocorrelation
(2-4]: mean negative autocorrelation

According to the Durbin-Watson, there is auto-correlation in the residuals of DID model.

According to the Durbin-Watson Test, there is auto-correlation in the residuals of DID model.

Conclusion: Need to introduce control X-variables in the regression

8.7 Fixed effects

In the study the author shows a more precise calculation of the DiD estimator, which only in-
cludes fast food restaurants that have responses regarding employment (emptot) before and after
the treatment (this is a so-called balanced sample). In RStudio this result can be generated by
computing a fixed effects model which is sometimes also called a within estimator. The R package
plm is used to run this regression with function plm() and argument model = ”within”. Before-
hand, the data has to be declared as a panel with function p.dataframe(). With variable sheet
each fast food restaurant can be uniquely identified. Additionally, the function coeftest() from R
package lmtest is needed in order to obtain the correct standard errors which must be clustered
by sheet.
t test of coefficients:

Estimate Std. Error t value Pr(>|t|)

time2 -2.2833 1.2465 -1.8319 0.06775 .
time2:treated 2.7500 1.3359 2.0585 0.04022 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

9. DiD Estimation with Time-Varying Effects

The key assumption here is the Parallel Trends Assumption: Absent treatment the outcomes
for the control and treatment group would follow parallel trends. A confounder variable is added
10

that leads to non-parallel trends. It is assumed that the outcome y also depends on a confounding
variable x that develops differently for the control and treatment group over time:
term estimate std.error statistic p-value
1 (Intercept) -3.71 0.0511 -72.5 0
2 time 0.978 0.0723 13.5 9.47e-38
3 treated 39.2 0.0569 688. 0
4 time:treated 50.1 0.0805 622. 0

Conclusion: The coefficient for ‘time:treated’ is the differences-in-differences estimator. The effect
is strong significant with the treatment having a positive effect when considering time-varying
effects.

10. DiD Estimation with X-variables (Control Variables)

So far, this would be the standard way to do a regression if there were no confounding variables.
But, since there are strong reasons to believe that age might distort the analysis here, there should
be a control for age in this analysis. In R, age is simply added to the equation. Instead of a
regression line, there is now a three-dimensional model (time, treated and EMPPT, that is part-
time employees). So the regression line becomes a regression plane (or surface). This is how this
looks like with the data:
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 15.58246 1.11671 13.954 <2e-16 ***
time -2.32777 1.36278 -1.708 0.0880 .
treated -2.56004 1.07323 -2.385 0.0173 *
emppt 0.39645 0.02887 13.730 <2e-16 ***
time:treated 3.09423 1.51806 2.038 0.0419 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.456 on 789 degrees of freedom

Multiple R-squared: 0.1988,Adjusted R-squared: 0.1948
F-statistic: 48.95 on 4 and 789 DF, p-value: < 2.2e-16
After taking the part-time employees into control, the DiD estimator becomes significant.

11. DiD Estimation with Anticipation Effects

Model 1: card_krueger_1994_mod$time ~ Lags(card_krueger_1994_mod$time, 1:1) +

Lags(card_krueger_1994_mod$emptot, 1:1)
Model 2: card_krueger_1994_mod$time ~ Lags(card_krueger_1994_mod$time, 1:1)
Res.Df Df F Pr(>F)
1 765
2 766 -1 0.0224 0.8811
11

> grangertest(card_krueger_1994_mod$emptot, card_krueger_1994_mod$treated, order = 1)

Granger causality test

Model 1: card_krueger_1994_mod$treated ~ Lags(card_krueger_1994_mod$treated, 1:1) +

Lags(card_krueger_1994_mod$emptot, 1:1)
Model 2: card_krueger_1994_mod$treated ~ Lags(card_krueger_1994_mod$treated, 1:1)
Res.Df Df F Pr(>F)
1 765
2 766 -1 0.0868 0.7684

The null hypothesis of the test cannot be rejected because the p-value is larger than 0.05 inferring
that the DiD estimator is not valuable for forecasting the future values of EMPTOT.

12. R Codes

# ---------------------------------------------------------------------------------
# Title: Financial Econometrics 6 Difference-in-difference approach
# Member: Hongbo, Allen, Alexandra
# Date: 12/Nov/2022
# ---------------------------------------------------------------------------------

# install the necessary library

library(dynlm)
library(xts)
library(TTR)
library(xts)
library(ggplot2)
library(tidyquant)
library(tidyverse)
library(timetk)
library(tibbletime)
library(broom)
library(quantmod)
install.packages("writexl")
library("writexl")
install.packages(’readxl’)
library(readxl)
install.packages(seastests)
install.packages(magrittr)
install.packages(dplyr)
library(magrittr)
library(dplyr)
install.packages(’moments’)
12

library(moments)
library(tseries)
install.packages(’stats’)
library(stats)
install.packages(forecast)
library(forecast)
library(urca)
install.packages("tsDyn")
library(tsDyn)
forecast::auto.arima
install.packages("sjlabelled")
library("sjlabelled")
install.packages("ggpubr")
library("ggpubr")
install.packages("plm")
library(plm)
install.packages("lmtest")
library(lmtest)

# ------------------------------------------------------------------------------
# Section 1 Data preparation for analysis
# ------------------------------------------------------------------------------

#njmin.zip is an archive with 5 files pertaining to the New Jersey - Pennsylvania surveys used
#in Card and Krueger’s book Myth and Measurement, chapter 2:

# This assignment will be replicating a study by David Card and Alan B. Krueger
about the effect of a raise in minimum wages on employment.

# Temporary file and path

tfile_path <- tempfile()
tdir_path <- tempdir()

# Download zip file

download.file("http://davidcard.berkeley.edu/data_sets/njmin.zip",
destfile = tfile_path)

# -----------------
# Raw Data
# -----------------
# Unzip
unzip(tfile_path, exdir = tdir_path)

# Read codebook
13

codebook <- read_lines(file = paste0(tdir_path, "/codebook"))

# Generate a vector with variable names

variable_names <- codebook %>%
‘[‘(8:59) %>% # Variablennamen starten bei Element 8 (sheet)
‘[‘(-c(5, 6, 13, 14, 32, 33)) %>% # Elemente ohne Variablennamen entfernen
A¤ngster Variablenname enth~
str_sub(1, 8) %>% # l~ A¤lt 8 Zeichen
str_squish() %>% # Whitespaces entfernen
str_to_lower() # nur Kleinbuchstaben verwenden

# Generate a vector with variable labels

variable_labels <- codebook %>%
‘[‘(8:59) %>% # variable names start at element 8 (sheet)
‘[‘(-c(5, 6, 13, 14, 32, 33)) %>% # remove elements w/o variable names
sub(".*\\.[0-9]", "", .) %>%
‘[‘(-c(5:10)) %>% # these elements are combined later on
str_squish() # remove white spaces

# Region
variable_labels[41] <- "region of restaurant"

# Read raw data

data_raw <- read_table2(paste0(tdir_path, "/public.dat"),
col_names = FALSE)

head(data_raw)

# -----------------
# Cleaned Data
# -----------------

# Add variable names

data_mod <- data_raw %>%
select(-X47) %>% # remove empty column
‘colnames<-‘(., variable_names) %>% # Assign variable names
mutate_all(as.numeric) %>% # treat all variables as numeric
mutate(sheet = ifelse(sheet == 407 & chain == 4, 408, sheet)) # duplicated sheet id 407

# Process data (currently wide format)

data_mod <- data_mod %>%
# chain value label
mutate(chain = case_when(chain == 1 ~ "bk",
chain == 2 ~ "kfc",
chain == 3 ~ "roys",
chain == 4 ~ "wendys")) %>%
14

# state value label

mutate(state = case_when(state == 1 ~ "New Jersey",
state == 0 ~ "Pennsylvania")) %>%
# Region dummy
mutate(region = case_when(southj == 1 ~ "southj",
centralj == 1 ~ "centralj",
northj == 1 ~ "northj",
shore == 1 ~ "shorej",
pa1 == 1 ~ "phillypa",
pa2 == 1 ~ "eastonpa")) %>%
# meals value label
mutate(meals = case_when(meals == 0 ~ "none",
meals == 1 ~ "free meals",
meals == 2 ~ "reduced price meals",
meals == 3 ~ "both free and reduced price meals")) %>%
# meals value label
mutate(meals2 = case_when(meals2 == 0 ~ "none",
meals2 == 1 ~ "free meals",
meals2 == 2 ~ "reduced price meals",
meals2 == 3 ~ "both free and reduced price meals")) %>%
# status2 value label
mutate(status2 = case_when(status2 == 0 ~ "refused second interview",
status2 == 1 ~ "answered 2nd interview",
status2 == 2 ~ "closed for renovations",
status2 == 3 ~ "closed permanently",
status2 == 4 ~ "closed for highway construction",
status2 == 5 ~ "closed due to Mall fire")) %>%
mutate(co_owned = if_else(co_owned == 1, "yes", "no")) %>%
mutate(bonus = if_else(bonus == 1, "yes", "no")) %>%
mutate(special2 = if_else(special2 == 1, "yes", "no")) %>%
mutate(type2 = if_else(type2 == 1, "phone", "personal")) %>%
select(-southj, -centralj, -northj, -shore, -pa1, -pa2) %>% # now included in region dummy
mutate(date2 = lubridate::mdy(date2)) %>% # Convert date
rename(open2 = open2r) %>% #Fit name to wave 1
rename(firstinc2 = firstin2) %>% # Fit name to wave 1
sjlabelled::set_label(variable_labels) # Add stored variable labels

# -----------------
# Transposed Data
# -----------------

# Structural variables
structure <- data_mod %>%
select(sheet, chain, co_owned, state, region)
15

# Wave 1 variables
wave1 <- data_mod %>%
select(-ends_with("2"), - names(structure)) %>%
mutate(observation = "February 1992") %>%
bind_cols(structure)

# Wave 2 variables
wave2 <- data_mod %>%
select(ends_with("2")) %>%
rename_all(~str_remove(., "2")) %>%
mutate(observation = "November 1992") %>%
bind_cols(structure)

# Final dataset
card_krueger_1994 <- bind_rows(wave1, wave2) %>%
select(sort(names(.))) %>% # Sort columns alphabetically
sjlabelled::copy_labels(data_mod) # Restore variable labels

# ------------
# Final Data
# ------------
card_krueger_1994_mod <- card_krueger_1994 %>%
mutate(emptot = empft + nmgrs + 0.5 * emppt,
pct_fte = empft / emptot * 100)

# ------------------------------------------------------------------------------
# Section 2 Descriptive Statistics
# ------------------------------------------------------------------------------

card_krueger_1994_mod %>%
select(chain, state) %>%
table() %>%
prop.table(margin = 2) %>%
apply(MARGIN = 2,
FUN = scales::percent_format(accuracy = 0.1)) %>%
noquote

# Pre-treatment Means
card_krueger_1994_mod %>%
filter(observation == "February 1992") %>%
group_by(state) %>%
summarise(emptot = mean(emptot, na.rm = TRUE),
pct_fte = mean(pct_fte, na.rm = TRUE),
wage_st = mean(wage_st, na.rm = TRUE),
16

hrsopen = mean(hrsopen, na.rm = TRUE)) %>%

pivot_longer(cols=-state, names_to = "variable") %>%
pivot_wider(names_from = state, values_from = value)

# Post-treatment Means
card_krueger_1994_mod %>%
filter(observation == "November 1992") %>%
group_by(state) %>%
summarise(emptot = mean(emptot, na.rm = TRUE),
pct_fte = mean(pct_fte, na.rm = TRUE),
wage_st = mean(wage_st, na.rm = TRUE),
hrsopen = mean(hrsopen, na.rm = TRUE)) %>%
pivot_longer(cols=-state, names_to = "variable") %>%
pivot_wider(names_from = state, values_from = value)

# Figure
hist.feb <- card_krueger_1994_mod %>%
filter(observation == "February 1992") %>%
ggplot(aes(wage_st, fill = state)) +
geom_histogram(aes(y=c(..count..[..group..==1]/sum(..count..[..group..==1]),
..count..[..group..==2]/sum(..count..[..group..==2]))*100),
alpha=0.5, position = "dodge", bins = 23) +
labs(title = "February 1992", x = "Wage range", y = "Percent of stores", fill = "") +
scale_fill_grey()

hist.nov <- card_krueger_1994_mod %>%

filter(observation == "November 1992") %>%
ggplot(aes(wage_st, fill = state)) +
geom_histogram(aes(y=c(..count..[..group..==1]/sum(..count..[..group..==1]),
..count..[..group..==2]/sum(..count..[..group..==2]))*100),
alpha = 0.5, position = "dodge", bins = 23) +
labs(title = "November 1992", x="Wage range", y = "Percent of stores", fill="") +
scale_fill_grey()

ggarrange(hist.feb, hist.nov, ncol = 2,

common.legend = TRUE, legend = "bottom")

# ------------------------------------------------------------------------------
# Section 3 First Difference
# ------------------------------------------------------------------------------

differences <- card_krueger_1994_mod %>%

group_by(observation, state) %>%
summarise(emptot = mean(emptot, na.rm = TRUE))
17

# Treatment group (NJ) before treatment

njfeb <- differences[1,3]
njfeb
# Control group (PA) before treatment
pafeb <- differences[2,3]
pafeb
# Treatment group (NJ) after treatment
njnov <- differences[3,3]
njnov
# Control group (PA) after treatment
panov <- differences[4,3]
panov

# ------------------------------------------------------------------------------
# Section 4 Average Treatment Effect
# ------------------------------------------------------------------------------

# calculate the difference between the difference of November and February within NJ and PA
(njnov-njfeb)-(panov-pafeb)

# calculate the difference between the difference of NJ and PA within November and February
(njnov-panov)-(njfeb-pafeb)

######Degression

# Calculate counterfactual outcome

nj_counterfactual <- tibble(
observation = c("February 1992","November 1992"),
state = c("New Jersey (Counterfactual)","New Jersey (Counterfactual)"),
emptot = as.numeric(c(njfeb, njfeb-(pafeb-panov)))
)

# Data points for treatment event

intervention <- tibble(
observation = c("Intervention", "Intervention", "Intervention"),
state = c("New Jersey", "Pennsylvania", "New Jersey (Counterfactual)"),
emptot = c(19.35, 22.3, 19.35)
)

# Combine data
did_plotdata <- bind_rows(differences,
nj_counterfactual,
intervention)
18

######Line Plot
did_plotdata %>%
mutate(label = if_else(observation == "November 1992",
as.character(state), NA_character_)) %>%
ggplot(aes(x=observation,y=emptot, group=state)) +
geom_line(aes(color=state), size=1.2) +
geom_vline(xintercept = "Intervention", linetype="dotted",
color = "black", size=1.1) +
scale_color_brewer(palette = "Accent") +
scale_y_continuous(limits = c(17,24)) +
ggrepel::geom_label_repel(aes(label = label),
nudge_x = 0.5, nudge_y = -0.5,
na.rm = TRUE) +
guides(color=FALSE) +
labs(x="", y="FTE Employment (mean)") +
annotate(
"text",
x = "November 1992",
y = 19.6,
label = "{Difference-in-Differences}",
angle = 90,
size = 3
)

# ------------------------------------------------------------------------------
# Section 5 DiD Estimator
# ------------------------------------------------------------------------------
colnames(card_krueger_1994_mod)

emptot <- card_krueger_1994_mod$emptot

#####Stationarity Test of the Series

adf.none = list(emptot = ur.df(na.omit(emptot), type=’none’,selectlags = c("BIC")))
adf.drift = list(emptot = ur.df(na.omit(emptot), type=’drift’,selectlags = c("BIC")))
adf.trend = list(emptot = ur.df(na.omit(emptot), type=’trend’,selectlags = c("BIC")))

print("Level Variable emptot with None")

cbind(t(adf.none$emptot@teststat), adf.none$emptot@cval)

print("Level Variable emptot with Drift")

cbind(t(adf.drift$emptot@teststat), adf.drift$emptot@cval)

print("Level Variable emptot with Drift and Trend")

cbind(t(adf.trend$emptot@teststat), adf.trend$emptot@cval)
19

#####Dummy Variable
card_krueger_1994_mod <- mutate(card_krueger_1994_mod,
time = ifelse(observation == "November 1992", 1, 0),
treated = ifelse(state == "New Jersey", 1, 0)
)

######DiD Estimation by interaction model

card_krueger_1994_mod$did = card_krueger_1994_mod$time * card_krueger_1994_mod$treated

did_model <- lm(emptot ~ time + treated + did, data = card_krueger_1994_mod)

summary(did_model)

# residual tests
acf(did_model$residuals)

library(lmtest)
lmtest::dwtest(did_model)

######DiD Estimation by interaction model

did_model2 <- lm(emptot ~ time*treated, data = card_krueger_1994_mod)
summary(did_model2)

acf(did_model2$residuals)

lmtest::dwtest(did_model2)

######fixed effects

# Declare as panel data

panel <- pdata.frame(card_krueger_1994_mod, "sheet")

# Within model
did.reg <- plm(emptot ~ time + treated + time:treated,
data = panel, model = "within")

# obtain clustered standard errors

coeftest(did.reg, vcov = function(x)
vcovHC(x, cluster = "group", type = "HC1"))

# ------------------------------------------------------------------------------
# Section 6 DiD Estimator with X-variables (control-variables)
20

# ------------------------------------------------------------------------------

# use part-time employees as control variabels

summary(lm(emptot ~ time*treated + emppt, card_krueger_1994_mod))

# ------------------------------------------------------------------------------
# Section 7 DiD Estimator with Anticipation Effects
# ------------------------------------------------------------------------------

grangertest(card_krueger_1994_mod$emptot, card_krueger_1994_mod$time, order = 1)

grangertest(card_krueger_1994_mod$emptot, card_krueger_1994_mod$treated, order = 1)

Download Analysis of Financial Time Series 3rd Edition Ruey S. Tsay ebook All Chapters PDF
100% (9)
Download Analysis of Financial Time Series 3rd Edition Ruey S. Tsay ebook All Chapters PDF
60 pages
An Introduction to Probability and Statistics
From Everand
An Introduction to Probability and Statistics
Vijay K. Rohatgi
4/5 (1)
Lecture 17
No ratings yet
Lecture 17
20 pages
Applied Economics DD Lecture Notes
No ratings yet
Applied Economics DD Lecture Notes
76 pages
Empirical Problem 4
No ratings yet
Empirical Problem 4
3 pages
Important Natural Experiments RDD
No ratings yet
Important Natural Experiments RDD
40 pages
AE Lecture 3 Differences-in-Differences
No ratings yet
AE Lecture 3 Differences-in-Differences
55 pages
Tejas_Nageshwaran_A7
No ratings yet
Tejas_Nageshwaran_A7
6 pages
Exercise 5
No ratings yet
Exercise 5
2 pages
Lecture 3 Differences in Differences
100% (1)
Lecture 3 Differences in Differences
47 pages
Callaway & SantAnna
No ratings yet
Callaway & SantAnna
31 pages
Differences in Differences
No ratings yet
Differences in Differences
78 pages
Did, Iv
No ratings yet
Did, Iv
42 pages
CardKrueger94 en
No ratings yet
CardKrueger94 en
13 pages
Handout 6 Causality
No ratings yet
Handout 6 Causality
16 pages
1 205
No ratings yet
1 205
205 pages
National Minimum Wage Regulation of 2014, Price - and Hour-Effects and Demand For Labour in The Hospitality Industry in The NW Region".
No ratings yet
National Minimum Wage Regulation of 2014, Price - and Hour-Effects and Demand For Labour in The Hospitality Industry in The NW Region".
4 pages
Bertrand Et Al. (2004) - How Much Should We Trust Differences-In-Differences Estimates
No ratings yet
Bertrand Et Al. (2004) - How Much Should We Trust Differences-In-Differences Estimates
28 pages
Econometrics - Slides
No ratings yet
Econometrics - Slides
264 pages
Arkhangelsky-SyntheticDifferenceinDifferences-2021
No ratings yet
Arkhangelsky-SyntheticDifferenceinDifferences-2021
32 pages
2024 DiD Handout
No ratings yet
2024 DiD Handout
4 pages
CH13 Wooldridge 7e+PPT 2pp
No ratings yet
CH13 Wooldridge 7e+PPT 2pp
14 pages
Part 1b
No ratings yet
Part 1b
7 pages
Tarea 3 Ingles
No ratings yet
Tarea 3 Ingles
23 pages
Past Paper 2019
No ratings yet
Past Paper 2019
7 pages
Wooldridge Slides 10 Diff in Diffs
No ratings yet
Wooldridge Slides 10 Diff in Diffs
31 pages
HD Econometrics
No ratings yet
HD Econometrics
197 pages
What's New in Econometrics? Difference-in-Differences Estimation
No ratings yet
What's New in Econometrics? Difference-in-Differences Estimation
31 pages
Minimum Wages and Employment
No ratings yet
Minimum Wages and Employment
17 pages
PS1 2012 Solution
100% (2)
PS1 2012 Solution
14 pages
FAM Prev Quest Ans
No ratings yet
FAM Prev Quest Ans
7 pages
01_Introduction
No ratings yet
01_Introduction
53 pages
Matching Regression
No ratings yet
Matching Regression
6 pages
HD - Machine Learnind and Econometrics
No ratings yet
HD - Machine Learnind and Econometrics
185 pages
EC313 Assignment2 W12 Sol
No ratings yet
EC313 Assignment2 W12 Sol
4 pages
Lect 10 Diffindiffs 230305 014504
No ratings yet
Lect 10 Diffindiffs 230305 014504
20 pages
Tarea 3 Ingles
No ratings yet
Tarea 3 Ingles
23 pages
Lect - 10 - Difference-in-Differences Estimation PDF
No ratings yet
Lect - 10 - Difference-in-Differences Estimation PDF
19 pages
正在发送邮件 wk-08-slides
No ratings yet
正在发送邮件 wk-08-slides
96 pages
Minimum Wages and Employment: A Case Study of The Fast-Food Industry in New Jersey and Pennsylvania
No ratings yet
Minimum Wages and Employment: A Case Study of The Fast-Food Industry in New Jersey and Pennsylvania
26 pages
David Card
No ratings yet
David Card
26 pages
Ecc321 Chapter 1
No ratings yet
Ecc321 Chapter 1
7 pages
Distribution Regression Difference-in-Differences
No ratings yet
Distribution Regression Difference-in-Differences
49 pages
01 Introduction
No ratings yet
01 Introduction
28 pages
2019-Impact Evaluation Using DiD
No ratings yet
2019-Impact Evaluation Using DiD
14 pages
IE Questions
No ratings yet
IE Questions
6 pages
Card Krueger AER 1994
No ratings yet
Card Krueger AER 1994
23 pages
Card & Krueger (1993) - MInimum Wages and Employment, A Case Study of The Fast Food Industry in New Jersey and Pennsylvania
No ratings yet
Card & Krueger (1993) - MInimum Wages and Employment, A Case Study of The Fast Food Industry in New Jersey and Pennsylvania
57 pages
Harris. and Jonathan Orszag. This Paper Is Part of NBER's Program in Labor Studies. Any
No ratings yet
Harris. and Jonathan Orszag. This Paper Is Part of NBER's Program in Labor Studies. Any
57 pages
Where can buy (Ebook) asymptotic theory for econometrician by Halbert White ebook with cheap price
100% (5)
Where can buy (Ebook) asymptotic theory for econometrician by Halbert White ebook with cheap price
67 pages
2503.13323v1
No ratings yet
2503.13323v1
75 pages
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
No ratings yet
Asheshrambachan Harvard EconomicsPhD Dissertation Revised
254 pages
MAE 301: Applied Experimental Statistics
No ratings yet
MAE 301: Applied Experimental Statistics
10 pages
CausalML Book
No ratings yet
CausalML Book
496 pages
Difference in Difference Models
No ratings yet
Difference in Difference Models
30 pages
Lec06 - Panel Data
No ratings yet
Lec06 - Panel Data
160 pages
Lecture Note 2019 PDF
100% (1)
Lecture Note 2019 PDF
235 pages
Econometric S
100% (1)
Econometric S
348 pages
Medical Statistics at a Glance Workbook
From Everand
Medical Statistics at a Glance Workbook
Aviva Petrie
No ratings yet
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
BZAN_6310-project_instructions
No ratings yet
BZAN_6310-project_instructions
4 pages
Week 5 of Tests and Testing
No ratings yet
Week 5 of Tests and Testing
7 pages
Emotional Intelligence and Professional Commitment PDF
No ratings yet
Emotional Intelligence and Professional Commitment PDF
18 pages
Instant Download Numerical Methods in Environmental Data Analysis Moses Eterigho Emetere PDF All Chapters
100% (1)
Instant Download Numerical Methods in Environmental Data Analysis Moses Eterigho Emetere PDF All Chapters
47 pages
YONAS WUDNEH MBA FINALTHESIS PAPER NOVEMBER 2023 edition 1
No ratings yet
YONAS WUDNEH MBA FINALTHESIS PAPER NOVEMBER 2023 edition 1
65 pages
Python Sklearn Linear Regression
No ratings yet
Python Sklearn Linear Regression
45 pages
Updated Cs3352 - Foundations of Data Science - Duraimurugan
No ratings yet
Updated Cs3352 - Foundations of Data Science - Duraimurugan
16 pages
UCR Syllabus
No ratings yet
UCR Syllabus
5 pages
M. SC Wildlife Biology and Conservation National Entrance Test, December 10, 2017
No ratings yet
M. SC Wildlife Biology and Conservation National Entrance Test, December 10, 2017
21 pages
RM Model Question Final
No ratings yet
RM Model Question Final
3 pages
Panel Vs Pooled Data
No ratings yet
Panel Vs Pooled Data
9 pages
An Introduction to Generalized Linear Models 3rd Edition Annette J. Dobson - The 2025 ebook edition is available with updated content
100% (2)
An Introduction to Generalized Linear Models 3rd Edition Annette J. Dobson - The 2025 ebook edition is available with updated content
47 pages
Anodizingv Parameters Optimization of Ti6Al4V Titanium Alloy Using Response Surface Methodology
No ratings yet
Anodizingv Parameters Optimization of Ti6Al4V Titanium Alloy Using Response Surface Methodology
15 pages
A Zaenal Mufaqih - Tugas6
No ratings yet
A Zaenal Mufaqih - Tugas6
6 pages
Assignment 6 Carlosalvarezmacias
No ratings yet
Assignment 6 Carlosalvarezmacias
17 pages
Computation of Tourist Arrival
No ratings yet
Computation of Tourist Arrival
1 page
JM - Jie,+11555 31878 1 SP
No ratings yet
JM - Jie,+11555 31878 1 SP
16 pages
Study e Material
No ratings yet
Study e Material
115 pages
A Study On The Effects of The Library Services and Resources To The Learning Performance of Isa Town Secondary School
No ratings yet
A Study On The Effects of The Library Services and Resources To The Learning Performance of Isa Town Secondary School
5 pages
CE4022 Lecture Note 4-1 Flood Frequency Analysis and Reservoir Capacity Yield
No ratings yet
CE4022 Lecture Note 4-1 Flood Frequency Analysis and Reservoir Capacity Yield
88 pages
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
No ratings yet
T-Tests, Anovas & Regression: and Their Application To The Statistical Analysis of Neuroimaging
39 pages
Assessing The Strength of Reinforced Concrete Structures Through Ultrasonic Pulse Velocity and Schmidt Rebound Hammer Tests-Libre
No ratings yet
Assessing The Strength of Reinforced Concrete Structures Through Ultrasonic Pulse Velocity and Schmidt Rebound Hammer Tests-Libre
8 pages
Econometrics Beat - Dave Giles' Blog - ARDL Modelling in EViews 9
No ratings yet
Econometrics Beat - Dave Giles' Blog - ARDL Modelling in EViews 9
26 pages
ML QB Ans
No ratings yet
ML QB Ans
141 pages
Assignment 2 Submitted
No ratings yet
Assignment 2 Submitted
8 pages
The Box-Jenkins Practical
No ratings yet
The Box-Jenkins Practical
9 pages
Stata Tutorial: Updated For Version 16
No ratings yet
Stata Tutorial: Updated For Version 16
49 pages
Logistic Regression
No ratings yet
Logistic Regression
27 pages
Unit-3
No ratings yet
Unit-3
131 pages