0% found this document useful (0 votes)
98 views8 pages

1 Assignment Presentation: Patrick Blanchenay Due Wednesday 9th Dec 2020, 11.59pm

This document provides instructions for Assignment 4 in ECO372H1F. Students must submit answers to exercises in a PDF document along with the Stata do-file and log file used to generate the answers. The files must follow specific naming and formatting guidelines. Students are warned against plagiarism and reminded of resources for performing regressions in Stata, including using fixed effects, interaction terms, and formatting output tables. The document outlines the files to submit and technical requirements for submitting the files through Quercus by the deadline.

Uploaded by

dfsfadf fdsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
98 views8 pages

1 Assignment Presentation: Patrick Blanchenay Due Wednesday 9th Dec 2020, 11.59pm

This document provides instructions for Assignment 4 in ECO372H1F. Students must submit answers to exercises in a PDF document along with the Stata do-file and log file used to generate the answers. The files must follow specific naming and formatting guidelines. Students are warned against plagiarism and reminded of resources for performing regressions in Stata, including using fixed effects, interaction terms, and formatting output tables. The document outlines the files to submit and technical requirements for submitting the files through Quercus by the deadline.

Uploaded by

dfsfadf fdsf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

ECO372H1F Assignment 04

Patrick Blanchenay
Due Wednesday 9th Dec 2020, 11.59pm

1 Assignment presentation
This assignment tests your comprehension of Difference-in-Differences. You must submit the answers to the exercises in
Section 8. You will have to submit three elements:
• A PDF document giving the answers to the exercises
• The unique Stata do-file that you used to generate the answers to all questions
• The unique log file produced by Stata when running the do-file

2 Warning: plagiarism and academic offenses


The files you submit will normally be checked in Turnitin for plagiarism at the time of upload. As a reminder, plagiarism
carries severe penalties that could endanger your curriculum at University of Toronto.
Any suspicious similarities with other submissions will be carefully examined. Since the assignment is worth 10% of your
final grade, I have to report any suspicion of academic offense to the Undergraduate Chair.
To limit possible infractions, you must cite any sources that you use. Any elements taken from the papers must be
cited between quotes (“ ”). You should not need any external sources, but if you do use them, you must cite them
adequately.

3 Before you start: Stata How-To’s


3.1 Controlling for a categorical variable (aka fixed effects)
As a reminder, if you need to control for a variable that is categorical, the first (tedious) solution is to create dummies
for each of its values, and include all of these dummies minus one. (The dummy left out will be used as reference.) The
easier solution is to use the i. operator in front of the variable name.
For more details, see section 2 in “Stata How-To: OLS Regressions” posted on Quercus.

3.2 Interaction terms


To create interaction terms in a regression, you can create a new variable manually, but instead, Stata can create inter-
action terms “on the fly”. The basic syntax z##w below creates a regression model with three regressors: z on its own,
w on its own, and the interaction term (z×w).

// Using interaction terms on the fly (recommended)


reg y z##w, robust

See section 4 in “Stata How-To: OLS Regressions” posted on Quercus.


Note that if interacting with integer variables (such as year), Stata might confuse this for a categorical variable. You are
instead encourage to specify it as a continuous variable using the c. operator, as opposed to the i. for categorical
variables:
// Interaction terms with continuous or categorical variables
reg logearnings c.educ##i.female, robust // Here education is continuous, female is categorical
reg logearnings c.educ##c.experience, robust // Here both variables are continuous

It is of course possible to add further regressors to any of these models.

3.3 Formatting regression results into table


Once again in this assignment, you are expected to present regression results in a nicely formatted tables using “esttab”,
as opposed to simply copying Stata output.
For more details, see “Stata How-To: Exporting Regression Results using esttab” posted on Quercus.

1
To make these tables more readable, you are also asked to give a short readable label to any variable that you cre-
ate:
// Giving variable a short label
gen logwage = ... // creating a new variable
label variable cityDummy "Log wages"

4 Before you start: preparing the files


1. Decide, on your computer, which folder will be your working directory for this project. It is recommended that you
use a folder with automatic backup such as Dropbox, OneDrive, Google Drive, iCloud Drive.
2. Download the Zip archive ECO372_Assignment4_Dec2020.zip, from Quercus in Assignments > Assignment 4, and
save it in the working directory.
3. Extract the archive ECO372_Assignment4_Dec2020.zip in your working directory. This should extract ECO372_
Assignment4_Dec2020.pdf, a do-file template called EC0372_Assignment4_SURNAME_FirstName.do, and the datasets
for the assignment. Your working directory should look like:
[Working directory]
Dynarski2003.pdf
ECO372_Assignment4_Dec2020.pdf
EC0372_Assignment4_SURNAME_FirstName.do
|- datasets
Dynarski2003.dta

4. Rename the do-file template EC0372_Assignment4_SURNAME_FirstName.do by replacing "SURNAME" and "First-


Name" by your surname and first name, as they appear on ACORN. For instance, mine would be called EC0372_
Assignment4_BLANCHENAY_Patrick.do. Note that there is a 10 point penalty for failing to name your do-file
appropriately.
5. Open the newly-renamed do-file:
• On line 24, set the working directory to the folder where the do-file is.
• On lines 27 and 30, replace BLANCHENAY by your last name as it appears on ACORN, and Patrick by your
first name as it appears on ACORN.
• On line 33, replace 12345678 by your student number.
• Save the do-file before doing any further changes.

5 Documents to upload
5.1 Results PDF document
Filename: ECO372_Assignment4_SURNAME_FirstName.pdf

The Results PDF should be a single document with your answers to all exercises. Most questions will require to perform
analyses using Stata and provide suitable explanations and interpretations of the results. You are expected to provide,
whenever possible, properly formatted regression results using the esttab command. You do not need to answer
questions that only ask you to generate a new variable.
The answers you provide should only on results that are directly produced by your do-file. Conversely, it is not
always necessary to include ALL Stata results into your Results document. Only put the parts that are used to
answer the questions; but keep tables together (do not only copy isolated numbers).
Answers will be graded based on the quality of the explanations. It is not enough to use Stata output. You have to explain
how the output answers the specific question.
The PDF document must be uploaded to the Quercus assignment by the deadline. This is a necessary but not sufficient
condition for your submission to be complete.

Format
• PDF only. No other file type will be accepted (in particular, no MS Word document).
• Letter-sized. Font should be at least 10points, everything should be readily readable, including the Stata output.

2
• Top line of the document should contain : [SURNAME] [First name] - ECO372 Assignment 4
• Second line: Student Number: [Student Number]
• Answers should be clearly numbered, but you do not need to copy the questions.
• Filename should be: EC0372_Assignment4_SURNAME_FirstName.pdf. For instance, mine would be called EC0372_
Assignment4_BLANCHENAY_Patrick.pdf. (It is OK if Quercus adds a number for a resubmission.)

5.2 Do-file
Filename: ECO372_Assignment4_SURNAME_FirstName.do

Use the provided do-file template as a starting base.

You can insert your commands in the space indicated in the provided template. Your code should produce all analyses
and output necessary for all exercises and questions, from one single do-file.
Your do-file must be able to run in one go if placed on a computer with the same datasets available. The only thing I
should need to change in your do-file, to reproduce exactly your results, is to change the working directory. In particular,
this requires to keep the do-file in your working directory, and for the /datasets/ folder to be in your working directory.
If you’re not sure, try on a classmate’s computer. If you get error when running your do-file (red lines in Stata output),
correct the errors, then re-run the do-file again, until the whole do-file can execute in one pass.
Comment your code. You do not need to comment every instruction, but you should comment the big steps, or the big
blocks of code. Explain why you are doing such or such instructions, and what you expect Stata to do. Indentation is
also useful to make your code more readable.
Part of your grades depends on code formatting & commenting.

Format
• Text file only.
• Only ASCII characters should be used; no accented characters, no characters from extended alphabets or writing
systems.
• Filename should be: EC0372_Assignment4_SURNAME_FirstName.do, e.g. EC0372_Assignment4_BLANCHENAY_Patrick.
do. (It is OK if Quercus adds a number for a resubmission.)

5.3 Stata log file


Filename: ECO372_Assignment4_SURNAME_FirstName.log

If you followed the steps in Section 4, your log file should be created automatically when you run your do-file. And
it will be automatically named EC0372_Assignment4_SURNAME_FirstName.log, where SURNAME and Firstname have
been appropriately replaced by your ACORN surname and your ACORN first name. For instance, mine would be called
EC0372_Assignment4_BLANCHENAY_Patrick.log. Again, this should happen automatically if you are using the do-file
template provided, and if you have configured it appropriately (see step 5 in Section 4).
Anything in your log file should come from your do-file, not from instructions typed in Stata command window. That is,
if I re-run your do-file, I should obtain exactly the same log file (apart from the path to the working directory).
If you get error when running your do-file, correct the errors, then re-run the do-file in its entirety to generate an
error-free log file.

Format
• Text file only, not in SMCL.
• Filename should (automatically) be: EC0372_Assignment4_SURNAME_FirstName.log. (It is OK if Quercus adds a
number for a resubmission.)

6 Submission instructions
By Wednesday 9th Dec 2020, 11.59pm, you should have uploaded all three documents. Your submission will only be
considered complete when you have done all of those things. Failure to complete one or more of those will count as late
submission.

3
Only the results file, the do-file and the log-file should be uploaded. Do not include the datasets in your submission.
Do not group files in a zip file.
No submission will be accepted on paper, or by email, regardless of any technological problem.

7 Grading
7.1 Rubric
The assignment is worth 100 points, graded according to the following rubric.

Item Points
Question 1 40
Question 2 40
Code formatting & commenting 10
PDF formatting 10

For exercise questions, you will be graded on the quality of the answers to the questions. Emphasis will be put on clear
and concise answers that address specifically the question, and show your understanding of the topic and the statistical
issues it raises. Appropriate use of the Stata output in the answer will also be taken into account: use what is necessary,
leave out the irrelevant.
Note on PDF formatting: You are expected to make use of esttab to produce appropriately formatted regression
results, whenever possible. To make these tables more readable, you are also asked to give a short readable label to any
variable that you create. See section 3.3 for more details.
All results file will be checked. Some do-files and log-files will be checked at random.

7.2 Penalties
Note the penalties below, as they can quickly lower your grade:

Problem Penalty
Late submission (starting immediately at deadline) 10pts per 24hrs
File names do not follow the prescribed pattern 5pts per file
Do-file generates errors after modifying working directory 10pts
Do-file does not run in one go after modifying working directory 10pts
Log file does not correspond to do-file 10pts
Results are used that are not reproducible with the do-file 10pts

7.3 Re-mark request


Clerical errors will be corrected immediately upon notification via the MS Form.
If you believe there is a mistake in your grade, you can ask for your assignment to be re-marked via the MS Form that
will posted on Quercus, within one week of your assignemnt grade release.
The whole assignment will be blind remarked. Your final mark may go up or down.

4
8 Questions
Reminder: You are expected to make use of esttab to produce nicely formatted regression tables.

Exercise 1: Dynarski (2003)


Note: In this exercise, data comes from a survey, which is made representative by the use of weights. Categories of
individuals underrepresented in the survey compared to the population get assigned a higher weight, so that their ob-
servations count for several observations in the dataset, making the dataset “representative of the whole population”.
The variable wt88 contains the sampling weight associated with each observation. This requires you to tell Stata how to
weight the data. To ensure that any tabulations or regressions are properly weighted, include [weight=wt88] at the
end of your commands but before the comma if you have any options specified; for instance:
reg y x [weight=wt88], robust

This question asks you to reproduce selected results of the paper “Does Aid Matter? Measuring the Effect of Student Aid
on College Attendance and Completion,” published by Susan Dynarski (2003) in The American Economic Review . This
paper is attached to the assignment, as Dynarski2003.pdf.
a. (10 pts) Describe the outcome, the variable of interest (treatment), and the source of exogenous variation that
the author uses to identify her model. Discuss how this approach is an improvement over attempting to estimate
equation (1) on page 279 of the paper by simply using observational data.
b. Load the dataset Dynarski2003.dta; locate the variable equal to 1 if a youth is a member of a cohort that
graduated from high school before student benefits were eliminated.
c. (10 pts) Replicate the summary statistics highlighted in Table 1 reproduced at the end of this exercise (the table
is also available in the original article). You do not have to replicate the exact formatting. (It is easier to put
the “Father Deceased/Not” and “Before/After Change in Benefits” as rows; and the summary statistics for each
variable “Attend college by 23”, “Yrs of Schooling” as columns.) The suggested way to do this is the table
command. For instance, the command:
table x2 [weight=wt88], by(x1) contents(mean y mean z)

generates a table where each row corresponds to a unique combination of x1 and x2, and each cell in the table
gives the mean of variables y and z respectively.
d. (10 pts) Replicate the highlighted results in Table 2 of the paper reproduced at the end of this assignment. Note
that for Table 2, the standard errors are clustered at the household level, to account for potential correlation of
students within the same household. To cluster standard errors, use the cluster() option instead of robust .
e. (10 pts) State the key assumption for the DD estimates to be valid in this context. What would be the preferred
method for supporting that assumption? What evidence does the author provide in lieu of the preferred evidence
to support this assumption?

5
6
Exercise 2: Facezon’s Headquarters This exercise gets you to estimate the effect that the arrival of big headquar-
ters on the wages of local workers. It has two parts. First, you will be given the data generating process of wages, and
ask to create a fictitious panel data of wages. Then, you will “forget” that you created the dataset and imagine that you
are a researcher who just received the data to estimate the “headquarters” effect.
Make sure to insert your Stata commands in the relevant part of the ECO372_Assignment4_Surname_Firstname.do. It
is important to leave the following commands intact:
clear
set seed ‘studentnumber’
set obs 780
gen workerID = _n

There are one thousand workers in our dataset, spread over two cities, A and B. Allocate approximately 60% of workers
to city A, and the rest to city B by running the following:
gen byte cityA = (runiform() < 0.6) // allocate approximately 60% of workers to city A
gen byte cityB = 1 - cityA // allocate the rest to city B

Duplicate each observation, and create 7 years of data for each individual, starting from 2012, to 2018, by doing the
following:
expand 7, generate(expandy)
sort workerID expandy
bysort workerID: gen year = 2012+ _n -1 // creates year
drop expandy

In this world, the hourly wage of worker i in year t is determined by the following equation:

w i t = κ0 + κ1 HQi t + κ2 (Y t × cityAi ) + κ3 (Y t × cityBi ) + ui t (2)

where
• Y t is the year since 2012 (for instance for observations in 2014, Y2014 = 2);
• HQi t is a dummy equal to 1 if worker i is in a city where a big company’s headquarters exist in year t;
• ui t is the error term (white noise).
This is the “true” model of wages, a.k.a. data-generating process.
a. (6 pts) What is the interpretation of κ1 in Equation (2)?
b. (6 pts) What is the interpretation of κ2 and κ3 in Equation (2)?
Suppose now that after a lengthy selection process, the Internet giant Facezon set up new headquarters in city A in
2016.
c. Create a dummy variable POST equal to 1 only for observations in year 2016 and following, and 0 otherwise. Create
a dummy variable HQ equal to 1 for workers who work in a city with big headquarters, that is, for observations in
city A in years 2016 and following; HQ is equal to 0 otherwise.
d. Generate a white “noise” error term u using the following:
gen u = rnormal(0,1.5) if cityA == 1
replace u = rnormal(0,1) if cityB == 1

Do you notice anything about the error term? How is this called? What does it imply for our estimations?
e. Create a variable ys2012 equal to the number of years elapsed since 2012. For instance, it should be equal to 0
for observations in 2012, and equal to 3 for observations in 2015.
f. Using Equation (2), generate hourly wages w for workers, using the following values: κ0 = 5; κ1 = 2.1; κ2 =
0.8; κ3 = 0.4.
We have now created the dataset. Imagine you are a researcher who does not know how the data was generated. You
are only told that city A welcomed Facezon’s new headquarters in 2016. You would like to estimate the effect that this
had on the wages of workers in city B using difference-in-differences approach. You receive the dataset, but containing
only the following variables: workerID, year, ys2012, cityA,cityB, POST and w.
g. Drop all the other variables.
h. (4 pts) Write down the equation to estimate this using a regression. NB: this equation will be different from
equation (2).

7
i. (6 pts) Based on the value of the κ’s given in question f., what would a correct estimation of the causal effect of
headquarters on local wages find?
j. (10 pts) Estimate the equation you set up in question h.. How does it compare to your answer to question i.?
Explain the difference. Be specific, using information on how we generated the data.
k. (8 pts) Can you suggest a way to remedy the problem? Re-do the estimation, and check whether your estimates
is statistically different from the value you were expecting in question i..

You might also like