0% found this document useful (0 votes)
9 views47 pages

Week03 LectureSlidesECO372

The document outlines the agenda and tasks for Week 3 of the ECO372 course, focusing on data analysis and applied econometrics. It includes instructions for downloading lecture materials, an announcement about a research grant application, and details on a lecture activity involving the creation of a dataset to study productivity through multi-tasking. Additionally, it recaps concepts related to selection bias and treatment effects in econometrics.

Uploaded by

Krish Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views47 pages

Week03 LectureSlidesECO372

The document outlines the agenda and tasks for Week 3 of the ECO372 course, focusing on data analysis and applied econometrics. It includes instructions for downloading lecture materials, an announcement about a research grant application, and details on a lecture activity involving the creation of a dataset to study productivity through multi-tasking. Additionally, it recaps concepts related to selection bias and treatment effects in econometrics.

Uploaded by

Krish Goyal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

WEEK 3

ECO372 Data Analysis and Applied Econometrics in Practice


ECO375

Start-up tasks for today’s lecture


¨ Please open your laptop & ready the following materials (e.g., while we wait for class to start):

1. Download the Week03_LectureMaterials.zip


and unzip the folder somewhere sensible on your
computer

2. This folder contains code files and data for our


examples today

3. Open Stata and the data file daFakeData.dta. You


can do this by going to the folder and double
clicking on the filename “daFakeData.dta”

4. Find yourself in this data using your ”Spy name”


from last class

¨ If you need help: raise your hand or ask your neighbour


ECO372

Announcement for Research Minded Students!

¨ Economics Undergraduate Research Grant Application


¤ Economics Undergraduate Research Grant provides funding for the completion of promising
undergraduate student research projects.
¤ Each proposal may request an amount of up to $1,000.
¤ Examples of eligible expenses include:
n Cost of purchasing data.
n Travel for the purpose of conducting research;
n Subject fees for experiments;
n Specialized equipment that would not otherwise be purchased;
n Journal submission fees;
n Conference travel.

¨ Deadline: Feb 3rd


¨ Apply here: https://forms.office.com/r/1XyR2zaXUx
ECO372

Touching base on course activities


¨ Prerequisite Skills Warm-up
¤ No need to worry if you found the quiz difficult: you’re going in cold, and it’s about concepts you learned in the
somewhat distant past.
¤ The aim is to jog those concepts loose so we can build them up again together.
¤ I suggest you gauge your understanding; for each incorrect question ask:
n Can I easily identify where/why I went wrong? => you have a good understanding
n Don’t know why the incorrect responses are incorrect? => go back to general concepts to review; come to student hours.

¨ Student Hours:
¤ Visit us in dedicated ECO372 student hours!
¤ Schedule:
n Mon: 2:30-3:30 GE313 (in Dept.)
n Wed: 2:30-3:30 GE313 (in Dept.)
n Thu: 3:00-4:00 SS2120 (after last lecture)
¤ As always: we are available in Thursday and Friday meeting for questions

¨ Learning Stata:
¤ Learning a new computing language/application is not a linear process with a narrative attached
¤ It’s more like learning a language where immersion is the key
¤ Continue to work through code over the next few weeks, and these processes will start to become easier and easier
ECO372

Agenda For Today

¨ Recap SDO and Estimation of Treatment effects


¤ Example 1: health insurance from MM Chapter 1

¨ Lecture activity: create a data set on Multi-Tasking and Productivity


¤ We will look at another example of SDO estimated from a data set we construct together today

¨ Data Work: Week 2 Pet - Spy data


¤ Example 2: SDO decomposition with fake data from last class

¨ Questions:
¤ Example 1 again: answer questions about the health insurance data from MM Chapter 1
ECO372

Recap SDO
ECO372

Recap SDO:

¨ We can decompose the simple difference in means into two parts:


SDO = E[Y1i| Di = 1] – E[Y0i| Di = 0]
= E[Y1i| Di = 1] – E[Y0i| Di = 1] + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}

ATT Selection Bias

¨ Selection bias = mean potential outcome of not receiving treatment among those
who did subtracted from that of those who did not receive the treatment.
¨ It’s a description of the differences between the two groups if there had never been
a treatment in the first place.
MM Chapter 1 Example of SDO: Health Insurance

Question: Does health insurance improve health?

Notation:
¨ 𝑌!" : health level of person 𝑖 if they don’t insure
¨ 𝑌#" : health level of person 𝑖 if they insure

Observed
Khuzdar chooses to insure 𝑌#,%
Maria chooses not to insure 𝑌!,&
MM example with healthcare

Question: Does health insurance improve health?

Following MM, assume for now the effect of healthcare is δ, same for everyone
(no Heterogeneity):
𝑌#" − 𝑌!" = δ

We would like to find what δ is.

Naïve answer (SDO): Compare health of people who insure, to health of people who
don’t:
𝑌#,% − 𝑌!,&
Positive means insurance coverage improves health
MM example with healthcare

Q: Causal effect of health insurance on person 𝑖’s health?


δ = 𝑌#" − 𝑌!"
Problem
For a given person 𝑖, we either observe 𝑌!" or 𝑌#" , but not both

Observed Unobserved
(Counterfactual)
Khuzdar chooses to insure 𝑌#,% 𝑌!,%
Maria chooses not to insure 𝑌!,& 𝑌#,&
MM example with healthcare

Problem with naïve comparison


𝑌#,% − 𝑌!,& = 𝑌#,% − 𝑌!,% + Y!,' − Y!,(

Potential
improvement in Difference in potential health
Khuzdar’s health if he between Khuzdar and Maria
takes healthcare Selection bias
Causal effect
MM example with healthcare

Problem with naïve comparison


𝑌#,% − 𝑌!,& = (𝑌#,% −𝑌!,% ) + (Y!,' −Y!,( )

Naïve comparison confounds:


¨ True effect of insurance, which we’re interested in
¨ Selection bias, which captures the fact that those who insure and those who don’t
might be different (different baseline health)
What is we had more than two people:

Naïve group difference:


𝐴𝑣𝑔) 𝑌#" | 𝐷" = 1 − 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 0

Rewrite as:
𝐴𝑣𝑔) 𝑌#" | 𝐷" = 1 − 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 1
+ 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 1 − 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 0

First term: estimate of causal effect (ATT)


Second term: “estimate of” selection bias
Expectation and average
¨ Expectation:
¤ averaging infinitely many times, or over the whole population 𝐸[𝑌! ]
¨ Average:
¤ averaging over the sample
¨ If sample is big, average should be close to the expectation
¤ Law of Large Numbers
¤ Page 13-15 in MM
¨ As the sample gets bigger, averages become close to the expectation.
¤ The selection effect in the sample would only converge to zero if the selection effect was actually
zero in the population.
¤ Under pure randomization this is true, so selection bias in any given sample will converge to zero
as the sample gets bigger.

We would like to estimate the expected effect of the treatment using averages in a sample.
Larger sample size in three-part decomp?
¨ If these differences exist in the population, sampling larger and larger samples will not drive these
differences to zero

SDO = E[Y1i| Di = 1] – E[Y0i| Di = 0]

= E[Y1i| Di = 1] – E[Y0i| Di = 1] + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}


ATT Selection Bias

= E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}


ATE Heterogeneous Selection Bias
Treatment
Effect Bias
¨ Larger sample:
¤ Increases precision of estimates
¤ Does not remove selection bias / heterogeneity treatment effect bias if these biases exist in the population
¤ Need better research design (where we can argue that these biases are zero in the population)
ECO372

“Gold Standard” Solution: (Pure) Random Assignment of Treatment

¨ Random assignment of Di zeros the selection bias and heterogeneous treatment effect bias because it makes Di
independent of potential outcomes (Di ╨ {Y0i, Y1i}). (Conditional Independence)

¨ Y0i, Y1i distributions are identical given Di.

¨ Therefore, the mean potential outcomes for Y1i and Y0i are the same (in the population) for either the treatment or the
control group:
E[Y1i| Di = 1] – E[Y1i| Di = 0] = 0

E[Y0i| Di = 1] – E[Y0i| Di = 0] = 0.

¨ This kind of randomization of the treatment assignment would eliminate both the selection bias and the heterogeneous
treatment effect bias.

¨ The selection bias zeroes out as follows: E[Y0i| Di = 1] – E[Y0i| Di = 0] = 0.

¨ The heterogeneous treatment effect bias zeroes out as follows:


(ATT – ATU) = {E[Y1i| Di = 1] – E[Y0i| Di = 1] } – {E[Y1i| Di = 0] – E[Y0i| Di = 0] } = 0
ECO372

Lecture activity: Multi-Tasking and Productivity


ECO372

Creating Data!

¨ In this week’s ECO372 workshop activity, we are creating data that we will use to
better understand the role of randomization in the estimation of treatment effects.

¨ We did this last week with made-up numbers inputted by you


¤ This week we are creating a dataset to estimate a real, live, causal effect studying productivity

¨ One of the hardest things about the potential outcomes framework is imagining
imaginary things (Y0, Y1).
¤ We can get around this by making up fictional data where we know Y0, Y1
¤ We can see examples of real datasets: e.g., RAND HEI and NHIS
¤ Today we’re going to take a third approach: creating our own data, which will be about ourselves.
This can help us in imagining alternative paths because the alternative paths are ones that exist for
ourselves.
ECO372

Creating Data!

¨ Creating data that we will use to better understand the role of randomization in the
estimation of treatment effects.

¨ So, what’s Y and what’s D?

¨ Context: our general question is whether multi-tasking is a good thing for productivity.
¤ More explicitly, we are going to compare productivity over two different work methods:
n single-tasking (focusing on one task at a time) versus
n multi-tasing (switching back and forth between tasks).
¤ Here, our outcome, Y, will be a measure of productivity: time to complete a set of tasks correctly, and
our treatment, D, is task-switching versus focused work
ECO372

How will it work?


¨ Each of you have been assigned to a group, and I will email you a link to a set of
tasks….
¤ The tasks are silly and have nothing to do with econometrics
¤ About half of you will be doing one thing and the other half another thing
¤ The tasks will take take under 5 minutes to complete and are not difficult.

¨ We will collect anonymized data, and we will use this data to:
¤ (a) understand the theory of treatment effect estimation and
¤ (b) gain more skill in taking a spreadsheet of data to Stata for analysis.

¨ Instructions:
¤ Complete each task with an aim to do it as ACCURATELY and as QUICKLY as possible.
¤ Completion of this exercise counts for participation, but the resulting data will be anonymous.
ECO372

Step 1: create a baseline

¨ Please fill out the following form to get a baseline for our dataset.

https://forms.office.com/r/1AzQEaR84P

Note: This form is different from the one sent by email.


ECO372

Step 1: create a baseline

Task 1: Task 2:

econometrics is the best


1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
ECO372

Step 2: multi-tasking

¨ Each of you have been assigned to a group, and I will email you a link to a set of
tasks in the next few seconds….
¤ About half of you will be doing one thing and the other half another thing

¨ Instructions:
¤ Again, complete each task with an aim to do it as ACCURATELY and as QUICKLY as possible.
¤ This time we will be accessing the link via email.
n Do not share your link with others. We want to maintain the correct treatment group assignment.

¨ Sending emails now... , complete the tasks and meet back here in 10 minutes…
ECO372

What Happened?

¨ Each of you typed a sequence of letters and numbers.


¤ Some of you typed econometrics is the best and then 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
19 20 21
n We will call this Focused work

¤ Some of you typed e1c2o3n4o5m6e7t8r9i10c11s12 i13s14 t15h16e17 b18e19s20t21


n We will call this Task Switching work

¤ And some of you probably cheated a little by not following the instructions, oh well… hopefully we
measured that as well.

¨ The question will be whether it takes longer to type the same series of characters in
the focused survey or in the task switching survey, which cycles back and forth
between both sequences.
ECO372

Now what?

¨ I’ll collect and clean the data, anonymize it and send it to you.

¨ Questions:
¤ Can we estimate the simple difference in completion times between the work methods: task
switching vs focused?
n I hope so: we will see time to completion, whether the task was performed correctly, and which type of
work method was used.
n You need to know how to clean data in Stata and run a regression, so let’s try.
¤ Take a few minutes to share your survey type with those around you.
¤ Can we estimate the time effect (causal effect) of task switching versus focused work within this
context?
n What do you need to know?
¤ We’ll have information on mother tongue and type of keyboard. Do you think it will be related to
speed? Will it be related to work method: task switching vs focused?
ECO372

Data Work: Week 2 Pet - Spy data


Let’s estimate these (SDO, ATT, ATE, ATU) in a dataset
ECO372

Pet-Spy Data Example:


¨ Last week, you each provided some fake numbers for Y1i – Y0i

¨ Let’s open the dataset: daFakeData.dta and the code file anFakeData_Week03.do
¤ This is a list of everyone else’s numbers and the code that will crunch our SDO numbers
¤ Find your “Spy name” and confirm your values

Find yourself here in this


n
spreadsheet:
Get a look at our data by
clicking the “Data Editor”
button at the top of the
main Stata window
ECO372

Pet-Spy Data Example: Y0i = Y1i = 𝛿! =


¨ Write down (or remind yourself of) your numbers in the header above

¨ Whether we can estimate treatment effects using SDO, depends entirely on how variation in D is determined.

¨ I have assigned D to each of you in three different ways:


¤ Randomly: Drand = 1 randomly; yields Yrand from the switching equation
¤ Pet ownership: Dpet = 1 for pet owners; yields Ypet from switching equation
¤ A mystery: Dmystery =1 mysteriously; yields Ymystery from the switching equation

What treatment path do you take under each scenario? Write down
your “path”
What is this for you: Drand = 0 or 1 for each
What is this for you: Dpet = type of D

What is this for you: Dmystery =

¨ Let’s decompose SDO in the data in each case of D:


SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}
ATE Heterogeneous Selection Bias
Treatment Effect Bias
ECO372

Pet-Spy Data Example: Y0i = Y1i = 𝛿! =


¨ Mystery revealed!

Pretend that Y0i is the amount of money you have at the beginning of this lecture.

If you are “treated,” I will leave you with Y1i at the end.

This means that I will adjust your wallet by 𝛿 dollars during this lecture

(e.g., depending on the treatment assignment, I might owe the class a lot or a little).
ECO372

Pet-Spy Data Example: SDO decomp with Drand


¨ Let’s decompose SDO in each case:
SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}
ATE Heterogeneous Selection Bias
Treatment
Effect Bias

= + +

Go to the do file
anFakeData_Week03.do
and find these numbers for
Drand
ECO372

Sample estimate of: ATE = E[Y1i – Y0i] = E[𝛿! ]

Sample estimate of: SDO = E[Y1i| Di = 1] – E[Y0i| Di = 0] Sample estimate of: π

Sample estimate of: ATT-ATU

Sample estimate of Selection = E[Y0i| Di = 1] – E[Y0i| Di = 0] Sample estimate of: (1- π)ATT-ATU
ECO372

Pet-Spy Data Example: SDO decomp with Drand


¨ Let’s decompose SDO in each case:
SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}
ATE Heterogeneous Selection Bias
Treatment
Effect Bias

33.6 = 32.0 - (1-.5)2.2 + 2.7

= 32.0 - 1.1 + 2.7


As N increases:
= 33.6 ! • these will approach 0
• (because of randomization)
• And SDO approaches ATE
ECO372

Pet-Spy Data Example: is Drandi ╨ Y1i ,Y0i


¨ Y0i, Y1i distributions similar given Di
¨ To see this we can get STATA to produce a density plot for Y0i and Y1i in our sample
¤ Command: kdensity
¤ Hint: the “Graphics” drop-down menus in STATA can help you with graphing syntax
¨ Y0i, Y1i distributions in our sample:
Y0 for Drand=1 and Drand=0 Y1 for Drand=1 and Drand=0
.015
.025

.02
.01
kdensity Y0

kdensity Y1
.015

.005
.01

0 .005
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x

Y0 | D=0 Y0 | D=1 Y1 | D=0 Y1 | D=1


ECO372

Pet-Spy Data Example: SDO decomp with Dpet


¨ Reminder of SDO decomp with randomization:
33.6 = 32.0 - 1.1 + 2.7

¨ Let’s decompose SDO with assignment based on pet:


SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}

= 32.0 + +
ECO372

Pet-Spy Data Example: SDO decomp with Dpet


¨ Reminder of SDO decomp with randomization:
33.6 = 32.0 - 1.1 + 2.7

¨ Let’s decompose SDO with assignment based on pet:


SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}

30.7 = 32.0 - (1-.43)4.9 + 1.5

= 32.0 - 2.8 + 1.5

= 30.7 !
ECO372

Pet-Spy Data Example: seems that Dpeti ╨ Y1i ,Y0i


¨ Y0i, Y1i distributions similar given Di

Y0 for Dpet=1 and Dpet=0 Y1 for Dpet=1 and Dpet=0


.015
.025

.02
.01
kdensity Y0

kdensity Y1
.015

.005
.01

0 .005
40.842.3 71.5 75
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Y0 Y1

Y0 | D=0 Y0 | D=1 Y1 | D=0 Y1 | D=1


ECO372

Pet-Spy Data Example: Y0i = Y1i = 𝛿! =


¨ Mystery:

Pretend that Y0i is the amount of money you have at the beginning of this lecture.

If you are “treated,” I will leave you with Y1i at the end.

This means that I will adjust your wallet by 𝛿 dollars during this lecture

(e.g., I will owe the class ATE*N dollars).


ECO372

Pet-Spy Data Example: SDO decomp with Dmystery


¨ Reminder of SDO decomp with randomization:
33.6 = 32.0 - 1.1 + 2.7
¨ Reminder of SDO decomp with pet:
30.7 = 32.0 - 2.8 + 1.5

¨ Let’s decompose SDO with assignment based on the mystery D:


SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}

= 32.0 + +
ECO372

Pet-Spy Data Example: SDO decomp with Dmystery


¨ Reminder of SDO decomp with randomization:
33.6 = 32.0 - 1.1 + 2.7
¨ Reminder of SDO decomp with pet:
30.7 = 32.0 - 2.8 + 1.5

¨ Let’s decompose SDO with assignment based on the mystery D:


SDO = E[Y1i–Y0i] + (1−π)(ATT-ATU) + {E[Y0i| Di = 1] – E[Y0i| Di = 0]}

- 5.9 = 32.0 +(1-.87)59.4 - 45.6

= 32.0 + 7.7 - 45.6

= - 5.9 !
ECO372

Pet-Spy Data Example: seems that Dmysteryi is not ╨ Y1i ,Y0i

¨ Y0i, Y1i distributions are not similar given Di

Y0 for Dmystery=1 and Dmystery=0 Y1 for Dmystery=1 and Dmystery=0


.025 .04

.02
.03
kdensity Y0

kdensity Y1
.015
.02
.01

.01
.005

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x

Y0 | D=0 Y0 | D=1 Y1 | D=0 Y1 | D=1


ECO372

Pet-Spy Data Example:


¨ What is happening? 𝐷 𝑌

D is randomized D is based on pet D is 1 for positive 𝛿

𝐷 𝑌 𝐷 𝑌 𝐷 𝑌

𝑋 ' 𝑠, 𝑈 ' 𝑠 𝑃𝑒𝑡 𝑝𝑟𝑒𝑓𝑒𝑟𝑒𝑛𝑐𝑒 𝑓𝑜𝑟 $' 𝑠


𝐷 𝑌 Under
randomization “pet”
would be
uncorrelated with
𝑝𝑒𝑡 both Y and D
ECO372

Example 2: MM Chapter 1 - NHIS versus RAND HIE

¨ Y is health and D is health insurance


¤ Both are measured in two different data sets
¤ But variation in D manifests in two different ways: as an observed choice versus randomized

¨ Selection bias: individuals are different at baseline in a way that is related to


potential outcomes.

¨ Consider the observable characteristics measured in both data sets and presented
in Tables 1.1 and 1.2, and take, for example, years of education
ECO372

Practical/economic
significance

Statistical
significance
ECO372

Question 1:

¨ Consider the NHIS data. Looking at the insured versus the uninsured group, there
are large differences, on average, in the observable characteristics measured in
Panel B (and these differences are large in both economic and statistical
significance).
ECO372

What if these were:

practically/economically
insignificant

AND

Statistically
insignificant?

(i.e., a precise zero)


ECO372

Question 1:

¨ Consider the NHIS data. Looking at the insured versus the uninsured group, there
are large differences, on average, in the observable characteristics measured in
Panel B (and these differences are large in both economic and statistical
significance).
¨ Suppose instead, we had the opposite: the estimated differences in the observable
characteristics in Panel B across treatment group are very small in both economic
AND statistical significance (i.e., we observe a precise zero effect), and so we have
good evidence that these characteristics are not differentially related to insurance
status, in truth.
¨ In this case, does this mean that selection bias is not a problem for the SDO
comparison of health status for insured versus non-insured individuals? Please
explain why.
¨ Consider this question in groups of 2 or 3.
ECO372

Question 1:

¨ Consider again the NHIS data. Looking at the insured versus the uninsured group,
there are large differences, on average, in the observable characteristics measured
in Panel B (and these differences are large in both economic and statistical
significance).
¨ Suppose instead, we had the opposite: the estimated differences in observable
characteristics across treatment group are very small in both economic AND
statistical significance (i.e., we observe a precise zero effect), and so we have good
evidence that these characteristics are not differentially related to insurance status,
in truth.
¨ In this case, does this mean that selection bias is not a problem for the SDO
comparison of health status for insured versus non-insured individuals? Please
explain why.

¨ See page 11 in MM

You might also like