Week03 LectureSlidesECO372
Week03 LectureSlidesECO372
¨ Student Hours:
¤ Visit us in dedicated ECO372 student hours!
¤ Schedule:
n Mon: 2:30-3:30 GE313 (in Dept.)
n Wed: 2:30-3:30 GE313 (in Dept.)
n Thu: 3:00-4:00 SS2120 (after last lecture)
¤ As always: we are available in Thursday and Friday meeting for questions
¨ Learning Stata:
¤ Learning a new computing language/application is not a linear process with a narrative attached
¤ It’s more like learning a language where immersion is the key
¤ Continue to work through code over the next few weeks, and these processes will start to become easier and easier
ECO372
¨ Questions:
¤ Example 1 again: answer questions about the health insurance data from MM Chapter 1
ECO372
Recap SDO
ECO372
Recap SDO:
¨ Selection bias = mean potential outcome of not receiving treatment among those
who did subtracted from that of those who did not receive the treatment.
¨ It’s a description of the differences between the two groups if there had never been
a treatment in the first place.
MM Chapter 1 Example of SDO: Health Insurance
Notation:
¨ 𝑌!" : health level of person 𝑖 if they don’t insure
¨ 𝑌#" : health level of person 𝑖 if they insure
Observed
Khuzdar chooses to insure 𝑌#,%
Maria chooses not to insure 𝑌!,&
MM example with healthcare
Following MM, assume for now the effect of healthcare is δ, same for everyone
(no Heterogeneity):
𝑌#" − 𝑌!" = δ
Naïve answer (SDO): Compare health of people who insure, to health of people who
don’t:
𝑌#,% − 𝑌!,&
Positive means insurance coverage improves health
MM example with healthcare
Observed Unobserved
(Counterfactual)
Khuzdar chooses to insure 𝑌#,% 𝑌!,%
Maria chooses not to insure 𝑌!,& 𝑌#,&
MM example with healthcare
Potential
improvement in Difference in potential health
Khuzdar’s health if he between Khuzdar and Maria
takes healthcare Selection bias
Causal effect
MM example with healthcare
Rewrite as:
𝐴𝑣𝑔) 𝑌#" | 𝐷" = 1 − 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 1
+ 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 1 − 𝐴𝑣𝑔) 𝑌!" | 𝐷" = 0
We would like to estimate the expected effect of the treatment using averages in a sample.
Larger sample size in three-part decomp?
¨ If these differences exist in the population, sampling larger and larger samples will not drive these
differences to zero
¨ Random assignment of Di zeros the selection bias and heterogeneous treatment effect bias because it makes Di
independent of potential outcomes (Di ╨ {Y0i, Y1i}). (Conditional Independence)
¨ Therefore, the mean potential outcomes for Y1i and Y0i are the same (in the population) for either the treatment or the
control group:
E[Y1i| Di = 1] – E[Y1i| Di = 0] = 0
E[Y0i| Di = 1] – E[Y0i| Di = 0] = 0.
¨ This kind of randomization of the treatment assignment would eliminate both the selection bias and the heterogeneous
treatment effect bias.
Creating Data!
¨ In this week’s ECO372 workshop activity, we are creating data that we will use to
better understand the role of randomization in the estimation of treatment effects.
¨ One of the hardest things about the potential outcomes framework is imagining
imaginary things (Y0, Y1).
¤ We can get around this by making up fictional data where we know Y0, Y1
¤ We can see examples of real datasets: e.g., RAND HEI and NHIS
¤ Today we’re going to take a third approach: creating our own data, which will be about ourselves.
This can help us in imagining alternative paths because the alternative paths are ones that exist for
ourselves.
ECO372
Creating Data!
¨ Creating data that we will use to better understand the role of randomization in the
estimation of treatment effects.
¨ Context: our general question is whether multi-tasking is a good thing for productivity.
¤ More explicitly, we are going to compare productivity over two different work methods:
n single-tasking (focusing on one task at a time) versus
n multi-tasing (switching back and forth between tasks).
¤ Here, our outcome, Y, will be a measure of productivity: time to complete a set of tasks correctly, and
our treatment, D, is task-switching versus focused work
ECO372
¨ We will collect anonymized data, and we will use this data to:
¤ (a) understand the theory of treatment effect estimation and
¤ (b) gain more skill in taking a spreadsheet of data to Stata for analysis.
¨ Instructions:
¤ Complete each task with an aim to do it as ACCURATELY and as QUICKLY as possible.
¤ Completion of this exercise counts for participation, but the resulting data will be anonymous.
ECO372
¨ Please fill out the following form to get a baseline for our dataset.
https://forms.office.com/r/1AzQEaR84P
Task 1: Task 2:
Step 2: multi-tasking
¨ Each of you have been assigned to a group, and I will email you a link to a set of
tasks in the next few seconds….
¤ About half of you will be doing one thing and the other half another thing
¨ Instructions:
¤ Again, complete each task with an aim to do it as ACCURATELY and as QUICKLY as possible.
¤ This time we will be accessing the link via email.
n Do not share your link with others. We want to maintain the correct treatment group assignment.
¨ Sending emails now... , complete the tasks and meet back here in 10 minutes…
ECO372
What Happened?
¤ And some of you probably cheated a little by not following the instructions, oh well… hopefully we
measured that as well.
¨ The question will be whether it takes longer to type the same series of characters in
the focused survey or in the task switching survey, which cycles back and forth
between both sequences.
ECO372
Now what?
¨ I’ll collect and clean the data, anonymize it and send it to you.
¨ Questions:
¤ Can we estimate the simple difference in completion times between the work methods: task
switching vs focused?
n I hope so: we will see time to completion, whether the task was performed correctly, and which type of
work method was used.
n You need to know how to clean data in Stata and run a regression, so let’s try.
¤ Take a few minutes to share your survey type with those around you.
¤ Can we estimate the time effect (causal effect) of task switching versus focused work within this
context?
n What do you need to know?
¤ We’ll have information on mother tongue and type of keyboard. Do you think it will be related to
speed? Will it be related to work method: task switching vs focused?
ECO372
¨ Let’s open the dataset: daFakeData.dta and the code file anFakeData_Week03.do
¤ This is a list of everyone else’s numbers and the code that will crunch our SDO numbers
¤ Find your “Spy name” and confirm your values
¨ Whether we can estimate treatment effects using SDO, depends entirely on how variation in D is determined.
What treatment path do you take under each scenario? Write down
your “path”
What is this for you: Drand = 0 or 1 for each
What is this for you: Dpet = type of D
Pretend that Y0i is the amount of money you have at the beginning of this lecture.
If you are “treated,” I will leave you with Y1i at the end.
This means that I will adjust your wallet by 𝛿 dollars during this lecture
(e.g., depending on the treatment assignment, I might owe the class a lot or a little).
ECO372
= + +
Go to the do file
anFakeData_Week03.do
and find these numbers for
Drand
ECO372
Sample estimate of Selection = E[Y0i| Di = 1] – E[Y0i| Di = 0] Sample estimate of: (1- π)ATT-ATU
ECO372
.02
.01
kdensity Y0
kdensity Y1
.015
.005
.01
0 .005
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x
= 32.0 + +
ECO372
= 30.7 !
ECO372
.02
.01
kdensity Y0
kdensity Y1
.015
.005
.01
0 .005
40.842.3 71.5 75
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
Y0 Y1
Pretend that Y0i is the amount of money you have at the beginning of this lecture.
If you are “treated,” I will leave you with Y1i at the end.
This means that I will adjust your wallet by 𝛿 dollars during this lecture
= 32.0 + +
ECO372
= - 5.9 !
ECO372
.02
.03
kdensity Y0
kdensity Y1
.015
.02
.01
.01
.005
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
x x
𝐷 𝑌 𝐷 𝑌 𝐷 𝑌
¨ Consider the observable characteristics measured in both data sets and presented
in Tables 1.1 and 1.2, and take, for example, years of education
ECO372
Practical/economic
significance
Statistical
significance
ECO372
Question 1:
¨ Consider the NHIS data. Looking at the insured versus the uninsured group, there
are large differences, on average, in the observable characteristics measured in
Panel B (and these differences are large in both economic and statistical
significance).
ECO372
practically/economically
insignificant
AND
Statistically
insignificant?
Question 1:
¨ Consider the NHIS data. Looking at the insured versus the uninsured group, there
are large differences, on average, in the observable characteristics measured in
Panel B (and these differences are large in both economic and statistical
significance).
¨ Suppose instead, we had the opposite: the estimated differences in the observable
characteristics in Panel B across treatment group are very small in both economic
AND statistical significance (i.e., we observe a precise zero effect), and so we have
good evidence that these characteristics are not differentially related to insurance
status, in truth.
¨ In this case, does this mean that selection bias is not a problem for the SDO
comparison of health status for insured versus non-insured individuals? Please
explain why.
¨ Consider this question in groups of 2 or 3.
ECO372
Question 1:
¨ Consider again the NHIS data. Looking at the insured versus the uninsured group,
there are large differences, on average, in the observable characteristics measured
in Panel B (and these differences are large in both economic and statistical
significance).
¨ Suppose instead, we had the opposite: the estimated differences in observable
characteristics across treatment group are very small in both economic AND
statistical significance (i.e., we observe a precise zero effect), and so we have good
evidence that these characteristics are not differentially related to insurance status,
in truth.
¨ In this case, does this mean that selection bias is not a problem for the SDO
comparison of health status for insured versus non-insured individuals? Please
explain why.
¨ See page 11 in MM