AppliedMetrics 2023 Week01 Lecture
AppliedMetrics 2023 Week01 Lecture
1
Structure of the Course
Weekly lectures
Weekly recitations
recordings for lecture & section, not OH
OH Aluma by appointment
Statistical software &
Textbooks:
Nick Huntington-Klein The Effect
Scott Cunningham Causal Inference the Mixtapes
Woodridge Introductory Econometrics: A Modern Approach
Heiss Using R for Introductory Econometrics
2
Grade Breakdown
Homework assignments 15%
• 6 total (drop lowest 1)
• Can work in groups up to 3, must write-up individually
Exam 85%
• Open note up to 20 printed pages front & back +
formula sheet + basic calculator
• Must earn at least 60 (out of 100) to pass the course
3
Knowledge assessment through guided
case studies
Given a dataset and a research question:
• What method(s) can you use to answer the question
using the data?
• What are the potential pitfalls? If they can be fixed, how
would you do so?
• Implementation:
• HW: actual datasets & R-studio generate results
• Exam: dataset description & pen and paper
describe what you would do and how
5
Types of datasets
• Cross-sectional:
A sample taken at a fixed point in time, e.g. demographic
characteristics and wage of every student in the course
• Time series:
A single unit measured over time, e.g. the per capita
GDP each year
• Panel (longitudinal):
The same cross section sampled over time, e.g. your HW
grades during the course of the semester
• Repeated cross-section:
Different cross-sections sampled over time, e.g. cohorts’
Metrics course grades from each year
6
Moodle course website
• Lecture notes
• Relevant R code and data from the lecture
• Class recordings on Panopto
• Homework assignments & solutions
7
Data from the lab
Andreoni & Miller (2002)
8
The dictator game & social preferences
elicitation
• Social preferences (altruism) have historically been
measured through the use of the dictator game
9
Andreoni & Miller (2002) instructions
• You are asked to divide a set of tokens between you and
another subject in the room. You and the other subject
will be paired randomly and you will not be told each
other’s identity.
• As you divide the tokens, you and the other subject will
earn points. Every point that subjects earn will be worth
10 cents. For example if you earn 58 points you will
(A) make $5.80 in the experiment
• You can keep all the tokens, keep some and pass some,
or pass all the tokens. The total number of tokens you
hold plus the number of tokens you pass must equal the
total number of tokens
10
Activity (part 1)
• How many tokens would you keep for your self?
• Pick a number between 0 and 60 and write it down
and pass it up to me (do not write your name!)
(A)
• Get up and pick one of the numbers from the pile and
stand near where the number is.
11
Looking at the data: subject heterogeneity
(176 subjects in sample)
60
number of subjects
40
20
0 20 40 60
self (tokens) 12
Looking at the data: subject heterogeneity
From the data:
• Subjects who kept everything
for themselves: 60
72/176 = 40.9%
number of subjects
• Subjects who divided equally: 40
59/176 = 33.5%
• Subjects who gave more than 20
they kept:
5/176 = 2.8% 0
----------------------------------
0 20 40 60
self (tokens)
Mean: 45.4
Median: 50
Std.: 14.3
13
Activity 1 (part 2)
• What if the behavior of people in this game depends on
the relative payout to self vs. other?
• The design choice of 1 point for every token for both the
dictator and their partner may be significant
(B)
(C)
14
Relative points to self
(B)
(C)
15
First look at data: (B) high payout to self
(B)
75
number of subjects
50
25
0
25 50 75 100 125
self (points) 16
First look at data: (C) high payout to other
(C)
number of subjects
40
20
0
0 20 40 60
self (points) 17
Looking at the % of points to self
(B)
question A B
(C)
(B)
75
number of subjects
(C)
50
25
Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1) 20
% of points to self vs. price ratio to self
(C) (A) (B)
21
% of points to self vs. price ratio to self
with jitter
(C) (A) (B)
22
Activity 1 (part 4)
Does changing the relative payout for tokens change the
dictator’s behavior?
Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1) 23
A first causal diagram
Does changing the relative payout for tokens change the
dictator’s behavior?
Relative
payout for Behavior
tokens
• The independent variable is
• The outcome variable is
• Why not use ?
• We want to test if variation in the independent variable
causes variation in the outcome variable
24
Regression to answer research question:
Option 1
Does changing the relative payout for tokens change the
dictator’s behavior?
• How do we interpret ? ?
• : when the payout for tokens to self is higher
than the payout for tokens to other, then the percent
of points the dictator allocates to themselves is also
higher
25
Regression to answer research question:
Option 1
• Interpret
• Test the null hypothesis that
26
Simple hypothesis test review
• To conduct a simple hypothesis test we use a t-test
• Under the classical assumptions A1 – A5:
For we have
There are 528 – 2 d.f. so the t-critical value at is 1.96
Since t-critical value we reject at 5% significance
(can also see this using a significance test and the p-value)
27
price ratio to self takes on only 3 values
(C) (A) (B)
28
Regression to answer research question:
Option 2 use dummy variables
Does changing the relative payout for tokens change the
dictator’s behavior?
Treat the price ratio as a categorical variable
29
Regression to answer research question:
Option 2 use dummy variables
• Interpret and
• Test the null hypothesis that
30
Multiple hypothesis testing review
• Use an F-test for the joint hypothesis
• The restricted model is:
• In this specific case it is the model goodness-of-fit test
• The F-statistic is a function of the Sum of Squared Residuals
(RSS) for the unrestricted (original) and restricted models
• It has an F distribution with q and N – M degrees of freedom
• Here q = 2 and N – M is 528 – 3 so the F-critical value is 2.62
31
The sampled observations are not all
independent
• Recall that the data is coming from 176 subjects who each
answered for all three questions
• So we have 528 observations, but each subject appears
three times in our dataset
32
The sampled observations are not all
independent
Each subject appears three times in our dataset
33
What to do in panel data contexts
• We will discuss a few options later in the course
• Since we have so many observations, one quick and dirty
solution is to randomly sample only one observations
per individual to use in our analysis
• The new dataset has the same variables, but there are
only 172 observations
34
Testing with a subset of the data
35
But which subset to use? ***
• What if we happened to draw a random sample where
the null hypothesis is rejected?
• We can do this same process 1000 times, each time
randomly selecting 1 out of the 3 observations for each
individual in our dataset.
• How many possibilities are there?
• For each replication we estimate the model, test the null
hypothesis and calculate the F-statistic
36
F-stat from 1,000 subsamples of the data
F critical
value =
99.9% of subsamples
have
37
1,000 subsamples of the data
t critical
value = 1.974
49.4% of subsamples
have 1.974
38
Full set of rounds Andreoni & Miller (2002)
39
What other questions can we answer with
this data?
• What is the external variation in the experiment?
• What is the endogenous variation?
Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1)
• - total tokens to allocate (40, 60, 75, 80, or 100)
40
Causal diagrams generally
Does variation in X (exogenous) cause a change in Y
(endogenous)?
X Y
41
Is the effect causal?
Does changing the relative payout for tokens change the
dictator’s behavior?
43
How does a change in the income affect
the dictator's behavior?
44
Using the full dataset and controlling for
price
How does a change in the income affect the dictator's
behavior?
45
Use the full dataset and control for price
46
Subsampling
• For rounds 7, 8, and 11 the price ratio is 1
47
Subsampling
• Treating income as a categorical variable…
48