0% found this document useful (0 votes)
10 views

AppliedMetrics 2023 Week01 Lecture

Uploaded by

evanichay108
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

AppliedMetrics 2023 Week01 Lecture

Uploaded by

evanichay108
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

Applied Econometrics

Dr. Aluma Dembo


[email protected]

1
Structure of the Course
Weekly lectures
Weekly recitations
recordings for lecture & section, not OH
OH Aluma by appointment
Statistical software &
Textbooks:
Nick Huntington-Klein The Effect
Scott Cunningham Causal Inference the Mixtapes
Woodridge Introductory Econometrics: A Modern Approach
Heiss Using R for Introductory Econometrics

2
Grade Breakdown
Homework assignments 15%
• 6 total (drop lowest 1)
• Can work in groups up to 3, must write-up individually

Exam 85%
• Open note up to 20 printed pages front & back +
formula sheet + basic calculator
• Must earn at least 60 (out of 100) to pass the course

3
Knowledge assessment through guided
case studies
Given a dataset and a research question:
• What method(s) can you use to answer the question
using the data?
• What are the potential pitfalls? If they can be fixed, how
would you do so?
• Implementation:
• HW: actual datasets & R-studio  generate results
• Exam: dataset description & pen and paper 
describe what you would do and how

You must pass Intro Metrics to take this course!


4
Goal of this course
• This is a unique course
• Upgrade the econometric tools at your disposal with a
focus on causal inference and complex data methods
• An important preparation for next year’s policy seminar
• Even more important as training for your future career in
any sector you end up working in

• The concurrent data sciences course will teach you how


to do things in R, this course will improve your
understanding as to what to do and why

5
Types of datasets
• Cross-sectional:
A sample taken at a fixed point in time, e.g. demographic
characteristics and wage of every student in the course
• Time series:
A single unit measured over time, e.g. the per capita
GDP each year
• Panel (longitudinal):
The same cross section sampled over time, e.g. your HW
grades during the course of the semester
• Repeated cross-section:
Different cross-sections sampled over time, e.g. cohorts’
Metrics course grades from each year
6
Moodle course website
• Lecture notes
• Relevant R code and data from the lecture
• Class recordings on Panopto
• Homework assignments & solutions

7
Data from the lab
Andreoni & Miller (2002)

8
The dictator game & social preferences
elicitation
• Social preferences (altruism) have historically been
measured through the use of the dictator game

• A dictator is given an allocation of money


• The dictator is paired with another person
• The dictator decides how much of the allocation to keep
and how much to give to the other person
• The dictator and partner’s payout is determined by the
decision of the dictator

9
Andreoni & Miller (2002) instructions
• You are asked to divide a set of tokens between you and
another subject in the room. You and the other subject
will be paired randomly and you will not be told each
other’s identity.
• As you divide the tokens, you and the other subject will
earn points. Every point that subjects earn will be worth
10 cents. For example if you earn 58 points you will
(A) make $5.80 in the experiment

• You can keep all the tokens, keep some and pass some,
or pass all the tokens. The total number of tokens you
hold plus the number of tokens you pass must equal the
total number of tokens
10
Activity (part 1)
• How many tokens would you keep for your self?
• Pick a number between 0 and 60 and write it down
and pass it up to me (do not write your name!)
(A)

• Get up and pick one of the numbers from the pile and
stand near where the number is.

11
Looking at the data: subject heterogeneity
(176 subjects in sample)

60
number of subjects

40

20

0 20 40 60
self (tokens) 12
Looking at the data: subject heterogeneity
From the data:
• Subjects who kept everything
for themselves: 60
72/176 = 40.9%

number of subjects
• Subjects who divided equally: 40

59/176 = 33.5%
• Subjects who gave more than 20

they kept:
5/176 = 2.8% 0

----------------------------------
0 20 40 60
self (tokens)

Mean: 45.4
Median: 50
Std.: 14.3

13
Activity 1 (part 2)
• What if the behavior of people in this game depends on
the relative payout to self vs. other?
• The design choice of 1 point for every token for both the
dictator and their partner may be significant

(B)

(C)

• For a single individual, do you think the choice in (B) will


mimic the choice in (C)? Or will it be different? How?

14
Relative points to self
(B)

(C)

• Since tokens in (B) are not worth the same amount as


tokens in (C) we will instead look at the points to self

• We can also look at the ratio: points to self out of the


total points (points to self + points to other)

15
First look at data: (B) high payout to self
(B)

75
number of subjects

50

25

0
25 50 75 100 125
self (points) 16
First look at data: (C) high payout to other
(C)
number of subjects

40

20

0
0 20 40 60
self (points) 17
Looking at the % of points to self
(B)
question A B
(C)
(B)
75
number of subjects

(C)
50

25

0.00 0.25 0.50 0.75 1.00


points to self/total points 18
Testing for a treatment effect
Research question:
• Does changing the relative payout for tokens change the
dictator’s behavior?
Data at our disposal from rounds A, B, & C for 176 subjects:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or
2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible
points
(a number 0 to 1)
19
Activity 1 (part 3)
Does changing the relative payout for tokens change the
dictator’s behavior?

• What graph would you plot to visualize this in the data?


• What is on the x-axis? What is on the y-axis?
• What do you expect to see?
• (bonus): write the R pseudo-code to implement this

Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1) 20
% of points to self vs. price ratio to self
(C) (A) (B)

21
% of points to self vs. price ratio to self
with jitter
(C) (A) (B)

22
Activity 1 (part 4)
Does changing the relative payout for tokens change the
dictator’s behavior?

• What regression would you run to answer this question?


• What is the null hypothesis you would test?

Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1) 23
A first causal diagram
Does changing the relative payout for tokens change the
dictator’s behavior?

Relative
payout for Behavior
tokens
• The independent variable is
• The outcome variable is
• Why not use ?
• We want to test if variation in the independent variable
causes variation in the outcome variable
24
Regression to answer research question:
Option 1
Does changing the relative payout for tokens change the
dictator’s behavior?

• Test the null hypothesis that

• How do we interpret ? ?
• : when the payout for tokens to self is higher
than the payout for tokens to other, then the percent
of points the dictator allocates to themselves is also
higher

25
Regression to answer research question:
Option 1

• Interpret
• Test the null hypothesis that

26
Simple hypothesis test review
• To conduct a simple hypothesis test we use a t-test
• Under the classical assumptions A1 – A5:

is t-distributed with N-M degrees of freedom

For we have
There are 528 – 2 d.f. so the t-critical value at is 1.96
Since t-critical value we reject at 5% significance
(can also see this using a significance test and the p-value)

27
price ratio to self takes on only 3 values
(C) (A) (B)

28
Regression to answer research question:
Option 2 use dummy variables
Does changing the relative payout for tokens change the
dictator’s behavior?
Treat the price ratio as a categorical variable

• is a dummy variable indicating if

• is a dummy variable indicating if

Test the null hypothesis that


• How do we interpret ? ?

29
Regression to answer research question:
Option 2 use dummy variables

• Interpret and
• Test the null hypothesis that

30
Multiple hypothesis testing review
• Use an F-test for the joint hypothesis
• The restricted model is:
• In this specific case it is the model goodness-of-fit test
• The F-statistic is a function of the Sum of Squared Residuals
(RSS) for the unrestricted (original) and restricted models
• It has an F distribution with q and N – M degrees of freedom
• Here q = 2 and N – M is 528 – 3 so the F-critical value is 2.62

• The F-statistic = 52.5 which is greater than the F-critical value


so we reject the null hypothesis at 5% significance level

31
The sampled observations are not all
independent
• Recall that the data is coming from 176 subjects who each
answered for all three questions
• So we have 528 observations, but each subject appears
three times in our dataset

• This is called a panel dataset where we have observations


for each individual
• We will revisit this again in week 10

32
The sampled observations are not all
independent
Each subject appears three times in our dataset

Which assumption(s) might this violate?

• A1: mean zero random errors;


• A2: exogeneity of errors; for all
• A3: errors in different observations are statistically
independent; for all
• A4: homeskedastic errors; for all
• A5: normality of errors; for all

33
What to do in panel data contexts
• We will discuss a few options later in the course
• Since we have so many observations, one quick and dirty
solution is to randomly sample only one observations
per individual to use in our analysis
• The new dataset has the same variables, but there are
only 172 observations

34
Testing with a subset of the data

• With fewer observations, is no longer statistically


significant
• But the F-test still rejects the null hypothesis
at 5% significance (p-value < 0.05)

35
But which subset to use? ***
• What if we happened to draw a random sample where
the null hypothesis is rejected?
• We can do this same process 1000 times, each time
randomly selecting 1 out of the 3 observations for each
individual in our dataset.
• How many possibilities are there?
• For each replication we estimate the model, test the null
hypothesis and calculate the F-statistic

** this is not covered in the exam just for general


knowledge

36
F-stat from 1,000 subsamples of the data
F critical
value =

99.9% of subsamples
have

37
1,000 subsamples of the data
t critical
value = 1.974

49.4% of subsamples
have 1.974

38
Full set of rounds Andreoni & Miller (2002)

39
What other questions can we answer with
this data?
• What is the external variation in the experiment?
• What is the endogenous variation?

Data description:
• - tokens to self (an integer between 0 and 60)
• - points per token to self (a number 1 or 2)
• - points per token to other (a number 1 or 2)
• - points per token, self divided by other
(a number 0.5, 1, or 2)
• - points to self out of total possible points
(a number 0 to 1)
• - total tokens to allocate (40, 60, 75, 80, or 100)
40
Causal diagrams generally
Does variation in X (exogenous) cause a change in Y
(endogenous)?

X Y

• X varies randomly, exogenously


• Variation in Y is the direct result of variation in X

41
Is the effect causal?
Does changing the relative payout for tokens change the
dictator’s behavior?

• We want to test if variation in the independent variable


directly causes variation in the outcome variable
• Need to control for other things that might cause
variation in the outcome variable
• Approach so far was to select rounds where only the
price ratio varied 42
Recitation overview

43
How does a change in the income affect
the dictator's behavior?

• Want to measure how much variation in income causes


variation in % points to self
• If we use the full dataset, we need to control for both
income and price ratio
• Or we restrict analysis to only rounds where income
varies (and price ratio stays constant)

44
Using the full dataset and controlling for
price
How does a change in the income affect the dictator's
behavior?

• The parameter captures the marginal effect of increasing


the income by 1 on the percentage points to self (controlling
for any variation in the price ratio across rounds)

45
Use the full dataset and control for price

• The effect of income is statistically significant but much


smaller than the effect of the price ratio

46
Subsampling
• For rounds 7, 8, and 11 the price ratio is 1

47
Subsampling
• Treating income as a categorical variable…

48

You might also like