0% found this document useful (0 votes)

15 views52 pages

Regression1 Framework

This document provides an overview of regression analysis and the matching estimator technique. It introduces linear regression models and the ordinary least squares estimation method. Key topics covered include the matching estimator framework, identification of treatment effects, and assumptions and properties of linear regression.

Uploaded by

tilfani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views52 pages

Regression1 Framework

Uploaded by

tilfani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Regression 1: Framework

Instructor: Yuta Toyama

Last updated: 2021-06-16

Introduction

2 / 52
Observational Study (観察研究)
Researchers in social science cannot always conduct a randomized control trial.

Instead, we need to use observational data in which treatment assignment may not be
random.

An approach in this case is controlling observable characteristics that causes a selection

bias.

This approach is essentially estimation of linear regression model (線形回帰モデル) by

ordinally least squares (OLS, 最小二乗法).

3 / 52
Overview
Introduce an idea of matching (マッチング) estimator.
Identification of treatment effect under selection on observable assumption.
Linear regression is a special case of matching estimator.
Linear regression: framework, practical topics, inference

4 / 52
Selection on Observables, or Matching

5 / 52
Matching to eliminate a selection bias
Idea: Compare individuals with the same observed characteristics X across treatment
and control groups

If treatment choice is driven by observed characteristics (such as age, income, gender, etc),
controlling for such factor would eliminate the selection.

Two key assumptions in matching

6 / 52
Assumption 1: Selection on observables
Let Xi denote the observed characteristics (sometimes called covariates (共変量))
age, income, education, race, etc..

Assumption 1:

Di ⊥ (Y0i , Y1i ) |Xi

Conditional on Xi , treatment assignment is random.

This assumption is often referred in a different name:

Selection on observables
Ignorability
Unconfoundedness

7 / 52
Assumption 2: Overlapping assumption
Assumption 2:

P (Di = 1|Xi = x) ∈ (0, 1) ∀x

Given x, we should be able to observe people from both control and treatment group.

The probability P (Di = 1|Xi = x) is called propensity score (傾向スコア).

8 / 52
Identification of Treatment Effect Parameters
Assumption 1 (unconfoundedness) implies that

E[Y1i |Di = 1, Xi ] = E[Y1i |Di = 0, Xi ] = E[Y1i |Xi ]

E[Y0i |Di = 1, Xi ] = E[Y0i |Di = 0, Xi ] = E[Y0i |Xi ]

Once you conditioning on Xi , the argument is essentially the same as the one in RCT.

9 / 52
The AT T conditional on Xi = x is given by

E[Y1i − Y0i |Di = 1, Xi ] = E[Y1i |Di = 1, Xi ] − E[Y0i |Di = 1, Xi ]

= E[Y1i |Di = 1, Xi ] − E[Y0i |Di = 0, Xi ]

Assumption 2 (overlapping) is needed to use the following

E[Ydi |Di = d, Xi ] = E[Yi |Di = d, Xi ] f or d = 0, 1

Why? Overlapping assumption P (Di = 1|Xi = x) ∈ (0, 1) means that for each x, we
should have people in both treatment and control group.

If not, we cannot observe both E[Yi |Di = d, Xi ] for d = 0, 1 .

With two assumptions,

E[Y1i − Y0i |Di = 1, Xi ] = E[Yi |Di = 1, Xi ] − E[Yi |Di = 0, Xi ]

 
avg with Xi in treatment avg with Xi in control

10 / 52
ATT E[Y 1i − Y0i |Di = 1]

ATT is given by

AT T = E[Y1i − Y0i |Di = 1]

= ∫ E[Y1i − Y0i |Di = 1, Xi = x]fXi (x|Di = 1)dx

= E[Yi |Di = 1] − ∫ (E[Yi |Di = 0, Xi = x]) fXi (x|Di = 1)

11 / 52
ATT E[Y 1i − Y0i ]

ATE is

AT E =E[Y1i − Y0i ]

=∫ E[Y1i − Y0i |Xi = x]fXi (x)dx

=∫ E[Y1i |Di = 1, Xi = x]fXi (x)dx + ∫ E[Y0i |Di = 0, Xi = x]fXi (x)dx

=∫ E[Yi |Di = 1, Xi = x]fXi (x)dx + ∫ E[Yi |Di = 0, Xi = x]fXi (x)dx

12 / 52
From Identification to Estimation
We need to estimate two conditional expectations E[Yi |Di = 1, Xi = x] and
E[Yi |Di = 0, Xi = x]

Several ways to implement this.

1. Regression: Nonparametric and Parametric

2. Nearest neighborhood matching (最近傍マッチング)
3. Propensity Score Matching (傾向スコアマッチング)

Here, I only explain a parametric regression as a way to implement the matching method.

See Appendix and textbooks for the details of matching estimators.

13 / 52
From Matching to Linear Regression Model
Assume that
′
E[Yi |Di = 0, Xi = x] = β xi
′
E[Yi |Di = 1, Xi = x] = β xi + τ

Here, treament effect is given by τ .

You will have a linear regression model

′
yi = β xi + τ Di + ϵi , E[ϵi |Di , xi ] = 0

Running a linear regression to obtain the treatment effect parameter τ .

14 / 52
Linear Regression: Framework

15 / 52
Regression (回帰) framework
Linear regression model (線形回帰モデル) is defined as

Yi = β0 + β1 X1i + ⋯ + βK XKi + ϵi

i: index for observations. i = 1, ⋯ , N .

Yi : dependent variable (被説明変数)

Xki : explanatory variable (説明変数)

ϵi : error term (誤差項)
β : coefficients (係数)

Data (sample): {Yi , Xi1 , … , XiK }N

i=1

We want to estimate coefficients β.

16 / 52
Ordinaly Least Squares (最小二乗法、OLS)
OLS estimators are the minimizers of the sum of squared residuals:

N
1
2
min ∑(Yi − (β0 + β1 Xi1 + ⋯ + βK XiK ))
β0 ,⋯,βK N
i=1

First order conditions characterize the OLS estimator. Denote it by β

^
.

17 / 52
Residual Regression (残差回帰)
Consider the model

Yi = β0 + αDi + β1 X1i + ⋯ + βK XKi + ϵi

Suppose that you are interested in α (say treatment effect parameter).

Residual regression characterizes the OLS estimator of α

^ in the following way.

18 / 52
Frisch–Waugh–Lovell Theorem
1. Run OLS regression of Di on all other explanatory variables 1, X1i , ⋯ , XKi . Obtain the
residual u^D
i
.

2. Run OLS regression of Yi on all other explanatory variables 1, X1i , ⋯ , XKi . Obtain the
residual u^Yi .

3. Run OLS regression of u

^i on u
^i without constant term. The OLS estimator α
^ is
Y D

Y D
∑u
î u
î
α
^ =
D 2
∑(u
î )

19 / 52
How to use FWL theorem
1. Computational advantage if you are interested in a particular coefficient. Use this idea in
estimation of panel data model.

1. Useful to see how the coefficient of interest is estimated. We will see this later in relation to
multicolinearity (多重共線性).

1. Double machine learning (Chernozhukov et al 2018): Estimation of treatment effect

parameters when so many covariates are available.

20 / 52
Assumptions for OLS
1. Random sample (ランダムサンプル): {Yi , Xi1 , … , XiK } is i.i.d. (identically and
independently distributed) drawn sample

2. mean independence: ϵi has zero conditional mean

E[ϵi |Xi1 , … , XiK ] = 0

3. Large outliers are unlikely: The random variable Yi and Xik have finite fourth moments.

4. No perfect multicollinearity (多重共線性): No linear relationship between explanatory

variables.

21 / 52
Theoretical Properties of OLS estimator
1. Unbiasedness: Conditional on the explantory variables X, the expectation of the OLS
estimator β
^
is equal to the true value β.

^
E[β |X] = β

2. Consistency: As the sample size N goes to infinity, the OLS estimator β

^
converges to β in
probability
p
^
β ⟶ β

3. Asymptotic normality (漸近正規性): discuss later.

22 / 52
Linear Regression: Practical Topics

23 / 52
Interpretation of Regression Coefficients
Remember that

Yi = β0 + β1 X1i + ⋯ + βK XKi + ϵi

The coefficient βk : the effect of Xk on Y ceteris paribus (all things being equal)
Equivalently, if Xk is continuous random variable,

∂Y
= βk
∂Xk

If we can estimate βk without bias, can obtain causal effect of Xk on Y .

24 / 52
Common Specifications in Linear Regression Model
Several specifications frequently used in empirical analysis.
1. Nonlinear term
2. log specification
3. dummy (categorical) variables
4. interaction terms (交差項)

25 / 52
Nonlinear term (非線形項)
Non-linear relationship between Y and X in a linearly additive form
2 3
Y i = β0 + β1 X i + β2 X + β3 X + ϵi
i i

As long as the error term ϵi appreas in a additively linear way, we can estimate the
coefficients by OLS.
Multicollinarity could be an issue if we have many polynomials (多項式).
You can use other non-linear variables such as log(x) and √x.

26 / 52
log specification
Using log changes the interpretation of the coefficient β in terms of scales.

Dependent Explanatory interpretation

Y X 1 unit increase in X causes β units change in Y
log Y X 1 unit increase in X causes 100β% change in Y
Y log X 1% increase in X causes β/100 unit change in Y
log Y log X 1% increase in X causes β% change in Y

27 / 52
Dummy variable (ダミー変数)
A dummy variable takes only 1 or 0. This is used to express qualititative information
Example: Dummy variable for race

1 if white
whitei = {
0 otherwise

The coefficient on a dummy variable captures the difference of the outcome Y between
categories

Yi = β0 + β1 whitei + ϵi

The coefficient β1 captures the difference of Y between white and non-white people.

28 / 52
Interaction term (交差項)
You can add the interaction of two explanatory variables in the regression model.
For example:

wagei = β0 + β1 educi + β2 whitei + β3 educi × whitei + ϵi

where wagei is the earnings of person i and educi is the years of schooling for person i.
The effect of educi is

∂wagei
= β1 + β3 whitei ,
∂educi

This allows for heterogeneous effects of education across races.

29 / 52
Measures of Fit
We often use R2 (決定係数) as a measure of the model fit.
Denote the fitted value as y^i

^ ^ ^
y
^ = β 0 + β 1 Xi1 + ⋯ + β K XiK
i

Also called prediction from the OLS regression.

30 / 52
R
2
is defined as

SSE
2
R = ,
T SS

where

2 2
SSE = ∑(y
^ − ȳ ) , T SS = ∑(yi − ȳ )
i

i i

R captures the fraction of the variation of Y explained by the regression model.

Adding variables always (weakly) increases R2 .

31 / 52
In a regression model with multiple explanatory variables, we often use adjusted R2 that
adjusts the number of explanatory variables

2 N − 1 SSR
R̄ = 1 −
N − (K + 1) T SS

where
2 2
SSR = ∑(y
^ − yi ) (= ∑ u
^i )
i

i i

32 / 52
Linear Regression: Inference

33 / 52
Statistical Inference of OLS Estimator
The OLS estimator is random variables as it depends on a drawn sample.

We need to conduct statistical inference to evaluate statistical uncertainty of the OLS

estimates.

Plan

Asymptotic distribution (漸近分布) of OLS estimator

Statistical inference:
Homoskedasticity (均一分散) vs Heteroskedasticity (不均一分散)

34 / 52
Asymptotic Normality (漸近正規性) of OLS Estimator
Under the OLS assumption, the OLS estimator has asymptotic normality

d
^
√N (β − β) → N (0, V )

V is called asymptotic variance (matrix) given by

′ −1 ′ 2 ′ −1
V = E[x xi ] E[x xi ϵ ]E[x xi ]
i i i i

(K+1)×(K+1)

is (K vector.
′
xi = (1, Xi1 , ⋯ , XiK ) + 1) × 1

35 / 52
We can approximate the distribution of β
^
by

^
β ∼ N (β, V /N )

The individual coefficient βk follows

^
β k ∼ N (βk , Vkk /N )

36 / 52
Estimation of Asymptotic Variance (漸近分散)
V is an unknown object. Need to be estimated.
Consider the estimator V^ for V using sample analogues
−1 −1
N N N
1 1 2
1
^ = (
V
′
∑ x xi ) (
′
∑ x xi ϵ
^i ) ( ∑ x xi )
′
i i i
N N N
i=1 i=1 i=1

where ϵ^i ^ ^
= yi − (β 0 + ⋯ + β K XiK ) is the residual.
Technically speaking, V^ converges to V in probability.
We often use the (asymptotic) standard error SE(β^k ) ^
= √V kk /N .

The standard error is an estimator for the standard deviation of the OLS estimator β^k .

37 / 52
Hypothesis testing
You might want to test a particular hypothesis regarding those coefficients.
Does x really affects y?
Is the production technology the constant returns to scale?

38 / 52
3 Steps in Hypothesis Testing
Step 1: Consider the null hypothesis H0 and the alternative hypothesis H1

H0 : β1 = k, H1 : β1 ≠ k

where k is the known number you set by yourself.

Step 2: Define t-statistic by

^
β1 − k
tn =
^
SE(β1 )

Step 3: We reject H0 is at α-percent significance level if

|tn | > Cα/2

where Cα/2 is the α/2 percentile of the standard normal distribution. We say we fail to
reject H0 if the above does not hold. 39 / 52
Caveats on Hypothesis Testing
We often say β
^
is statistically significant (統計的有意) at 5% level if |tn | > 1.96 when we
set k = 0 .

You should also discuss economic significance (経済的有意) of the coefficient in analysis.

Case 1: Small but statistically significant coefficient.

As the sample size N gets large, the SE decreases.

Case 2: Large but statistically insignificant coefficient.

The variable might have an important (economically meaningful) effect.
But you may not be able to estimate the effect precisely with the sample at your hand.

40 / 52
F test
We often test a composite hypothesis that involves multiple parameters such as

H0 : β1 + β2 = 0, H1 : β1 + β2 ≠ 0

We use F test in such a case.

41 / 52
Confidence interval (信頼区間)
95% confidence interval

^
β − k
1
CIn = {k : | | ≤ 1.96}
^
SE(β )
1

^ ^ ^ ^
= [β 1 − 1.96 × SE(β 1 ), β1 + 1.96 × SE(β 1 )]

Interpretation: If you draw many samples (dataset) and construct the 95% CI for each
sample, 95% of those CIs will include the true parameter.

42 / 52
Homoskedasticity vs Heteroskedasticity
The error term ϵi has heteroskedasticity (不均一分散) if V ar(ui |Xi ) depends on Xi . The
asymptotic variance is
′ −1 ′ 2 ′ −1
V = E[x xi ] E[x xi ϵ ]E[x xi ]
i i i i

If not, we call ϵi has homoskedasticity (均一分散). In this case,

′ −1 2
V = E[x xi ] σ
i

where σ 2 = V (ϵi ) .

43 / 52
Standard Errors in Practice
Standard errors under heteroskedasticity assumption is called heteroskedasticity robust
standard errors (不均一分散に頑健な標準誤差)

In many statistical packages (including R and Stata), the standard errors for the OLS
estimators are calculated under homoskedasticity assumption as a default.

However, if the error has heteroskedasticity, the standard error under homoskedasticity
assumption will be underestimated.

In OLS, we should always use heteroskedasticity robust standard error.

44 / 52
Appendix: Matching Estimator

45 / 52
Estimation Methods
We need to estimate E[Yi |Di = 1, Xi = x] and E[Yi |Di = 0, Xi = x]

Several ways to implement the above idea

1. Regression: Nonparametric and Parametric

2. Nearest neighborhood matching
3. Propensity Score Matching

46 / 52
Approach 1: Regression, or Analogue Approach
Let μ
^ (x) be an estimator of μk (x)
k
= E[Yi |Di = k, Xi = x] for k ∈ {0, 1}

The analog estimators are

N
1
^
AT E = ∑ (μ
^ (Xi ) − μ
^ (Xi ))
1 0
N
i=1

−1 N
N ∑ Di (Yi − μ
^ (Xi ))
i=1 0
^
AT T =
N
−1
N ∑ Di
i=1

How to estimate μk (x) = E[Yi |Di = k, Xi = x] ?

47 / 52
Nonparametric Estimation
Suppose that Xi ∈ {x1 , ⋯ , xK } is discrete with small K
Ex: two demographic characteristics (male/female, white/non-white). K = 4 \bigskip
Then, a nonparametric binning estimator is
N
∑ 1{Di = k, Xi = x}Yi
i=1
μ
^ (x) =
k N
∑ 1{Di = k, Xi = x}
i=1

\bigskip
Here, I do not put any parametric assumption on μk (x) = E[Yi |Di = k, Xi = x] .

48 / 52
Curse of dimensionality
Issue: Poor performance if K is large due to many covariates.
So many potential groups, too few observations for each group.
With K variables, each of which takes L values, LK possible groups (bins) in total.
This is known as curse of dimensionality.
Relatedly, if X is a continuous random variable, can use kernel regression.

49 / 52
Parametric Estimation, or going back to linear regression
If you put parametric assumption such as
′
E[Yi |Di = 0, Xi = x] = β xi
′
E[Yi |Di = 1, Xi = x] = β xi + τ0

then, you will have a model

′
y i = β x i + τ Di + ϵ i

You can think the matching estimator as controlling for omitted variable bias by adding
(many) covariates (control variables) xi .

50 / 52
Approach 2: M −Nearest Neighborhood Matching
Idea: Find the counterpart in other group that is close to me.
Define y^i (0) and y^i (1) be the estimator for (hypothetical) outcomes when treated and not
treated.

yi if Di = 0
y
^ (0) = { 1
i
∑ yj if Di = 1
M j∈LM (i)

LM (i) is the set of M individuals in the opposite group who are "close" to individual i
Several ways to define the distance between Xi and Xj , such as

2
dist(Xi , Xj ) = ||Xi − Xj ||

Need to choose (1) M and (2) the measure of distance

51 / 52
Approach 3: Propensity Score Matching
Use propensity score P (Di = 1|Xi = x) as a distance to define who is the closest to me.
Step 1: Estimate propensity score function by logit or probit using a flexible function of Xi .
Step 2: Calculate the propensity score for each observation. Use it to define the pair.

52 / 52

Chapter 6
81% (16)
Chapter 6
118 pages
UDSM Statistics and Probability For Non-Majors
No ratings yet
UDSM Statistics and Probability For Non-Majors
148 pages
Econometrics Notes Heidelberg
No ratings yet
Econometrics Notes Heidelberg
62 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Metrics Topic6 Part1 Multipleregression
No ratings yet
Metrics Topic6 Part1 Multipleregression
33 pages
Econometric S
No ratings yet
Econometric S
8 pages
Chi Square Test
78% (9)
Chi Square Test
49 pages
F Distribution: DF Denomi Nator DF For Numerator
No ratings yet
F Distribution: DF Denomi Nator DF For Numerator
12 pages
6102-Data Analysis and Decision Tools
No ratings yet
6102-Data Analysis and Decision Tools
8 pages
Logit and Probit Models
No ratings yet
Logit and Probit Models
44 pages
Simple Linear Regression - Lecture Notes
No ratings yet
Simple Linear Regression - Lecture Notes
19 pages
ECC321 Chapter2
No ratings yet
ECC321 Chapter2
5 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
Business Analytics: Advance: Logistic Regression
100% (1)
Business Analytics: Advance: Logistic Regression
26 pages
ECON3334 Midterm Fall2022 Question
No ratings yet
ECON3334 Midterm Fall2022 Question
7 pages
Amazon ML Pyq
No ratings yet
Amazon ML Pyq
8 pages
(Ebook PDF) Research Methods and Statistics: A Critical Thinking Approach 5th Edition Instant Download
100% (1)
(Ebook PDF) Research Methods and Statistics: A Critical Thinking Approach 5th Edition Instant Download
57 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
Understanding The Normal Curve Distribution
No ratings yet
Understanding The Normal Curve Distribution
14 pages
IntroEconmerics AcFn 5031 (q1)
No ratings yet
IntroEconmerics AcFn 5031 (q1)
302 pages
Lecture 4 MLR - 1
No ratings yet
Lecture 4 MLR - 1
30 pages
T Test Expamles
No ratings yet
T Test Expamles
7 pages
Compound Stress in English The Phonetics and Phonology of Prosodic Prominence Gero Kunter Download
No ratings yet
Compound Stress in English The Phonetics and Phonology of Prosodic Prominence Gero Kunter Download
82 pages
Module 4
No ratings yet
Module 4
36 pages
What Non-Statisticians Need To Know About Statistics in Clinical Trials
No ratings yet
What Non-Statisticians Need To Know About Statistics in Clinical Trials
43 pages
Problem Set 3 SOLUTIONS
No ratings yet
Problem Set 3 SOLUTIONS
7 pages
Econometrics Hawas-1
No ratings yet
Econometrics Hawas-1
83 pages
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
No ratings yet
ECON6001: Applied Econometrics S&W: Chapter 4: Linear Regression With One Regressor, An Introduction Dr. Gedeon Lim
59 pages
Chapter 3 Econometrics Edited
No ratings yet
Chapter 3 Econometrics Edited
48 pages
Econometrics Chap - 2
No ratings yet
Econometrics Chap - 2
57 pages
DescribingDataNumerically Activity
No ratings yet
DescribingDataNumerically Activity
5 pages
EC606A January 2024
No ratings yet
EC606A January 2024
3 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
Econometrics Test Prep
100% (2)
Econometrics Test Prep
7 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
R18&19
No ratings yet
R18&19
32 pages
Multiple Regression Analysis: y + X + X + - . - X + U
No ratings yet
Multiple Regression Analysis: y + X + X + - . - X + U
43 pages
Beta Calcutaion SPSS
No ratings yet
Beta Calcutaion SPSS
3 pages
Old Final Exam Problems
No ratings yet
Old Final Exam Problems
5 pages
Tests in SPSS
No ratings yet
Tests in SPSS
24 pages
2024 1 Metrics 6 Multipleols 1
No ratings yet
2024 1 Metrics 6 Multipleols 1
35 pages
Econometrics Jimma Assignment
No ratings yet
Econometrics Jimma Assignment
6 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
AG909 Quantitative Methods For Finance
No ratings yet
AG909 Quantitative Methods For Finance
7 pages
Tabel Stat Baru PDF
No ratings yet
Tabel Stat Baru PDF
19 pages
Cheatsheet
No ratings yet
Cheatsheet
2 pages
MATH 120 Introduction To Statistics Week 8 Final Exam
No ratings yet
MATH 120 Introduction To Statistics Week 8 Final Exam
6 pages
Lecture4 Linearregression Oneregressor
No ratings yet
Lecture4 Linearregression Oneregressor
37 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Data Science 6th Sem CS Engineesring Questions
No ratings yet
Data Science 6th Sem CS Engineesring Questions
35 pages
TCH442E Quantitative Methods For Finance: Last Lecture: Next
No ratings yet
TCH442E Quantitative Methods For Finance: Last Lecture: Next
13 pages
Introduction To Econometrics - Stock & Watson - CH 4 Slides
100% (2)
Introduction To Econometrics - Stock & Watson - CH 4 Slides
84 pages
Eco670e 03
No ratings yet
Eco670e 03
27 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Chapter 2
No ratings yet
Chapter 2
41 pages
Lectures - Multiple - Regression - Analysis - Further - Issues
No ratings yet
Lectures - Multiple - Regression - Analysis - Further - Issues
14 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
Basic Economterics - I
No ratings yet
Basic Economterics - I
17 pages
Komar University of Science and Technology
No ratings yet
Komar University of Science and Technology
15 pages
Short Quiz 14 STAT PDF
No ratings yet
Short Quiz 14 STAT PDF
14 pages
Linear Regression
No ratings yet
Linear Regression
73 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
43 pages
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
No ratings yet
Quiz 10 - Regression, Cluster Analysis, & Association Analysis
3 pages
Wilcoxon Signed Ranks Table
No ratings yet
Wilcoxon Signed Ranks Table
8 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Chapter 3
No ratings yet
Chapter 3
40 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
A Comparative Study of Relevant Vector Machine and
No ratings yet
A Comparative Study of Relevant Vector Machine and
5 pages
STAT 2100 Exercise 7
No ratings yet
STAT 2100 Exercise 7
3 pages
Tutorial 8 - Questions
No ratings yet
Tutorial 8 - Questions
2 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Assignment 5 H
No ratings yet
Assignment 5 H
5 pages
Tema I (Mínimos Cuadrados Ordinarios)
No ratings yet
Tema I (Mínimos Cuadrados Ordinarios)
49 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
TCW 1 - Introducing Statistics
No ratings yet
TCW 1 - Introducing Statistics
1 page
Ssss PDF
No ratings yet
Ssss PDF
50 pages
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
No ratings yet
Gujarati D, Porter D, 2008: Basic Econometrics 5Th Edition Summary of Chapter 3-5
64 pages
Assignments Ashoka University
No ratings yet
Assignments Ashoka University
32 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
TCH442E Quantitative Methods For Finance
No ratings yet
TCH442E Quantitative Methods For Finance
21 pages
Ordinary Least Squares: Rómulo A. Chumacero
No ratings yet
Ordinary Least Squares: Rómulo A. Chumacero
50 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
ECON3334 Midterm Fall2022 Solution
No ratings yet
ECON3334 Midterm Fall2022 Solution
6 pages
M300 Summary Notes
No ratings yet
M300 Summary Notes
12 pages
統計摘要
No ratings yet
統計摘要
12 pages
Introduction To Multiple Regression
No ratings yet
Introduction To Multiple Regression
36 pages
Lecture 2: Simple Linear Regression Model: Recap
No ratings yet
Lecture 2: Simple Linear Regression Model: Recap
5 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet

Regression1 Framework

Uploaded by

Regression1 Framework

Uploaded by

Regression 1: Framework

Instructor: Yuta Toyama

Last updated: 2021-06-16

An approach in this case is controlling observable characteristics that causes a selection

This approach is essentially estimation of linear regression model (線形回帰モデル) by

Two key assumptions in matching

Di ⊥ (Y0i , Y1i ) |Xi

Conditional on Xi , treatment assignment is random.

This assumption is often referred in a different name:

P (Di = 1|Xi = x) ∈ (0, 1) ∀x

The probability P (Di = 1|Xi = x) is called propensity score (傾向スコア).

E[Y1i |Di = 1, Xi ] = E[Y1i |Di = 0, Xi ] = E[Y1i |Xi ]

E[Y0i |Di = 1, Xi ] = E[Y0i |Di = 0, Xi ] = E[Y0i |Xi ]

E[Y1i − Y0i |Di = 1, Xi ] = E[Y1i |Di = 1, Xi ] − E[Y0i |Di = 1, Xi ]

= E[Y1i |Di = 1, Xi ] − E[Y0i |Di = 0, Xi ]

Assumption 2 (overlapping) is needed to use the following

E[Ydi |Di = d, Xi ] = E[Yi |Di = d, Xi ] f or d = 0, 1

If not, we cannot observe both E[Yi |Di = d, Xi ] for d = 0, 1 .

With two assumptions,

E[Y1i − Y0i |Di = 1, Xi ] = E[Yi |Di = 1, Xi ] − E[Yi |Di = 0, Xi ]

AT T = E[Y1i − Y0i |Di = 1]

= ∫ E[Y1i − Y0i |Di = 1, Xi = x]fXi (x|Di = 1)dx

= E[Yi |Di = 1] − ∫ (E[Yi |Di = 0, Xi = x]) fXi (x|Di = 1)

=∫ E[Y1i − Y0i |Xi = x]fXi (x)dx

=∫ E[Y1i |Di = 1, Xi = x]fXi (x)dx + ∫ E[Y0i |Di = 0, Xi = x]fXi (x)dx

=∫ E[Yi |Di = 1, Xi = x]fXi (x)dx + ∫ E[Yi |Di = 0, Xi = x]fXi (x)dx

Several ways to implement this.

1. Regression: Nonparametric and Parametric

See Appendix and textbooks for the details of matching estimators.

Here, treament effect is given by τ .

You will have a linear regression model

Running a linear regression to obtain the treatment effect parameter τ .

i: index for observations. i = 1, ⋯ , N .

Xki : explanatory variable (説明変数)

Data (sample): {Yi , Xi1 , … , XiK }N

We want to estimate coefficients β.

First order conditions characterize the OLS estimator. Denote it by β

Yi = β0 + αDi + β1 X1i + ⋯ + βK XKi + ϵi

Suppose that you are interested in α (say treatment effect parameter).

Residual regression characterizes the OLS estimator of α

3. Run OLS regression of u

1. Double machine learning (Chernozhukov et al 2018): Estimation of treatment effect

2. mean independence: ϵi has zero conditional mean

E[ϵi |Xi1 , … , XiK ] = 0

4. No perfect multicollinearity (多重共線性): No linear relationship between explanatory

2. Consistency: As the sample size N goes to infinity, the OLS estimator β

3. Asymptotic normality (漸近正規性): discuss later.

If we can estimate βk without bias, can obtain causal effect of Xk on Y .

Dependent Explanatory interpretation

wagei = β0 + β1 educi + β2 whitei + β3 educi × whitei + ϵi

This allows for heterogeneous effects of education across races.

Also called prediction from the OLS regression.

R captures the fraction of the variation of Y explained by the regression model.

Adding variables always (weakly) increases R2 .

We need to conduct statistical inference to evaluate statistical uncertainty of the OLS

Asymptotic distribution (漸近分布) of OLS estimator

V is called asymptotic variance (matrix) given by

The individual coefficient βk follows

where k is the known number you set by yourself.

Step 2: Define t-statistic by

Step 3: We reject H0 is at α-percent significance level if

|tn | > Cα/2

Case 1: Small but statistically significant coefficient.

Case 2: Large but statistically insignificant coefficient.

We use F test in such a case.

If not, we call ϵi has homoskedasticity (均一分散). In this case,

In OLS, we should always use heteroskedasticity robust standard error.

Several ways to implement the above idea

1. Regression: Nonparametric and Parametric

The analog estimators are

How to estimate μk (x) = E[Yi |Di = k, Xi = x] ?

then, you will have a model

Need to choose (1) M and (2) the measure of distance

You might also like