0% found this document useful (0 votes)
348 views14 pages

Econometrics Final Exam Study Guide PDF

This document provides an overview of nonlinear regression functions and assessing studies based on multiple regressors. It discusses several key topics: 1. Nonlinear regression functions including polynomial, logarithmic, and exponential forms as well as interactions between independent variables. 2. Threats to the internal and external validity of multiple regression studies, such as omitted variable bias, functional form misspecification, errors-in-variables bias, missing data bias, and simultaneous causality bias. 3. Using panel data, which contains observations on multiple entities over time, to control for unobserved factors that do not vary over time but could cause omitted variable bias if excluded from the regression.

Uploaded by

Mohamad Bizri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
348 views14 pages

Econometrics Final Exam Study Guide PDF

This document provides an overview of nonlinear regression functions and assessing studies based on multiple regressors. It discusses several key topics: 1. Nonlinear regression functions including polynomial, logarithmic, and exponential forms as well as interactions between independent variables. 2. Threats to the internal and external validity of multiple regression studies, such as omitted variable bias, functional form misspecification, errors-in-variables bias, missing data bias, and simultaneous causality bias. 3. Using panel data, which contains observations on multiple entities over time, to control for unobserved factors that do not vary over time but could cause omitted variable bias if excluded from the regression.

Uploaded by

Mohamad Bizri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Chapter 8 - Nonlinear Regression Functions

I. Nonlinear Regression Functions - General Comments

A. The general nonlinear population regression function

1. Assumptions

a) E(u​i​| X​1​, X​2​,..., X​Ki​) = 0

(1) For any given value of X the mean is zero

b) (X​1​,..., X​ki​, Y​i​) are i.i.d.

c) Big outliers are rare

d) No perfect multicollinearity

II. Nonlinear functions of one variable

A. Two complementary approaches

1. Polynomials in X; Ex: Y​i​ = B​0​ + B​1​X​i​ + B​2​*X​i​2​ + B​r​ + u​i

a) Joint hypothesis testing can be used to determine non-linearity

Ex: TestScore = B​0​ + B​1​Income​i​ + B​2​(Income​i​)​2​ + B​3​(Income​i​)​3​ + u​i

H​0​ : population coefficients on Income​2​ and Income​3​ = 0

H​1​ : at least one of these coefficients is nonzero

*Run test income2 income3; if p<0.05 then reject null

2. Logarithmic transformations of Y and/or X - permits modeling relations in

percentage terms

a) Three log regression specifications:


(1) Linear-log : Y​i​ = B​0​ + B​1​ln(X​i​) + u​i

(a) A 1% change in X (multiplying X by 1.01) is

associated with a 0.01*B​1​ unit change in Y

(2) Log-linear : ln(Y​i​) = B​0​ + B​1​X​i​ + u​i

(a) A 1 unit change in X is associated with a 100*B​1​%

change in Y

(3) Log-log : ln(Y​i​) = B​0​ + B​1​ln(X​i​) + u​i

(a) A 1% change in X is associated with a B​1​% change

in Y

B. Negative Exponential Growth Regression

1. Y​i​ = B​0​ - ae​-B1X​ + u​i

III. Nonlinear functions of two variables: interactions between independent variables

A. Interactions between two binary variables: Y​i​ = B​0​ + B​1​D​1​ + B​2​D​2​ + B​3​D​1​D​2​ + u​i

1. D​1​ and D​2​ are binary

2. Including an interaction term D​1​*D​2​ allows the effect of changing D​1​ to

depend on D​2

a) To find effect of variable, partially derive:

(1) Y​i​ = B​0​ + B​1​D​1​ + B​2​D​2​ + B​3​D​1​D​2​ + u​i

(a) dy/dD​1​ = B​1​ + B​3​D​2​ = effect of change in D​1

(b) dy/dD​2​ = B​2​ + B​3​D​1​ = effect of change in D​2

B. Interactions between continuous and binary variables:

1. Y​i​ = B​0​ + B​1​D​i​ + B​2​X​i​ + B​3​D​i​*X​i​ + u​i


a) D​i​ is binary, X is continuous; the interaction term allows the effect

of X to depend on D

b) Two regression lines are formed by D=0 and D=1

(1) Find effect by partially deriving

C. Interactions between two continuous variables:

1. Y​i​ = B​0​ + B​1​X​1​ + B​2​X​2​ + B​3​X​1​X​2​ + u​i

a) Both X​1​ and X​2​ are continuous

b) Interaction term does the same thing as above

c) Perform joint hypothesis testing on individual variables and

interaction

Chapter 9 - Assessing Studies Based on Multiple Regressors

I. Internal and External Validity

A. Internal Validity - The statistical inferences about causal effects are valid for the

population being studied

B. External Validity - The statistical inferences can be generalized from the

population and setting studied to other populations and settings (where the

“setting” refers to the legal, policy, and physical environment and related salient

features)

C. Threats to External Validity of Multiple Regression Studies

1. Assessing threats to external validity requires detailed substantive

knowledge and judgment on a case-by-case basis

II. Threats to Internal Validity of Multiple Regression Analysis


A. Omitted variable bias

1. Arises if an omitted variable is both:

a) A determinant of Y

b) Correlated with at least one included regressor

2. Solutions:

a) If the omitted causal variable can be measured, include it

b) If you have data on one or more controls and they are adequate

(conditional mean independence plausibility holding) then include

the control variables

c) Possibly, use panel data in which each entity (individual) is

observed more than once (thus providing a control)

d) If the omitted variable(s) cannot be measured, use instrumental

variables regression

e) Run randomized control experiment

(1) If X is randomly assigned, then X necessarily will be

distributed independently of u​i​, thus E(u|X = x) = 0

B. Functional form misspecification

1. Arises if the functional form is incorrect (linear vs. non-linear); ex: an

interaction term is incorrectly omitted

2. Solutions

a) Continuous dependent variable: use the “appropriate” nonlinear

specifications in X (logarithms, interactions, etc…)


b) Discrete (ex: binary) dependent variable: need an extension of

multiple regression methods (“probit” or “logit” analysis for binary

dependent variables)

C. Errors-in-variables bias

1. Arises with data entry errors in administrative data, recollection errors in

surveys, ambiguous questions, intentionally false response problems,

etc…

2. Solutions:

a) Get better data

b) Develop a specific model of the measurement error process (only

possible if a lot is known about the nature of the measurement

error)

c) Instrumental variables regression

D. Missing data and sample selection bias

1. Three cases: Data are missing at random, data are missing based on the

value of one or more X’s, and data are missing based in part on the value

of Y or u

a) Cases 1 and 2 don’t introduce bias; case 3 introduces “sample

selection” bias

2. Sample selection bias arises when a selection process:

a) Influences the availability of data and

b) Is related to the dependent variable


3. Solutions:

a) Collect the sample in a way that avoids sample selection

(1) Don’t collect data on the height of GW’s population by

standing outside the basketball locker-room

b) Randomized control experiment

c) Construct a model of the sample selection problem and estimate

that model (Heckman’s two stage)

E. Simultaneous causality bias

1. What if, not only does X cause Y, but Y causes X as well?

a) Ex: low STR results in better test scores, but suppose districts with

low test scores are given extra resources and as a result of a

political process also have a low STR; STR and u are correlated in

this case

2. Solutions:

a) Run a randomized controlled experiment

b) Develop and estimate a complete model of the two way causal

interaction

c) Use instrumental variables to estimate the causal effect of interest

(effect of X on Y, ignoring effect of Y on X)

*all of these threats imply that the mean for any given X does not equal 0 (conditional mean

independence is invalid), in which case OLS is biased and inconsistent

Chapter 10 - Regression with Panel Data


I. Panel Data: What and Why

A. Contains observations on multiple entities at two or more points in time; also

referred to as longitudinal data

1. Balanced Panel: no missing observations; all variables are observed for all

entities and all time periods

B. With panel data we can control for factors that:

1. Vary across entities but do not vary over time

2. Could cause omitted variable bias if they are omitted

3. Are unobserved or unmeasured and therefore cannot be included in the

regression

II. Panel Data with Two Time Periods: Y​it​ = B​0​ + B​1​X​it​ + B​2​Z​i​ + u​it

A. Z is a factor that does not change over time (at least during the years on which we

have data

1. Suppose Z​i​ is not observed, so its omission could result in omitted variable

bias

a) The effect of Z​i​ can be eliminated by using T=2 years

(1) Any change in Y from Y=1 to Y=2 cannot be caused by Z

because Z does not change

III. Fixed Effects Regression (if T>2) : Y​it​ = B​0​ + B​1​X​it​ + B​2​Z​i​ + u​it​, i = 1,...,n, T = 1,...,T

A. We can rewrite this in two useful ways:

1. “n-1 binary regressor” form:

a) Y​it​ = B​0​ + B​1​X​it​ + y​2​D2​i​ + … + y​n​Dn​i​ + u​it


(1) Where D2​i​ = 1 for i=2 (entity 2) or 0 for otherwise)

2. “Fixed Effects” form:

a) Y​it​ = B​1​X​it​ + a​i​ + u​it

(1) a​i​ is called a “state fixed effect” or “state effect” it is the

constant (fixed) effect of being in state i

B. Fixed Effects Regression: Estimation

1. Three estimation methods:

a) “n-1 binary regressors” OLS regression

(1) First create the binary variables D2​i​,...,Dn​i

(2) Then estimate Y​it​ = B​0​ + B​1​X​it​ + y​2​D2​i​ + … + y​n​Dn​i​ + u​it​ by

OLS

(3) Inference (hypothesis tests, confidence intervals,etc... ) is as

usual

(4) This is only practical when n is small

b) “Entity-demeaned” OLS regression

c) “Changes” specification, without an intercept (only works for T=2)

IV. Regression with Time Fixed Effects

A. An omitted variable may vary over time but not across states

1. Let S​t​ denote the combined effect of variables which changes over time

but not states: Y​it​ = B​0​ + B​1​X​it​ + B​2​Z​i​ + B​3​S​t​ + u​it

B. Two formulations of regression with time fixed effects

1. “T-1” binary regressor formulation:


a) Y​it​ = B​0​ + B​1​X​it​ + ​δ​2​B2​t​ + … + ​δ​T​BT​t​ + u​it

b) Where B2​t​ = (1 when t=2 (year 2) and 0 otherwise)

2. “Time effects” formulation: Y​it​ = B​1​X​it​ + λ​t​ + u​it

C. Time fixed effects: estimation methods

1. “T-1 binary regressor” OLS regression:

a) Y​it​ = B​0​ + B​1​X​it​ + ​δ​2​B2​t​ + … + ​δ​T​BT​t​ + u​it

b) Create binary variables B2,...BT

c) B2 = 1 if t = year 2, = 0 otherwise

d) Regress using OLS

2. “Year-demeaned” OLS regression

a) Deviate Y​it​, X​it​ from year (not entity) averages

b) Estimate by OLS using “year-demeaned” data

V. Standard Errors for Fixed Effects Regression

A. LS Assumptions for Panel Data: Y​it​ = B​1​X​it​ + a​i​ + u​it​, i = 1,...,n, t = 1,...,T

1. E(u​it​| X​i1​,...,X​iT​a​i​) = 0

a) u​it​ has mean zero, given the entity fixed effect and the entire

history of the X’s for that entity

2. (X​j1​,..., X​iT​, u​i1​,...,u​IT​), i = 1,...,n, are i.i.d. draws from their joint

distribution

a) Does not require observations to be i.i.d over time for the same

entity

3. (X​it​, u​it​) have finite fourth movements


4. No perfect multicollinearity

Chapter 11 - Regression with a Binary Dependent Variable

I. The Linear Probability Model

A. When Y is binary, the linear regression model: Yi = B​0​ + B​1​Xi + ui

1. Called the linear probability model because:

a) Pr(Y=1|X) = B​0​ + B​1​Xi

b) The predicted value is a probability:

(1) E(Y|X=x) = Pr(Y=1|X=x) = prob. That Y=1 given x

(2) Y-hat = the predicted probability that Y​i​=1 given X

c) B​1​ = change in probability that Y=1 for a unity change in x:

(1) B​1​ = [Pr(Y=1|X=x + delta(x)) - Pr(Y=1|X=x)]/delta(x)

B. Advantages

1. Simple to estimate and to interpret

2. Inference is the same as for multiple regression

C. Disadvantages

1. A LPM says that the change in the predicted probability for a given

change in X is the same for all values of X. but that doesn’t make sense;

these predicted probabilities can also be <0 or >1

a) These disadvantages can be solved by the following nonlinear

probability models

II. Probit and Logit Regression

A. We don’t want the probability of Y=1 as being linear, instead we want:


1. Pr(Y=1|X) to be increasing in X for B​1​>0, and

2. 0<= Pr(Y=1|X) <= 1 for all X

B. Probit regression - models the probability that Y=1 using the cumulative standard

normal distribution function, chi(z), evaluated at z = B​0​ + B​1​X

1. Probit Model: Pr(Y=1|X) = chi(B​0​ + B​1​X)

a) Where chi is the cumulative normal distribution function and z =

B​0​ + B​1​X is the “z-value” of the probit model

C. Logit regression - models the probability of Y=1 given X, as the cumulative

standard logistic distribution function, evaluated at z = B​0​ + B​1​X

1. Logit Model: Pr(Y=1|X) = F(B​0​ + B​1​X)

a) Where F is the cumulative logistic distribution function:

(1) F(B​0​ + B​1​X) = 1/(1+e^-(B​0​+B​1​X))

III. Estimation and Inference in Probit and Logit

A. The R​2​ and R-bar​2​ don’t make sense here so two other specialized measures are

used:

1. The fraction correctly predicted = fraction of Y’s for which the predicted

probability is >50% when Y​i​ =1, or is <50% when Y​i​=0

2. The pseudo-R​2​ measures the improvement in the value of the log

likelihood, relative to having no X’s; simplifies to the R​2​ in the linear

model with normally distributed errors

Chapter 12 - Instrumental Variables Regression

I. IV Regression: Why and What; Two Stage Least Squares


A. Can address three important threats to internal validity: omitted variable bias,

simultaneous causality bias, and errors-in-variables bias; all three problems can

result in E(u|X) =/= 0 which can be fixed using an instrumental variable, Z

B. IV regression breaks X into two parts: a part that might be correlated with u, and a

part that is not; by isolating the later it is possible to estimate B​1

1. This is done using an instrumental variable, Z​i​, which is correlated with X​i

but not with u​i

2. Endogenous variable - one that is correlated with u

3. Exogenous variable - one that is uncorrelated with u

C. Two conditions for a valid instrument:

1. Instrument relevance: corr(Z​i​,X​i​) =/= 0

2. Instrument exogeneity: corr(Z​i​,u​i​) = 0

D. IV estimator with one X and one Z

1. Two Stage Least Squares (TSLS)

a) Stage 1 - isolate the part of X that is uncorrelated with u by

regressing X on Z using OLS: X​i​ = pi​0​ +pi​1​Z​i​ + v​i

(1) Because Z is uncorrelated with u​i​, pi​0​ +pi​1​Z​i​ is uncorrelated

with u​i​; we may not know the values but we have estimated

them

(2) Compute the predicted values of X​i​, where X-hat​i​ = pi-hat​0

+pi-hat​1​Z​i​, i = 1,...,n

b) Stage 2 - replace X​i​ with X-hat​i​ in the regress and regress Y on it


II. The General IV Regression Model

A. Terminology:

1. Identification - in IV regression, whether the coefficients are identified

depends on the relation between the number of instruments (m) and the

number of endogenous regressors (k)

a) If there are fewer instruments than endogenous regressors, we

can’t estimate B​1​,...,B​k

b) Coefficients B​1​,...,B​k​ are said to be:

(1) Exactly identified if m=k

(2) Overidentified if m>k

(3) Underidentified if m<k

B. Summary of Jargon:

1. Y​i​ = B​0​ + B​1​X​i1​, + … + B​k​X​ik​ + B​k+1​W​i1​, + … + B​k+r​W​ir​ + u​i

2. X​i1​,...,X​ik​ are the endogenous regressors (potentially correlated with u​i

3. W​i1​,...,W​ir​ are the included exogenous regressors (uncorrelated with u​i​) or

control variables (included so that Z​i​ is uncorrelated with u​i​, once the W’s

are included)

4. B​0​, B​1​,...,B​k+r​ are the unknown regression coefficients

5. Z​i​,...,Z​im​ are the m instrumental variables (the excluded exogenous

variables

C. IV Regression Assumptions

1. E(u​i​|W​i1​,...,W​ir​) = 0
a) I.e. the exogenous regressors are exogenous

b) If W’s are used as control variables

(1) E(u​i​|W​i​, Z​i​) = E(u​i​|W​i​)

2. All variables are i.i.d.

3. The X’s, W’s, Z’s, and Y have nonzero, finite 4th movements

4. The instruments (Z​i​,...,Z​im​) are valid

You might also like