100% found this document useful (2 votes)
346 views5 pages

Chaeat Sheet Econometrics

A random variable is a variable whose value is unknown until observed. Probability density functions describe the probability of values occurring, while cumulative density functions describe the probability of values being greater than or equal to a value. Joint probability density functions describe the joint probability of two random variables occurring together. Statistical independence occurs when one variable does not impact the other. The assumptions of simple linear regression include that errors have mean zero, are homoscedastic and serially independent. Least squares regression minimizes the sum of squared residuals to estimate coefficients. The properties of estimators from least squares regression depend on factors like sample size and the sum of squares of the independent variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (2 votes)
346 views5 pages

Chaeat Sheet Econometrics

A random variable is a variable whose value is unknown until observed. Probability density functions describe the probability of values occurring, while cumulative density functions describe the probability of values being greater than or equal to a value. Joint probability density functions describe the joint probability of two random variables occurring together. Statistical independence occurs when one variable does not impact the other. The assumptions of simple linear regression include that errors have mean zero, are homoscedastic and serially independent. Least squares regression minimizes the sum of squared residuals to estimate coefficients. The properties of estimators from least squares regression depend on factors like sample size and the sum of squares of the independent variable.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

A random variable is a variable whose value is

unknown until it is observed;  


 If g(x) if a function of X, then g(x) is
Probability Density Functions – f(x) random The error term
 Probability of each possible value occurring
 
Cumulative Density Function – F(x)  Rules of mean: 

 Probability x≥X Assumptions of Simple Linear Regression

Joint Probability Density Function
 1. For each value x,
 Joint probability of X&Y occurring 2. equivalently,
 Marginal probabilities are PDFs of either
X/Y variables
 Conditional Probability (f(x|y)) Pr of x Variance = Measure of dispersion 3.
occurring if y assumed to happen 4. Covariance between any pair of random
errors ei and ej

o
 Statistical Independence occurs if x  Stronger version: es are stat independent,
doesn’t impact y  therefore values of y are stat independent
5. Variable x is not random, and takes at
least two different values
o 6. (optional) Values of e are normally
o Or, if: distributed about mean if values of y are,
o  IF x and Y independent
and vice versa
o X and Y ONLY statistically independent
if either above is true for every pair of x Least squares regression
and y

 General form
Rules of Summation  Least squares residual
Covariance


Generating the least squares estimates

 Minimise function:

Correlation



 
 If ρ= {-1,1) perfect positive/negative
correlation Elasticities (for a 1% change in x, {elasticity}%
 If ρ=0, no correlation, also cov(X,Y) = 0 change in y)

 Elasticity of mean expenditure with
respect to income

- often used at a representative point on


the regression line

The estimator b2

Normal Distribution - X ~ N(µ, σ2)

 Standard normal distribution 


 Plus, E(b2) =β2 (if model assumptions hold)
hence estimator is unbiased.
o
 Weighted sum of normal variables is Variances/covariances of the OLS
normally distributed indicators

On the variances b2
o Larget the variance, greater the
uncertainty there is in the statistical
Properties of Probability distributions model, and the larger the variances and
covariance of the least squares
Mean = key measure of centre Simple Regressions estimator
o The larger the sum of the squares, the
 General form smaller the variances of the LSE and the
 more precisely we estimate the
 Can calculate a conditional mean Slope of simple regression (β2)
unknown parameters
o The larger the sample size N, the smaller 2. Calculate Test Statistic
the variances and covariances of the LSE 
o The larger the term (sum(x2)), the larger  Elasticity: (β2)
the variance of the LSE b1
o The absolute magnitude of the Log-linear function 3. Decide on α
covariance increases the larger in 4. Calculate tc (1-α OR 1-α/2, df(N-2) using t-tables and
magnitude of the sample mean xbar and  a sketch of rejection region
the covariance has a sign opposite xbar  Slope =by 5. Rule that we reject H0 if |t|>tc
6. Conclude in terms of problem at hand
 Elasticity:
Gauss-Markov Theorum
 Semi-elasticity: %change in y for a 1 unit
 Under assumptions of SR1-SR5, estimators
b1 and b2 have smallest variance of all change in x
linear and unbiased estimators of b1 and
b2. They are BLUE – Best linear unbiased Regression with indicator variables
estimators

Facts about GMT

 Go
1. Estimators b1 and b2 best compared to
odness of fit and modelling issues
similar estimators – those which linear  Used to compare difference bw two
and unbiased. Not states that best of all variables- slope is the difference bw R2 – Explains the proportion of the variance in y
possible estimators population means about its mean that is explained by the
2. Estimators best in class bc min variance. regression model
Always want to use one with smallest var, Interval estimation and hypothesis testing
Mention Omitted vars if low
bc estimation rule gives us higher p of
Point v Interval Estimates
obtaining estimate close to true
parameter value  Point estimates - be point estimate of the
3. If any of SR1-5 assumptions not held, then unknown population parameter in
OLS estimates not best linear unbiased regression model
estimators  Interval estimate – a range of values in
4. GMT doesn’t depend on assumption of which true parameter is likely to fall
normality (SR6)
5. GMT applies to LS estimators – not to LS How to make a interval estimate of β2 (But
estimates from a single sample. don’t know population s.d.)

Normality assumption

 If we make SR6, LSE are normally distributed

CLT
Normalise by converting to Z
 If SR1-5 hold, and sample size N is sufficiently
large, then the least squares estimators are apx
normal

Estimating σ2

BUT DO NOT USE!!! –


Use Critical t and df because popn sd unknown
Estimating Variance and Covariance
Obtaining Interval Estimates

 Find Critical tc percentile value t(1-a/2,m)


for degrees freedom m

 Then solve as above for upper and lower
(Xi-xbar)^2
limits
Variance-Covariance Matrix  ‘When the procedure we used is applied
to many random samples of data from the
same population, then 95% of all the
interval estimates constructed using this
procedure will contain the true
parameter’
Standard error of b2
Hypothesis Testing
 i.e. se from sample-to-sample used to Multiple Regression
construct various b2s  Steps:
Assumptions:
1. State Hypotheses

 [i.e. homoskedastic]


Estimating nonlinear relationships 
Quadratic model
Assumptions about explanatory variables:
 Expl. Vars are not random (i.e. known  Omitting a relevant variable leads to a  EG, inc. N, S, E – and base will be W
prior to finding value of dependent var) biased estimator
 Any one of expl vars is not an exact linear  Can be viewed as setting βOmittedvar=0 Example
function of another – otherwise exact
collinearity and LS fails

Finding OLS estimators
Irrelevant Variables
Minimise
 Can increase variance for included var. i.e.
 LS estimators are random vars reducing precision of those vars. Can apply F test to test sig of dummies

Model specification tips Log-linear models


Estimating σ^ 2

1. Choose vars and form on basis of


theoretical and general understanding of
, k=number of β parameters being the relationship
2. If estimated equation has coefficients w
estimated
unexpected signs or unrealistic
Var-Covar matrix magnitude, may be cause by
misspecifications like omission of imp var
3. Can perform sig tests to decide whether
var or group of vars should be included Can approximate % gap bw M/F by δ
use hat 4. Can test adequacy of a model using RESET
(not good) For a better calculation, use:
Hypothesis testing of βk
Collinearity
(% change bw Dummy=1, D=0)
*NOTE df=N-K!*
Linear Probability Model
Interval estimation
Therefore, if x2, x3 corr=1, var b2-> infinity –
likewise id x2 has no variation (i.e. collinear w
constant term)
Probability function
Testing joint hypothesis Cannot find LS estimators, cannot obtain
estimates of βk -
e.g. , H1: any of β4, 5,
6 are ≠0 Impacts:
this is a Bernoulli distribution:
 Makes unrestricted model with all xi  Estimator SEs are large, so likely t tests
 Makes restricted model with x4,5,6 be will lead to conclusion that parameter
excluded from y estimates are not sig diff from 0
 Calc. SSER and SSEU  Estimators may be v sensitive to
 F stat determines whether a large or small addition/deletion of few obs, or to
reduction in SSEs deletion of apparently insignificant var BUT, var of error term is not homoscedastic,
 F crit(J, N-k) – J is horizontal, less sig on  Accurate forecasts may be possible if the and p(x) can be <0, >1 (i.e. problems w model)
crit) nature of the collinear relationship
remains same within the out of sample
obs

J = #of restrictions (ie Indicator variables Heteroskedacity


terms removed), N=#obs, K=#coef. In
unrestricted model inc. constant  Use to construct models in which some or When the var of e is not randomly distributed –
all of regression parameters inc intercept i.e. it increases/decreases or some
Steps in F test change for some obs in the sample combination. NOT Randomly distributed
 D=0 reference (base) group residuals!
1. State H0 and H1
2. Specify test stat and distribution Intercept indicator (dummy) variable E.g. var(e) increases as x increases -> y and e
3. Set sig. level, determine rej. Region
are heteroskedastic
4. Calculate sample value of test stat
5. State conclusion Therefore the LS assumptions are violated
Testing Sig of model (test of the overall
 - violation of LS
significance of the regression model
assumptions, as variance is a function of x
Interaction variable (slope indicator/slope
1. State H0 (all βk=0) and H1 (at least 1≠0)
dummy) Two implications of heteroskedasticity
2. Continue as above. Use this equation

3. 1. The LSE still linear, unbiased- but not best


– there is another better estimator
Relationship bw t- and F-Tests  2. The standard errors usually computed for
 Slope: the LS estimators are incorrect. CIs and
 When F-test for a single β, F=t 2
Hyp tests may be misleading

Model specification

Omitted Variable Bias Dummy var trap


Need to use as an estimator
 Cannot include L and (Not L) – will make of var(b2), not the one used for unbiased e
collinearity
Detecting heteroskedasticity σ 2 =α 1 + α 2 male
Visually (informal – should be no pattern in For full marks the variance function should be
residuals) for K=2 in terms of sigma squared– I don’t mind what
greek letters are used for the coefficients
Lagrange multiplier (Breusch-Pagan) Test  Helps to ensure CIs and test stats are
correct when there is heteroskedasticity Is the estimated variance of e higher for men
 BUT, does not address other impacts of or for women? .[5 points]
hetero – LS estimator no longer best
 Failing to address this may not be too The estimated variance of e is lower for men
serious – w large N, var of LS estimators than for women. The estimated coefficient
may be small enough to get adequately suggests that the variance is lower for men by
precise estimates 28,849.63, Must state that it’s lower and say by
o To find an alternative estimator how much it is lower for full marks.
w lower var, it is necessary to
Sub ehat – then R2 from eqn measures the specify a suitable variance Is the variance of e statistically different for
proportion of var on ehat2 explained by Zs. function. Using LS w robust SEs men and for women? .[5 points]
avoids the need to specify a
Use Chi-square test – test stat: Chi-crit: Hypothesis test of male coefficient. Required:
suitable variance function
Hypotheses: (1 point)
Tough MCQs Test statistic/t critical/alpha OR p-value. (3)
BUT! Large sample test only Conclusion (1 point)
When collinear variables are included in an
econometric model coefficient estimates are Conduct an appropriate test for the presence
d.) unbiased but have larger standard errors of heteroskedasticity. What do you conclude?
Show all working.
If you reject the null hypothesis when
performing a RESET test, what should you State the equation to use for testing hetero:
conclude? d.) an incorrect functional form was e^ 2=α 1 +α 2 male +v 1
used
Hypothesis (1 point):
How does including an irrelevant variable in a H 0 :α 2 =0( homoskedasticity)
regression model affect the estimated
H 1 :α 2≠0( heteroskedasticity )
coefficient of other variables in the model? d.)
they are unbiased but have larger standard
errors
Test statistic (1 point)
If X has a negative effect on Y and Z has a 2 2
positive effect upon Y, and X and Z are χ =N×R =706×0 . 0016=1 . 1296
negatively correlated, what is the expected
consequence of omitting Z from a regression of Level of significance, df, Chi sq critical value –
Hal White test
Y on X? a) The estimated coefficient on X will be any level of significance can be used. For 0.05,
 Can test for hetero wo precise knowledge biased downwards (too negative). df=1, the critical value is 3.841 (1 point)
of relevant vars – sets Zs as equal to xs,
What are the consequences of using least Conclusion (1 points). Since the test statistic is
x2s, possibly cross-products
squares when heteroskedasticity is present? not greater than the critical value, we cannot
NONE of a) no consequences, coefficient reject the null hypothesis of homoscedasticity.
estimates are still unbiased b) confidence There is no heteroskedasticity in the model.
intervals and hypothesis testing are inaccurate
Depending on your result from part (16), what
due to inflated standard errors c) all coefficient
changes should be made to your model?
estimates are biased for variables correlated
with the error term d) it requires very large Since the test in part (16) concludes that there
Use F test or with sample sizes to get efficient estimates is no hetero present, we don’t need to do
anything but can estimate the model as
Exam Qs
specified.
Suppose [equation] includes hetero – what
ln(WAGE) = β1 + β2EDUC + β3EDUC2 + β4EXPER
does this mean for [CI/hyp tests]
+ β5EXPER2 + β6HRSWK + e
For full marks, I expect an explanation of
d) Suppose you wish to test the hypothesis
heteroskedasticity, the consequences and why
that a year of education has the same effect on
the tests are unreliable.
ln(WAGE) as a year or experience. What null
E.g. Heteroskedasticity is a violation of the GM and alternative hypothesis would you set up?
assumption of constant error variance (5 marks)
(homoscedasticity). The variance of the error
NOTE: White/Breusch tests may give different Education and experience have the same effect
term under hetero is no longer constant. (2
results on ln(wage) if β2 = β4 and β3 = β5 The null and
points)
alternative hypotheses are: H0 : β2 = β4 and β3
Heteroskedasticity-consistent standard errors In the presence of hetero, standard errors will = β5 H1 : β2 ≠ β4 or β3 ≠ β5 or both
(Robust standard errors) be biased and test statistics therefore unreliable
e) What is the restricted model, assuming that
since they depend on the estimates of the
Valid in large samples for both hetero- and the null hypothesis is true? (5 marks)
standard errors. (3 points)
homoscedastic errors
The restricted model assuming the null
Write down a model that allows the variance
hypothesis is true is: ln(WAGE) = β1 + β4 (EDUC
of e to differ between men and women. The
+ EXPER) + β5 EDUC2 + EXPER2 ( ) + β6HRSWK +
variance should not depend on other factors
e
f) Given that the sum of squared errors from
the restricted model is SSER = 254.1726, test
the hypothesis in (d). (For SSEU use the
relevant value from the table of output above.
The sample size is N = 1000 )

F = (SSER − SSEU ) J SSEU (N − K) = (254.1726 −


222.6674) 2 222.6674 994 = 70.32 The 5%
critical value is F=3.005. Since the F statistic is
greater than the F critical value, we reject the
null hypothesis and conclude that education
and experience have different effects on
ln(WAGE).

You might also like