Cheat Sheet Compilation
Cheat Sheet Compilation
Old final questions - Atkins Final Spring 2015 - Watchter Final Spring 2014
1) If x and y are complements, an increase in the price of x will lead to a decrease in the quantity demanded of y. If x and y
are substitutes, an increase in the price of fx will lead to an increase in the quantity demanded of y. Look at the sign of
coefficients of relevant variables to make an assessment.
2) SSEk≥SSE
k+1 always
3) The large constant estimate in both regressions suggests that there is no implication regarding the impact of the
other variables.
4) Which of the following statements is true regarding var(B^ 2) when estimated by IV using z as an instrument for x?
Instrumental variable estimation leads to larger variance of estimates compared to OLS.
5) What is the null hypothesis when performing an F-test to test the strength of multiple instruments j = 1,...,J? The
instruments are weak with all coefficients θj on the instruments different from 0.
6) In certain country, most people are poor but some people are very rich. Let X to be the wealth of an individual randomly
chose from the population of that country. Then, the mean of x is greater than the median of X. In addition, the
skewness of X is positive.
7) When you are implementing an instrumental variable regression, you are worried about
a) a potential direct effect of the IV on outcome
m
er as
b) a weak relationship b/n IV and endogenous variable
c) a remaining correlation of IV and error term
co
8) The interpretation of slope coefficient βk in the model is a change in xki by one unit is associated with a βk change in
eH w
Y, holding all the K-1 regressors constant.
o.
9) A high R2 or adjusted-R2 does not always mean that an added variable is statistically significant.
rs e
10) Omitting a variable which is relevant can result in a negative value for the coefficient of the included variable, even
ou urc
though the coefficient will have a significant positive effect on Y if the omitted variable were included.
11) As sample size N increases, the length of 100(1 - α) confidence interval decreases.
12) One will reject H0: βk = 0 against H1: βk > 0 at significant level α the 100(1 - α) cannot help in testing this
hypothesis.
o
13) SST is measured differently for y = β0 + β1x 1 + e and ln(y) = β0 + β1x 1 + e.
aC s
14) To test whether or not the population regression is linear rather than a polynomial of order r, use the test of (r-1)
vi y re
to test that the effect of xi on y is identical for both values of Di you must use a t-test for H0: β3 = 0.
ar stu
17) When an exogenous variable is used, IV estimators are consistent and approximately normally distributed in large
samples.
18) What is the null hypothesis when performing an F-test to test the strength of multiple instruments j = 1,..., J? T he
instruments are weak, with no coefficient θj on the instruments in the first stage different from 0.
is
19) TestScorei = 607.3 + 3.85Incomei - 0.0423Incomei2 . The equation suggests a positive relationship b/n test scores
and income until a value of the income of approximately 45.508.
Th
20) ln(WAGE) = 1.439 + 0.0834EDU + 0.0512EXPER + 0.1932WHITE. White employees earn 19.32% more than
non-whites.
The log-linear regression equation can be interpreted such that a one unit increase in an xk variable (independent
sh
Buchinsky 2011
● model that uses income to predict monthly expenditures on transportation -- explanatory variable: income
● NOT assumption of SLR Model - parameter estimate of B2 is unbiased.
● OLS model, N increases, variance b2 decreases
● MR model: which does NOT lead to larger variances of least square estimator b2 and var(b2) - larger correlation
between x2 and y
● degrees of freedom in denominator of F-distribution - the number of observations minus number of coefficients
estimated (N-K)
https://www.coursehero.com/file/12911867/CheatSheetCompilation/
1
Downloaded by mit daz ([email protected])
lOMoARcPSD|44031930
● how does omitting relevant variable from regression model affect estimated coefficient of other variables - biased, can
be positive or negative
● when collinear variables are included in an econometric model coefficient estimates are unbiased but have larger
standard errors
● Running auxiliary regressions where each explanatory variable is estimated as a function of the remaining explanatory
variables can help detect collinearity
● adjusted R^2 is better measure than R^2 because it adjust R^2 for the number of variables in the regression
● log log regression - corrx2, x3 = 0; such model cannot be estimated by OLS
● model with B3>0, test B2/B3 = 1-- then you can use t-test (or F-test) considering the null H0: B2-B3=0.
● Gauss Markov does NOT depend on assumption: values of e are normally distributed
● any given linear model, let b1 be the OLS estimator of B1. b1 is a random variable, whereas B1 not.
● country w mostly poor and some very rich. mean of X is greater than median. Skewness is POSITIVE
● if male and female dummy variables in a regression: regression cannot be estimated due to perfect collinearity
● police regression: in order to measure this impact we need to use an instrumental variable
● regression of one more year of education on the wages for blacks: effect given by B2 + B4.
● restricted model has smaller R^2 compared to the original model
● randomized, controlled experiments are needed to accurately measure treatment effects without omitted-variable bias
m
Atkins - Practice Final Spring 2015
er as
● The interpretation of the slope coeff Bk is - a change in xki by one unit is associated with a Bk change in Y, holding all
co
other K-1 regressors constant
eH w
● R2 is a valid measure of goodness of fit of the regression if - the regression has a constant term
False - a high R2 or Rbar2 always means that an added variable is statistically significant
o.
●
●
rs e
If you had a two regressor regression model, then omitting one variable which is relevant → can result in a neg value for
the coefficient of the included variable, even though the coefficient will have a significant positive effect on Y if the
ou urc
omitted variable were included
● When collinear variables are included in an econometric model, coefficient estimates are → unbiased but they have
larger standard errors
o
● If one rejects the null hypoth H0 = Bk = 0 against H1 = Bx≄ 0 at the significant level alpha then → it cannot be determined
aC s
whether he/she will reject it for H0 = Bk = 0 against H1 = Bk > 0
vi y re
● To decide whether (linear reg) or (log regression) fits the data better, you cannot consult the regression R2 because →
the SST are not measured in the same units between the two models
● To test whether or not the population regression function is linear rather than a polynomial of order r → use the test of
(r-1) restrictions using the F-statistic
ed d
● The binary variable interaction regression → allows the effect of changing one of the binary independent variables to
ar stu
are weak with no coeff theta on the instruments in the first stage different from 0
● when you are implementing an instrumental variable regression, you are worried about → all of the following = a
potential direct effect of the instrumental variable on the outcome; a weak relationship between the instrumental
variable and the endogenous variable; a remaining correlation of the instrumental variable and the error term
sh
https://www.coursehero.com/file/12911867/CheatSheetCompilation/
2
Downloaded by mit daz ([email protected])
lOMoARcPSD|44031930
● Running aux regressions where each explanatory variable is estimated as a function of the remaining explanatory
variables can help detect → collinearity
● The adjusted R2 Rbar2 is a better measure than R2 because → it adjusts R2 for the number of variables in the regression
● In a simple regression model, the Gauss-Markov Th does not depend on → the values of e are normally distributed
● Which cannot cause the least square estimator to be biased → heteroskedastic random error variance^2 is not
constant across different observations
● In a multiple regression, if we change the units of measurement of y by multiplying it by some constant, estimator of B0,
B1, B2 and standard errors of B0, B1, B2 would change → R2 would not change
● With a log model → if we increase 1% of xi, yi increases B1 percent
● With a full model and a restricted model → R2 of the full model is always greater than that of the restricted model
● Under heteroskedasticity, the OLS estimators → are still linear but no longer attain the minimum value (still
unbiased)
m
be biased if xK is not correlated with x2...x
er as
K-1
● One will reject the null hypothesis H0 = Bl = 0 against H1 : Bk > 0 at the significance level a → the 100(1-a) confidence
co
interval cannot help in testing this hypothesis
eH w
● Suppose that Var(ei) = o-^2. Then the Gauss-Markov th → does not apply to the least squares estimator B1... BK
If the x’s in the regression x1….xk are uncorrelated, then → we cannot determine the sign for Cov(bk, b1) for all k ≠
o.
●
l, k,l = 1...K
rs e
In a given data set the larger the R2 → the lower is SSE
ou urc
●
● When collinear variables are included in an econometric model, coefficient estimates are → unbiased but they have
larger standard errors
● Choosing the right functional form for the regression is important b/c → it allows one to capture the specific feature
o
Hypothesis Testing
● In one rejects Ho: Bk = 0 for all k = 2,...,K against H1: Bk ≄ 0, for some at least k, at the significance level α1, then he will
reject it for all α2 > α1
● If we were to reject the H0: Bk = 0 against H1: Bk ≄ 0 for some k, k = 2,...,k then we are likely to reject H0: B2 = … = Bk =
ed d
● Consider the case in which K = 3, let x2i = 1 if the person is a male and zeros otherwise, and x3i = 1 if the person lives in
the LA area and zero otherwise, while yi is the individual’s earnings. The average earnings for a female living in LA is B1 +
B3
● If one rejects H0: Bk = 0 against H1: Bk ≄ 0 at significance level α, then it cannot be determined whether he will reject
is
Assumptions
Assumptions of Simple Linear Regression Model
SR1: y = β1 + β2x 2 + e
sh
MR5: each xki is nonrandom, and is not exact linear functions of other explanatory variables
MR6: (optional) ei is normally distributed
Math-Based Questions
m
er as
○ SE = sq rt of variance
● To find the point at which a differentiable function changes slope, take the first derivative and set it equal to 0
co
eH w
○ dTestscore/dIncome = a + bincome = 0
o.
● T-stat = λ - c / se (predicted value λ)
rs e
○ se (predicted value λ) is always positive so the sign of predicted value λ determines the sign of
ou urc
the t-stat
○ se (predicted λ) = sq variance (predicted λ)
○ from there, make se (predicted λ) the denominator and solve for t-stat formula
o
○ 1 - R2^2 = SSE2/SST
○ R2^2 - R1^2/1 - R2^2 = SSE1 - SSE2 / SSE2
● SST = SSE + SSR
● SSE = SST - SSR
ed d
Labs
10/2/15: Regression Analysis in Employment Litigation – Elaine Reardon
is
● Resolution Economics
Th
○ If a1 is negative then we would not observe discrimination based on the data
● Pay gap is statistically significantly lower for Hispanics
● Too many variables that aren’t justifiable?
● Analyzes complex data for the purpose of assisting counsel in evaluating class certification and merits issues in
employment matters. She has analyzed claims alleging age, race, and gender discrimination in hiring, termination,
and pay equity. She also has considerable experience in wage and hour consulting and litigation, utilizing various
data sources such as surveys, observation studies, and administrative data to assess class certification issues
regarding uncompensated time, missed meals, and exempt/non-exempt status. She has significant experience in
designing, implementing and analyzing scientific surveys, including drawing statistical samples and making
inferences from the results. In connection with her litigation work, Dr. Reardon has served in an expert capacity a
number of times.
https://www.coursehero.com/file/12911867/CheatSheetCompilation/
4
Downloaded by mit daz ([email protected])
lOMoARcPSD|44031930
m
● Customer churn
er as
● Targeting potential customers
co
○ Segmentation
eH w
○ Factor analysis to search for redundancies
● Garbage in garbage out (GIGO)
o.
● Turf analysis
rs e
ou urc
Corn Production
● The coefficient B2: a unit increase in capital leads to a 265% increase in the production of corn
● According to the above result, we reject H0: B3 - B4 = 0 against H1: B3 - B4 ≄ 0: at 1% and 5%
o
● According to the above result, we reject H0: B4 = B5 = 0 against H1: B4 ≄ 0 or B5 ≄ 0: there is not enough information
aC s
● Considering the above output results, the consequence of heteroskedasticity in e is: all of the above
vi y re
●
original model.
ar stu
● Consider the original model. Suppose that we think that the wage has a higher variability among immigrants than
among natives. Then: the OLS estimator would still be unbiased . The estimated intervals would not be valid.
● We think that education has a positive effect in wages, but immigrants from poor countries are on average less
is
educated than natives. We conclude that: the OLS estimator of im_poor has a negative bias. We expect B2 to be
greater than -0.1009196
Th
● if we had measured the wages in cents instead of dollars: the constant would be larger, the coefficient of im_rich
would be the same.
Mroz Data
sh
https://www.coursehero.com/file/12911867/CheatSheetCompilation/
5
Downloaded by mit daz ([email protected])
lOMoARcPSD|44031930
● Section 4 results: total effect of husband’s and wife’s experience is insignificant from zeros at the a =.01
significance level
● 95% CI: (25,618, 41,539)
● having 3 children under 6 years old: no significant statistical effect on family income.
m
the variance of meat purchases
er as
● The fact that coefficients on prices in model 2 barely change as income is included as regressor implies: that there
co
is a low correlation of individual income and these prices
eH w
● The elasticity of meat purchases with respect to a change in price in meat: cannot be calculated without further
information
o.
● The 95% confidence interval for the effect of price of meat on meat purchases: is approximately [-1.1629, -0.1885]
rs e
ou urc
Boca Raton Output
● The results indicate that a traditional house would cost approx → $18k
● The correlation between the coefficients on the number of bedroom and the number of baths = -0.01835
o
● The results indicate that if one test the hypothesis H0:B2 =...Bk = 0 against H1: at least one Bk ≠0, k = 2...K one
→ will reject the null hypoth a = 0.01
aC s
Holding all variables constant, the result indicate that, on avg, having a pool → does not have an effect on the
vi y re
●
house price
● The 95% CI for the coefficient on SQFT is approx → [74.46,89.20]
● The houses have waterfront → 7.2% of the houses have waterfront
The results indicate that the effect having a traditional house with a pool is → negative
ed d
●
ar stu
● The results indicate that if one test the null hypoth H0 = B3 = B5 = B6 = 0 against H1: at least one Bk ≠ 0, k =
3,5,6 → one will reject the null hypothesis for a = 0.1
Th
● Consider the rejected model with the form WAGE = B1 + B2EDU → the restricted model will have a smaller
R^2 compared to the original model
sh
https://www.coursehero.com/file/12911867/CheatSheetCompilation/
6
Downloaded by mit daz ([email protected])
Powered by TCPDF (www.tcpdf.org)