Stat Review Continued
Stat Review Continued
1
Probability
• Random Variable
– Has some probability (pj) of taking on a
specific numeric value (xj) each time it is drawn
– Has a “support” or possible numbers it can take
on.
– Examples: flipping a coin, ages in our
classroom, level in school
• Notation:
P( X = x j ) = p j
2
Probability Continued
• P(a<X<b)
– Draw what it looks like for a normal curve with
arbitrary points a and b
• Cumulative Distribution Function
– P(X<=x)=F(x)
– Bounded between 0 and 1
– Draw it for Binomial, Uniform, and Normal
distributions
3
More Stats
• Joint Distributions
– fX,Y(x,y)=P(X=x,Y=y)
• Examples: Return and event announcement
(merger).
• What does it mean for distributions to be
independent?
– fX,Y(x,y)=fX(x) fY(y)
• Example:
– Big negative return on the S&P 500, the price of green tea in
China.
4
More Stats
7
Covariance
• Cov(X,Y)=E[(X- μx)(Y- μy)]
• If X and Y are independent, then
Cov(X,Y)=0
• Var[aX+bY]=
a 2 Var[X]+b2 Var[Y]+2abCov(X,Y)
• What is Var(X-Y) if X and Y are
independent?
8
Correlation Coefficient ρ
cov( X , Y )
=
(var( X ) var(Y )).5
Regression Coefficient β
cov( X , Y )
=
var( X ) 9
Correlation versus Regression
• Correlation between random variables
assigns no causation.
• We may investigate the correlation of
sunspots and stock market crashes, but that
does not mean we are determining causation
- i.e. that sunspots cause stock market
crashes, or vice versa.
10
Correlation and Regression
Continued
• Causation, or dependency has us positing a
model like,
– given the value of one variable, X, we expect
another variable, Y, to take on a particular
value, i.e. that X causes Y.
• We estimate these relationships with
techniques like least squares.
• R2 is a typical measure of goodness-of-fit.
11
Dummy Variables in Regressions
17
Dummy Variables in Regressions
Continued
• An intercept dummy allows the constant to
be different in the regression depending on
whether the dummy is 1 or 0.
• For instance we may expect the mean return
to be lower on Mondays.
18
Slope Dummy Variables
• We may also expect that the effect of
explanatory variables on the dependent
variable will change with other conditions,
and for this we need a slope dummy which
allows the slope parameter to be different
depending on the condition.
• For instance, the impact of mood on a
Monday may be exaggerated by a soccer
defeat. 19
Dummy Variables Continued
• Dummy variables may be used to test for a
change in the intercept or the slope
parameters (test the Monday effect).
• We can only include dummy variables for
less categories than exist in the data (or
exclude another term).
– For instance, dummies may be used to model
quarterly seasonality, but we can’t include a
dummy for each quarter as well as an intercept
term. 20
Multicollinearity
• One or more regressors are nearly linear
combinations of the other regressors.
• Symptoms:
– High R2and F-statistic for the significance of a
group of regressors jointly, but all individual
variables in the group have low t-statistics.
• Consequences:
– Estimates of coefficients become imprecise,
sensitive to sample window.
• Solutions? More data, simpler questions. 21
Specification Error
• Basically, use of the wrong model:
– Faulty inclusion or exclusion of variables.
– Mis-measured variables.
– Incorrect form of model.
• This can be pretty serious.
– Faulty omission can lead to invalid inference
and biased estimates.
22
Other Complications
• Heteroskedasticity:
– Can lead to invalid inference.
• Autocorrelation
• Can lead to invalid inference and biased
estimation.
23
S&P 100 Index and Volatility
Heteroskedasticity:
Residuals predictably
large in magnitude
Autocorrelation:
Residuals
predictably
negative.
24