0% found this document useful (0 votes)
5 views24 pages

Stat Review Continued

Uploaded by

James Harden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views24 pages

Stat Review Continued

Uploaded by

James Harden
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

Statistics Review, Lecture 2

Readings and Resources


Read If You Need Extra Background

BH Ch. 2-4, 9, 10, 15, 16, and 21.3

1
Probability
• Random Variable
– Has some probability (pj) of taking on a
specific numeric value (xj) each time it is drawn
– Has a “support” or possible numbers it can take
on.
– Examples: flipping a coin, ages in our
classroom, level in school
• Notation:
P( X = x j ) = p j
2
Probability Continued
• P(a<X<b)
– Draw what it looks like for a normal curve with
arbitrary points a and b
• Cumulative Distribution Function
– P(X<=x)=F(x)
– Bounded between 0 and 1
– Draw it for Binomial, Uniform, and Normal
distributions
3
More Stats

• Joint Distributions
– fX,Y(x,y)=P(X=x,Y=y)
• Examples: Return and event announcement
(merger).
• What does it mean for distributions to be
independent?
– fX,Y(x,y)=fX(x) fY(y)
• Example:
– Big negative return on the S&P 500, the price of green tea in
China.

4
More Stats

• Conditional Distribution Functions:


– fX|Y(x|y)=P(X=x|Y=y)
– The probability of X given Y
– These do not have to be predictive (stuff can
happen at the same time).
• Example:
– Probability of a stock market crash given it is October.
– Mistaken conditional relationship: Smooth landing of
fighter pilot and praise/punishment after last landing.
– Regression to the mean. 5
Example for Conditional Distributions -
International Soccer results and the Stock
Market, Edmans et al. 2007
Stock Your Team Your Team Marginal
Return Loses Wins Prob.
Bad Return .30 .02 .32

Average .15 .18 .33


Good .05 .30 .35
Return
Marginal .50 .50 100
Prob.
6
Example for Conditional
Distributions
• What is the joint distribution?
• What is the marginal distribution?
• What is the probability that the UK market
drops when a UK team wins an
international soccer game?

7
Covariance
• Cov(X,Y)=E[(X- μx)(Y- μy)]
• If X and Y are independent, then
Cov(X,Y)=0
• Var[aX+bY]=
a 2 Var[X]+b2 Var[Y]+2abCov(X,Y)
• What is Var(X-Y) if X and Y are
independent?
8
Correlation Coefficient ρ
cov( X , Y )
=
(var( X ) var(Y )).5

Regression Coefficient β

cov( X , Y )
=
var( X ) 9
Correlation versus Regression
• Correlation between random variables
assigns no causation.
• We may investigate the correlation of
sunspots and stock market crashes, but that
does not mean we are determining causation
- i.e. that sunspots cause stock market
crashes, or vice versa.

10
Correlation and Regression
Continued
• Causation, or dependency has us positing a
model like,
– given the value of one variable, X, we expect
another variable, Y, to take on a particular
value, i.e. that X causes Y.
• We estimate these relationships with
techniques like least squares.
• R2 is a typical measure of goodness-of-fit.

11
Dummy Variables in Regressions

• Ordinal variables, usually 0/1 indicating


an event.
• There are two kinds of dummy variables
we will consider - intercept and slope
dummies.

Yt = α + βXt + αMonday Mondayt + β Monday Xt *Mondayt +εt


– Where Mondayt is a variable equal to 1 on the first day
of the week.
12
13
14
Dummy Variables in Regressions
• Ordinal variables, usually 0/1 indicating
an event.
• Intercept dummy.
PageViewst = α + αcat Catt + β ClipLength ClipLengtht +εt
– Catt is a variable equal to 1 if there is a cat in the video.
– ClipLengtht is the length of the video.
– I expect a positive coefficient for αcat and a negative
coefficient for βClipLength. Longer clips, fewer
pageviews, but if there is a cat, more pageviews.
15
Dummy Variables in Regressions
• Slope dummy.
PageViewst = α +αcat Catt + β ClipLength ClipLengtht
+ β Interaction Catt *ClipLengtht +εt
• This captures an interaction effect between having
a cat in the video and the length of the video.
• I expect a positive coefficient for β Interaction. Longer
clips get fewer pageviews, but if there is a cat in
the long video, it gets more pageviews than a clip
the same length but with no cat.
16
Dummy Variables in Regressions
• Another example, a CAPM model with a
dummy for Mondays (allows return to be
different for Mondays) and an interaction
term between Monday and the market
return.
Returnt = α + βMktReturnt + αMonday Mondayt
+ β Monday Xt *Mondayt +εt
– Where Mondayt is a variable equal to 1 on the first day
of the week.

17
Dummy Variables in Regressions
Continued
• An intercept dummy allows the constant to
be different in the regression depending on
whether the dummy is 1 or 0.
• For instance we may expect the mean return
to be lower on Mondays.

18
Slope Dummy Variables
• We may also expect that the effect of
explanatory variables on the dependent
variable will change with other conditions,
and for this we need a slope dummy which
allows the slope parameter to be different
depending on the condition.
• For instance, the impact of mood on a
Monday may be exaggerated by a soccer
defeat. 19
Dummy Variables Continued
• Dummy variables may be used to test for a
change in the intercept or the slope
parameters (test the Monday effect).
• We can only include dummy variables for
less categories than exist in the data (or
exclude another term).
– For instance, dummies may be used to model
quarterly seasonality, but we can’t include a
dummy for each quarter as well as an intercept
term. 20
Multicollinearity
• One or more regressors are nearly linear
combinations of the other regressors.
• Symptoms:
– High R2and F-statistic for the significance of a
group of regressors jointly, but all individual
variables in the group have low t-statistics.
• Consequences:
– Estimates of coefficients become imprecise,
sensitive to sample window.
• Solutions? More data, simpler questions. 21
Specification Error
• Basically, use of the wrong model:
– Faulty inclusion or exclusion of variables.
– Mis-measured variables.
– Incorrect form of model.
• This can be pretty serious.
– Faulty omission can lead to invalid inference
and biased estimates.

22
Other Complications
• Heteroskedasticity:
– Can lead to invalid inference.

• Autocorrelation
• Can lead to invalid inference and biased
estimation.

23
S&P 100 Index and Volatility

Heteroskedasticity:
Residuals predictably
large in magnitude

Autocorrelation:
Residuals
predictably
negative.

24

You might also like