0% found this document useful (0 votes)
25 views29 pages

Statistics Overview Part II

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views29 pages

Statistics Overview Part II

Uploaded by

lcseguraf
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

Statistics Overview

Part II
Outline
• Covariance
• Correlation
• Simple Linear Regression Model
Measures Of The Relationship Between Two Numerical Variables

• Scatter plots allow you to visually examine the relationship between


two numerical variables and now we will discuss two quantitative
measures of such relationships.

• The Covariance
• The Coefficient of Correlation
The Covariance
• The covariance measures the strength of the linear relationship between two numerical
variables (X & Y)

• The sample covariance:


n

 ( X  X)( Y  Y )
i i
cov ( X , Y )  i1
n 1

• Only concerned with the strength of the relationship


• No causal effect is implied
Interpreting Covariance

• Covariance between two variables:


cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent

• The covariance has a major flaw:


• It is not possible to determine the relative strength of the
relationship from the size of the covariance (only tell
direction)
Coefficient of Correlation
• Measures the relative strength of the linear
relationship between two numerical variables
• Sample coefficient of correlation:

cov (X , Y)
r
SX SY

where
n n n
 (X  X)(Y  Y)
i i  (X  X)
i
2
 i
(Y  Y ) 2

cov (X , Y)  i1 SX  i1


SY  i1
n 1 n 1 n 1
Features of the Coefficient of Correlation
• The population coefficient of correlation is referred as ρ.
• The sample coefficient of correlation is referred to as r.
• Either ρ or r have the following features:
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
Scatter Plots of Sample Data with
Various Coefficients of Correlation
Y Y

X X
r = -1 r = -.6
Y
Y Y X
relationshi
p

X X X
r = +1 r = +.3 r=0
Introduction to Regression Analysis
• Regression analysis is used to:
• Predict the value of a dependent variable based on the value of at least one
independent variable
• Explain the impact of changes in an independent variable on the dependent
variable
Dependent variable: the variable we wish to predict or explain(Y)
Independent variable: the variable used to predict
or explain the dependent variable(X)
Simple Linear Regression Model

• Only one independent variable, X


• Relationship between X and Y is described
by a linear function
• Changes in Y are assumed to be related to
changes in X
Types of Relationships

Linear relationships Nonlinear relationships

Y Y Quadrati
c/
Paraboli
c
X X

Y Y

Exponenti
X al X
Types of Relationships (continued)
Strong relationships Weak relationships

Y Y

X X

Y Y

X X
Types of Relationships (continued)

No relationship

X
Simple Linear Regression Model

Population Random
Population Independent Error
Slope
Y intercept Variable term
Coefficient
Dependent
Variable

Yi β0  β1Xi  ε i
Linear component Random Error
component
Simple Linear Regression Model
(continued)

Y Yi β0  β1Xi  ε i
Observed Value
of Y for Xi

εi Slope = β1
Predicted Value Random Error
of Y for Xi
for this Xi value

Intercept = β0

Xi X
Simple Linear Regression Equation (Prediction Line)

The simple linear regression equation provides an


estimate of the population regression line

Estimated
(or predicted) Estimate of Estimate of the
Y value for the regression regression slope
observation i
intercept
Value of X for

Ŷi b0  b1Xi


observation i
Interpretation of the Slope and the Intercept

• b0 is the estimated average value of Y


when the value of X is zero

• b1 is the estimated change in the average


value of Y as a result of a one-unit increase
in X
The Least Squares Method
b0 and b1 are obtained by finding the values of that
minimize the sum of the squared differences between
Y and Ŷ

min  (Yi  Ŷi ) min  (Yi  (b 0  b1Xi ))


2 2
The Least Squares Estimates
SSXY
Slope b1 
SSX
Intercept b0 Y  b1 X
where
n n
SSXY  ( X i  X )(Yi  Y )  X iYi  n XY
i 1 i 1
n n
SSX  ( X i  X )  X i  n X
2 2 2

i 1 i 1
Inferences About the Slope

• The standard error of the regression slope coefficient (b 1) is estimated


by

S YX S YX
Sb1  
SSX  (X i  X) 2

where:
Sb1 = Estimate of the standard error of the slope

SSE
S YX  = Standard error of the estimate
n 2
Chap 13-20
Inferences About the Slope: t Test
• t test for a population slope
• Is there a linear relationship between X and Y?
• Null and alternative hypotheses
• H0: β1 = 0 (no linear relationship)
• H1: β1 ≠ 0 (linear relationship does exist)
• Test statistic where:
b1  β 1 b1 = regression slope
t STAT  coefficient
Sb β1 = hypothesized slope
1
Sb1 = standard
d.f. n  2 error of the slope
Chap 13-21
Measures of Variation

• Total variation is made up of two parts:

SST  SSR  SSE


Total Sum of Regression Sum Error Sum of
Squares of Squares Squares

SST  ( Yi  Y )2 SSR  ( Ŷi  Y )2 SSE  ( Yi  Ŷi )2


where:
Y = Mean value of the dependent variable
Yi = Observed value of the dependent variable
Yˆi = Predicted value of Y for the given X value
i Chap 13-22
Measures of Variation (continued)

• SST = total sum of squares (Total Variation)


• Measures the variation of the Yi values around their mean Y
• SSR = regression sum of squares (Explained Variation)
• Variation attributable to the relationship between X and Y
• SSE = error sum of squares (Unexplained Variation)
• Variation in Y attributable to factors other than X
Measures of Variation
(continued)

Y
Yi  
SSE = (Yi - Yi )2 Y
_
SST = (Yi - Y)2

Y  _
_ SSR = (Yi - Y)2 _
Y Y

Xi X
Coefficient of Determination, r 2

• The coefficient of determination is the portion of


the total variation in the dependent variable that is
explained by variation in the independent variable
• The coefficient of determination is also called r-
squared and is denoted as r2
SSR regression sum of squares 2
2
r   0 r 1
SST total sum of squares

• r2 is also the sample correlation


coefficient
Examples of Approximate r2 Values

Y
r2 = 1

Perfect linear relationship


between X and Y:
X
r2 = 1
Y 100% of the variation in Y is
explained by variation in X

X
r =1
2
Examples of Approximate r2 Values

Y
0 < r2 < 1

Weaker linear relationships


between X and Y:
X
Some but not all of the
Y
variation in Y is explained
by variation in X

X
Examples of Approximate r2 Values

r2 = 0
Y
No linear relationship
between X and Y:

The value of Y does not


X depend on X. (None of the
r2 = 0
variation in Y is explained
by variation in X)
Empirical Time
• Collect data from investing.com
• Monthly Return of Bitcoin from Jan 2018 to Oct 2024
• Monthly Return of stock market , proxied by S&P500, from Jan 2018 to Oct
2024
• Can you find the covariance and correlation between the monthly return of
Bitcoin and monthly return of stock market
• Can you use regress Bitcoin monthly return on S&P500 monthly return?
• Is the Beta estimates significant ? If yes, how to interpret it ?
• You can use excel or any software (R, Python, SAS or Stat whatever you
like )

You might also like