0% found this document useful (0 votes)
7 views14 pages

Chapter - 3.

The document outlines a lecture plan focused on regression, correlation, and causation, detailing the use of regression analysis, types of regression models, and the significance of variables in regression. It distinguishes between correlation and regression, emphasizing that while correlation indicates a relationship, regression can imply causation. Additionally, it discusses the Ordinary Least Squares (OLS) method as a primary technique for estimating parameters in regression analysis.

Uploaded by

Bantamkak Fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views14 pages

Chapter - 3.

The document outlines a lecture plan focused on regression, correlation, and causation, detailing the use of regression analysis, types of regression models, and the significance of variables in regression. It distinguishes between correlation and regression, emphasizing that while correlation indicates a relationship, regression can imply causation. Additionally, it discusses the Ordinary Least Squares (OLS) method as a primary technique for estimating parameters in regression analysis.

Uploaded by

Bantamkak Fikadu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

11/1/2024

Lecture Plan
Chapter three
1. Regression, Correlation & Causation
2. Use of regression analysis
A brief overview of the classical linear 3. Variables in regression models
regression model 4. Types of regression analysis
5. Best-fitting straight line –OLS
6. The assumptions underlying the classical linear
regression model
7. Quality of straight line fit
Sisay D. (PhD) 8. Exercises

1 2

1 2

Correlation versus Regression


1.Regression,Correlation & Causation

 Correlation analysis shows the existence of a


relationship between two variables X & Y, but not
#. What are the differences between correlation, they have cause-&-effect relationship.
causation & regression analysis ?  However, regression analysis may show cause-&-
effect relationship between X & Y;
➢ a change in the value of independent variable X affects
or influences (or effect or may cause) a corresponding
change in the value of dependent variable Y, holding
other factors constant

3 4

3 4

Correlation versus Regression… Correlation versus Regression…

 Regression analysis implies (but does not prove)  In correlation analysis both Y & X are independent
causality between dependent variable Y & variables, but
independent variable X.
 Regression is estimation or prediction of the
 However, correlation analysis implies no causality average value of a dependent variable on the basis
or dependency but refers simply to the type & of the fixed values of other variables.
degree of association between two variables.
 In regression analysis Y is dependent variable &
For example: X & Y may be highly correlated because of X is independent variable or there are many
another variable that strongly affects both. independent variables.

5 6

5 6

Dr Sisay Debebe 1
11/1/2024

Correlation versus Regression… Causation versus correlation

 The statistical analysis helps to formulate an  Causation comes from theory rather than
algebraic relationship between two or more variables statistics. Thus, regression does not necessarily
to estimate the values of associated parameters with imply causation.
the variables (alias independent variables), given the  Correlation measures the strength of linear
value of another variable (alias dependent variable) is association between variables.
known as regression analysis.
 In regression, we have stochastic dependent
 Regression model or equation is the mathematical variable & non-stochastic independent variable
equation that helps to predict or forecast the value (fixed) while in correlation, variables involved
of the dependent variable based on the known are stochastic.
independent variables.
7 8

7 8

Pearson correlation Pearson correlation…

Pearson correlation – it measures the degree of


 The degree of relationship is measured by
linear relationship between variables
the numerical value of the correlation.

 A value of 1.00 indicates a perfect Positive relationship


& -1.00 indicates a prefect negative relationship
 a value of zero indicates no relationship.

9 10

9 10

Pearson correlation …
2. Use of regression analysis

#. When do we apply regression analysis?


#. What are some of the considerations we should
make when choosing statistical analysis?

Examples of different values for linear correlations: (a) a perfect negative correlation, −1.00; (b) no linear
trend, 0.00; (c) a strong positive relationship, approximately +0.90; (d) a relatively weak negative correlation,
approximately −0.40.

11 12

11 12

Dr Sisay Debebe 2
11/1/2024

When do we apply regression analysis? When do we apply regression analysis?...

 To characterize the relationship (association or  To determine the best mathematical model (or
perhaps causality) between dependent & independent parsimonious model) for describing the relationship
variables by determining the extent, direction & between a dependent & one or more independent
strength of the relationship. variables.
 To obtain a quantitative formula or equation to
 To describe quantitatively or qualitatively the
describe the dependent variable as a function of the
relationship between different observed variables.
independent variables.
 To determine which of several independent variables
 To assess the interactive effects of two or more
are important & which are not for describing or
independent variables with regard to a dependent
predicting a dependent variable.
variable.
13 14

13 14

What are some of the considerations we should What are some of the considerations or
make when choosing statistical analysis? assumptions that lead to a choice of analysis?

 Some of the major considerations are  These are considerations made about the variables
(continuous or discrete) that lead to what type of
❖ The purpose of investigation,
models to apply to the data.
❖ The mathematical characteristics of the variable,
 How the data were collected (sampling method);
❖ The statistical assumptions made about the variables the use of sample characteristics to estimate
population characteristics such as mean, variance,
How the data were collected (sampling procedures).
covariance, proportion, etc. depends on the

sampling technique use.

15 16

15 16

What are the different types of regression


3. Types of regression analysis models?

 There are many types of regression models; but,


here we will deal only with some two types of
regression models.
#. What are the types of regression models?
1. Simple regression model

2. Multiple regression model

17 18

17 18

Dr Sisay Debebe 3
11/1/2024

I. Simple regression model I. Simple regression model...

 Simple regression model is a statistical equation that EXAMPLE:


characterizes the relationship between a dependent • Suppose that we have the following data on the excess returns on a
variable & only one independent variable. fund manager’s portfolio (“fund XXX”) together with the excess
returns on a market index:
Year, t Excess return Excess return on market index
= rXXX,t – rft = rmt - rft
1 17.8 13.7
2 39.0 23.2
3 12.8 6.9
4 24.2 16.8
5 17.2 12.3

• We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship
between x and y given the data that we have. The first stage would be
19 to form a scatter plot of the two variables. 20

19 20

I. Simple regression model... I. Simple regression model...

 Finding a Line of Best Fit


45 • We can use the general equation for a straight line,
y=a+bx
Excess return on fund XXX

40

to get the line that best “fits” the data.


35
30
25
20 • However, this equation (y=a+bx) is completely deterministic.
15
10
5 • Is this realistic? No. So what we do is to add a random
0 disturbance term, u into the equation.
yt =  + xt + ut
0 5 10 15 20 25
Excess return on market portfolio
where t = 1,2,3,4,5

21 22

21 22

II. Multiple regression model II. Multiple regression model...

 Multiple regression model is a mathematical model


 But, If k>1, i.e. more than one x-variable, we have the
that characterizes the relationship between a
multiple regression case.
dependent variable & two or more independent
variables.  Using specific mathematical expression,
 Generally,
Example: for k=4

 Using generic mathematical expression,


 Where Y= excess returns, X1 = market index, X2 = total net
 If k=1, that is, there is only one X-variable, we have the income, X3= initial investment, & X4 =
simple regression
23 24

23 24

Dr Sisay Debebe 4
11/1/2024

Explain the variables involved in a regression


model?
4.Variables in regression models
 These variables are observable, unobservable &
unknown parameters.
#. Explain the variables involved in a regression 1.Observable Variables
model? ✓ These are the variables in which their values are collected
from the field through questionnaires, interviews & other
#. What are the justifications for the inclusion of means of data collection mechanisms. Thus,
the disturbance term in a regression model?
Y = the ith value of the dependent variable.
i

❖ Yi is also called the dependent variable or the regressand


#. What is the significance of the stochastic term or or the explained variable
disturbance term? Xi = the ith value of the independent variable.
❖ Xi is also called the independent variable or the regressor
or the explanatory variable
25 26

25 26

Explain the variables involved in a regression Explain the variables involved in a regression
model? model?...

• Denote the dependent variable by y and the independent 2. Unobservable variables


variable(s) by x1, x2, ... , xk where there are k independent  These are the values that will be determined
variables.
from the observations & estimated values of the
• Some alternative names for the y and x variables:
data set.
y x  The εi is the random error term for the ith
dependent variable independent variables member of the population
regressand regressors  εi is also called the disturbance term or the
effect variable causal variables stochastic term
explained variable explanatory variable

27 28

27 28

Explain the variables involved in a regression What is the significance of the stochastic
model?... term?

3. Unknown Parameters (regression coefficients)  Regression model usually contains variables; a


▪ It is the values that will be estimated from the sample disturbance (stochastic) term; & parameters in a
data of dependent & independent variables. system of structural equations.
▪ The unknown parameters include –

 The stochastic disturbance term is a random


variable that typically is added to all equations of
the model other than identities or equilibrium
conditions.

29 30

29 30

Dr Sisay Debebe 5
11/1/2024

What are the justifications for the inclusion of What are the justifications for the inclusion of
the disturbance term in a regression model? the disturbance term in a regression model?...

(1) Effect of omitted variables from the model (2) The Randomness of human behaviour
Due to a variety of reasons some variables (or factors) which
 There is an unpredictable element of randomness in

might affect the dependent variable are omitted.


human behaviour that can be accounted by the error
EXAMPLE
term.
Qd = f(p) Gd = α+ βpi + ε a demand equation
❖ Factors like family size, tastes & preferences, price of other
commodities, income etc. are excluded & taken care by ε
 But, due to:
❖ Lack of knowledge on the factors that should be included
❖ Difficulty in measurement or lack of data (usually time series data)
❖ Some factors are random (unpredictable, e.g. flood), etc., & they
may not be included.
31 32

31 32

What are the justifications for the inclusion of What are the justifications for the inclusion of
the disturbance term in a regression model?... the disturbance term in a regression model?...

(3) Measurement Error


4. Imperfect specification of the mathematical
 Deviations of the points from the line may be due to
form of the model
errors of measurement of the variables, due to the
methods of collection, processing statistical  The equations may be mis-specified in that the
information, etc. particular functional forms chosen may be
 The variables included in the model may be measured incorrect.
inaccurately & the stochastic term is expected to ▪ We may have linearized a non-linear relationship or we
account for these errors. may have left out some equation out of the model
Many variables are simultaneously determined by a
 Problems that arise due to the methods of data ▪

system containing many equations


collection, processing statistical information, etc. can
be captured by the error term.
33 34

33 34

What are the justifications for the inclusion of


the disturbance term in a regression model?...
5.Best-fitting straight line

 Therefore, in order to take all these sources of


error into account, a random variable (i.e., error #. What are the main methods of determining the
term, random disturbance term or stochastic best fitting straight line to a data set?
term) is included in the regression equation.

35 36

35 36

Dr Sisay Debebe 6
11/1/2024

Method of fitting Straight Line Ordinary Least Squares (OLS) Method

 There are several methods, but the ones that are 


widely used are:
1. Ordinary Least Squares (OLS) Method
2. Minimum-Variance (MV) Method, &
3. Maximum Likelihood (ML) Method

Note: The main focus of this lecture is on Ordinary Least Squares (OLS) which is
one of the most widely used methods of estimating the parameters

37 38

37 38

Ordinary Least Squares (OLS) Method… Ordinary Least Squares (OLS) Method…

 

39 40

39 40

Ordinary Least Squares (OLS) Method… Ordinary Least Squares (OLS) Method…

Therefore, the normal equation is

41 42

41 42

Dr Sisay Debebe 7
11/1/2024

Ordinary Least Squares (OLS) Method… Ordinary Least Squares (OLS) Method…

 The solution for parameter estimates are 

 The estimators obtained thus are known as the


least square estimators with unbiased &
efficient properties.
43 44

43 44

Ordinary Least Squares (OLS) Method… Ordinary Least Squares (OLS) Method…

Properties of the Ordinary Least Square estimators


45 46

45 46

Ordinary Least Squares (OLS) Method… Properties of the OLS Estimator

• If assumptions 1. through 4. hold, then the estimators and


determined by OLS are known as Best Linear Unbiased
Estimators (BLUE).
What does the acronym stand for?

• “Estimator” - is an estimator of the true value of .


• “Linear” - is a linear estimator
• “Unbiased” - On average, the actual value of the and ’s
will be equal to the true values.
• “Best” - means that the OLS estimator has minimum
variance among the class of linear unbiased estimators. The
Gauss-Markov theorem proves that the OLS estimator is best.

47 48

47 48

Dr Sisay Debebe 8
11/1/2024

Properties of the OLS Estimator…


6. Assumptions about regression
• Consistent
The least squares estimators and are consistent. That is, the
estimates will converge to their true values as the sample size
increases to infinity. Need the assumptions E(xtut)=0 and Var(ut)=2 < #. What are the main statistical assumptions
 to prove this. Consistency implies that
 
lim Pr ˆ −    = 0   0
about a linear regression model or classical
• Unbiased T →
The least squares estimates of and are unbiased. That is E( )= linear regression model?
and E( )=
Thus on average the estimated value will be equal to the true values.
To prove this also requires the assumption that E(ut)=0. Unbiasedness
is a stronger condition than consistency.
• Efficiency
An estimator of parameter  is said to be efficient if it is unbiased
and no other unbiased estimator has a smaller variance. If the
estimator is efficient, we are minimising the probability that it is a
long way off from the true value of .
49 50

49 50

Assumptions about regression Assumptions about regression…

 The main assumptions about a Classical Linear 1. Existence


Regression Model (CLRM) are
 For any fixed value of the independent
◼ Existence,
variable X, the dependent variable, Y, is a
◼ Continuity,
random variable with a certain probability
◼ Independence,
distribution, having finite means & variance.
◼ Constant variance,
 A violation of this assumption may indicate
◼ Normal distribution,
that there is no relationship between the
◼ Zero-covariance &
variables involved.
◼ Linearity.
51 52

51 52

Assumptions about regression… Assumptions about regression…

2. Continuity 3. Independence
 The dependent variable is a continuous random  The observations or the explanatory variables,
variable, whereas values of the independent X’s, are statistically independent of each another.
variable are fixed values; they can take continuous
 Mathematically, it means the covariance between
or discrete values.
any two observations is zero.
 Cautions must be taken that if the dependent
 Meaning, the X observations are independent of
variable is not continuous, then other type of
each other, & denoted as Cov (Xi, Xj) = 0.
regression models such as Probit, Logit, Tobit,
etc. should be used accordingly.  However, it is not unusual for there to be some
association between independent variables
53 54

53 54

Dr Sisay Debebe 9
11/1/2024

Assumptions about regression… Assumptions about regression…

 A violation of independence assumption 


indicates that there is multicollinearity
problem among the explanatory variables,
 Which leads to a very high value of
coefficient of determination & inconsistent
parameter estimates.

55 56

55 56

Assumptions about regression… Assumptions about regression…

 A violation of this assumption, widely known as 5. Normal distribution


heteroskedasticity (non-constant variance) leads  For any fixed explanatory value, X, the response,
to a non-constant variance Y, has a normal distribution.
 Generally, the observations are normally
 Indicating that very high standard errors & distributed, if the observations are graphed by a
inconsistent sample estimates, which may lead to bell-shaped or normal curve appears with zero
a wider confidence interval. mean.
 A violation of this assumption occurs when there
are outliers in dataset, & leads to problems of
wider confidence intervals & wrong hypothesis
57
testing. 58

57 58

Assumptions about regression… Assumptions about regression…

6. Zero Covariance  A violation of this assumption occurs where


there is Mis-specification of model or when
 The error term associated with the estimated there are very high or low observed Y values
dependent variable is assumed to be
in the data set.
independent of each other about the regression
line.
 This perhaps leads to high mean error &
 Mathematically, it means there is a minimal hypothesis-testing problem, as well as, F-
correlation between the expected or estimated
value could be meaningless.
dependent variable & its associated error term

59 60

59 60

Dr Sisay Debebe 10
11/1/2024

Assumptions about regression… Assumptions about regression…

7.Non-autocorrelation  Correlatedness across error terms,


▪ Any pair of error terms should be independent of famously known as autocorrelation, occurs where
each other successive disturbance terms are associated
with each other.
 Mathematically, it means there is a minimal  This perhaps leads to high mean error &
correlation between any pair of error terms
 hypothesis-testing problem, as well as, F-value
 could be meaningless.

61 62

61 62

Assumptions about regression… Assumptions about regression…

8.Non-Endogeniety 9.Linearity
 The mean value of response variable ( Y) is a straight-
 Any of the independent variables should not be
line function of the independent variables, X’s.
correlated with any error term
 Mathematically, the relation between a dependent
variable & an independent variable is denoted y
 A departure from this assumption ( , known
as endogeniety problem, occurs where irrelevant  A violation of this assumption may indicate that there
variables or lagged dependent variable (s) are is a non-linear relationship between the response &
introduced as independent variable(s) in a model. explanatory variables.
 This leads to high standard error & inefficient  Thus, the linear regression model may not be
parameter estimates. 63
applicable or fitted to the data under consideration.
64

63 64

Quality of straight line fit


7.Quality of straight line fit

 What is known as the “Test of Goodness of


#. How do you test whether the fit Fit” method determines whether a regression
(or estimates) is good? Or model is valid or adequately fit the data under
investigation.
#. How do you test the validity of a model?
#.What qualifies a model to be adequately  This test alone does not determine the fitness,
representing the data? however it gives a good picture on overall
fitness of a model.

65 66

65 66

Dr Sisay Debebe 11
11/1/2024

Quality of straight line fit… Quality of straight line fit…

 In fact, the closer the observations are to the fitted  Hence, the OLS method estimates the parameters by
or estimated regression line, the higher the variation minimizing the ESS. Or, OLS estimates must have
in the dependent variable explained by the estimated minimized Mean Square Error (MSE).
regression equation.
 Now the total variation in the dependent variable, Y,  Now, the difference between Total Sum of Squares
is equal to the explained variation in the dependent (TSS) & Regression Sum of Squares (RSS) is Error
variable plus the residual variation. Sum of Squares (ESS), which is expressed as -
 Mathematically, it is formulated as

67 68

67 68

Quality of straight line fit… Quality of straight line fit…

 By virtue of dividing both sides by TSS &


 The coefficient of determination which is
decomposing the response & the explanatory
expressed as
variables, it is possible to calculate the Coefficient
of Determination, R2, as follows: 1-

 Measures as the proportion of the total variation in


Y that has been accounted for by regressing Y on
 Which indicates the proportion of the response or
the whole set of regressors/ explanatory variable.
dependent variable, y, which is explained by the
independent variables in the model.  It explained proportionally the amount of
dependent variable explained by all regressor in the
 The coefficient of determination ranges between
model.
0 & 1 inclusive.
69 70

69 70

Quality of straight line fit… Quality of straight line fit…

• Interpretation of R2 
• Suppose R2 = 90%, this means that the regression
line gives a good fit to the observed data since this
line explains 90% of the total variation of the Y
value around their mean. The remaining 10% of the
total variation in Y is unaccounted for by the
regression line & is attributed to the factors
included in the disturbance variable.

71 72

71 72

Dr Sisay Debebe 12
11/1/2024

Exercise I Exercise I…

 The Coca-Cola Company is attempting to develop & estimate a Regions


Sales/week (m)
Advertisement
Expenditure/week Y2 X2 XY
simple linear regression demand model for its soft drinks. (Y)
(X)
1 13 4 169 16 52
 The company’s chief statistician feels that the most 2 16 5 256 25 80
important variable affecting sales for now is advertisement. 3 18 6 324 36 108
4 18 7 324 49 126
 The statistician decides to collect data on the variables in a 5 26 8 676 64 208
sample of 10 company sales regions that are roughly equal in 6 22 9 484 81 198

terms of population. 7 28 9 784 81 252


8 26 10 676 100 260
 Data on the sales & advertising expenditures were obtained 9 32 11 1024 121 352
from the company’s marketing department. 10 28 12 784 144 336
Totals ∑Y=227 ∑X=81 ∑Y2=5501 ∑X2=717 ∑XY=1972

73 74

73 74

Exercise I… Ordinary Least Squares (OLS) Method…

− 
Mean of adverts = 8.1 = X Getting 0
−    
Mean of sales = 22.7 = Y 0 = Y  1 X
  
Using the formulae to get the parameters we get 1 first since 0 includes 1 = 22.7 − 2.189 (8.1)

  X tYt − Y  X t = 22.7 − 17.731
1 = 
 X t2 − X  X t Intercept = 4.969
1972 − 22.7 ( 81)
=
717 − 8.1( 81) Therefore the estimated regression line, is:
133.3 Yt = 4.969 + 2.189 X t
=
60.9
= 2.189

75 76

75 76

Exercise I… Ordinary Least Squares (OLS) Method…

Interpretation of the Results Making Prediction


• =4.969 – this is the intercept and it represents the  A regression equation can be used to make predictions
autonomous sales i.e. sales that do not dependent on concerning the value of Y, given any particular value of X.
advertisements. Whether the products are advertised or  This is done by substituting the particulars value of X, into
not, this level of sales (4.969 units) is attained. the sample regression equation,
• =2.189 – this is the slope & it represents the sales that  Suppose one is interested in estimating soft drinks sales
are influenced by advertisement on the products. for a region with advertisement expenditures equal to 10.5
(millions).
• It indicates that a unit increase (millions) in the advertising
expenditure increases sales by 2.189 (millions) units in a
given sales region.

77 78

77 78

Dr Sisay Debebe 13
11/1/2024

Exercise II Exercise II… Using STATA

 Use the above Exercise I datasets & run Source SS df MS Number of obs = 10

Simple Linear Regression Model using SPSS


F(1, 8) = 41.44
Model 291.771593 1 291.771593 Prob > F = 0.0002
or STATA Residual 56.3284072 8 7.0410509 R-squared = 0.8382

 Interpret your result


Adj R-squared = 0.8180
Total 348.1 9 38.6777778 Root MSE = 2.6535

sale Coef. Std. Err. t P>|t| [95% Conf. Interval]

ad 2.188834 .3400244 6.44 0.000 1.404736 2.972932


_cons 4.970443 2.879186 1.73 0.123 -1.668971 11.60986

79 80

79 80

Dr Sisay Debebe 14

You might also like