0% found this document useful (0 votes)

27 views52 pages

Chapter 3 - Classical Simple Linear Regression

Lecture notes on linear regression

Uploaded by

SolomonSakala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views52 pages

Chapter 3 - Classical Simple Linear Regression

Lecture notes on linear regression

Uploaded by

SolomonSakala

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 52

Simple Linear Regression

 Regression is probably the single most important

tool at the econometrician’s disposal.
 Regression Analysis is concerned with the study
of the dependence of one variable (The
Dependent Variable), on one or more other
variable(s) (The Explanatory Variable), with a
view to estimating and/or predicting the
(population) mean or average value of the
former in term of the known or fixed (in
repeated sampling) values of the latter.
Classical Linear Regression Model

 In econometrics, there is need to examine the

relationship between two or more financial
variables.
 The relationship between variables can be
explored by:
a. Correlation Analysis
b. Building a linear regression model
Correlation Analysis
 The correlation between two variables
measures the degree of linear association
between them.
 A group of statistical techniques used to
measure the strength of the relationship
(correlations) between two variables.
 Once a linear relationship is established,
knowledge of the independent variable/s can
be used to forecast the dependent variable.
THE SIMPLE REGRESSION MODEL

 Regression analysis is concerned with

describing and evaluating the relationship
between a given variable (explained or
dependent variable) and one or more other
variables (explanatory or independent
variables).
 In statistical modelling, regression analysis is a
statistical process for estimating the
relationships among variables.
 Explained variable is denoted by y and
explanatory variable by x.
 Regression is an attempt to explain the
variation in a dependent variable using the
variation in independent variables.
 Regression is thus an explanation of
causation.
 If the independent variable(s) sufficiently
explain the variation in the dependent
variable, the model can be used for
prediction.
Types of
Regression Models

1 Explanatory Regression 2+ Explanatory

Variable Models Variables

Simple Multiple

Non- Non-
Linear Linear
Linear Linear

EPI 809/Spring 2008 7

𝒀=𝒂+𝒃𝑿 + 
Where:
 Y is the dependent variable
 X is the independent variable (the variable
that drives the dependent variable, i.e. level of
activity).
 a is the intercept of the trend-line on the Y axis
(i.e. the fixed component or starting point)
 b is the gradient of the trend-line
 Y - yield
 X - fertilizer
 The agricultural researcher is interested in the
effect of fertilizer on yield, holding other
factors fixed.
 The error term u contains factors such as land
quality, rainfall, and so on.
2-5. The Significance of the Stochastic
Disturbance Term

Why not include as many as variable into the

model (or the reasons for using ui)
+ Vagueness of theory
+ Unavailability of Data
+ Core Variables vs. Peripheral Variables
+ Intrinsic randomness in human behavior
+ Poor proxy variables
+ Principle of parsimony
+ Wrong functional form
 To review the analysis of the relationships
between two variables, consider the following
example.
 The values of a and b can be calculated using
the following formulae:
Example
Question
 Calculate correlation coefficient(r) for Tom’s T
– shirts using the following formula.

 xy   x y
r n

 x 
2 (  x) 2
 
.  y 
2 (  y) 2


 n  n 
  
Example
 Consider the following example
Weight Age
Serial
Y2 X2 xy (Kg) (years)
.n

144 49 84 12 7 1

64 36 48 8 6 2

144 64 96 12 8 3

100 25 50 10 5 4

121 36 66 11 6 5

169 81 117 13 9 6

=y2∑ =x2∑ xy=∑ =y ∑ =x ∑ Total

742 291 461 66 41
 Using the information in the table above,
Calculate the following:
a) 
b) 
c) Correlation coefficient (r)
d) Fit the regression line
Required
Calculate the following:

a) 
b) 
c) R-squared and comment on the goodness of fit of
the model to data
d) Fit the regression model and interpret the results
Sources of Errors in Regression

 The question which remains unanswered is

why should we add an error term? What are
the sources of the error term u in the
equation?
 Technically, u is known as the stochastic
disturbance or stochastic error term.
 It is a surrogate or proxy for all the omitted or
neglected variables that may affect Y but are
not (or cannot be) included in the regression
model.
Sources of Errors in Regression
1.Unpredictable element of randomness in
human response
 If y = consumption expenditure of household
and x = disposable income of household.
There is an unpredictable element of
randomness in each household’s
consumption.
 The household does not behave like a
machine, the expenditure pattern fluctuates.
2. Effect of Omitted Variables

 In our example x is not the only variable

influencing y.
 The family size, tastes of the family, spending
habits etc can affect the variable y.
 The u is a catch ball for the effects of all the
variables, some of which may not even be
quantifiable and some of which may not
even be identifiable.
3. Measurement Error in y
 It refers to the measurement error in in the
household consumption.
 The argument is we cannot measure it
accurately
 For now lets assume that there is
measurement error in y but not in x.
 Vagueness of theory - The theory, if any,
determining the behaviour of Y may be, and
often is, incomplete.
 Unavailability of data - It is a common
experience in empirical analysis that the data we
would ideally like to have often are not available.
 Wrong functional form - Even if we have
theoretically correct variables explaining a
phenomenon and even if we can obtain data on
these variables, very often we do not know the
form of the functional relationship between the
regressand and the regressors.
The Gauss-Markov Theorem
 Under assumptions, OLS is the Best Linear
Unbiased Estimator (BLUE) of the population
parameters.
 Best = smallest variance
 It’s reassuring to know that, under
assumptions, you cannot find a better
estimator than OLS.
 If one or several of these assumptions fail, OLS
is no longer BLUE
Regression Line

 We need also to draw a fitted regression line that

best fits the collection of (x-y) data points.
 A better procedure is to find the best straight line
using a criterion that minimises the sum of
squared distances (errors) from the points to the
line as measured by Y direction.
 It is possible to use the general equation for a
straight line to get the line that best ‘fits’ the data.
 The researcher would then be seeking to find the
values of the parameters or coefficients, α and β,
which would place the line as close as possible to
all of the data points taken together.
 The most common method used to fit a line to
the data is known as Ordinary Least Squares
(OLS).
 Ordinary Least Squares (OLS) or linear least
squares is a method for estimating the unknown
parameters in a linear regression model, with
the goal of minimizing the sum of the squares of
the differences between the observed responses
in the given dataset and those predicted by a
linear function of a set of explanatory variables.
 This line is known as the least squares line or
fitted regression line.
DERIVING THE ORDINARY LEAST SQUARES ESTIMATES

 Linear regression also known as linear least

squares, computes a line that best fit the
observations.
 The method of least squares requires that we
should choose as estimates of

 Thus the predictions must be based on

parameters- estimated values and testing is
based on estimated values in relation to
hypothesized population values.
Example
 In other words, for the given value of x of this
observation t, ˆyt is the value for y which the
model would have predicted.
 Note that a hat (ˆ) over a variable or
parameter is used to denote a value estimated
by a model.
 Finally, let ût denote the residual, which is
the difference between the actual value of y
and the value fitted by the model for this data
point -- i.e. (yt − ˆyt).
 The distance of the data points from the fitted
line is the residual – which is the difference
between the actual value of the dependent
variable and the predicted value.
 The method minimises collectively the vertical
distances from the data points to the fitted
line (y-y hat).
 To guess is cheap. To guess wrongly is
expensive - Chinese proverb
 The reason that the sum of the squared
distances is minimised - finding the sum of ût
that is as close to zero as possible - points will
lie above the line while others lie below it.
 When the sum to be made as close to zero as
possible is formed, the points above the line
would count as positive values, while those
below would count as negatives.
 Any fitted line that goes through the mean of
the observations (i.e. ¯x, ¯y) would set the
sum of the ût to zero.
 So minimising the Sum of Squared Errors (SSE)
is given by minimising:
Explained and Unexplained Variation

 We need to define a line that passes through

the point determined by the mean x-value and
the mean y-value.
 The variation in the dependent (y) variable can
be partitioned as follows:
 Total variation in the dependent (y) variable
dependents on
 Variation in the dependent (y) variable explained by
the independent (x) variable.
 The variation in the dependent (y) variable NOT
explained by the independent (x) variable (residual).
Total Variation
$ Spent on
Health Care
Y
Y-Y=deviation unexplained by regression
8 x x
(x,y)
7 x
x x Y-Y=deviation explained by regression
6
5.9 Y
5 x x Y-Y=total deviation around Y

4 x
Y=a+bx

10 20 30 40 50 60 70 X
Income
Explaining Variation

SST = SSR + SSE

Total = Explained + Unexplained

Deviation Deviation Deviation
Example
Suppose Mr Kuwaza observes the selling price
and sales volume of milk for 10 randomly
selected weeks. The data he has collected are
presented in the Table 2.1 overleaf.

Required, Calculate
a) SSR
b) SST
c) SSE
Table
Week Weekly Sales (1000 s of Selling Price ($)
gallons)
1 10 1.30
2 6 2.00
3 5 1.70
4 12 1.50
5 10 1.60
6 15 1.20
7 5 1.60
8 12 1.40
9 17 1.00
10 20 1.10
Y X XY
10 1.30 13.0 1.69 100
6 2.00 12.0 4.00 36
5 1.70 8.5 2.89 25
12 1.50 18.0 2.25 144
10 1.60 16.0 2.56 100
15 1.20 18.0 1.44 225
5 1.60 8.0 2.56 25
12 1.40 16.8 1.96 144
17 1.00 17.0 1.00 289
20 1.10 22.0 1.21 400
Totals: 112 14.40 149.3 21.56 1488
 The line that best fits a collection of X-Y data
points is the line that minimises the sum of
squared distances from the fitted line.
 This is known as the least squares line or
fitted regression equation.
 The fitted line will be in the form of:
Calculation of Residuals (SST)
Residuals with Predicted Data
Testing Validity of the Model
 In simple linear regression, the validity of the
model is tested by Coefficient of
Determination
 If Coefficient of Determination is high, it
shows the model is very good.
 We can also check validity of the model If
SSR>SSE
Coefficient of Determination
 The coefficient of determination measures the
percentage of variability in Y that can be
explained through knowledge of the variability
in the independent variable X -

 The more of the variance in Y you can explain, the more

powerful your model
 Calculate Coefficient of Determination using previous
example.
Solution

Basic Econometrics 4th Edition Damodar N. Gujarati PDF Download
No ratings yet
Basic Econometrics 4th Edition Damodar N. Gujarati PDF Download
54 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Shazam Reference Manual 11
No ratings yet
Shazam Reference Manual 11
565 pages
Topic 3 - Simple Regression Analysis
No ratings yet
Topic 3 - Simple Regression Analysis
37 pages
Econometrics For Finace Lecture II-Session Three
No ratings yet
Econometrics For Finace Lecture II-Session Three
32 pages
Da Unit 3 R22
No ratings yet
Da Unit 3 R22
15 pages
Unit 2 Regression
No ratings yet
Unit 2 Regression
31 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Test Bank Questions Chapter 5
No ratings yet
Test Bank Questions Chapter 5
3 pages
A Robust Test For Weak Instruments
No ratings yet
A Robust Test For Weak Instruments
13 pages
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
No ratings yet
ECS3706-econometric Techniques Discussion Class 2 15-09-2010
33 pages
Topic 2
No ratings yet
Topic 2
23 pages
Regression
No ratings yet
Regression
60 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Regression Course For Second Year (Chap 1-3)
No ratings yet
Regression Course For Second Year (Chap 1-3)
59 pages
DMJAP LinearRegression 3
No ratings yet
DMJAP LinearRegression 3
28 pages
Statistical Approach For Selection of Regression Model During Validation of Bioanalytical Method
No ratings yet
Statistical Approach For Selection of Regression Model During Validation of Bioanalytical Method
7 pages
Econometrics For MGT ppt-2
No ratings yet
Econometrics For MGT ppt-2
58 pages
Ef3450 2021B Mid
No ratings yet
Ef3450 2021B Mid
12 pages
Unit III
No ratings yet
Unit III
13 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
BST 32202 Linear Regression 6 SLR Assumptions Lse
No ratings yet
BST 32202 Linear Regression 6 SLR Assumptions Lse
20 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
CH 5
No ratings yet
CH 5
36 pages
Martin MacDonnell 2012 Meata Analysis On Perception of Telework and Organizational Outcomes
No ratings yet
Martin MacDonnell 2012 Meata Analysis On Perception of Telework and Organizational Outcomes
15 pages
Econometrics Session
No ratings yet
Econometrics Session
43 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
Unit III
No ratings yet
Unit III
18 pages
Advances in Spatial Science: Springer
No ratings yet
Advances in Spatial Science: Springer
431 pages
Lecture Var Signrestriction
No ratings yet
Lecture Var Signrestriction
41 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Claude Regresssion Output
No ratings yet
Claude Regresssion Output
6 pages
Chapter 2
No ratings yet
Chapter 2
17 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Topic 2
No ratings yet
Topic 2
23 pages
Abhinav Assessment of The Quality of State Expenditures in India
No ratings yet
Abhinav Assessment of The Quality of State Expenditures in India
36 pages
15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
Linear Regression Models
No ratings yet
Linear Regression Models
42 pages
Classical Linear Regression Model Assumptions and Diagnostics
No ratings yet
Classical Linear Regression Model Assumptions and Diagnostics
71 pages
Unit - Iii
No ratings yet
Unit - Iii
9 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Ra Web
No ratings yet
Ra Web
70 pages
1 - Stat-701 Regression
No ratings yet
1 - Stat-701 Regression
18 pages
Generalized Method of Moments (GMM) Estimation in Stata 11: David M. Drukker
No ratings yet
Generalized Method of Moments (GMM) Estimation in Stata 11: David M. Drukker
27 pages
Python Dts Calibration
No ratings yet
Python Dts Calibration
69 pages
The Effect of Liquidity On Financial Performance E
No ratings yet
The Effect of Liquidity On Financial Performance E
18 pages
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
No ratings yet
ECN 5121 Econometric Methods Two-Variable Regression Model: The Problem of Estimation By: Domodar N. Gujarati
65 pages
Econometrics Assignment
No ratings yet
Econometrics Assignment
40 pages
New Group Assignment
No ratings yet
New Group Assignment
10 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Regression
No ratings yet
Regression
1 page
Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
The Price Impact of Order Book Events: Rama Cont, Arseniy Kukanov and Sasha Stoikov March 2011
No ratings yet
The Price Impact of Order Book Events: Rama Cont, Arseniy Kukanov and Sasha Stoikov March 2011
26 pages
Regression Analysis
No ratings yet
Regression Analysis
11 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Untitled 472
No ratings yet
Untitled 472
13 pages
K00699 20210604181519 Hirschfield-Gasper2011 Article TheRelationshipBetweenSchoolEn
No ratings yet
K00699 20210604181519 Hirschfield-Gasper2011 Article TheRelationshipBetweenSchoolEn
20 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Econometrics Notes
No ratings yet
Econometrics Notes
6 pages
Linear Models
No ratings yet
Linear Models
92 pages
First-Order System Least Squares and Electrical Impedance Tomography
No ratings yet
First-Order System Least Squares and Electrical Impedance Tomography
24 pages
Linear Regression II
No ratings yet
Linear Regression II
54 pages
Work Organization and HR
No ratings yet
Work Organization and HR
10 pages
Detecting Influential Observations in DEA WILSON
No ratings yet
Detecting Influential Observations in DEA WILSON
19 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
M2L2 CLRM & Simple Linear Regression Analysis
No ratings yet
M2L2 CLRM & Simple Linear Regression Analysis
13 pages
Limitations of Neuropsychological Tests and Remedial Measures
No ratings yet
Limitations of Neuropsychological Tests and Remedial Measures
8 pages
DSE - Course Outline
No ratings yet
DSE - Course Outline
11 pages
8-Simple Regression Analysis
No ratings yet
8-Simple Regression Analysis
9 pages
Chapter 1
No ratings yet
Chapter 1
17 pages
The Impact of Oil Revenue On The Economic Growth in Nigeria (1980-2010)
100% (2)
The Impact of Oil Revenue On The Economic Growth in Nigeria (1980-2010)
58 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
(Revised) Simple Linear Regression and Correlation
No ratings yet
(Revised) Simple Linear Regression and Correlation
41 pages
Quiz 2
No ratings yet
Quiz 2
22 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Third, Regression Analysis Predicts Trends and Future Values
No ratings yet
Third, Regression Analysis Predicts Trends and Future Values
2 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Arellano Bond GMM Estimators
100% (1)
Arellano Bond GMM Estimators
10 pages
Statistics
No ratings yet
Statistics
1 page
hw2 Answer Econ120c Su05 PDF
No ratings yet
hw2 Answer Econ120c Su05 PDF
4 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
100% (1)
Homoscedastic That Is, They All Have The Same Variance: Heteroscedasticity
11 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages

Chapter 3 - Classical Simple Linear Regression

Uploaded by

Chapter 3 - Classical Simple Linear Regression

Uploaded by

Simple Linear Regression

 Regression is probably the single most important

 In econometrics, there is need to examine the

 Regression analysis is concerned with

1 Explanatory Regression 2+ Explanatory

EPI 809/Spring 2008 7

Why not include as many as variable into the

=y2∑ =x2∑ xy=∑ =y ∑ =x ∑ Total

 The question which remains unanswered is

 In our example x is not the only variable

 We need also to draw a fitted regression line that

 Linear regression also known as linear least

 Thus the predictions must be based on

 We need to define a line that passes through

SST = SSR + SSE

Total = Explained + Unexplained

 The more of the variance in Y you can explain, the more

You might also like