0% found this document useful (0 votes)

18 views42 pages

Simple Linear Regression

The document provides an overview of simple linear regression models. It discusses estimating regression parameters using ordinary least squares to minimize the sum of squared residuals between observed and predicted values of the dependent variable. Key aspects covered include the general regression equation, assumptions of the model, estimating the slope and intercept parameters from sample data, interpreting goodness-of-fit measures like R-squared, and examples of estimating and interpreting regression coefficients.

Uploaded by

Blessing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views42 pages

Simple Linear Regression

Uploaded by

Blessing

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Applied Statistics and Econometrics

Simple regression

Winter semester 2022/23

Fabian Frick
Technical University of Munich
Agricultural Production and Resource Economics
04.11.2022
Today
Getting to know the simple regression model:

General setup, components, and intuition

Estimation from sample data

3
Simple Regression
„Regression of y on x“:

We refer to y as:
• Dependent variable
• Explained variable = intercept or constant
• Regressand
= slope parameter
We refer to x as:
• Independent variable
• Explanatory variable = error term or disturbance („“, „e“…);
• Regressor contains all other factors relevant for y that are
• Covariate not explicitly included as separate variables in
• Control variable the regression equation. That is, the effects of all
unobserved factors important for y are included
in the error term.

4
Simple Regression
𝑌 = 𝛽𝑜 + 𝛽1 𝑋 + 𝑢
𝑌 𝑖= 𝛽 𝑜 + 𝛽 1 𝑋 𝑖 +𝑢𝑖 (i identifies individual observations)

y, e.g., crop yield

y3,x3
^3
𝑢
y5,x5 y1,x1
y4,x4
y6,x6
^
𝛽1 y2,x2 +
^
𝛽0 1

x, e.g., fertilizer
5
Simple Regression
What do we hope to learn from the regression?
Essentially, the effect of on , which we extract by looking at the change in associated with a
change in .

For example, if we change by 5 units () and , will change by

units.

The same conclusion is given by the first derivative of with respect to :

Note that we assume a linear relationship between and by the functional form we chose –
assumption can be relaxed (varying effect of x on y with differing x).

? Is merely the value for when  Rarely of importance for the analysis.

6
First assumptions
For the estimation of the unknown/unobserved parameters and , we need to make some
assumptions.

First assumption:

The average value of the error term in the population is zero.

Assumption is not restrictive, because we can use to normalize to 0 (shifting the regression
line until ).

7
First assumptions
One crucial assumption concerns the relationship between u and x.

and need to be unrelated.

Because correlation measures linear dependence, we resort to a more general expression:

The average value of u does not depend on x; u is mean independent of x. At the same time,

the “zero conditional mean assumption”.

Is the assumption plausible? Consider soil quality contained in u in the fertilizer example?
vs. ?

8
Estimation of the parameters
Ordinary Least Squares (OLS)

Most common approach to estimate a linear model.

Key idea of a regression is the estimation of population parameter using a sample of the
population.

Let be a random sample of size of a population,

Then for each observation in this sample, we can write

9
Regression line and sample points

y E(y|x) = b0 + b1x
y4 .

y3 .
y2 .

y1 .
x1 x2 x3 x4 10 x 10
Ordinary Least Squares
Our assumption also implies

Why? Because using the covariance rules, we have:

By rearranging the regression equation, we can express the error term as:

Plugging this into the equations above, we see that

(“moments” of the regression equation  “method of moments” estimation)

11
Ordinary Least Squares
To estimate the model parameters, we form the sample counterparts of the population
expectations:

and

All we need to do now is to choose values for and that make the conditions above true. Both
parameters can be identified because we have two unknowns and two moment conditions.

12
Ordinary Least Squares – Finding
Rewrite the first moment condition, using as the shorthand expression for the
sample mean of

That is, once we find , it is easy to calculate .

13
Ordinary Least Squares – Finding
Using the second moment condition

[drop because it is never zero]

[because ]

[by rearranging]

[by summation rules]

provided that , we then arrive at

14
Ordinary Least Squares – Finding

The slope parameter is the sample covariance between and divided by the sample variance.

If and are positively correlated, then the slope is positive.

The only assumption necessary is variation in ().

15
Ordinary Least Squares – Why the name?
Define the fitted value for when :

(the values on the regression line),

the residual of observation i:

(deviation of observation i from the regression line),

the sum of squared residuals:

An intuitive way of optimizing the regression line ist to choose and such that the sum of
squared residuals (as a measure for the “errors” of the regression line) is minimized.
It can be shown that minimizing SSR relies on the same conditions and leads to the same
estimators.

16
Ordinary Least Squares – Why the name?

17
Algebraic Properties of OLS
The sum of the OLS residuals is zero

Thus, the sample average of the OLS residuals is zero as well

The sample covariance between the regressors and the OLS residuals is zero

The OLS regression line always goes through the mean of the sample.

18
More terminology
We can regard each observation as being made up of:

• An explained part,
𝑦 𝑖= ^ ^𝑖
𝑦 𝑖 +𝑢
• An unexplained part,

Further is:
• the Total Sum of Squares (SST):
(measures variation in ; SST=SSE+SSR)

• the Explained Sum of Squares (SSE):

(measures variation in )

• the unexplained or Residual Sum of Squares (SSR):

(measures variation in of )

19
Goodness-of-Fit

unexplained variation
total variation

explained variation

We can calculate the share of the total sum of squares explained by the model:

Interpretation: R² is the fraction of the sample variation in y that is explained by x.

20
Goodness-of-Fit
Interpretation of a low R²?

Low R²‘s not uncommon in social sciences.

Is the analysis worthless?

Not per se: can still be a good and useful estimation for , no matter the size of R².

However, a low R² suggests that other (unobserved) factors are of far greater importance for
explaining the variation in y, compared to the variable(s) included in the regression.

21
OLS Example

Scatter plot and R output of regression of test scores on student-teacher ratio (stratio)

22
OLS Example
Interpretation

Coefficient of STR:
School districts that have one additional student per teacher show, on average, 2.28 points
lower test scores.

Intercept:
Would mean that school districts with a student-teacher ratio of zero have average test scores
of 698.9. Is this meaningful?

23
Units of Measurement
What happens to the regression parameters if the unit of x or y is changed?

Suppose we estimate the salary of CEO’s dependent on the company’s return on equity:

where is measured in thousands of dollars, and is measured in %.

 CEOs receive USD 18,500 more in salary if their company achieves a one percentage
point higher ROE.

Assume we reestimate the model using in dollars:

 scale of the parameters change, but interpretation is the same.

 In general, scaling the units of x or y by a constant will accordingly scale the parameters by
the same constant.
24
Nonlinearities in the Simple Regression Model
All examples of the simple regression model so far were linear in the explaining variable.
However, the relationship between y and x might not be linear.

How to choose the functional form?

Can be given from theory (e.g., exponential growth of bacteria).
Without theoretical basis, choose a functional form able to accommodate nonlinearities
(“flexible” functional forms)?

The regression model can be adjusted by incorporating nonlinear terms of the explaining
variable. These additional terms would be additional regressors, making our model a multiple
regression (not „simple“ anymore). We discard this option until the next chapter.

Incorporating nonlinearities changes the interpretation of the slope parameter (not linear
anymore).

One popular transform: Log-transform, forming the natural logarithm of the dependent,
indepentent, or both variables.
Caution: natural logarithm = , oftentimes just called “log”

25
Functional form

Relationship between test scores and student-teacher ration looks (somewhat) linear:

26
Functional form

… but how about the relationship between test scores and income?

27
Log-level model of the wage equation

𝑤𝑎𝑔𝑒=𝑒 𝛽 + 𝛽 𝑒𝑑𝑢𝑐
0 1

28
Log-level model of the wage equation
70000 11.5

11 𝑤𝑎𝑔𝑒 = 𝛽 0 + 𝛽1 𝑒𝑑𝑢𝑐
𝛽 0 + 𝛽1 𝑒𝑑𝑢𝑐
𝑤𝑎𝑔𝑒
60000 =𝑒 ln
50000 10.5

40000 10

30000 9.5

20000 9

10000 8.5

0 8
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Linearisation of the original model for estimation:

29
Functional Form
Taking Logs of in/dependent variables
„Log-differences“ can be used to calculate the approximate percentage change.

E.g.,

Model Functional form Interpretation of

Level-Level Change of Y associated with a change in X by
one (each in their natural units)
Log-Level (Appr.) percentage change in Y (in 1/100 %)
associated with a change in X by one unit.
Level-Log Change in Y by (in its natural unit) associated
with a change in X by appr. 1%.
Log-Log Percentage change in Y associated with a
change in X by 1% (→ elasticity!).

With one additional year of schooling, income increases by appr. 5%.

(„semi-elasticity“)

With an increase in nitrogen fertilizer by 1%, yield increases by 0.8%.

30
Unbiasedness of OLS estimators
Four key assumptions:

1. Assume the population model is linear in parameters as

2. Assume we can use a random sample of size n, , from the population model. Thus, we
can write the sample model

3. Assume there is variation in

4. Assume and thus

31
Unbiasedness of OLS estimators
1. Linear in parameters

“Linear” regression means linear in parameters.

Which of the following models are linear in parameters?

32
Unbiasedness of OLS estimators
2. Random sampling

The condition is met, if the individuals in our sample are chosen randomly.

The observations are required to be „i.i.d“– independent and identically distributed.

Independently: characteristics of one observation does not depend on another observation.

Identically: the observations stem from the same distribution.

For example, this assumption is oftentimes not fulfilled for time series analysis because of
autocorrelation (however, there are ways to correct for it). y

t
33
Unbiasedness of OLS estimators
3. There is variation in x

Intuitively, a regression line cannot be estimated, if all observations show the same value for x
(we cannot explore the change in y if x does not change in our sample).

34
Unbiasedness of OLS estimators
4.

Oftentimes the most critical assumption in econometrics/when working with observational

data.

Remember: all other factors not explicitly considered/measured as variables in our regression
equation are contained in u.

35
Unbiasedness of OLS estimators
In the following, we want to show that indeed our OLS estimators are unbiased under the
assumptions made.

In order to think about unbiasedness, we need to rewrite our estimator in terms of the
population parameter

Numerator:

36
Unbiasedness of OLS estimators

We use the fact that and

So the numerator can be rewritten as

Substituting back into the estimator equation:

Expected value of the estimator?

because

37
Unbiasedness of OLS estimators
The OLS estimates of and are unbiased

Proof of unbiasedness depends on our 4 assumptions—if any assumption fails, then OLS is
not necessarily unbiased

Remember unbiasedness is a description of the estimator—in a given sample, we may be

“near” or “far” from the true parameter

38
Variance of the OLS estimators
Unbiasedness: We know now that the sampling distribution of our estimate is centered around
the true parameter.

The second important information is, how reliable our estimators are, that is, how spread out
their distribution is.

Like their expected values, we can estimate the standard errors from information from the
sample.

39
Standard errors
Deriving under the assumption of homoscedasticity ():

We use our earlier formula for the estimator

, , are (held) constant, are independent

40
Standard errors

can be estimated from the residuals in the sample.

Standard error of is then given as

The standard error depends on the error variance and variation in x:

The greater the error variance (unexplained factors more important), the greater the standard
error.

The greater the variation in x (heterogeneous/larger sample), the smaller the standard error.

41
Heteroscedasticity

Homoscedasticity vs. Heteroscedasticity

E.g., increases with educ

Possible explanation: individuals with a higher level of
y
education have a greater choice of interests and job
possibilities

Consequences of heteroscedasticity:
remain unbiased
Standard errors need to be adjusted: „Robust“ standard errors
x
42
Summary
We saw how to get unbiased estimates by OLS estimation for and .

Their unbiasedness depends on 4 critical assumptions—if any assumption fails, then OLS is
not necessarily unbiased.

In Econometrics, discussion on validity of estimation results oftentimes concerns the zero

conditional mean assumption.

During the course, we will gradually get to know methods to make OLS results more robust.

Chapter 2-Simple Regression Model
No ratings yet
Chapter 2-Simple Regression Model
25 pages
Lecture 2 - MRA and Inference
No ratings yet
Lecture 2 - MRA and Inference
57 pages
Unit-I Introduction To Data Science
No ratings yet
Unit-I Introduction To Data Science
40 pages
Activity Based Costing Testbanks
No ratings yet
Activity Based Costing Testbanks
18 pages
Ch.2 The Simple Regression Model
No ratings yet
Ch.2 The Simple Regression Model
6 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
No ratings yet
Theory 3. Linear Regression With One Regressor (Textbook Chapter 4)
41 pages
Econ 399 Chapter2a
No ratings yet
Econ 399 Chapter2a
40 pages
Lecture 3 Simple Linear Regression
No ratings yet
Lecture 3 Simple Linear Regression
46 pages
CH 02
No ratings yet
CH 02
41 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
EECM3724 Unit 9 ch14 Slides 2023
No ratings yet
EECM3724 Unit 9 ch14 Slides 2023
57 pages
The Simple Regression Model
No ratings yet
The Simple Regression Model
24 pages
Machine Learning and Deep Learning With Python A Beginners Guide To Programming - 2 Books in 1
No ratings yet
Machine Learning and Deep Learning With Python A Beginners Guide To Programming - 2 Books in 1
132 pages
Applications Chapter 4
No ratings yet
Applications Chapter 4
38 pages
Eco 3
No ratings yet
Eco 3
68 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
CH 02 Simple Regression TQT
No ratings yet
CH 02 Simple Regression TQT
61 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Simple Regression Model CH02
No ratings yet
Simple Regression Model CH02
60 pages
Lecture Set 2
No ratings yet
Lecture Set 2
47 pages
Chapter 2 Econometric
No ratings yet
Chapter 2 Econometric
28 pages
Introduction To Econometrics
No ratings yet
Introduction To Econometrics
37 pages
Simple Linear Regression Model I
No ratings yet
Simple Linear Regression Model I
83 pages
As of Sep 16, 2020: Seppo Pynn Onen Econometrics I
No ratings yet
As of Sep 16, 2020: Seppo Pynn Onen Econometrics I
52 pages
Simple Linear Regression Model
No ratings yet
Simple Linear Regression Model
51 pages
Econometrics II: Revision Class: Introduction To Econometrics
No ratings yet
Econometrics II: Revision Class: Introduction To Econometrics
55 pages
Week 3-4
No ratings yet
Week 3-4
75 pages
STAT 445-Lecture 1 - 2021
No ratings yet
STAT 445-Lecture 1 - 2021
42 pages
BRM - L4,5 - Linear Regression
No ratings yet
BRM - L4,5 - Linear Regression
113 pages
Lecture 6. Linear Regression
No ratings yet
Lecture 6. Linear Regression
12 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
Chapter Three
No ratings yet
Chapter Three
22 pages
Multiple Regression
No ratings yet
Multiple Regression
14 pages
Manual ML 1
No ratings yet
Manual ML 1
8 pages
02 Simple Regression
No ratings yet
02 Simple Regression
29 pages
Topic 2
No ratings yet
Topic 2
23 pages
Bivariate Regression Analysis: The Beginning of Many Types of Regression
No ratings yet
Bivariate Regression Analysis: The Beginning of Many Types of Regression
40 pages
Lecture 2
No ratings yet
Lecture 2
47 pages
Regression Analysis PDF
100% (2)
Regression Analysis PDF
205 pages
Econometrics Jimma Assignment
No ratings yet
Econometrics Jimma Assignment
6 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Ordinary Least Squares Linear Regression Review: Week 4
No ratings yet
Ordinary Least Squares Linear Regression Review: Week 4
10 pages
Econometrics Lecture 2 Simple Regression
No ratings yet
Econometrics Lecture 2 Simple Regression
33 pages
2 - Model Linear Jamak Dan OLS
No ratings yet
2 - Model Linear Jamak Dan OLS
11 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
ECC321 Chapter2
No ratings yet
ECC321 Chapter2
5 pages
Econometrics 7
No ratings yet
Econometrics 7
49 pages
Theme 2 Ordinary Least Squares Regression
No ratings yet
Theme 2 Ordinary Least Squares Regression
10 pages
Chapter 2 Econometrics
No ratings yet
Chapter 2 Econometrics
9 pages
Econometrics Theory Note
No ratings yet
Econometrics Theory Note
13 pages
AG909 Quantitative Methods For Finance
No ratings yet
AG909 Quantitative Methods For Finance
7 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
Linear Regression Models
No ratings yet
Linear Regression Models
41 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
3-Econometrics-Linear Regression
No ratings yet
3-Econometrics-Linear Regression
13 pages
Chapter 1: The Nature of Econometrics and Economic Data
No ratings yet
Chapter 1: The Nature of Econometrics and Economic Data
19 pages
Pertemuan 2 - Simple Linear Regression
No ratings yet
Pertemuan 2 - Simple Linear Regression
24 pages
Chap 2
No ratings yet
Chap 2
15 pages
IMT 24 Quantitative Techniques M1
No ratings yet
IMT 24 Quantitative Techniques M1
20 pages
University of Dhaka Department of Statistics Syllabus For 4-Year B.S. (Honors) Starting Sessions: 2017-2018
No ratings yet
University of Dhaka Department of Statistics Syllabus For 4-Year B.S. (Honors) Starting Sessions: 2017-2018
45 pages
Analyzing Collaborative Reflection Support: A Content Analysis Approach
No ratings yet
Analyzing Collaborative Reflection Support: A Content Analysis Approach
20 pages
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
No ratings yet
Factors Influencing The Service Lifespan of Buildings An Improved Hedonic Model
9 pages
TG2 Acc115
No ratings yet
TG2 Acc115
12 pages
The Effect of Participative Leadership Style On The Performance of COYA Senior Managers in Kenya
No ratings yet
The Effect of Participative Leadership Style On The Performance of COYA Senior Managers in Kenya
12 pages
Econometrics University of Ottawa
No ratings yet
Econometrics University of Ottawa
6 pages
E06 - Selective Inflow Performance
No ratings yet
E06 - Selective Inflow Performance
4 pages
Mathematical Science Mathematical Science Mathematical Science Mathematical Science
No ratings yet
Mathematical Science Mathematical Science Mathematical Science Mathematical Science
28 pages
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
No ratings yet
INDR 372 Selected Solutions of Review Exercises For The Midterm Exam
15 pages
Italy Is in The Airbnb
No ratings yet
Italy Is in The Airbnb
28 pages
Correlation
No ratings yet
Correlation
9 pages
7 Regression With Stationary Time-Series Data-Revised
No ratings yet
7 Regression With Stationary Time-Series Data-Revised
75 pages
Machine Learning Project: Choice of Employee Mode of Transport
No ratings yet
Machine Learning Project: Choice of Employee Mode of Transport
35 pages
Martin Et Al 1986 Transmission of Social Attitudes
No ratings yet
Martin Et Al 1986 Transmission of Social Attitudes
5 pages
Surge
No ratings yet
Surge
16 pages
Ray 2018
No ratings yet
Ray 2018
13 pages
Nugent 2010 Chapter 3
No ratings yet
Nugent 2010 Chapter 3
13 pages
Prob Stats Module 3
No ratings yet
Prob Stats Module 3
49 pages
Journal of World Business: David M. Kemme, Bhavik Parikh, Tanja Steigner T
No ratings yet
Journal of World Business: David M. Kemme, Bhavik Parikh, Tanja Steigner T
12 pages
MTH408 Machine - Learning - Logistic - Regression
No ratings yet
MTH408 Machine - Learning - Logistic - Regression
43 pages
BBS11 ISM Ch14
No ratings yet
BBS11 ISM Ch14
50 pages
10 1016@j Worlddev 2016 03 011
No ratings yet
10 1016@j Worlddev 2016 03 011
16 pages
Pattern Recognition Unit - 5
No ratings yet
Pattern Recognition Unit - 5
16 pages
Arun Kumar
No ratings yet
Arun Kumar
7 pages
Statistical Tests For Comparing Machine Learning Algorithms
No ratings yet
Statistical Tests For Comparing Machine Learning Algorithms
8 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Simple Linear Regression

Uploaded by

Simple Linear Regression

Uploaded by

Applied Statistics and Econometrics

Winter semester 2022/23

General setup, components, and intuition

Estimation from sample data

y, e.g., crop yield

For example, if we change by 5 units () and , will change by

The same conclusion is given by the first derivative of with respect to :

The average value of the error term in the population is zero.

and need to be unrelated.

the “zero conditional mean assumption”.

Most common approach to estimate a linear model.

Let be a random sample of size of a population,

Then for each observation in this sample, we can write

Why? Because using the covariance rules, we have:

Plugging this into the equations above, we see that

(“moments” of the regression equation  “method of moments” estimation)

That is, once we find , it is easy to calculate .

[drop because it is never zero]

[by summation rules]

provided that , we then arrive at

If and are positively correlated, then the slope is positive.

The only assumption necessary is variation in ().

(the values on the regression line),

the residual of observation i:

(deviation of observation i from the regression line),

the sum of squared residuals:

Thus, the sample average of the OLS residuals is zero as well

• the Explained Sum of Squares (SSE):

• the unexplained or Residual Sum of Squares (SSR):

Interpretation: R² is the fraction of the sample variation in y that is explained by x.

Low R²‘s not uncommon in social sciences.

Is the analysis worthless?

where is measured in thousands of dollars, and is measured in %.

Assume we reestimate the model using in dollars:

 scale of the parameters change, but interpretation is the same.

How to choose the functional form?

Linearisation of the original model for estimation:

Model Functional form Interpretation of

With one additional year of schooling, income increases by appr. 5%.

With an increase in nitrogen fertilizer by 1%, yield increases by 0.8%.

1. Assume the population model is linear in parameters as

3. Assume there is variation in

4. Assume and thus

“Linear” regression means linear in parameters.

Which of the following models are linear in parameters?

The observations are required to be „i.i.d“– independent and identically distributed.

Independently: characteristics of one observation does not depend on another observation.

Identically: the observations stem from the same distribution.

Oftentimes the most critical assumption in econometrics/when working with observational

We use the fact that and

So the numerator can be rewritten as

Substituting back into the estimator equation:

Expected value of the estimator?

Remember unbiasedness is a description of the estimator—in a given sample, we may be

We use our earlier formula for the estimator

, , are (held) constant, are independent

can be estimated from the residuals in the sample.

Standard error of is then given as

The standard error depends on the error variance and variation in x:

Homoscedasticity vs. Heteroscedasticity

E.g., increases with educ

In Econometrics, discussion on validity of estimation results oftentimes concerns the zero

You might also like