0% found this document useful (0 votes)

5 views14 pages

Simple Regression

This document provides an introduction to Ordinary Least Squares (OLS) and simple linear regression, emphasizing the importance of understanding statistical relationships and the distinction between correlation and causation. It discusses the role of deterministic and stochastic relationships in econometrics, the structure of linear regression models, and the significance of the stochastic error term in accounting for unexplained variations. The document also outlines the population and sample regression functions, highlighting the assumptions necessary for effective regression analysis.

Uploaded by

Hicham Atatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views14 pages

Simple Regression

Uploaded by

Hicham Atatri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

An introduction to Ordinary

Least Squares (OLS):

Simple Linear Regression
Marcio Santetti | Spring 2023

Table of contents

The modern interpretation of regression 2

Deterministic vs. stochastic relationships 3

Regression vs. correlation 4

A note on terminology 6

Single-equation (simple) linear models 6

The stochastic error term 7

How are ε and X related? . . . . . . . . . . . . . . . . . 8

The regression function (finally!) 10

The population regression function . . . . . . . . . . . 11
The sample regression function . . . . . . . . . . . . . . 11

What is the best method to estimate the sample regres-

sion function? 12

1
After some quick review of statistical concepts, we will begin
our content with the basic—and most important—technique
in Econometrics: linear regression. You may have been intro-
duced to it in your Stats courses, but we need to go deeper
within its concept, in order to adapt it to real-world uses. The
key argument is that the world that surrounds us produces
diffuse and fuzzy data, and we are far from estimating empirical
models free from error. We need to live with this fact, and the
best way to treat this inherent error is by minimizing it.1 The 1 By minimizing something, keep in

most popular method for that is Ordinary Least Squares (OLS), mind that we will use its mathematical
meaning; that is, we try to make some-
and we introduce this technique in the context of simple linear
thing (in this case, our error) not zero,
regression, that is, a model where we try to explain a variable but as small as possible.
of interest’s behavior in terms of only one single explanatory
factor.

The modern interpretation of regression

For our purposes, the term regression is concerned with the

study of the dependence of one variable (dependent) on one—or
more—other variable (explanatory/independent). We do this
in order to predict the population mean or average effect of the
independent variable(s) on some other variable of interest.2 2 As a quick historical note, in case

you are curious to know more about

In other words, a statistical regression compresses a simple the historical origin of the term “regres-
model through which we wish to analyze how a change in one sion,” make sure to check out the paper
or more independent variable affects a dependent variable of by J. Stanton listed on the “Additional
Readings” module on theSpring.
interest. For example, in case we want to estimate the average
effect of education on wages, the simplest possible model would
be a simple regression, with wages as the dependent variable,
and education as the independent variable. But how do we do
this in practice? Keep going, we’ll get there.

2
Deterministic vs. stochastic relationships

In statistical relationships, we generally deal with random

variables. This term should not be something new, but a quick
reminder does not hurt: a random variable is a variable whose
value is unknown until it is oberved, i.e., its outcomes are not
predictable, thus following a probability distribution.
As an example, consider the relationship between crop yield and
other variables, such as the amount of rainfall, sunshine, and
fertilizer use. Such association is statistical in nature: although
relevant, these factors will not enable an agronomist to exactly
predict crop yield, due to possible errors involved in measuring
these variables, as well as a host of other factors that collectively
affect the yield at a certain point in time. Therefore, the random
variability in the variable “crop yield” cannot be fully explained,
regardless of the amount of explanatory variables we list.3 3 What other variables would you con-

sider relevant to explain crop yield, in

Therefore, most—if not all—relationships in Economics and addition to those already listed? And
other Social Sciences involve uncertainty by definition. When why?
there is no uncertainty involved in an association between two
or more variables, we call it a deterministic relationship. The
equation of a straight line (y = ax + b) that you learned in
high school is an example. If you look at Figure 1, I have
just plotted a few x and y points and connected them with a
straight line. There is no uncertainty in this model, since all the
points are captured by the line. Your Math teacher may have
never told you this, but you were being taught deterministic
Mathematics.
If you consider the 10 points in this graph as data points, and
x and y as data variables, such relationship can be perfectly
explained by a simple straight line. This looks great, but when
we look at the real world, uncertainty is everywhere, so we
are surrounded by stochastic relationships. Econometrics is
concerned with these types of relations.
We cannot expect that real-world data behaves like a straight
line, especially in the Social Sciences, where data are diffuse

3
Figure 1: A simple straight line.

and fluctuate according to several different factors. A typical

scatter diagram between two variables may look something
like Figure 2.
The left panel illustrates the relationship between x and y, while
the right panel fits a straight line to these points. Unlike the
high school case, we cannot capture all the points only with a
straight line. This is just a simple representation of how working
with data and trying to explain it through statistical models is
not an easy task. But we will do our best here.

Regression vs. correlation

Since in this course we will be mostly interested in statistical

relationships, it is important to remark one point: a statistical
relationship, in itself, cannot logically imply causation. The
only thing regression can do is testing whether a significant
relationship/association exists, as well as giving a quantitative

4
Figure 2: A stochastic relationship.

assessment on how the variables are related. However, no

causality direction can be inferred from this methodology.
Furthermore, when we jump to regression analysis, the term
correlation may be a temptation, but you should actually dis-
card it when interpreting regression results. If you recall from
your Stats classes, correlation analysis does not imply setting
dependent or independent variables, i.e., x and y are treated
symmetrically. When we study regression, we assume an in-
herent asymmetry between x and y, since we will be trying to
explain changes in y in terms of changes in x. Even though
these variables may be correlated (that is, having a linear re-
lationship), our interests go way beyond simple correlation
coefficients. Therefore, keep correlation out of your future
regression interpretations.

Correlation implies a linear

association between two variables,
without an explicit call for which one
is dependent, which one is
independent. A more general
measure of association is covariance,
which computes how two variables
5 are associated, regardless of the shape.
A note on terminology

The table below brings a few synonyms for both dependent and
independent variables. These terms can be used interchange-
ably, with no loss of generality or meaning.

Dependent (y) Independent (x)

Explained Explanatory
Regressand Regressor
Outcome Covariate
Endogenous Exogenous
Controlled Control
Predictand Predictor

Single-equation (simple) linear models

The simplest single-equation linear regression model can be

represented by:

Y = β0 + β1 X

Let’s break down every component of this equation. Once

again, this should not be anything new up to this point, since
you were probably introduced to this type of equation in your
Stats courses. But let us review this topic and also implement
our future notation. First, X and Y are our familiar independent
and dependent variables, respectively. Moreover, we can see
that this is the equation of a line, right? Okay, good.
The stars of our model—so far—are the β coefficients: β0 and
β1 . The first is called the intercept, or constant, term, and is
simply the value that Y assumes when X is set to zero. The
second is the slope coefficient, and represents the amount of
Y that changes when X changes by 1 unit. In other words, β1

6
represents by how much the dependent variable changes, given
a marginal change in the independent variable.

For those familiar with calculus, β1 is

simply β1 = ∂y/∂x.
The stochastic error term

Now, we introduce something new (hopefully): besides the

variation in Y due to changes in X, it is almost certain that X
cannot fully explain changes in Y. This additional variation
comes in part from omitted independent variables (we will deal
with this issue later), but, even if these omitted variables were
added, do you think that only a chosen set of X covariates
would be able to explain 100% of the variation in our variable
of interest? If the answer is no, you are in the right place.
Such unexplained changes in Y may come from factors such as
measurement error, incorrect model specification, or purely random
events—that is, whose value is determined by chance. Does this
ring a bell?
Therefore, this intrinsic and inevitable lack of explanation is
fulfilled by the stochastic error term (also known as the residual
term), in order to account for the variation in Y that is not
explained by the independent variable(s). In simple language,
the error captures our ignorance or inability to model the entire
population model. However, its existence should not be an
excuse for estimating poor models. We need to do our best in
order to minimize this ignorance.
Let us, then, update our linear regression model by incor-
porating this error term, which we will denote, for now, by
ε:

Y = β0 + β1 X + ε

Now, we have a complete regression model! And it has a deter-

ministic part, composed by “β0 + β1 X,” as well as a stochastic

7
part, represented by the residual, “ε.” Thus, the uncertain part
of any statistical/econometric model is represented by an error
term, which captures everything that the explanatory variables are
not capable of doing to explain variations in Y, our dependent
variable.
Since we will always be dealing with random variables in this
course, let us apply the expectations operator (that is, compute
the Expected Value, which you learned in your Stats course) to
both sides of this equation:4 4 Let us demystify what the term Ex-

pected Value means. It is simply the

long-run average (mean) value of a
E(Y|X) = β0 + β1 X + 0 random variable. Nothing else.

Put simply, we are just calculating the average value of Y, given

(that is what the “|” symbol stands for) values of X. On average,
the expected value of Y is given by the deterministic portion
of our regression model,5 while, on average, the value of our 5 If you do not recall the laws of Ex-

error term is zero. This is a crucial assumption we are making pected Value from your Stats class,
make sure to give it a quick review,
here, concerning the distribution of our error term. Figure 3
so you fully understand what is going
illustrates this latter point, simply showing that the central on at this point. It is an important step
location of our ε friend is 0. to understand the basic foundations of
our class.
However, make sure you understand this point: this does not
mean that, when we get to apply this concept to real data, our
error term will show a value of zero. This is a statement about its
average, assuming we can run the same regression model with
several different samples of the same size for the same variables
X and Y. This is related to the sampling distribution of the
error term. If you do not recall what sampling distributions
mean, make sure to also give it a quick review from Stats.

How are ε and X related?

If ε and X are uncorrelated, then by definition these are not

linearly related. However, the error term can be correlated with
functions of X, such as X2 , for example—we will deal with these
issues later. Therefore, correlation is not the most appropriate

8
Figure 3: The residual’s distribution.

approach to define this relationship. Instead, we can define the

conditional distribution of ε, given any value of X. Let’s see
this:

E(ε|X) = E(ε)

The above equation simply states that, on average, the value of

ε does not depend on X. In other words, these are independent
of each other.
In order to make more sense of the above statement, let us look
at a more realistic model, relating wages and education:

wage = β0 + β1 educ + ε

This is the simplest way to estimate how an individual’s wage is

affected by their education. But before we analyze the relation-
ship between the residual and the independent variable, it is
worth asking: since ε contains everything that is not explicitly

9
accounted for in the model, what is contained in there? Think
about what is also relevant to explain wages.
Factors such as years of experience, innate ability, gender, race, and
many other variables are included in the error term, since the
only independent variable is education. To keep things simple,
assume that ε is only ability (abil), and assume it is lying on the
error term due to the impossibility of measuring it, or obtaining
data.
Then, going back to our previous assumption, it requires that
the average level of ability is the same, regardless of one’s years
of education. Illustrating:

E(abil|educ) = E(abil)

If this assumption is true, then

E(abil|educ = 8) = E(abil|educ = 16) = E(abil)

This means that years of education are independent, on average,

of what is contained in the error term, which is assumed to be
only ability.
In case you believe that ability increases with years of education,
congratulations, you are not a robot. However, this indepen-
dence assumption is often useful and theoretically necessary. We
will see this in more depth later on. If you are confused, that
is a sign that you are paying attention. When we look at some
applied examples, this will make sense.

The regression function (finally!)

The main goal of Statistics (and, therefore, Econometrics) is

estimating population parameters based on sample statistics.
In order to understand the latter, we begin with the former.

10
The population regression function

So far, we have worked with the notion of a population regres-

sion function, denoting Y and X as the “true”population values
for the dependent and independent variables. Stating once
again:

Y = β0 + β1 X

E(Y|X) = β0 + β1 X

From these two equations, we know that the average value of Y

changes with X. In case we had access to data from the entire
population of interest, there would be no need for statistical
techniques, such as the ones we will cover in this course. The
solution is, then, working with an appropriate sample extracted
from the overall population.

The sample regression function

We almost never have access to the “true” regression model

defining Y. Rather, we work with samples. Let {(xi , yi ) : i =
1, 2, 3, ..., n} denote a random sample pair of size n from the
whole population N. Then, we can write:

yi = β0 + β1 xi + ui

for each i = 1, 2, 3, ..., n.

Notice a few changes in our notation. From now on, we will only
deal in practice with sample data. For this course, lower-case
letters, such as yi and xi will denote sample data, extracted from
the overall population data represented by Y and X, respectively.
You may have also noticed that the sample error term is denoted
by ui . We have also introduced i subscripts denoting each
individual from the sample. These individuals will depend

11
on the research we are conducting: these may be households,
houses, cats, cities, etc. Conceptually, however, all the terms of
the sample regression functions are the same as before.
Additionally, we assume:

E(u) = 0

Cov(xi , ui ) = 0

That is, the expected value of the sample error term is zero,
and the covariance between the independent variable(s) and
the residual is zero. Recall that the covariance is a more general
concept than correlation: while the latter is only concerned with
linear relationships, the former defines relationships of any
form.

What is the best method to estimate the sample

regression function?

So far, we have only conceptually defined the regression func-

tion which will be the bread-and-butter of our class. But how
do we calculate the β coefficients? Moreover, what is the
best straight line that represents the relationships we want
to estimate in the future, regardless of our specific research
question?
As Econometrics practitioners, our main goal should be explain-
ing variations in yi due to changes in xi , with the role played by
our “ignorance,” ui , being as small as possible. So, how do we
translate these words (i) graphically and (ii) mathematically?
I will answer these questions in class. You may stop reading
this notes here. The rest will make sense after we discuss these
issues in person.

12
Figure 4: Residuals vs. regression line.

After this graph makes sense in your mind, you can see that
Ordinary Least Squares (OLS) is the mathematical technique
for obtaining the estimates of yi , β0 and β1 that will make
the residuals, ui , as small as possible. In other words, it
calculates the β coefficients that minimize the sum of the
squared residuals of our model.6 6 Why OLS, though? Put simply, it

is easy to compute (both manual and

OLS gives the “best” estimator possible for β̂0 and β̂1 under a computationally); it is theoretically ap-
set of assumptions, that we will explore in a couple of weeks. propriate for statistical work, and has a
Also, pay attention to the “best” term stated in the previous great number of useful characteristics
that we will explore little by little in
sentence. We will se what this means in future lectures.
our course.

Estimator: it is a mathematical
technique that is applied to a sample
to produce numerical estimates of the
“true” population parameters. Do not
confuse estimator with estimate:
estimator is the formula, and an
estimate is the final numerical value
generated by this formula. To be clear,
OLS is an estimator, as well as β0 and
β1 . On the other hand, β̂0 , β̂1 , and û
are estimates.

13
Now, we are all set to start playing with real-world data. OLS
will be our preferred estimator until the end of the semester.
Just be aware that there are many other estimators out there,
depending on the circumstances. However, the basic founda-
tions of Econometrics are built upon OLS, and it is important
to master it before moving on to more complex methods.
Keep these formulas in mind, we will use them!

Ín
i=1 (xi − x̄)(yi − ȳ) Cov(xi , yi )
β̂1 = Ín 2
=
i=1 (xi − x̄) Var(xi )

β̂0 = ȳ − β̂1 x̄

Last question: refresh your memory

from your Stats class and interpret
the following regression model
estimates: ŷi = 103.4 + 6.38xi . This
interpretation routine will give you
many points in future assignments
and exams.

LESSON 1 - 10-03-2025 - Merged
No ratings yet
LESSON 1 - 10-03-2025 - Merged
210 pages
Wooldridge (2018) - Introductury Econometrics - A Modern Approach-Chapter 2
No ratings yet
Wooldridge (2018) - Introductury Econometrics - A Modern Approach-Chapter 2
47 pages
Logit Probit
No ratings yet
Logit Probit
17 pages
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
100% (1)
Loss Models From Data To Decisions Third Edition Stuart A. Klugman PDF Download
49 pages
Perilaku Biaya Aktivitas
No ratings yet
Perilaku Biaya Aktivitas
54 pages
IE Chapter2
No ratings yet
IE Chapter2
46 pages
16 Linear Regression - CP, AIC, BIC, and Adjusted R2 - PCA
No ratings yet
16 Linear Regression - CP, AIC, BIC, and Adjusted R2 - PCA
83 pages
Ch14 Regression
No ratings yet
Ch14 Regression
89 pages
3-4 CLRM
No ratings yet
3-4 CLRM
87 pages
Lecture 1
No ratings yet
Lecture 1
36 pages
CH2 Complete Simple Linear Regression 2011 Mesfin
No ratings yet
CH2 Complete Simple Linear Regression 2011 Mesfin
42 pages
Lecture Notes
No ratings yet
Lecture Notes
141 pages
Intervalo de Confianza y Dummy Variables 1
No ratings yet
Intervalo de Confianza y Dummy Variables 1
13 pages
Chapter2 PDF
No ratings yet
Chapter2 PDF
68 pages
Am (101-120) Analisis Multinivel
No ratings yet
Am (101-120) Analisis Multinivel
20 pages
Stat Cor Reg
No ratings yet
Stat Cor Reg
85 pages
Chapter 14 (14.1 - 14.2)
No ratings yet
Chapter 14 (14.1 - 14.2)
22 pages
Problem Set 03 - Solutions
No ratings yet
Problem Set 03 - Solutions
16 pages
Lecture 3 Classical Linear Regression Model
No ratings yet
Lecture 3 Classical Linear Regression Model
55 pages
AA3 - Linear Regression - 2024
No ratings yet
AA3 - Linear Regression - 2024
26 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
83 pages
Thesis Multiple Regression Analysis
100% (2)
Thesis Multiple Regression Analysis
4 pages
Logistic Regression
0% (1)
Logistic Regression
49 pages
Module 3 - Data Analysis - S RM
No ratings yet
Module 3 - Data Analysis - S RM
63 pages
Chapter 3 - Linear Regression
No ratings yet
Chapter 3 - Linear Regression
43 pages
Hoang Pham - Springer Handbook of Engineering Statistics-2023
100% (3)
Hoang Pham - Springer Handbook of Engineering Statistics-2023
1,135 pages
Statistical Interference Lecture-8
No ratings yet
Statistical Interference Lecture-8
12 pages
Chapter 16 - Class
No ratings yet
Chapter 16 - Class
36 pages
Class 8
No ratings yet
Class 8
9 pages
ML MCQ 1
No ratings yet
ML MCQ 1
5 pages
6 +ARTIKEL+Nur+Rahma
No ratings yet
6 +ARTIKEL+Nur+Rahma
9 pages
M.E. Ise
No ratings yet
M.E. Ise
60 pages
Business Statistics CH 2
No ratings yet
Business Statistics CH 2
49 pages
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
No ratings yet
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
14 pages
Stoltzfus (2011) Logreg
No ratings yet
Stoltzfus (2011) Logreg
6 pages
Econometrics - Chapter - Chapter - II
No ratings yet
Econometrics - Chapter - Chapter - II
34 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
No ratings yet
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
43 pages
Correlation and Regression
No ratings yet
Correlation and Regression
15 pages
Review Question Econometrics - 2
No ratings yet
Review Question Econometrics - 2
3 pages
Econometrics Notes
No ratings yet
Econometrics Notes
95 pages
Regression Analysis MCQ's
No ratings yet
Regression Analysis MCQ's
3 pages
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
No ratings yet
Simple Regression and Simple Correlation: MA261 Statistical and Numerical Techniques March 24, 2022
52 pages
Reading 2 Time-Series Analysis
No ratings yet
Reading 2 Time-Series Analysis
47 pages
Econometrics Unit 3 Tedy Best
No ratings yet
Econometrics Unit 3 Tedy Best
147 pages
Problem Set 2
No ratings yet
Problem Set 2
3 pages
Latin Square Design
No ratings yet
Latin Square Design
8 pages
Aiml M3 C3
No ratings yet
Aiml M3 C3
37 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Decision Sciences Formulae Sheet
No ratings yet
Decision Sciences Formulae Sheet
3 pages
Data Analytics Lesson 11 Notes
No ratings yet
Data Analytics Lesson 11 Notes
8 pages
Correlation
No ratings yet
Correlation
13 pages
Ida Unit-3
No ratings yet
Ida Unit-3
34 pages
By: Domodar N. Gujarati: Prof. M. El-Sakka
No ratings yet
By: Domodar N. Gujarati: Prof. M. El-Sakka
19 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
Econometrics Lecture Notes
No ratings yet
Econometrics Lecture Notes
16 pages
Presentation4 - Bivariate Analysis and Simple Linear Regression
No ratings yet
Presentation4 - Bivariate Analysis and Simple Linear Regression
31 pages
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
No ratings yet
Chapter One Review of Linear Regression Models: Definitions and Components of Econometrics
65 pages
University of Caloocan City: Managerial Economics Eco 3
No ratings yet
University of Caloocan City: Managerial Economics Eco 3
34 pages
Econometrics Lectures
No ratings yet
Econometrics Lectures
22 pages
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
No ratings yet
ECO 391-007 Lecture Handout For Chapter 15 SPRING 2003 Regression Analysis Sections 15.1, 15.2
22 pages
Regression&Corr&Annova
No ratings yet
Regression&Corr&Annova
71 pages
Summary: Correlation and Regression
No ratings yet
Summary: Correlation and Regression
6 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
Probablity
No ratings yet
Probablity
4 pages
Introduction To Simple Linear Regression
No ratings yet
Introduction To Simple Linear Regression
34 pages
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
No ratings yet
Chapter Two: Simple Linear Regression Model: 2.1 Introduction To Regression Analysis
7 pages
Week 3 Forecasting Homework
No ratings yet
Week 3 Forecasting Homework
4 pages
Ra Web
No ratings yet
Ra Web
70 pages
Determining The Sample Size (Continuous Data)
No ratings yet
Determining The Sample Size (Continuous Data)
2 pages
Econometrics Means "Economic Measurement." Although Measurement Is An Important Part of
No ratings yet
Econometrics Means "Economic Measurement." Although Measurement Is An Important Part of
4 pages
Principles of Transportation Engineering: Learning Module Series Unit 2: Lesson 2
No ratings yet
Principles of Transportation Engineering: Learning Module Series Unit 2: Lesson 2
6 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Correlation
100% (1)
Correlation
29 pages
4 STAT-602 Regression & Correlation (Mid&Final)
No ratings yet
4 STAT-602 Regression & Correlation (Mid&Final)
22 pages
Note Simple Linear Regression
No ratings yet
Note Simple Linear Regression
17 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
MId - Term 2
No ratings yet
MId - Term 2
9 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
Unit Regression Analysis: Objectives
No ratings yet
Unit Regression Analysis: Objectives
18 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
A Tutorial On How To Run A Simple Linear Regression in Excel
No ratings yet
A Tutorial On How To Run A Simple Linear Regression in Excel
19 pages
Cars Sales (In 1,000 Units) Price (In Lakh Rupees) Mileage (KM/LTR) Top Speed (KM/HR)
No ratings yet
Cars Sales (In 1,000 Units) Price (In Lakh Rupees) Mileage (KM/LTR) Top Speed (KM/HR)
10 pages
3 STAT-602 Regression & Correlation
No ratings yet
3 STAT-602 Regression & Correlation
4 pages
Regression
No ratings yet
Regression
3 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Regression Analysis With Cross-Sectional Data
No ratings yet
Regression Analysis With Cross-Sectional Data
0 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Simple Regression

Uploaded by

Simple Regression

Uploaded by

An introduction to Ordinary

Least Squares (OLS):

The modern interpretation of regression 2

Deterministic vs. stochastic relationships 3

Regression vs. correlation 4

Single-equation (simple) linear models 6

The stochastic error term 7

The regression function (finally!) 10

What is the best method to estimate the sample regres-

The modern interpretation of regression

For our purposes, the term regression is concerned with the

you are curious to know more about

In statistical relationships, we generally deal with random

sider relevant to explain crop yield, in

and fluctuate according to several different factors. A typical

Regression vs. correlation

Since in this course we will be mostly interested in statistical

assessment on how the variables are related. However, no

Correlation implies a linear

Dependent (y) Independent (x)

Single-equation (simple) linear models

The simplest single-equation linear regression model can be

Let’s break down every component of this equation. Once

For those familiar with calculus, β1 is

Now, we introduce something new (hopefully): besides the

Now, we have a complete regression model! And it has a deter-

pected Value means. It is simply the

Put simply, we are just calculating the average value of Y, given

How are ε and X related?

If ε and X are uncorrelated, then by definition these are not

approach to define this relationship. Instead, we can define the

The above equation simply states that, on average, the value of

This is the simplest way to estimate how an individual’s wage is

If this assumption is true, then

E(abil|educ = 8) = E(abil|educ = 16) = E(abil)

This means that years of education are independent, on average,

The regression function (finally!)

The main goal of Statistics (and, therefore, Econometrics) is

So far, we have worked with the notion of a population regres-

From these two equations, we know that the average value of Y

The sample regression function

We almost never have access to the “true” regression model

for each i = 1, 2, 3, ..., n.

What is the best method to estimate the sample

So far, we have only conceptually defined the regression func-

is easy to compute (both manual and

Last question: refresh your memory

You might also like