Chapter 1 Simple Linear Regression

This document discusses simple linear regression models. It defines key terms like dependent and independent variables and the error term. It also explains how to estimate regression coefficients using ordinary least squares on sample data to find the line of best fit.

Uploaded by

Boutaina Az

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

37 views17 pages

Chapter 1 Simple Linear Regression

Uploaded by

Boutaina Az

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 17

II.

Simple regression model

Definition
• y and x are two variables, representing some population, and we are interested in “explaining “y” in terms of
“x”,” or in “studying how “y” varies with changes in “x”.
• Examples: ‘y’ is soybean crop yield and “x” is amount of fertilizer - “y” is hourly wage and “x” is years of
education; “y” is a community crime rate and “x” is number of police officers.
• Simple linear regression model:

• It is also called the two-variable linear regression model or bivariate linear regression model because it
relates the two variables x and y
The meaning of each term (1)
• The variables y and x have several different names used
interchangeably,.
• y is called the dependent variable, the explained variable, the
response variable, the predicted variable, or the regressand.
• x is called the independent variable, the explanatory variable,
the control variable, the predictor variable, or the regressor
• The terms “dependent variable” and “independent variable” are
frequently used in econometrics
The meaning of each term (1)
• The variable “u”, called the error term or disturbance in the relationship, represents factors other
than “x” that affect “y”. A simple regression analysis effectively treats all factors affecting “y”
other than “x” as being unobserved. We can usefully think of “u” as standing for “unobserved.”
• Equation also addresses the issue of the functional relationship between “y”
and “x”. If the other factors in u are held fixed, so that the change in “u” is zero, , then “x” has a
linear effect on “y”:

• Thus, the change in “y” is simply multiplied by the change in “x”. This means that is the slope
parameter in the relationship between “y” and “x” holding the other factors in “u” fixed; it is of
primary interest in applied economics. The intercept parameter also has its uses, although it is
rarely central to an analysis.
Examples
EX1:
Suppose that Soybean yield is determined by the following model:
so that y = yield and x = fertilizer. The economist is interested in the effect of fertilizer on yield, holding other factors
fixed. This effect is given by . The error term “u” contains factors such as land quality, rainfall, and so on. The coefficient
measures the effect of fertilizer on yield, holding other factors fixed: fertilizer.
EX2:
A model relating a person’s wage to observed education and other unobserved factor is :
If wage is measured in dollars per hour and educ is years of education, then measures the change in hourly wage given
another year of education, holding all other factors fixed. Some of those factors include labor force experience, innate
ability, tenure with current employer, work ethics, and innumerable other things.
• Before we state the key assumption about how “x” and “u” are related, there is one assumption
about “u” that we can always make. As long as the intercept is included in the equation, nothing is
lost by assuming that the average value of “u” in the population is zero:
• Because “u” and “x” are random variables, we can define the conditional distribution of “u” given
any value of “x”. In particular, for any “x”, we can obtain the expected (or average) value of “u”
for that slice of the population described by the value of “x”. The crucial assumption is that the
average value of “u” does not depend on the value of “x”. We can write this as:
• This means, for any given value of “x”, the average of the unobservables is the same and therefore
must equal the average value of “u” in the entire population.
Deriving the OLS Estimates
• Now that we have discussed the basic ingredients of the simple regression model, we will address the important
issue of how to estimate the parameters and in equation:
• To do this, we need a sample from the population. Let {(xi,yi): i=1,...,n} denote a random sample of size “n” from
the population. Since these data come from , , we can write : +
• ui is the error term for observation i since it contains all factors affecting yi other than xi.
• Example: xi might be the annual income and yi the annual savings for family i during a particular year. If we have
collected data on 15 families, then n = 15. A scatter plot of such a data set is given in the following figure.
Practice: Estimating parameters
• Let us approach the topic of regression analysis with an example. A mail order business adds a new
summer dress to its collection. The purchasing manager needs to know how many dresses to buy so
that by the end of the season the total quantity purchased equals the quantity ordered by customers. To
prevent stock shortages (i.e. customers going without wares) and stock surpluses (i.e. the business is
left stuck with extra dresses), the purchasing managing decides to carry out a sales forecast.
• What’s the best way to forecast sales? The economist immediately thinks of several possible
predictors or influencing variables. How high are sales of a similar dress in the previous year? How
high is the price? How large is the image of the dress in the catalogue? How large is the advertising
budget for the dress? But we don’t only want to know which independent variables exert an influence;
we want to know how large the respective influence is. To know that catalogue image size has an
influence on the number of orders does not suffice. We need to find out the number of orders that can
be expected on average when the image size is, say, 50 sq cm.
Let us first consider the case where future
demand is estimated from the sales of a
similar dress from the previous year. The
following figure displays the association as
a scatterplot for 100 dresses of a given price
category, with the future demand plotted on
the y-axis and the demand from the previous
year plotted on the x-axis.

If all the plots lay on the angle bisector (an angle bisector divides an angle into two angles of
equal measure), the future demand of period (t) would equal the sold quantities of the previous
year (t -1). As is easy to see, this is only rarely the case. The scatterplot that results contains some
large deviations, producing a correlation coefficient of only r =0.42
Now if, instead of equivalent dresses from
the previous year, we take into account the
catalogue image size for the current season
(t), we arrive at the scatterplot in the new
following figure. We see immediately that
the data points lie much closer to the line,
which was drawn to best approximate the
course of the data. This line is more suited
for a sales forecast than a line produced
using the “equivalence method” in the
previous Figure.

The relatively large correlation coefficient of r=0.95, however, ultimately shows that the linear
association between these variables is stronger. The points lie much closer to the line, which means
that the sales forecast will result in fewer costs for stock shortages and stock surpluses. But, again,
this applies only for products of the same quality and in a specific price category.
The linear equation consists of two components:
1. The intercept is where the line crosses the y-axis. We call this point α. It determines the distance
of the line along the y-axis to the origin.
2. The slope coefficient (β) indicates the slope of the line. From this coefficient we can determine
to what extent catalogue image size impacts demand. If the slope of the lines is two, the value on
the y-axis changes by two units, while the value on the x-axis changes by one unit. In other words,
the flatter the slope, the less influence x values have on the y-axis.

The line in the scatterplot in our previous figure can be represented with the algebraic linear
equation:

This equation intersects the y-axis at the value 138, so that α = 138. Its slope is calculated from the
slope triangle (quotient) β=82/40= 2,1. When the image size increases by 10 sq cm, the demand
increases by 21 dresses. The total linear equation is:
For a dress with an image size of 50 sq cm, we can expect sales to be: dresses
With an image size of 70 sq cm, the equation is:
This linear estimation approximates the
average influence of x variables on y
variables using a mathematical function.
The estimated values are indicated by i, and
the realized values are indicated by yi.
Although the linear estimation runs
through the entire quadrant, the association
between the x and y variable is only
calculated for the area that contains data
points, referred to as the data range. If we
use the regression function for estimations To better illustrate this point, consider the figure above. The
outside this area (as part of a forecast, for marked data point corresponds to dress model 23, which was
instance), we must assume that the advertised with an image size of 47.4 sq cm and which was later
association identified outside the data sold 248 times. The linear regression estimates average sales of
range does not differ from the associations 238 dresses for this image size. The difference between actual
within the data range. sales and estimated sales is referred to as the residual or the
error term. It is calculated by: I )
For dress model 23 the residual is: 23 )=248-
237.5=10.5
In this way, every data point can be expressed as a
combination of the result of the linear regression
and its residual:
i +

We have yet to explain which rule applies for

determining this line and how it can be derived
algebraically. Up to now we only expected that
the line run as closely as possible to as many data
Since we want to prevent both, we can position the line so
points as possible and that deviations above and
that the sum of deviations between realized points and the
below the line be kept to a minimum and be
points on line is as close to zero as possible. The problem
distributed non systematically. The deviations in
with this approach is that a variety of possible lines with
the figure between actual demand and the
different qualities of fit all fulfil this condition. A selection
regression line create stock shortages when they
of possible lines is shown in the figure above
are located above and stock surpluses when they
are located below.
The reason for this is simple: the deviations above and below cancel each other out, resulting in a sum of zero.

All lines that run through the bivariate centroid—the value pair of the averages of x and y—fulfil the condition:

But in view of the differences in quality among the lines, the condition above makes little sense as a construction criterion.
Instead, we need a line that does not allow deviations to cancel each other yet still limits the total sum of errors. Frequently,
statisticians create a line that minimizes the sum of the squared deviations of the actual data points yi from the points on the
line . The minimization of the entire deviation error is:

This method of generating the regression line is called the ordinary least squares method, or OLS. It can be shown that
these lines also run through the bivariate centroid—i.e. the value pair —but this time we only have a single regression line,
which fulfils the condition of the minimal squared error. If we insert equation of the regression line for , we get:
The minimum can be achieved by using the necessary conditions for a minimum, deriving the function f(α; β)
once according to α and once according to β and setting both deviations equal to zero.

From what we know about a mean, we can write:

We should now rearrange the function and simplify it:

Regression
No ratings yet
Regression
60 pages
Econometrics Revision Work
100% (6)
Econometrics Revision Work
6 pages
SIMPLE LINEAR REGRESSION ANALYSIS..
No ratings yet
SIMPLE LINEAR REGRESSION ANALYSIS..
51 pages
Chapter 1_ Simple linear regression
No ratings yet
Chapter 1_ Simple linear regression
17 pages
CHAPTER 14 Regression Analysis
No ratings yet
CHAPTER 14 Regression Analysis
69 pages
Chapter 4
No ratings yet
Chapter 4
15 pages
Linear Regression Analysis
No ratings yet
Linear Regression Analysis
17 pages
MZB127_Topic_11_Lecture_Notes_(Unannotated_Version)
No ratings yet
MZB127_Topic_11_Lecture_Notes_(Unannotated_Version)
14 pages
econometrics notes 2024
100% (1)
econometrics notes 2024
46 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
39 pages
Presentation REGRESSION (9)
No ratings yet
Presentation REGRESSION (9)
26 pages
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
No ratings yet
Lecture 2.2: Simple Regression Model-Linear Equation With One Independent Variable
14 pages
Correlation (Linear Dependence) Linear Regression (Simple and Multiple)
No ratings yet
Correlation (Linear Dependence) Linear Regression (Simple and Multiple)
35 pages
Econometrics I Handout
No ratings yet
Econometrics I Handout
41 pages
Simple Linear Regression Model (The Ordinary Least Squares Method)
No ratings yet
Simple Linear Regression Model (The Ordinary Least Squares Method)
27 pages
Topic 3_simple Regression Analysis
No ratings yet
Topic 3_simple Regression Analysis
37 pages
Unit 9 Simple Linear Regression: Structure
No ratings yet
Unit 9 Simple Linear Regression: Structure
22 pages
CH 6
No ratings yet
CH 6
42 pages
Note Simple Linear Regression
No ratings yet
Note Simple Linear Regression
17 pages
Econometrics for Mgt ppt-2 (1)
No ratings yet
Econometrics for Mgt ppt-2 (1)
58 pages
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
No ratings yet
What Is A Math/Stats Model?: 1. Often Describe Relationship Between Variables 2. Types
64 pages
regression analysis
No ratings yet
regression analysis
8 pages
BS Ref17
No ratings yet
BS Ref17
32 pages
Regression and correlation notes
No ratings yet
Regression and correlation notes
28 pages
Regression
No ratings yet
Regression
18 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
10 pages
Basic Statistics For Data Science
100% (1)
Basic Statistics For Data Science
45 pages
MA Forecasting
No ratings yet
MA Forecasting
22 pages
Regression Analysispdf
No ratings yet
Regression Analysispdf
20 pages
Demand estimation
No ratings yet
Demand estimation
4 pages
Assignment 6- STAT
No ratings yet
Assignment 6- STAT
12 pages
Chapter 2. Simple Linear Regression Module May13
No ratings yet
Chapter 2. Simple Linear Regression Module May13
20 pages
Levenbach Causal2017
No ratings yet
Levenbach Causal2017
15 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
FRM Questions
No ratings yet
FRM Questions
19 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
No ratings yet
Complete Business Statistics: by Amir D. Aczel & Jayavel Sounderpandian 6 Edition
54 pages
1486016038da-mod12-Q1-e-text
No ratings yet
1486016038da-mod12-Q1-e-text
11 pages
Lesson 6 02 Regression 2
No ratings yet
Lesson 6 02 Regression 2
17 pages
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
No ratings yet
REGRESSION and CORRELATION ANALYSIS STA 106 -DR. BASHIRU
10 pages
Econometrics 2
No ratings yet
Econometrics 2
27 pages
REGRESSION ANALYSIS STA 221
No ratings yet
REGRESSION ANALYSIS STA 221
10 pages
Chapter 0
No ratings yet
Chapter 0
10 pages
Iskak, Stats 2
No ratings yet
Iskak, Stats 2
5 pages
ECON3049 Lecture Notes 1
No ratings yet
ECON3049 Lecture Notes 1
32 pages
Tema 0 Econometrics
No ratings yet
Tema 0 Econometrics
6 pages
Regression & Correlation
No ratings yet
Regression & Correlation
18 pages
Topic 5-Lecture Notes
No ratings yet
Topic 5-Lecture Notes
12 pages
lecture 6 linear regression
No ratings yet
lecture 6 linear regression
8 pages
Correlation and Regression 2
No ratings yet
Correlation and Regression 2
24 pages
江艇+IRID metrics 2016 slides
No ratings yet
江艇+IRID metrics 2016 slides
162 pages
R R y X y Y: Imple Inear Egression
No ratings yet
R R y X y Y: Imple Inear Egression
1 page
Different Types of Curve Fitting
No ratings yet
Different Types of Curve Fitting
13 pages
Chapter - Two - Simple Linear Regression - Final Edited
No ratings yet
Chapter - Two - Simple Linear Regression - Final Edited
28 pages
Artificial Intelligence Measurement of Disclosure AIMD
No ratings yet
Artificial Intelligence Measurement of Disclosure AIMD
36 pages
Quantile Regression
No ratings yet
Quantile Regression
122 pages
Gompertz and Logistic growth models
No ratings yet
Gompertz and Logistic growth models
8 pages
How Can We Explore The Association Between Two Quantitative Variables?
No ratings yet
How Can We Explore The Association Between Two Quantitative Variables?
7 pages
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
No ratings yet
Chapter # 6: Multiple Regression Analysis: The Problem of Estimation
43 pages
Data Analytics Unit III
No ratings yet
Data Analytics Unit III
15 pages
Student Notes Madule 2
No ratings yet
Student Notes Madule 2
12 pages
Experiments Montgomery Word
No ratings yet
Experiments Montgomery Word
111 pages
Stock_Watson_3U_ExerciseSolutions_Chapter17_Instructors
No ratings yet
Stock_Watson_3U_ExerciseSolutions_Chapter17_Instructors
26 pages
Module 11 Unit 2 Simple Linear Regression
No ratings yet
Module 11 Unit 2 Simple Linear Regression
12 pages
Regression Analysis - SSB
No ratings yet
Regression Analysis - SSB
2 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Chap 3 Two Variable Regression Model The Problem of Estimation
No ratings yet
Chap 3 Two Variable Regression Model The Problem of Estimation
35 pages
Stata Journal
No ratings yet
Stata Journal
192 pages
Is There Social Capital in Cities? Indonesia
No ratings yet
Is There Social Capital in Cities? Indonesia
26 pages
Eco Basic 1-8
No ratings yet
Eco Basic 1-8
156 pages
Unit 7 Regration and Correlation
No ratings yet
Unit 7 Regration and Correlation
11 pages
ardl-modeling-using-r-software
No ratings yet
ardl-modeling-using-r-software
5 pages
Chapter 5 - 1
No ratings yet
Chapter 5 - 1
5 pages
06 Simple Linear Regression Part1
No ratings yet
06 Simple Linear Regression Part1
8 pages
Session 2-3 (ANOVA) Regression
No ratings yet
Session 2-3 (ANOVA) Regression
54 pages
BBS 2020 03 Dao
No ratings yet
BBS 2020 03 Dao
11 pages
The Non-Performing Loans: Some Bank-Level Evidences
No ratings yet
The Non-Performing Loans: Some Bank-Level Evidences
34 pages
"Conventional Theory Is Embarrassingly Defective. It Greatly Needs To Call More Heavily On Radical Thought" (Charles E. Lindblom, 1982)
100% (1)
"Conventional Theory Is Embarrassingly Defective. It Greatly Needs To Call More Heavily On Radical Thought" (Charles E. Lindblom, 1982)
3 pages
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
No ratings yet
The Empirical Status of Gottfredson and Hirschfs General Theory of Crime: A Meta-Analysis
35 pages
Shiney Chakraborty
No ratings yet
Shiney Chakraborty
23 pages
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
No ratings yet
Clustering Data With Measurement Errors: Mahesh Kumar, Nitin R. Patel, James B. Orlin Operations Research Center, MIT
26 pages
Comandos
No ratings yet
Comandos
51 pages
Empirical Analysis of Lewis Ranis-Fei Model in Pakistan: Rummana Zaheer and Adeel Sultan Kadri
No ratings yet
Empirical Analysis of Lewis Ranis-Fei Model in Pakistan: Rummana Zaheer and Adeel Sultan Kadri
15 pages
Classical Least Squares Theory - Lecture Notes
No ratings yet
Classical Least Squares Theory - Lecture Notes
109 pages
Job Satisfaction and Promotions
No ratings yet
Job Satisfaction and Promotions
21 pages
1416-4241-1-PB-Ibrahim, Mukdad 2015
No ratings yet
1416-4241-1-PB-Ibrahim, Mukdad 2015
8 pages
Class Exercises Topic 2 Solutions: Jordi Blanes I Vidal Econometrics: Theory and Applications
No ratings yet
Class Exercises Topic 2 Solutions: Jordi Blanes I Vidal Econometrics: Theory and Applications
12 pages
Specification Choosing Independent Variables
No ratings yet
Specification Choosing Independent Variables
7 pages
Econometrics Assignment 1
50% (2)
Econometrics Assignment 1
3 pages
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)

Chapter 1 Simple Linear Regression

Uploaded by

Chapter 1 Simple Linear Regression

Uploaded by

II.

Simple regression model

We have yet to explain which rule applies for

From what we know about a mean, we can write:

You might also like