0% found this document useful (0 votes)

38 views

BES - Lecture 10 - Simple Linear Regression

This document provides an overview of simple linear regression and correlation. It discusses predicting home prices using square footage as the independent variable. Key concepts covered include the linear regression model, assessing fit using the standard error of estimate and coefficient of determination, and required conditions for the error variable like having a normal distribution with mean of zero. Examples from an R output on home price data are referenced.

Uploaded by

Diễm Quỳnh Trịnh

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views

BES - Lecture 10 - Simple Linear Regression

Uploaded by

Diễm Quỳnh Trịnh

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

8/25/22

Lecture 10:
Simple Linear Regression
and Correlation

Outline

• Simple linear regression model

• Required conditions for the model
• Model assessment
• Confidence and prediction intervals
• Model diagnostics
• Outliers

Introduction

• In this lecture, we employ regression analysis

to examine the relationship among
quantitative variables.
• The technique is used to predict the value of
one variable (the dependent variable – y)
based on the value of other variables
(independent variables x1, x2, … xk.)

1
8/25/22

Predicting home prices

Recent family home sales in San Antonio
provided the data displayed (partly) in the next
slide (San Antonio Realty Watch website,
November, 2008). We wish to predict the home
prices using the square footage.

Part of the data

What is the dependent variable?

What is the independent variable? 5

Linear relationship?

2
8/25/22

The model
• The first-order linear model or simple linear
regression model

b0 and b1 are unknown,

y = dependent variable therefore they are estimated
y from the data.
x = independent variable
b 0 = y-intercept
b 1 = slope of the line Rise b 1 = rise/run
e = error variable b0 Run

x
7

Least squares method

• Estimates of the coefficients are determined by
– drawing a sample from the population of interest
– calculating sample statistics
– producing a straight line that cuts into the data.

The question is:

y w
Which straight line fits the best?
w
w
w
w w w w w
w
w w w w
w
x 8

Least squares method

The best line is the one that minimises the sum of squared
vertical differences between the points and the line.
Sum of squared differences = (2 - 1)2 + (4 - 2)2 + (1.5 - 3)2 + (3.2 - 4)2 = 6.89
Sum of squared differences = (2 -2.5)2 + (4 - 2.5)2 + (1.5 - 2.5)2 + (3.2 - 2.5)2 = 3.99
Let us compare two lines.
4 (2, 4)
w The second line is horizontal.

3 w (4, 3.2)
2.5
2 The smaller the sum of
(1, 2)w (3, 1.5)
w squared differences,
1 the better the fit of the
line to the data.

1 2 3 4 9

3
8/25/22

Least squares method

The least squares line is:

R output for linear model

Interpreting estimated parameters

4
8/25/22

Error variable: Required conditions

• The error e is a critical part of the regression model.
• Five requirements involving the distribution of e must
be satisfied:
– The mean of e is zero: E(e) = 0.
– The standard deviation of e is a constant (s e) for all
values of x.
– The errors are independent.
– The errors are independent of the independent
variable x.
– The probability distribution of e is normal.
13

Error variable: Required conditions

E(y|x 3)
The standard deviation remains constant ...
b 0 + b 1x 3 µ3
E(y|x 2)
b 0 + b 1x 2 µ2

… but the mean value changes with x E(y|x 1)

b 0 + b 1x 1 µ1

x1 x2 x3

From the first three assumptions we have: y is normally distributed with

mean E(y) = b0 + b1x and a constant standard deviation se 14

Assessing the model

• The least squares method will produce a
regression line whether or not there is a linear
relationship between x and y.
• Consequently, it is important to assess how
well the linear model fits the data.
• Several methods are used to assess the
model:
– using descriptive measurements
– testing and/or estimating the coefficients

5
8/25/22

Sum of squares for errors

– This is the sum of the squared vertical differences
between the points and the regression line.
– It can serve as a measure of how well the line fits
the data.

– This statistic plays a role in every statistical

technique we employ to assess the model.

Standard error of estimate

– The mean error is equal to zero.
– If se is small, the errors tend to be close to zero (close
to the mean error). Then the model fits the data well.
– Therefore we can use se as a measure of the suitability
of using a linear model.
– An unbiased estimator of se2 is given by se2

Home Prices example (cont.)

Read the standard error of estimate from the R
output and describe what it tells you about the
model fit.

Note: We can find the sample mean price to be

$120,270

6
8/25/22

Coefficient of determination
• When we want to measure the strength of
the linear relationship, we use the
coefficient of determination.

Coefficient of determination

in p a
rt b y the regression model
in e d
e x p la
Overall variability in y
rema
ins, in
part,
unex
plaine the error
d

Coefficient of determination
y2

Two data points

(x1, y1) and (x2,
y2) of a certain
y
sample are
shown.
y1

x1 x2

Total variation in y = variation explained by + unexplained variation (error)

the regression line

7
8/25/22

Coefficient of determination
• R2 measures the proportion of the variation
in y that is explained by the variation in x.

¡ R2 takes on any value between zero and one.

R 2 = 1: perfect match between the line and the
data points.
R 2 = 0: there is no linear relationship between x
and y. SST = variation in y = SSR + SSE
22

Home prices example (cont.)

Find the coefficient of determination. What does
this statistic tell you about the model?

57% of the variation in the home prices is explained by

the variation in square footage. The rest (43%) remains
unexplained by this model.

Testing the slope

When no relationship exists between two variables,
the regression line should be horizontal.

q q
q
q q q
q q q
qq q q q q q
q q qq q qq qq qq q q
q qq
q
q q q q q q q q q
q q
q q qq q q qq qq q q qqq q
q q q qqq qq q qq
q q
q qq qqqq q
qq qqqqqq q qq q qq q q q
qq qq qq qqq qq q

Relationship. No relationship.
Different inputs (x) yield Different inputs (x) yield
different outputs (y). the same output (y).
The slope is not equal to zero. The slope is equal to zero.
24

8
8/25/22

Testing the slope

H 0: b 1 = 0
H A: b 1 ¹ 0 (or < 0, or > 0)
– The test statistic is

where

The standard error of

– If the error variable is normally distributed, the

statistic is Student t-distributed with d.f. = n – 2.
25

Testing the slope using the R output

Coefficient of correlation

• The coefficient of correlation is used to measure the

strength of a linear association between two variables.
• The coefficient values range between –1 and 1.
– If r = –1 (perfect negative linear association) or r =
+1 (perfect positive linear association): every point
falls on the regression line.
– If r = 0: there is no linear association.
• The coefficient can be used to test for linear
relationships between two variables.

9
8/25/22

Testing the coefficient of correlation

– When there is no linear
relationship between two
variables, r = 0.
– The hypotheses are:
H 0: r = 0 Y
H A: r ¹ 0
X
– The test statistic is:
The statistic is Student t-distributed
with d.f. = n – 2, provided the variables
are bivariate normally distributed.

Home prices example (cont.)

Test the coefficient of correlation to determine if a
linear relationship exists. R output is provided below:

Using the Regression equation

• Before using the regression model, we need
to assess how well it fits the data.
• If we are satisfied with how well the model fits
the data and the model assumptions are
satisfied, we can use it to make predictions for
y.
Example
– Predict the price of a home with square footage =
1500

10
8/25/22

Prediction interval and confidence

interval
§ Two intervals can be used to discover how closely
the predicted value will match the true value of y
• prediction interval – for a particular value of y
• confidence interval – for the expected value of y.

The prediction interval The confidence

interval

The prediction interval is wider than the confidence interval.

Home Prices example (cont.)

– Provide an 95% confidence interval estimate for
the price of a home with square footage = 1500
– R output:

Home Prices example (cont.)

– Provide an 95% confidence interval estimate for
the mean price of homes with square footage =
1500
– R output:

11
8/25/22

The effect of the given value of x

on the intervals
– As xg moves away from `x the interval becomes
longer. That is, the shortest interval is found at `x.

The confidence interval

when x g =

The confidence interval

when x g =

The confidence interval

when x g =

Regression diagnostics
• The three important conditions required for
the validity of the regression analysis are:
– The error variable is normally distributed.
– The error variance is constant for all values of x.
– The errors are independent of each other.
• How can we diagnose violations of these
conditions?

Regression diagnostics
• Examining the residuals (or standardized
residuals), we can identify violations of the
required conditions.

• For the details à Self-study (read the

textbooks for guidelines). We will give some
examples in the next few slides.

12
8/25/22

Example: Heteroscedasticity
When the requirement of a constant variance is
violated, we have heteroscedasticity.
+
^y
++
Residual
+
+ + + ++
+
+ + + ++ + +
+ + + +
+ + + ++ +
+ + + + y^
+ + ++ +
+ + +
+ ++
+ +++
+

The spread increases with ^y

Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.

+
^y
++
Residual
+ +
+ + + ++
+
+ + + +
+ ++ + +
+ +
+ + + ++ ++ +
+ + + y^ ++
+ + + ++ +
+ + + + +
+ +++
+ ++
+
+
The spread of the data points
does not change much.
38

Example: Heteroscedasticity
When the requirement of a constant variance is
not violated, we have homoscedasticity.

+
^y +++ +
++ ++
Residual
+ +++
+ + +++ +
+ +++
+ + + +
+ ++ +
+ + +
+ ++
+ + + + ++
+ + y^ +
+ + +
+ + + ++ +
+ ++
+ ++
As far as the even spread, this is
a much better situation.
39

13
8/25/22

Example: Non-independence of the

error variable
• A time series is constituted if data were
collected over time.
• Examining the residuals over time, no pattern
should be observed if the errors are
independent.
• When a pattern is detected, the errors are said
to be auto-correlated.
• Autocorrelation can be detected by graphing
the residuals against time.
40

Patterns in the appearance of the residuals

over time indicate that autocorrelation exists.

Residual Residual

+
+ + +
+
+ + +
+ + +
0 + 0 + +
+ Time Time
+ + + + + +
+ + + +
++
+

Note the runs of positive Note the oscillating behaviour

residuals, replaced by runs of the residuals around zero.
of negative residuals.
41

Outliers
• An outlier is an observation that is unusually
small or large.
• Several possibilities need to be investigated when
an outlier is observed:
– There was an error in recording the value.
– The point does not belong in the sample.
– The observation is valid.
• Identify outliers from the scatter diagram.
• It is customary to suspect an observation is an
outlier if its |standard residual| > 2.
42

14
8/25/22

an outlier an influential observation

+++++++++++
+ +
+
+ + … but some outliers
+ +
+
may be very influential.
+ + + +
+
+ +
+

The outlier causes a shift

in the regression line ...

Procedure for simple linear regression

analysis
– Develop a model that has a theoretical basis.
– Gather data for the two variables in the model.
– Draw the scatter diagram to determine whether a linear
model appears to be appropriate.
– Check the required conditions for the errors.
– Assess the model fit.
– If the model fits the data and the assumptions are
satisfied, use the regression equation.

Summary

• Simple linear regression model

• Required conditions for the model
• Model assessment
• Confidence and prediction intervals
• Model diagnostics
• Outliers

Deductive and Inductive Reasoning in Law
No ratings yet
Deductive and Inductive Reasoning in Law
47 pages
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
No ratings yet
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
21 pages
Simple Lin Regress Inference
No ratings yet
Simple Lin Regress Inference
51 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Ch5 SimpleRegression
No ratings yet
Ch5 SimpleRegression
20 pages
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
No ratings yet
Simple Linear Regression 1. Review of Least Squares Procedure 2. Inference For Least Squares Lines
51 pages
Linear Regression
No ratings yet
Linear Regression
21 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
50 pages
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
No ratings yet
The Bucharest University of Economic Studies Bucharest Business School Romanian - French INDE MBA Program
67 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Lecture 14 PDF
No ratings yet
Lecture 14 PDF
32 pages
Chapter 6 (Cont.) Regression Estimation
No ratings yet
Chapter 6 (Cont.) Regression Estimation
16 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
51 pages
2a Linear Regression 18may
No ratings yet
2a Linear Regression 18may
28 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Nonlinear Regression
No ratings yet
Nonlinear Regression
11 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Simple Regression Analysis
No ratings yet
Simple Regression Analysis
60 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Topic 7 Linear Regreation CHP14
No ratings yet
Topic 7 Linear Regreation CHP14
21 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression 18may
No ratings yet
Linear Regression 18may
28 pages
Lecture-3---Linear-Regression-imran-20022025-092939am
No ratings yet
Lecture-3---Linear-Regression-imran-20022025-092939am
46 pages
Module 5
No ratings yet
Module 5
48 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
Regression Analysis 1 2020
No ratings yet
Regression Analysis 1 2020
40 pages
Cor Regression
No ratings yet
Cor Regression
27 pages
Errors in Solutions To Systems of Linear Equations
No ratings yet
Errors in Solutions To Systems of Linear Equations
6 pages
SimpleLineaReg Example
No ratings yet
SimpleLineaReg Example
12 pages
Topic Simple Linear Regression
No ratings yet
Topic Simple Linear Regression
38 pages
Lec4 Numerical Model
No ratings yet
Lec4 Numerical Model
26 pages
Lesson 6-8 Linear Regression and Correlation
No ratings yet
Lesson 6-8 Linear Regression and Correlation
42 pages
Lecture 6 Demand Estimation 17032020 020520pm
No ratings yet
Lecture 6 Demand Estimation 17032020 020520pm
26 pages
Statistical Analysis: Linear Regression
No ratings yet
Statistical Analysis: Linear Regression
36 pages
Numeric Chap 3
No ratings yet
Numeric Chap 3
51 pages
CH5 Linear Regression
No ratings yet
CH5 Linear Regression
24 pages
ch12 0
No ratings yet
ch12 0
82 pages
Desigualdades Precalculus 7th - Larson Hostetler PDF
No ratings yet
Desigualdades Precalculus 7th - Larson Hostetler PDF
8 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
Datamining Lecture6
No ratings yet
Datamining Lecture6
41 pages
Phase Portraits
No ratings yet
Phase Portraits
39 pages
Chap 1
No ratings yet
Chap 1
101 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
43 pages
2.3 ALGEBRA-Simultaneous Equations
No ratings yet
2.3 ALGEBRA-Simultaneous Equations
11 pages
Koneru Lakshmaiah Education Foundation Department of Mathematics I/Iv B.Tech Ist Sem Course: Mathematics For Computing
No ratings yet
Koneru Lakshmaiah Education Foundation Department of Mathematics I/Iv B.Tech Ist Sem Course: Mathematics For Computing
22 pages
Lecture 11 Regression
No ratings yet
Lecture 11 Regression
53 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Nonlinear Finite Element Analysis
100% (1)
Nonlinear Finite Element Analysis
102 pages
Binary Image Analysis: Skeleton Finding Via Distance Transform
No ratings yet
Binary Image Analysis: Skeleton Finding Via Distance Transform
43 pages
Unit 6 Simple Regression and Corr
No ratings yet
Unit 6 Simple Regression and Corr
39 pages
Unit 5
No ratings yet
Unit 5
104 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Complete-Week 2 Line Drawing Algorithms
No ratings yet
Complete-Week 2 Line Drawing Algorithms
48 pages
Lecture 4 Linear Regression
100% (1)
Lecture 4 Linear Regression
44 pages
w3 - Linear Model - Linear Regression
No ratings yet
w3 - Linear Model - Linear Regression
33 pages
Sessions 18 19 - Regression - SLR MLR
No ratings yet
Sessions 18 19 - Regression - SLR MLR
70 pages
TE IT DMBI Module2 Data Preprocessing L8-L11
No ratings yet
TE IT DMBI Module2 Data Preprocessing L8-L11
73 pages
Integral Methods in Low-Frequency Electromagnetics
From Everand
Integral Methods in Low-Frequency Electromagnetics
Pavel Ŝolín
No ratings yet
Stat 11 Q4 Week 5-SSLM
No ratings yet
Stat 11 Q4 Week 5-SSLM
4 pages
University of Delhi Delhi School of Economics Department of Economics Minutes of Meeting Subject Course Date of Meeting Venue Chair Attended by
No ratings yet
University of Delhi Delhi School of Economics Department of Economics Minutes of Meeting Subject Course Date of Meeting Venue Chair Attended by
5 pages
ANOVAangel
No ratings yet
ANOVAangel
24 pages
UNSW - Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics
No ratings yet
UNSW - Econ2206 Solutions Semester 1 2011 - Introductory Eco No Metrics
28 pages
GE 401 Deductive and Inductive Reasoning
No ratings yet
GE 401 Deductive and Inductive Reasoning
9 pages
Cagu I Sem Unit 3 Final
No ratings yet
Cagu I Sem Unit 3 Final
37 pages
Cheat Sheet For Stat Final Exam
No ratings yet
Cheat Sheet For Stat Final Exam
6 pages
Methods in Criminological Research
No ratings yet
Methods in Criminological Research
10 pages
Linear Regression-2: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-2: Prof. Asim Tewari IIT Bombay
19 pages
Chapter 1 - Research Foundation and Fundamentals
No ratings yet
Chapter 1 - Research Foundation and Fundamentals
52 pages
Deductive Reasoning Lesson Plan
No ratings yet
Deductive Reasoning Lesson Plan
4 pages
Devarshi KapadiyaFinal BB 15
No ratings yet
Devarshi KapadiyaFinal BB 15
81 pages
Reduced Syllabus, BA Philosophy, 2020-21
No ratings yet
Reduced Syllabus, BA Philosophy, 2020-21
8 pages
First Quarter Examination RESEARCH 2
No ratings yet
First Quarter Examination RESEARCH 2
4 pages
Addis Ababa University - Group Assignment I-4
No ratings yet
Addis Ababa University - Group Assignment I-4
1 page
ANOVA - Example - Welch and G-H - Key
No ratings yet
ANOVA - Example - Welch and G-H - Key
6 pages
Competing Paradigms in Qualitative Research: Egon G. Guba Yvonn A S. Lincoln
No ratings yet
Competing Paradigms in Qualitative Research: Egon G. Guba Yvonn A S. Lincoln
13 pages
Realm of Philosophy
No ratings yet
Realm of Philosophy
10 pages
Unit 10
No ratings yet
Unit 10
14 pages
Hasil Analisis Data
No ratings yet
Hasil Analisis Data
8 pages
Module 3 Problems, Reasons and Solutions in Mathematics
No ratings yet
Module 3 Problems, Reasons and Solutions in Mathematics
15 pages
8604-1 Spring 2019
No ratings yet
8604-1 Spring 2019
33 pages
SIMULATION Inventory Simulation
No ratings yet
SIMULATION Inventory Simulation
8 pages
PS5 Answer Key
No ratings yet
PS5 Answer Key
6 pages
3 Business Research Methods, Chapter 3 - 2
No ratings yet
3 Business Research Methods, Chapter 3 - 2
30 pages
(Palgrave Philosophy Today) Odysseus Makridis - Symbolic Logic-Palgrave Macmillan (2022)
No ratings yet
(Palgrave Philosophy Today) Odysseus Makridis - Symbolic Logic-Palgrave Macmillan (2022)
493 pages
Samir Okasha
No ratings yet
Samir Okasha
9 pages
Basics of Statistics For Postgraduates
No ratings yet
Basics of Statistics For Postgraduates
5 pages
4 Multiple Regression Models
No ratings yet
4 Multiple Regression Models
66 pages