0% found this document useful (0 votes)

55 views29 pages

Lecture8 4

This document provides an introduction to simple linear regression. It defines key concepts such as the dependent and independent variables, scatterplots for examining relationships between variables, the correlation coefficient for measuring the strength and direction of relationships, and the assumptions and probabilistic model underlying simple linear regression. It also introduces residuals and the least squares regression line which minimizes the sum of squared residuals.

Uploaded by

victorleehc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views29 pages

Lecture8 4

Uploaded by

victorleehc

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

LECTURE 8: SIMPLE LINEAR

REGRESSION (PART I)
Weihua Zhou
Department of Mathematics and Statistics
Introduction
So far we have done statistics
on one variable at a time. We
now interested in relationships
between two variables and how to
use one variable to predict another
variable.
 Does weight depend on height?
 Does blood pressure level predict life expectancy?
 Do SAT scores predict college performance?
 Does taking Statistics course make you a better
person?
 Dependent and Independent Variables
Most statistical studies examine data on more than one
variable. In many of these settings, the two variables
play different roles.

Definition:
A dependent (response) variable measures an outcome
of a study. An independent (predictor) variable may
help explain or influence changes in a response variable.

Note: In many studies, the goal is to show that changes in

one or more explanatory variables actually cause changes
in a response variable. However, other predictor-response
relationships don’t involve direct causation.
 Displaying Relationships: Scatterplots
Make a scatterplot of the relationship between body weight and pack
weight.
Since Body weight is our predictor variable, be sure to place it on the X-
axis!
Body weight (lb) 120 187 109 103 131 165 158 116

Backpack weight (lb) 26 30 26 24 29 35 31 28

 Interpreting Scatterplots
To interpret a scatterplot, follow the basic strategy of data
analysis from Chapters 1 and 2. Look for patterns and
important departures from those patterns.

How to Examine a Scatterplot

As in any graph of data, look for the overall pattern and
for striking departures from that pattern.
• You can describe the overall pattern of a scatterplot by
the direction, form, and strength of the relationship.
• An important kind of departure is an outlier, an
individual value that falls outside the overall pattern of
the relationship.
 Interpreting Scatterplots

Outlier
 There is one possible outlier, the hiker with
the body weight of 187 pounds seems to be
carrying relatively less weight than are the
other group members.

Strength Direction Form

 There is a moderately strong, positive, linear relationship
between body weight and pack weight.
 It appears that lighter students are carrying lighter backpacks.
Examples

Positive linear - strong Negative linear -weak

Curvilinear No relationship
The Correlation Coefficient
The strength and direction of the relationship between x and y
are measured using the correlation coefficient (Pearson
product moment coefficient of correlation), r.

𝒔𝒔𝒙𝒚
𝒓= where
𝒔𝒔𝒙𝒙 ∙ 𝒔𝒔𝒚𝒚
2
1
𝑠𝑠𝑥𝑥 = ෍ 𝑥 2 − (෍ 𝑥)
𝑛
2
1
𝑠𝑠𝑦𝑦 = ෍ 𝑦 2 − (෍ 𝑦)
𝑛
1
𝑠𝑠𝑥𝑦 = ෍ 𝑥𝑦 − (෍ 𝑥) (෍ 𝑦)
𝑛
Example

The table shows the heights and weights of

n = 10 randomly selected college football players.
Player 1 2 3 4 5 6 7 8 9 10
Height, x 73 71 75 72 72 75 67 69 71 69
Weight, y 185 175 200 210 190 195 150 170 180 175

Use your calculator S xy  328 S xx  60.4 S yy  2610

to find the sums
328
and sums of r  .8261
squares. (60.4)(2610)
Football Players

Scatterplot of Weight vs Height

210

200

190
Weight

180
r = .8261
170

160
Strong positive
150
correlation
66 67 68 69 70 71 72 73 74 75
Height
As the player’s
height increases, so
does his weight.
Interpreting r
• -1  r  1 Sign of r indicates direction of
the linear relationship.

•r0 No relationship; random scatter of

points

• r  1 or –1 Strong relationship; either

positive or negative

All points fall exactly on a

• r = 1 or –1 straight line.
 Correlation Coefficient
Example

Suppose a experiment is conducted to study the relationship between the percentage of a certain drug in the
bloodstream (𝑥) and the length of the time it takes to react to a stimulus (𝑦). The results are below.

Amount of drug (𝒙) 2 1 4 3 5

Reaction time 𝑦 1 1 2 2 4

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

෍ 𝑥 2 = 55, ෍ 𝑦 2 = 26

Find the correlation coefficient and explain in the context of the problem.
Probabilistic Model
• Probabilistic model:
y = deterministic model + random error
• Random error represents random fluctuation from the
deterministic model
• The probabilistic model is assumed for the population
• Simple linear regression model:
y = α + βx + ε
• Without the random deviation ε, all observed points (x, y)
points would fall exactly on the deterministic line. The
inclusion of ε in the model equation allows points to deviate
from the line by random amounts.
Basic Assumptions of the Simple
Linear Regression Model
1. The distribution of ε at any particular x value has mean value 0.
2. The standard deviation of ε is the same for any particular value
of x. This standard deviation is denoted by 𝜎.
3. The distribution of ε at any particular x value is normal.
4. The random errors are independent of one another.
 The Distribution of y
The figure below shows the regression model when the conditions are met. The
line in the figure is the population regression line µy= α + βx.

The Normal curves show how y

will vary when x is held fixed at
different values. All the curves
For each possible value of have the same standard
the explanatory variable x, deviation σ, so the variability of
the mean of the responses y is the same for all values of x.
µ(y | x) moves along this
line.

The value of σ determines

whether the points fall close to
the population regression line
(small σ) or are widely
scattered (large σ).
Data
1. So far we have described the population probabilistic model.
2. Usually three population parameters, α, β and σ, are unknown.
We need to estimate them from data.
3. Data: n pairs of observations of independent and dependent
variables
(x1,y1), (x2,y2), …, (xn,yn)
4. Probabilistic model
yi = a + b xi + ei, i=1,…,n
ei are independent normal with mean 0 and standard deviation s.
 Residuals

In most cases, no line will pass exactly through all the points in a scatterplot. A
good regression line makes the vertical distances of the points from the line
as small as possible.

Definition:
A residual is the difference between an observed value of the response
variable and the value predicted by the regression line. That is,
residual = observed y – predicted y
residual = y - ŷ Positive residuals
(above line)

residual
Negative residuals
(below line)
 Least-Squares Regression Line
Different regression lines produce different residuals. The regression
line we want is the one that minimizes the sum of the squared
residuals.

Definition:
The least-squares regression line of y on x is the line that makes the sum of
the squared residuals as small as possible.
Least Squares Regression
The sum of squared errors in regression is:
n n
SSE =  e i   (y i - y$ i ) 2
2
SSE: sum of squared errors
i=1 i=1

The least squares regression line is that which minimizes the SSE
with respect to the estimates a and b.
a
SSE

Parabola function

Least squares a

Least squares b b
Sums of Squares, Cross Products, and
Least Squares Estimators
Sums of Squares and Cross Products:
SSxx  (x x )  x
 - 2
 2
-
( x)
2

n 2
( y)
SSyy  ( y y )  y
 - 2
 2
-
n
(  x)( y )
SSxy  (x x )( y y )  xy
 - -  -
n
ŷ  a  bx

Least - squares regression estimators:

SS
b  xy
SSxx
a  y -bx ŷ  a  bx
Example

Amount of drug (𝒙) 2 1 4 3 5

Reaction time 𝑦 1 1 2 2 4

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

෍ 𝑥 2 = 55, ෍ 𝑦 2 = 26

(1) Find the least squares regression line.

(2) Interpret the meaning of the slope in the context of the problem.
(3) Predict the reaction time when the amount of drug is 3.5
(4) Predict the reaction time when the amount of drug is 0
 Extrapolation
We can use a regression line to predict the response ŷ for a specific
value of the explanatory variable x. The accuracy of the prediction
depends on how much the data scatter about the line.
While we can substitute any value of x into the equation of the
regression line, we must exercise caution in making predictions
outside the observed values of x.

Definition:

Extrapolation is the use of a regression line for prediction far outside

the interval of values of the explanatory variable x used to obtain the
line. Such predictions are often not accurate.

Don’t make predictions using values of x that are much larger or much
smaller than those that actually appear in your data.
𝟐
The estimation of 𝝈
 Because 𝜎 2 is a population parameter, we will rarely know its true value. The
best we can do is to estimate it!
 The mean square error estimates 𝜎 2
σ𝑛𝑖=1(𝑦𝑖 −𝑦ො𝑖 )2 𝑆𝑆𝐸 𝑆𝑆𝐸
MSE = = =
𝑛−2 𝑑. 𝑓. 𝑛 − 2
Where
𝑛

𝑆𝑆𝐸 = ෍(𝑦𝑖 −𝑦ො𝑖 )2 = 𝑆𝑆𝑦𝑦 − 𝛽መ1 𝑆𝑆𝑥𝑥

𝑖=1
Total Variance and Error Variance
Y
 ( y - y ) 2 Y
 ( y - ˆ
y ) 2

n -1 n-2

X X

What you see when looking

What you see when looking
at the total variation of Y.
along the regression line at
the error variance of Y.
The Analysis of Variance
The total variation in the experiment is measured by the total sum
of squares:

SST  S yy  ( y - y ) 2

The SST is divided into two parts:

SSR (sum of squares for regression): measures the variation
explained by including the independent variable x in the model.
SSE (sum of squares for error): measures the leftover
variation not explained by x.
The Analysis of Variance

( y - y )  ( y - y$)  ( y$ - y )
Y Total = Unexplained Explained
Deviation Deviation Deviation
Y . (Error) (Regression)

Y
Unexplained Deviation

Explained Deviation
{
}
{
Total Deviation

SST
2
= SSE
2
 ( y - y )   ( y - y$)   ( y$ - y )
+ SSR

Percentage of
2

SSR SSE
Rr22=   1- total variation
SST SST explained by the
X regression.
X
𝟐
𝑹 : Coefficient of Determination
 The coefficient of determination, R2 is a descriptive measure of the
strength of the regression relationship, a measure how well the regression
line fits the data.
The coefficient of determination is defined as
SSR SSE
R 
2
 1-
SST SST

 R2 is the percentage of total variation explained by the regression.

 It is is a number between zero and one and a value close to zero suggests a
poor model.
 A very high value of R2 can arise even though the relationship between the
two variables is non-linear. The fit of a model should never simply be
judged from the R2 value alone.
 It can be proved that
𝑅2 = 𝑟 2
Example

Amount of drug (𝒙) 2 1 4 3 5

Reaction time 𝑦 1 1 2 2 4

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

෍ 𝑥 2 = 55, ෍ 𝑦 2 = 26

(1) Find the MSE.

(2) Find the coefficient of determination.

Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Corr and Regress
No ratings yet
Corr and Regress
61 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Lecture 4
No ratings yet
Lecture 4
60 pages
Session 17
No ratings yet
Session 17
23 pages
Lecture 8 Correlation and Linear Regression
No ratings yet
Lecture 8 Correlation and Linear Regression
66 pages
Chapter 2 - 1907876925
No ratings yet
Chapter 2 - 1907876925
33 pages
Session 19&20
No ratings yet
Session 19&20
54 pages
Corr and Regress
No ratings yet
Corr and Regress
30 pages
Engineering - Simple Correlation and Regression - 2024
No ratings yet
Engineering - Simple Correlation and Regression - 2024
35 pages
06 Regression
No ratings yet
06 Regression
18 pages
Lecture 9 Simple-Linear-Regression-Correlation Updated
No ratings yet
Lecture 9 Simple-Linear-Regression-Correlation Updated
44 pages
Chapter 3: Describing Relationships: Section 3.2
No ratings yet
Chapter 3: Describing Relationships: Section 3.2
23 pages
Regression Analysis NEW-1
No ratings yet
Regression Analysis NEW-1
60 pages
Lecture 12
No ratings yet
Lecture 12
47 pages
Bio-L8 - Correlation and Regression Analysis
No ratings yet
Bio-L8 - Correlation and Regression Analysis
15 pages
Sec2 Regression PDF
No ratings yet
Sec2 Regression PDF
183 pages
Correlation and Regression
No ratings yet
Correlation and Regression
30 pages
Week 13
No ratings yet
Week 13
25 pages
Regression Models - Follow
No ratings yet
Regression Models - Follow
7 pages
Chapter7
No ratings yet
Chapter7
52 pages
Lecture 8
No ratings yet
Lecture 8
26 pages
ECO2004 Ch13
No ratings yet
ECO2004 Ch13
13 pages
Simple Linear Regression1
No ratings yet
Simple Linear Regression1
36 pages
Unit 2 - Scatterplots Correlation and Regression Summer 2021
No ratings yet
Unit 2 - Scatterplots Correlation and Regression Summer 2021
43 pages
Chapter 5 - Regression
No ratings yet
Chapter 5 - Regression
7 pages
PE Civil: Transportation Ebook Practice Exam
No ratings yet
PE Civil: Transportation Ebook Practice Exam
41 pages
MAP 716 Lecture 4 Simple Linear Regression
No ratings yet
MAP 716 Lecture 4 Simple Linear Regression
23 pages
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
No ratings yet
6.3 SSK5210 Parametric Statistical Testing - Analysis of Variance LR and Correlation - 2
39 pages
Lecture 4 Linear Regression
No ratings yet
Lecture 4 Linear Regression
75 pages
Relationship - Correlation and Regression
No ratings yet
Relationship - Correlation and Regression
42 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
No ratings yet
Simple Linear Regression and Correlation: Model and Examine The Relationship Between A and One or More (Predictors)
31 pages
Simple LR Lecture
No ratings yet
Simple LR Lecture
60 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
Case Reyem Affair
100% (3)
Case Reyem Affair
22 pages
Regression
0% (1)
Regression
38 pages
Prediction Is A Key Task of Statistics
No ratings yet
Prediction Is A Key Task of Statistics
18 pages
6 Continuous Data Analysis
No ratings yet
6 Continuous Data Analysis
49 pages
Regression Analysis
No ratings yet
Regression Analysis
21 pages
Part 2 Exploring Relationships Among Variables
No ratings yet
Part 2 Exploring Relationships Among Variables
8 pages
Simple Regression and Correlation
No ratings yet
Simple Regression and Correlation
30 pages
STAT630Slide Adv Data Analysis
No ratings yet
STAT630Slide Adv Data Analysis
238 pages
Inference For Regression
No ratings yet
Inference For Regression
24 pages
Lecture9 Regression1 PDF
No ratings yet
Lecture9 Regression1 PDF
22 pages
An Examination of The Influence of Retirement Planning On Retirement Adjustment and Retirement Well-Being in Nigeria
No ratings yet
An Examination of The Influence of Retirement Planning On Retirement Adjustment and Retirement Well-Being in Nigeria
8 pages
Linear Regression Example Data
100% (1)
Linear Regression Example Data
43 pages
Advanced Marketing Research
No ratings yet
Advanced Marketing Research
32 pages
5) DOE Design and Analysis Using Minitab
100% (1)
5) DOE Design and Analysis Using Minitab
48 pages
Corelation and Regression
No ratings yet
Corelation and Regression
137 pages
Unit 2-Part 3-Linear Regression
No ratings yet
Unit 2-Part 3-Linear Regression
38 pages
Complete Business Statistics: Simple Linear Regression and Correlation
No ratings yet
Complete Business Statistics: Simple Linear Regression and Correlation
50 pages
Chapter No 11 (Simple Linear Regression)
No ratings yet
Chapter No 11 (Simple Linear Regression)
3 pages
QMM Epgdm 5
No ratings yet
QMM Epgdm 5
58 pages
Regression and Correlation
No ratings yet
Regression and Correlation
14 pages
Chapter 6 Student
No ratings yet
Chapter 6 Student
21 pages
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
No ratings yet
Sawtooth Software: Analysis of Traditional Conjoint Using Microsoft Excel: An Introductory Example
7 pages
Frank-Lynch-Rego, 2009, Tax Reporting Aggressiveness and Its Relation To Aggressive Financial Reporting
No ratings yet
Frank-Lynch-Rego, 2009, Tax Reporting Aggressiveness and Its Relation To Aggressive Financial Reporting
49 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
02whole PDF
No ratings yet
02whole PDF
228 pages
Reliance Industries Limited - Financial Model
No ratings yet
Reliance Industries Limited - Financial Model
19 pages
LabVIEW Integrated With Python Machine Learning For DC Motor Drive
No ratings yet
LabVIEW Integrated With Python Machine Learning For DC Motor Drive
6 pages
APtest 3B
No ratings yet
APtest 3B
4 pages
Applied Energy: Amin Talebian-Kiakalaieh, Nor Aishah Saidina Amin, Alireza Zarei, Iman Noshadi
No ratings yet
Applied Energy: Amin Talebian-Kiakalaieh, Nor Aishah Saidina Amin, Alireza Zarei, Iman Noshadi
10 pages
Auditdysfunctional
No ratings yet
Auditdysfunctional
18 pages
M Hurdle
No ratings yet
M Hurdle
41 pages
Factor Influencing On Hanu Students' House Rent: Econometrics Project
No ratings yet
Factor Influencing On Hanu Students' House Rent: Econometrics Project
23 pages
Econometrics For Finance Assignment 2 2023 12-07-12!14!23
100% (1)
Econometrics For Finance Assignment 2 2023 12-07-12!14!23
3 pages
The Relationship Between Work, School Performance and School Attendance of Primary School Children in Turkey
No ratings yet
The Relationship Between Work, School Performance and School Attendance of Primary School Children in Turkey
14 pages
A View About The Determinants of Change in Share Prices A Case From Karachi Stock Exchange (Banking)
No ratings yet
A View About The Determinants of Change in Share Prices A Case From Karachi Stock Exchange (Banking)
17 pages
The Impact of Financial Education For Youth: Verónica Frisancho
No ratings yet
The Impact of Financial Education For Youth: Verónica Frisancho
25 pages
Analytics Concepts Social Listening
No ratings yet
Analytics Concepts Social Listening
10 pages
WTE Thesis DataRazo
No ratings yet
WTE Thesis DataRazo
14 pages
Data Analysis Report Team 5
No ratings yet
Data Analysis Report Team 5
15 pages
FINAL - CC01 - Group7
No ratings yet
FINAL - CC01 - Group7
23 pages
Cost Pract1 7
No ratings yet
Cost Pract1 7
43 pages
Domain-Specific Physical Activity and Mental Health-A Meta-Analysis
No ratings yet
Domain-Specific Physical Activity and Mental Health-A Meta-Analysis
14 pages
College of Computer Studies: Vision: Mission: Program Objectives
No ratings yet
College of Computer Studies: Vision: Mission: Program Objectives
6 pages
1 s2.0 S2666016425000878 Main
No ratings yet
1 s2.0 S2666016425000878 Main
37 pages
1 PB
No ratings yet
1 PB
15 pages
The Influence of Student's Mathematical Beliefs On Metacognitive Skills in Solving Mathematical Problem
No ratings yet
The Influence of Student's Mathematical Beliefs On Metacognitive Skills in Solving Mathematical Problem
11 pages
ECON 310 Stata Assignment
No ratings yet
ECON 310 Stata Assignment
8 pages

Lecture8 4

Uploaded by

Lecture8 4

Uploaded by

LECTURE 8: SIMPLE LINEAR

Note: In many studies, the goal is to show that changes in

Backpack weight (lb) 26 30 26 24 29 35 31 28

How to Examine a Scatterplot

Strength Direction Form

Positive linear - strong Negative linear -weak

The table shows the heights and weights of

Use your calculator S xy  328 S xx  60.4 S yy  2610

Scatterplot of Weight vs Height

•r0 No relationship; random scatter of

• r  1 or –1 Strong relationship; either

All points fall exactly on a

Amount of drug (𝒙) 2 1 4 3 5

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

The Normal curves show how y

The value of σ determines

Least - squares regression estimators:

Amount of drug (𝒙) 2 1 4 3 5

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

(1) Find the least squares regression line.

Extrapolation is the use of a regression line for prediction far outside

𝑆𝑆𝐸 = ෍(𝑦𝑖 −𝑦ො𝑖 )2 = 𝑆𝑆𝑦𝑦 − 𝛽መ1 𝑆𝑆𝑥𝑥

What you see when looking

The SST is divided into two parts:

 R2 is the percentage of total variation explained by the regression.

Amount of drug (𝒙) 2 1 4 3 5

෍ 𝑥 = 15, ෍ 𝑦 = 10, ෍ 𝑥𝑦 = 37,

(1) Find the MSE.

You might also like