0% found this document useful (0 votes)

66 views

Slides - Simple Linear Regression

This document provides an overview of linear regression. It defines linear regression as drawing the "best" line through data. It explains that the equation of a line is y=mx+b, where b is the y-intercept and m is the slope. Linear regression aims to estimate the parameters β0 and β1 in the model Yi = β0 + β1Xi + εi to find the line that best fits the data based on minimizing the sum of squared residuals. It discusses interpreting the estimated slope b1 and intercept b0 coefficients and making predictions within the relevant range of the data.

Uploaded by

Jarir Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

66 views

Slides - Simple Linear Regression

Uploaded by

Jarir Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Linear Regression

Regression Analysis
Regression Analysis = drawing the “best” line through data
The equation of a line

𝑦 = 𝑚𝑥 + 𝑏
The equation of a line
Rearranging:

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope
The equation of a line

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope

(0,10)
The equation of a line

𝑦 = 𝑏 + 𝑚𝑥
Y = 10 + .8X
• b is the y-intercept

• m is the slope
y increases by .8

(0,10) When x
increases by 1
Simple Linear Regression

• Our goal is to use the data to estimate the “best”

line through the data- specifically the intercept and
the slope.
Linear
Regression • In linear regression, we typically use the symbol β
for parameters (values of the slope and intercept)

• So we try to estimate the following β0 and β1:

𝑌 = β0 + β1 𝑋

• β0 is the y intercept

• β1 is the slope
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes

between any two observations
are different.
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes

between any two observations
are different.

• So we do the best we can at

drawing ONE line through the
data…
• However, we cannot find a
slope and intercept that
perfectly fits the data.

• For example, the slopes

between any two observations
are different.

• So we do the best we can at

drawing ONE line through the
data…

• …and the differences between

the data and our line are called
residuals
Simple Linear Regression Model:

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖

Linear • β0 is the y-intercept

Regression
• β1 is the slope

• ε𝑖 is the residual for observation i=1,2,…N

• Xi is the independent variable

• Yi is the dependent variable

Which Variable do I Make Y, and Which Variable do I make X?

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• You will need to carefully read the question to find key words that tell you which is the
dependent variable (Y) and which is the independent variable (X).

• A variable that is being “predicted”, “explained”, “affected”, “impacted”, etc., is the

dependent variable.

• On the other hand, the variable that does the predicting, explaining, affecting, etc. is
the independent variable.
(Ex) A professor wants to know how well studying predicts test scores.
What is the dependent and independent variable?
The dependent variable is test scores, the independent variable is studying.
What do 𝛽0 and 𝛽1 tell us?

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
• They describe the relationship between the independent and dependent variables.

(Ex) Suppose Y is annual sales, and X is customers.

• If the number of customers increases by 1, β1 (the slope coefficient) tells us how much
annual sales will change.

• β0 , the y-intercept coefficient, tells us the annual sales when X is exactly zero.
➢ Although this can sometimes be interesting, we are usually more interested
in the slope, β1.
#
Practice
Suppose you estimate the following relationship between a manager’s salary
(in thousands) and their age (in years):

෣ = 48.4 + 5.2 𝐴𝑔𝑒

𝑠𝑎𝑙𝑎𝑟𝑦

• What is the interpretation of the coefficient on 𝐴𝑔𝑒?

• What is the interpretation of the y-intercept?

Simple Linear Regression Model:

𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖

Linear • Note that β0 and β1 are population parameters.

Regression
• The represent the real relationship between X and Y.

• But just like with hypothesis testing, we typically only

have a sample of data, so we use this to estimate the
population parameters.

• The estimates of β0 and β1 are typically denoted b0

and b1 .
How do we define the “best” line through the
data?
• Intuitively, we would want to make the residuals
as small as possible.
Linear
𝑌𝑖 = β0 + β1 𝑋𝑖 + ε𝑖
Regression
• Solve for the residuals:
ε𝑖 = 𝑌𝑖 − β0 − β1 𝑋𝑖

• Add up the residuals for every observation:

𝑁 𝑁

෍ ε𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖
𝑖=1 𝑖=1
• Note that some residuals are Positive error
negative and some are
positive

• So when we add them

together, the negative values
partially offset the positive
values
Negative error
• This is bad because it will Positive error
underestimate the total
distance between the errors
and the line

• To fix this problem, we square

the residuals so they’re all
positive
How do we define the “best” line through the data?

𝑁 𝑁

෍ ε𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖
Linear 𝑖=1 𝑖=1
Regression
How do we define the “best” line through the data?

𝑁 𝑁

෍ ε2𝑖 = ෍ 𝑌𝑖 − β0 − β1 𝑋𝑖 2
Linear 𝑖=1 𝑖=1
Regression • This equation gives you the sum of the squared residuals,
or SSR.

• Our goal is to minimize this value.

• Since Yi and Xi are data that we’ve observed (i.e. they can’t
be changed), we can only adjust βo and β1 to achieve this
goal.

• This is called the Least Squares Method of estimating βo

and β1 .
Estimation
• In practice, we use calculus to choose 𝑏0 and 𝑏1 to minimize
the sum of the squared residuals.

Linear • But for simple (one-variable) regression, there is an easy

formula for the slope b1:
Regression
𝐶𝑜𝑣(𝑋, 𝑌)
𝑏1 =
𝑉𝑎𝑟(𝑥)
• After we compute the slope, we can use it to solve
for the intercept with the following formula:
𝑏0 = 𝑌ത − 𝑏1 𝑋ത

• 𝑌ത is the mean of Y

• 𝑋ത is the mean of X
Suppose you want to know how the number of customers near your
store effects annual sales. You decide to use simple linear regression.

• Annual sales (in millions) is your dependent variable, Y.

• Number of customers (in millions) is your independent variable, X.

Estimate b1 and b0
• b1 = 2.07

• b0 = -1.21

Our estimated regression line is:

෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Estimate b1 and b0
• b1 = 2.07

• b0 = -1.21

Our estimated regression line is:

෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
#

Interpretation
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • If the number of customers increases by 1 million, the
annual sales increase by 2.07 million.

• If the number of customers is zero, the average annual

sales are -1.21 million.
Predictions
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • What are the predicted sales if there are 4 million
customers?
Predicted value
Predictions
෡𝑖 = −1.21 + 2.07𝑋𝑖
𝑌
Linear
Regression • What are the predicted sales if there are 4 million
customers?

𝑌෡𝑖 = −1.21 + 2.07 ∗ 4 = 7.07 million dollars

• NOTE: Whenever possible, base your predictions off

the exact coefficient estimates, rather than the
rounded numbers.

➢ With the exact numbers in Excel, the prediction

would be 7.09 million dollars. This is a slightly
more accurate estimate.
Your Predictions are Limited
• Only make predictions that are within the
relevant range of your data
Linear
Regression • In other words, you can predict Y for values of X
that are between the smallest and the largest
values of X in your data

• This is called Interpolation.

• Predicting values outside of your relevant range

is called extrapolation, and should be avoided.
#

In other words…
But not
here

Or here
Or here

You can predict here

Measures of Variation
• Just like with ANOVA,
it can be helpful to
break the total
variation in the data
into 3 different groups SST

1. Variation of the
observed data around 𝑌ത = 6.63
the mean. This is the
total sum of squares, or
SST.
𝑁

𝑆𝑆𝑇 = ෍ Yi − 𝑌ത 2

𝑖=1
Measures of Variation
• Just like with ANOVA,
it can be helpful to
break the total
variation in the data
into 3 different groups SST
SSR
2. Variation of the
predicted values around
𝑌ത = 6.63
the mean. This is the
sum of the squared
residuals, or SSR.
𝑁
2
𝑆𝑆𝑅 = ෍ 𝑌෡𝑖 − 𝑌ത
𝑖=1
• 𝑁𝑜𝑡𝑒: the above formula is equivalent
to the previous one given for SSR.
Measures of Variation
Note that 𝑆𝑆𝑇 = 𝑆𝑆𝐸 + 𝑆𝑆𝑅
• Just like with ANOVA,
it can be helpful to
SSE
break the total
variation in the data
into 3 different groups SST
SSR
3. Variation of the observed
values around the predicted 𝑌ത = 6.63
values. This is the error sum
of squares, or SSE.

𝑁
2
𝑆𝑆𝐸 = ෍ Yi − 𝑌෡𝑖
𝑖=1
#

Two ways to Evaluate a Model Using Variation

1. The coefficient of determination, 𝑅2

Linear 𝑆𝑆𝑅
2
Regression 𝑅 =
𝑆𝑆𝑇
• This measures the amount of variation in
Y that is explained by X.

• A high 𝑅2 means your independent

variable, X, is a good predictor of Y.
(Ex) If your 𝑅2 = .90, then your model explains 90% of
the variation in Y. This is considered a very good fit, and
should make relatively good predictions.
#

Two ways to Evaluate a Model Using Variation

2. The standard error of the estimate, 𝑆𝑥𝑦

Linear
Regression 𝑆𝑆𝐸
𝑆𝑥𝑦 =
𝑛−2
• This is the standard deviation of
observations around the prediction line.

• It tells you, on average, how far off a

prediction will be.
(Ex) Say, for our previous example with annual sales and
customers, we get a 𝑆𝑥𝑦 = 1.5. Then, on average, our
predictions are off by 1.5 (million) dollars.

Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
No ratings yet
Regression Fundamentals: Below Is A Scored Review of Your Assessment. All Questions Are Shown
17 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Module III (Part II)(Regression and Time Series)
No ratings yet
Module III (Part II)(Regression and Time Series)
118 pages
13 Predictive Analysis - Tests of Association- Regression
No ratings yet
13 Predictive Analysis - Tests of Association- Regression
70 pages
3CP10 Final MJJ Linear Regression
No ratings yet
3CP10 Final MJJ Linear Regression
68 pages
Topic 8 - Regression Analysis
No ratings yet
Topic 8 - Regression Analysis
51 pages
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
No ratings yet
SimpleMultipleLinearRegression_FoundationalMathofAI_S24
6 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
CHAPTER 2 Simple Linear Regression
No ratings yet
CHAPTER 2 Simple Linear Regression
76 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Chapter 7 Presentation_11.18.2024 (1)
No ratings yet
Chapter 7 Presentation_11.18.2024 (1)
18 pages
03 Revisions L Regression
No ratings yet
03 Revisions L Regression
25 pages
Chapter 9 Simple Linear Regression and Correlation (1) (1)
No ratings yet
Chapter 9 Simple Linear Regression and Correlation (1) (1)
56 pages
The Simple Linear Regression Model and Correlation
100% (1)
The Simple Linear Regression Model and Correlation
64 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Chapter 2 Simple Linear Regression - Jan2023
No ratings yet
Chapter 2 Simple Linear Regression - Jan2023
66 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Notes- Part-1
No ratings yet
Regression Notes- Part-1
17 pages
Introduction To Linear Regression and Correlation Analysis
No ratings yet
Introduction To Linear Regression and Correlation Analysis
92 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
9 Regression (Statistics IEM 2-2)
No ratings yet
9 Regression (Statistics IEM 2-2)
32 pages
Lecture Note #8_PEC-CS701E
No ratings yet
Lecture Note #8_PEC-CS701E
20 pages
Regression
No ratings yet
Regression
32 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
AnalytixLabs - Linear Regression - 1623137749089
No ratings yet
AnalytixLabs - Linear Regression - 1623137749089
41 pages
Chapter 7 - Linear Regression
No ratings yet
Chapter 7 - Linear Regression
3 pages
01 SLR Final
No ratings yet
01 SLR Final
37 pages
LINEAR REGRESSION Feu Diliman
No ratings yet
LINEAR REGRESSION Feu Diliman
11 pages
Module -05 Statistical Computing and r Programming
No ratings yet
Module -05 Statistical Computing and r Programming
53 pages
Simple+Linear+Regression1
No ratings yet
Simple+Linear+Regression1
51 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
15.Simple Linear Regression-530
No ratings yet
15.Simple Linear Regression-530
54 pages
Lesson 2 Linear Regression
100% (1)
Lesson 2 Linear Regression
21 pages
Week-4 BA Linear Regression
No ratings yet
Week-4 BA Linear Regression
16 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Chap 10 Regression Analysis
No ratings yet
Chap 10 Regression Analysis
68 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
File4-Session3-Introduction To Regression
No ratings yet
File4-Session3-Introduction To Regression
50 pages
Lecturer 4 Regression Analysis
100% (1)
Lecturer 4 Regression Analysis
29 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
63 pages
Regression Equations
No ratings yet
Regression Equations
94 pages
Regression
No ratings yet
Regression
19 pages
Regression and Correlation
No ratings yet
Regression and Correlation
66 pages
Linear Regression
No ratings yet
Linear Regression
97 pages
Simple Regression
No ratings yet
Simple Regression
35 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
COMM5005 Lecture 8
No ratings yet
COMM5005 Lecture 8
54 pages
Simple Linear Regression sample
No ratings yet
Simple Linear Regression sample
55 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
27 pages
Linear Regression Full Version
No ratings yet
Linear Regression Full Version
34 pages
09 Inference For Regression Part1
No ratings yet
09 Inference For Regression Part1
12 pages
Chapter Simple Linear Regression 1
100% (1)
Chapter Simple Linear Regression 1
77 pages
AI_Lec23
No ratings yet
AI_Lec23
36 pages
Lecture 5 Regression
No ratings yet
Lecture 5 Regression
77 pages
Pre-Calculus Essentials
From Everand
Pre-Calculus Essentials
Ernest Woodward
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Matrix(Farhan)
No ratings yet
Matrix(Farhan)
57 pages
English 2014
No ratings yet
English 2014
2 pages
Cse 16 1ST Yr 1ST Sem CT
No ratings yet
Cse 16 1ST Yr 1ST Sem CT
4 pages
Md. Shahid Uz Zaman Dept. of CSE, RUET
No ratings yet
Md. Shahid Uz Zaman Dept. of CSE, RUET
18 pages
All Paragraphs
No ratings yet
All Paragraphs
12 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
DBSCAN Clustering Algorithm: Presented by
No ratings yet
DBSCAN Clustering Algorithm: Presented by
22 pages
09784997-b762-40a0-8955-55f82ba3c2dc
No ratings yet
09784997-b762-40a0-8955-55f82ba3c2dc
2 pages
ML 5th
No ratings yet
ML 5th
8 pages
Econometrics Chapter Three
No ratings yet
Econometrics Chapter Three
35 pages
Chapter 5 Forecasting: Quantitative Analysis For Management, 11e (Render)
No ratings yet
Chapter 5 Forecasting: Quantitative Analysis For Management, 11e (Render)
27 pages
Sys User Has Role 25.11.24
No ratings yet
Sys User Has Role 25.11.24
10 pages
Machine Learning - Unit 2
No ratings yet
Machine Learning - Unit 2
104 pages
Lecture 4 Linear Regression 1 07032024 082032pm
No ratings yet
Lecture 4 Linear Regression 1 07032024 082032pm
32 pages
Regression Analysis LAB _ Session 17 _1_
No ratings yet
Regression Analysis LAB _ Session 17 _1_
2 pages
IE360_Quiz3 (3)
No ratings yet
IE360_Quiz3 (3)
4 pages
Name:Aarti Singh: Roll no:MB20078 Sub:Operational Analytics Topic:Mad Mse Mape
No ratings yet
Name:Aarti Singh: Roll no:MB20078 Sub:Operational Analytics Topic:Mad Mse Mape
31 pages
(Jae K. Shim) Evaluation of Forecasts
No ratings yet
(Jae K. Shim) Evaluation of Forecasts
8 pages
Stata Assignment
No ratings yet
Stata Assignment
2 pages
A Modern Course in Statistical Physics: Linda E. Reich!
No ratings yet
A Modern Course in Statistical Physics: Linda E. Reich!
8 pages
Physics Investigatory Project
50% (2)
Physics Investigatory Project
11 pages
16.2 Answers
0% (1)
16.2 Answers
2 pages
Rivregress
No ratings yet
Rivregress
16 pages
Laboratory Report On Practical 3 - Rapd Analysis (A184381)
No ratings yet
Laboratory Report On Practical 3 - Rapd Analysis (A184381)
11 pages
Dummy Variable Regression
No ratings yet
Dummy Variable Regression
8 pages
ES12010 - Extra Exercise WK26 Solutions
No ratings yet
ES12010 - Extra Exercise WK26 Solutions
4 pages
SVKM'S Nmims Anil Surendra Modi School of Commerce Academic Year: 2020-2021
No ratings yet
SVKM'S Nmims Anil Surendra Modi School of Commerce Academic Year: 2020-2021
3 pages
Simple Regression Model: Erbil Technology Institute
No ratings yet
Simple Regression Model: Erbil Technology Institute
9 pages
Estimation
No ratings yet
Estimation
4 pages
Chapter three
No ratings yet
Chapter three
35 pages
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
No ratings yet
Shrinkage Parameter Selection Via Modified Cross Validation Approach For Ridge Regression Model
10 pages
Tugas Termodinamika: The Second Law of Thermodynamics
No ratings yet
Tugas Termodinamika: The Second Law of Thermodynamics
5 pages
PSYCH STATS JAMOVI REVIEWER
No ratings yet
PSYCH STATS JAMOVI REVIEWER
1 page
AE6207_Problem Set 5
No ratings yet
AE6207_Problem Set 5
1 page
15 Instrumental Variables
No ratings yet
15 Instrumental Variables
27 pages
Aff700 1000 220401
No ratings yet
Aff700 1000 220401
8 pages

Slides - Simple Linear Regression

Uploaded by

Slides - Simple Linear Regression

Uploaded by

Linear Regression

• Our goal is to use the data to estimate the “best”

• So we try to estimate the following β0 and β1:

• For example, the slopes

• For example, the slopes

• So we do the best we can at

• For example, the slopes

• So we do the best we can at

• …and the differences between

Linear • β0 is the y-intercept

• ε𝑖 is the residual for observation i=1,2,…N

• Xi is the independent variable

• Yi is the dependent variable

Which Variable do I Make Y, and Which Variable do I make X?

• A variable that is being “predicted”, “explained”, “affected”, “impacted”, etc., is the

(Ex) Suppose Y is annual sales, and X is customers.

෣ = 48.4 + 5.2 𝐴𝑔𝑒

• What is the interpretation of the coefficient on 𝐴𝑔𝑒?

• What is the interpretation of the y-intercept?

Linear • Note that β0 and β1 are population parameters.

• But just like with hypothesis testing, we typically only

• The estimates of β0 and β1 are typically denoted b0

• Add up the residuals for every observation:

• So when we add them

• To fix this problem, we square

• Our goal is to minimize this value.

• This is called the Least Squares Method of estimating βo

Linear • But for simple (one-variable) regression, there is an easy

• Annual sales (in millions) is your dependent variable, Y.

• Number of customers (in millions) is your independent variable, X.

Our estimated regression line is:

Our estimated regression line is:

• If the number of customers is zero, the average annual

𝑌෡𝑖 = −1.21 + 2.07 ∗ 4 = 7.07 million dollars

• NOTE: Whenever possible, base your predictions off

➢ With the exact numbers in Excel, the prediction

• This is called Interpolation.

• Predicting values outside of your relevant range

You can predict here

Two ways to Evaluate a Model Using Variation

1. The coefficient of determination, 𝑅2

• A high 𝑅2 means your independent

Two ways to Evaluate a Model Using Variation

2. The standard error of the estimate, 𝑆𝑥𝑦

• It tells you, on average, how far off a

You might also like