0% found this document useful (0 votes)

45 views42 pages

Lecture6 Regression

Regression analysis is a statistical method used to understand the relationship between variables. It can be used to predict the value of a dependent variable from independent variables. Common regression techniques include simple linear regression, multiple linear regression, and nonlinear regression. Key aspects of simple linear regression are estimating coefficients, assessing fit, and ensuring error terms meet assumptions like normality.

Uploaded by

Fasih Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

45 views42 pages

Lecture6 Regression

Uploaded by

Fasih Ullah

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

Regression

Lecture 6
What is Regression
• Regression analysis is a statistical method that helps us to analyze and
understand the relationship between two or more variables of interest.
• It helps to understand which factors are important, which factors can be
ignored, and how they are influencing each other. In other words analyze
the specific relationships between the independent variables and the
dependent variable.
• In regression, we normally have one dependent variable and one or more
independent variables. Forecast the value of a dependent variable (Y) from
the value of independent variables (X1, X2,…Xk.).
• We try to “regress” the value of dependent variable “Y” with the help of
the independent variables.
Types of Regression approaches
• There are many types of regression approaches we will study some of
them here
• Simple Linear Regression
• Multiple Linear Regression
• Polynomial Regression
• Support Vector for Regression (SVR)
• Decision Tree Regression
• Random Forest Regression
Simple linear regression
• In statistics, simple linear regression is a linear regression model with
a single explanatory variable.
• It concerns two-dimensional sample points with one independent
variable and one dependent variable (conventionally, the x and y
coordinates in a Cartesian coordinate system)
• It finds a linear function (a non-vertical straight line) that, as
accurately as possible, predicts the dependent variable values as a
function of the independent variable.
• Simply put, Simple linear regression is used to estimate the
relationship between two quantitative variables.
What can simple linear regression be used
for?
• You can use simple linear regression when you want to know:
• How strong the relationship is between two variables (e.g. the relationship
between rainfall and soil erosion).
• The value of the dependent variable at a certain value of the independent
variable (e.g. the amount of soil erosion at a certain level of rainfall).
Model for simple linear regression
• Consider the equation of line given as,

Yˆ  b0  b1 X
• Where y is the dependent variable, x is the independent variable, b0
is the y-intercept and b1 is the slope of the line.
• We need to find b0 and b1 to estimate y using x , such that the error Ɛ
 between the predicted value of y and original value of y.
is minimized
The Model

The model has a deterministic and a probabilistic components

House
Cost
ˆ
Y  b0  b1 Xcosts a b o ut

o u se t. S ize)
n g a h re foo + 75(
u ildi squa 5000
B p er =2
5 st
$7 se co
ˆ Most lots sell
Y  b0 for b$25,000 Ho u
1X

House size
However, house cost vary even among same size
houses! Since cost behave unpredictably,
House we add a random component.
Cost

Most lots sell

for $25,000
House cost = 25000 + 75 (Size)+ e
House size
• The first order linear model
Y   0  1 X  
Y = dependent variable
X = independent variable and b are unknown population
Y parameters, therefore are estimated
b0 = Y-intercept from the data.
b = slope of the line
 e = error variable
Rise b1 = Rise/Run
b0 Run
X
Estimating the Coefficients
• The estimates are determined by
• drawing a sample from the population of interest,
• calculating sample statistics.
• producing a straight line that cuts into the data.

Y w
w Question: What should be
w considered a good line?
w
w w w w w
w w w w w
w
X
General linear model
Working concept of simple linear regression
• Ordinary least squares (OLS) method is Y w
usually used to implement simple linear w
regression. w
w
w
• A good line is one that minimizes the w
w
w
w
w
w w
w w
sum of squared differences between the w
points and the line. X
• The accuracy of each predicted value is
measured by its squared residual
(vertical distance between the point of
the data set and the fitted line), and the
goal is to make the sum of these
squared deviations as small as possible.
Sum of squared differences =(2 - 1)2 +(4 - 2)2 +(1.5 - 3)2 +(3.2 - 4)2 = 6.89
Sum of squared differences =(2 -2.5)2 +(4 - 2.5)2 +(1.5 - 2.5)2 +(3.2 - 2.5)2 = 3.99

(2,4)
Let us compare two lines
4
w The second line is horizontal
3 w (4,3.2)
2.5
2
(1,2)w
w (3,1.5)
1

The smaller the sum of

1 2 3 4 squared differences
the better the fit of the
line to the data.
The Simple Linear Regression Line
Example
The values of y and their corresponding values of y are
shown in the table below
x 0 1 2 3 4
y 2 3 5 4 6

Find the least square regression line y = a x + b.

Solution:
We use a table to calculate a and b.
x y xy x2
We now calculate a and b using the least square
0 2 0 0 regression formulas for a and b.
1 3 3 1 a = 0.9
b = 2.2
2 5 10 4
Now that we have the least square regression line
3 4 12 9 y = 0.9 x + 2.2,
4 6 24 16
?x = 10 ?y = 20 ?x y = 49 ?x2 = 30
Example 2
a)
a = 8.4
b = 11.6
b) In 2012, t = 2012 - 2005 = 7
The estimated sales in 2012 are: y = 8.4 * 7 + 11.6 = 70.4
million dollars.
Error Variable: Required Conditions for
better performance of simple linear regression
• The error e is a critical part of the regression model.
• Four requirements involving the distribution of e must
be satisfied.
• The probability distribution of e is normal.
• The mean of e is zero: E(e) = 0.
• The standard deviation of e is se for all values of X.
• The set of errors associated with different values of Y are all
independent.
The Normality of e
E(Y|X3)
The standard deviation remains constant,
b +b X m3
0 1 3
E(Y|X2)
b0 + b1 X2 m2

but the mean value changes with XE(Y|X1)

b0 + b1X1 m1

X1 X2 X3
From the
From the first
first three
three assumptions
assumptions
we
we
have: YY isis normally
have: normally distributed
distributed
with mean
with mean E(Y) E(Y) == bb00 ++ bb11X,
X, and
and aa
constant standard
constant deviation ssee
standard deviation
Assessing the Model
• The least squares method will produce a regression line whether or
not there are linear relationship between X and Y.
• Consequently, it is important to assess how well the linear model fits
the data.
• Several methods are used to assess the model. All are based on the
sum of squares for errors, SSE.
Sum of Squares for Errors
• This is the sum of differences between the points and
the regression line.
• It can serve as a measure of how well the line fits the
data. SSE is defined by

n
SSE   (Yi  Yˆi ) 2 .
i1

– A shortcut formula

 SSE  (n  1)s2  cov(X,Y)

Y 2
sX
Standard Error of Estimate
• The mean error is equal to zero.
• If se is small the errors tend to be close to zero (close to
the mean error). Then, the model fits the data well.
• Therefore, we can, use se as a measure of the
suitability of using a linear model.
• An estimator of se is given by se

S tan dard Error of Estimate

SSE
s 
n2
Assumptions of simple linear regression
• Simple linear regression is a parametric test, meaning that it makes certain
assumptions about the data. These assumptions are:
• Homogeneity of variance (homoscedasticity): the size of the error in our prediction
doesn’t change significantly across the values of the independent variable.
• Independence of observations: the observations in the dataset were collected using
statistically valid sampling methods, and there are no hidden relationships among
observations.
• Normality: The data follows a normal distribution.
• The relationship between the independent and dependent variable is linear: the line of
best fit through the data points is a straight line (rather than a curve or some sort of
grouping factor).
• If your data do not meet the assumptions of homoscedasticity or normality, you
may be able to use a nonparametric test instead, such as the Spearman rank test.
Example: Data that doesn’t meet the
assumptions
• You think there is a linear relationship between meat consumption
and the incidence of cancer in the U.S.
• However, you find that much more data has been collected at high
rates of meat consumption than at low rates of meat consumption,
• With the result that there is much more variation in the estimate of
cancer rates at the low range than at the high range.
• Because the data violate the assumption of homoscedasticity, it
doesn’t work for regression.
Implementing simple linear regression in
Python
[Link] the packages and classes .
[Link] the data
[Link] the data
[Link] missing values and clean the data
[Link] the data into training and test sets
[Link] the regression model and train it.
[Link] the results of model fitting to know whether the model is
satisfactory using plots.
[Link] predictions using unseen data.
Importing packages and data
Visualize the data
Handle missing values and clean the data

• Missing data present

• Data cleaning is required as
salary cannot be negative
code
Visualizing the processed data
Split the data into training and test sets
Build the regression model and train it.

• Import the linear regression class from the linear model

• Make an instance of the linear regression class
• The train the model using training data
Check the results of model fitting to know
whether the model is satisfactory using plots.
Make predictions using unseen data.
Another Example
Import dataset and visualize
Data cleaning
Visualize the data
Splitting the data
Build model and train it
Predicting the output for unseen data

AI Lec23
No ratings yet
AI Lec23
36 pages
Lecture 3 - Linear Regression Imran 20022025 092939am
No ratings yet
Lecture 3 - Linear Regression Imran 20022025 092939am
46 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
27 pages
Lecture 4
No ratings yet
Lecture 4
22 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
13 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Model Development
No ratings yet
Model Development
80 pages
An Introduction To Simple Linear Regression
No ratings yet
An Introduction To Simple Linear Regression
2 pages
Hanan
No ratings yet
Hanan
9 pages
06 Least Squar Regression
No ratings yet
06 Least Squar Regression
25 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
AI & ML: Linear Regression Guide
No ratings yet
AI & ML: Linear Regression Guide
55 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
1linear Regression
No ratings yet
1linear Regression
12 pages
Exponential Smoothing in Forecasting
No ratings yet
Exponential Smoothing in Forecasting
69 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
10 - Regression 1
No ratings yet
10 - Regression 1
58 pages
Lecture 6 - Regression Analysis
No ratings yet
Lecture 6 - Regression Analysis
34 pages
Understanding Linear Regression Techniques
No ratings yet
Understanding Linear Regression Techniques
12 pages
Simple Linear Regression Homework Solutions
50% (2)
Simple Linear Regression Homework Solutions
6 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Regression Analysis for Students
No ratings yet
Regression Analysis for Students
10 pages
Regression Analysis and Techniques
No ratings yet
Regression Analysis and Techniques
49 pages
Understanding Linear Regression Basics
No ratings yet
Understanding Linear Regression Basics
26 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
1.linear Regression PSP
No ratings yet
1.linear Regression PSP
92 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Lecture 9-10
No ratings yet
Lecture 9-10
28 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Regression v33
No ratings yet
Regression v33
81 pages
Linear Regression Guide & Examples
No ratings yet
Linear Regression Guide & Examples
36 pages
Unit 2
No ratings yet
Unit 2
26 pages
Simple Linear Regression Analysis Guide
No ratings yet
Simple Linear Regression Analysis Guide
67 pages
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
No ratings yet
Lectures 7 8-Simple Regression Analysis - Assumptions and Estimations (OLS)
21 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Practical 5
No ratings yet
Practical 5
8 pages
ML Unit 3 Notes 1
No ratings yet
ML Unit 3 Notes 1
58 pages
What Is Regression
No ratings yet
What Is Regression
17 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Introduction To Simple Linear Regression: - K.Tejashree (23H51A66F8)
No ratings yet
Introduction To Simple Linear Regression: - K.Tejashree (23H51A66F8)
10 pages
F Regression
No ratings yet
F Regression
65 pages
Simple Linear Regression and Correlation
No ratings yet
Simple Linear Regression and Correlation
39 pages
Lecture 6 Simple Linear Regression
No ratings yet
Lecture 6 Simple Linear Regression
36 pages
Regression
No ratings yet
Regression
14 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
51 pages
Linear Regression Techniques Explained
100% (1)
Linear Regression Techniques Explained
44 pages
Lect 10 Regression
No ratings yet
Lect 10 Regression
7 pages
Linear & Polynomial Regression Guide
No ratings yet
Linear & Polynomial Regression Guide
56 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Regression Coeffient
No ratings yet
Regression Coeffient
52 pages
11 Multiple Regression Part1
100% (1)
11 Multiple Regression Part1
13 pages
Notes Simple Linear Regression Analysis
No ratings yet
Notes Simple Linear Regression Analysis
39 pages
Thesis Using Linear Regression
100% (2)
Thesis Using Linear Regression
7 pages
Machine Learning Notes For Exam
No ratings yet
Machine Learning Notes For Exam
29 pages
Regresi Linear: Konsep dan Contoh Soal
No ratings yet
Regresi Linear: Konsep dan Contoh Soal
16 pages
Understanding Causality in Linear Regression
No ratings yet
Understanding Causality in Linear Regression
66 pages
STAT3008 Regression Course Outline
No ratings yet
STAT3008 Regression Course Outline
4 pages
(Ebook PDF) First Course in Statistics A 11Th Edition Download
No ratings yet
(Ebook PDF) First Course in Statistics A 11Th Edition Download
53 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Quantitative Methods in Accounting and Finance - DFI4005
100% (1)
Quantitative Methods in Accounting and Finance - DFI4005
10 pages
Stock Analysis for Investors
No ratings yet
Stock Analysis for Investors
5 pages
Business Analytics: Linear Regression
No ratings yet
Business Analytics: Linear Regression
60 pages
Solution Manual For Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 Download PDF
No ratings yet
Solution Manual For Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 Download PDF
43 pages
Assignment3 05.01.24
No ratings yet
Assignment3 05.01.24
4 pages
Econometric Analysis
No ratings yet
Econometric Analysis
60 pages
Simple Linear Regression Overview
No ratings yet
Simple Linear Regression Overview
6 pages
Time Series Analysis
No ratings yet
Time Series Analysis
17 pages
The Effect of Supervision Levels On Employees' Performance Levels
No ratings yet
The Effect of Supervision Levels On Employees' Performance Levels
5 pages
The Role of Digital Learning in Improving Job Performance of Employees A Field Study at Ades Company
No ratings yet
The Role of Digital Learning in Improving Job Performance of Employees A Field Study at Ades Company
14 pages
Chapter 3 Multiple Linear Regression - Jan
No ratings yet
Chapter 3 Multiple Linear Regression - Jan
47 pages
ISOM2500 Spring 25 - Topic 10 - Assumptions For Linear Regression
No ratings yet
ISOM2500 Spring 25 - Topic 10 - Assumptions For Linear Regression
35 pages
Statistics Using IBM SPSS: An Integrative Approach - Ebook PDF Version Download
100% (2)
Statistics Using IBM SPSS: An Integrative Approach - Ebook PDF Version Download
60 pages
Introduction to Artificial Neural Networks
No ratings yet
Introduction to Artificial Neural Networks
71 pages
Statistical Analysis of Indian Liver Dataset
No ratings yet
Statistical Analysis of Indian Liver Dataset
55 pages
Understanding Cost Behavior Analysis
100% (1)
Understanding Cost Behavior Analysis
35 pages
Omitted Variable Bias: The Simple Case
No ratings yet
Omitted Variable Bias: The Simple Case
8 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Block-3 MCO-3 Unit-3
No ratings yet
Block-3 MCO-3 Unit-3
23 pages
Analysis of The Influence of Discount Promotions o
No ratings yet
Analysis of The Influence of Discount Promotions o
5 pages
Chapter 6 in Class Questions Handout 2
No ratings yet
Chapter 6 in Class Questions Handout 2
2 pages

Lecture6 Regression

Uploaded by

Lecture6 Regression

Uploaded by

Regression

The model has a deterministic and a probabilistic components

Most lots sell

The smaller the sum of

Find the least square regression line y = a x + b.

but the mean value changes with XE(Y|X1)

 SSE  (n  1)s2  cov(X,Y)

S tan dard Error of Estimate

• Missing data present

• Import the linear regression class from the linear model

You might also like