0% found this document useful (0 votes)

53 views16 pages

Linear Regression

Linear regression is a machine learning technique for predicting a numeric value based on linear relationships between input variables. It finds the best fitting straight line through the data that minimizes the squared distances between the observed responses in the data and the responses predicted by the linear approximation. The coefficient of determination, R-squared, measures how well the regression line approximates the real data points, ranging from 0 to 1, with values closer to 1 indicating a better fit. Gradient descent is often used to find the slope and intercept that produce the line with the lowest error for a given dataset.

Uploaded by

Prateek Mohan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

53 views16 pages

Linear Regression

Uploaded by

Prateek Mohan Sharma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Introduction to machine learning

Linear Regression

1
Introduction to machine learning

Linear Regression Models -

a. The term "regression" generally refers to predicting a real number. However, it

can also be used for classification (predicting a category or class.)

b. The term "linear" in the name “linear regression” refers to the fact that the
method models data with linear combination of the explanatory variables.

c. A linear combination is an expression where one or more variables are scaled

by a constant factor and added together.

d. In the case of linear regression with a single explanatory variable, the linear
combination used in linear regression can be expressed as:

response = intercept + constant ∗ explanatory

e. In its most basic form fits a straight line to the response variable. The model is
2 designed to fit a line that minimizes the squared differences (also called errors
Introduction to machine learning
Linear Regression Models -

a. Before we generate a model, we need to understand the degree of relationship

between the attributes Y and X

b. Mathematically correlation between two variables indicates how closely their

relationship follows a straight line. By default we use Pearson’s correlation which
ranges between -1 and +1.

c. Correlation of extreme possible values of -1 and +1 indicate a perfectly linear

relationship between X and Y whereas a correlation of 0 indicates absence of linear
relationship
I. When r value is small, one needs to test whether it is statistically significant or not to
believe that there is correlation or not

3
Introduction to machine learning
Linear Regression Models -

d. Coefficient of relation - Pearson’s coefficient p(x,y) = Cov(x,y) / ( stnd Dev (x) X stnd
Dev (y) )

r is near 0 r is near -1 r is near +1

e. Generating linear model for cases where r is near 0, makes no sense. The model will
not be reliable. For a given value of X, there can be many values of Y! Nonlinear
models may be better in such cases

4
Introduction to machine learning
Linear Regression Models (Recap) -

f. Coefficient of relation - Pearson’s coefficient p(x,y) = Cov(x,y) / ( stnd Dev (x) X stnd
Dev (y) )

- ve +ve
quad quad

+ve - ve
quad quad

=0
>0

http://www.socscistatistics.com/tests/pearson/Default2.aspx

5
Introduction to machine learning
Linear Regression Models -
g. Given Y = f(x) and the scatter plot shows apparent correlation between X and Y
Let’s fit a line into the scatter which shall be our model

h. But there are infinite number of lines that can be fit in the scatter. Which one
should we consider as the model?
i. This and many other
algorithms use gradient
descent or variants of
gradient descent method
for finding the best
model

j. Gradient descent
methods use partial
derivatives on the
parameters (slope and
intercept) to minimize
sum of squared errors

6
Introduction to machine learning

Linear Regression Models (Recap) -

k. Whichever line we consider as the model, it will not pass through all the points.
l. The distance between a point and the line (drop a line vertically (shown in
yellow)) is the error in prediction
m. That line which gives least sum of squared errors is considered as the best line

Error = (T – (mx + C)
Sum of all errors can cancel
out and give 0

We square all the errors and

sum it up. That line which
gives us least sum of squared
errors is the best fit

7
Introduction to machine learning
Linear Regression Models -
n. Coefficient of determinant – determines the fitness of a linear model. The closer the
points get to the line, the R^2 (coeff of determinant) tends to 1, the better the model is

Model line always passes

through Xbar and Ybar

Ybar

Xbar

8
Introduction to machine learning
Linear Regression Models -
o. Coefficient of determinant (Contd…)
I. There are a variety of errors for all those points that don’t fall exactly on the line.
II. It is important to understand these errors to judge the goodness of fit of the model i.e.
How representative the model is likely to be in general
III. Let us look at point P1 which is one of the given data points and associated errors due to
the model
1. P1 – Original y data point for given x

2. P2 - Estimated y value for given x

y P1 3. Ybar – Average of all Y values in data set

SSE
4. SST – Sum of Square error Total (SST)
SST
P2 Variance of P1 from Ybar (Y – Ybar)^2
SSR
Ybar 5. SSR - Regression error (p2 – ybar)^2 (portion
SST captured by regression model)

6. SSE - Residual error (p1 – p2)^2

Xbar x
9
Introduction to machine learning
Linear Regression Models -

o. Coefficient of determinant (Contd…)

1. That model is the most fit where every
data point lies on the line. i.e. SSE = 0 for
y P1 all data points

SSE
SST 2. Hence SSR should be equal to SST i.e.
P2 SSR/SST should be 1.
SSR
Ybar
3. Poor fit will mean large SSE. SSR/SST will
be close to 0

4. SSR / SST is called as r^2 (r square) or

coefficient of determination
Xbar x

5.r^2 is always between 0 and 1 and is a

Note: measure
SS in all the terms stand for Sum Squared. In the of utility
diagram onlyofone
the point
regression model
is shown,
vertical lines are used to explain the concept. However, these terms make sense only when
more than one data points are considered.
10
Introduction to machine learning
Linear Regression Models -

o. Coefficient of determinant (Contd…) -

Point B
Point B

Point A Point A

In case of point “A”, the line explains the variance of the point

Whereas point “B” the is a small area (light grey) which the line does not represent.

%age of total variance that is represented by the line is coeff of determinant

11
Introduction to machine learning

Linear Regression Assumptions

Linear regression model is based on a set of assumptions. If the underlying dataset does not meet these
assumptions, then data may have to be transformed or linear model may not be good fit

1. Assumption of linearity. assumes a linear relation between the dependent / target variable
and the independent / predictor variables.

2. Assumption of normality of the error distribution.

a. The errors should be normally distributed across the model.
b. This assumption can be tested using a frequency histogram, skew and kurtosis of a normal plot.
If the distribution does not approximate normal distribution, data transformation may be
necessary
c. A scatter plot between the actual values and the predicted values should show the data
distributed equally across the model.
d. Another way of doing this is to plot residual values against the predicted values. We should not
see any trends

12
Introduction to machine learning

Linear Regression Assumptions

3. Assumption of homoscedasticity of errors. The variation of the error or residuals

across each of the independent variable should remain constant. There should be no
trend visible in plots of errors against predicted values, independent variables

4. Assumption of independence of errors. There should be no trend in the residuals

based on the order in which the observations were collected. A scatter plot of the
errors against an order in which the data was collected should show not trend. Durbin
Watson test can also be employed… Ref.
https://www.investopedia.com/terms/d/durbin-watson-statistic.asp

13
Introduction to machine learning

Linear Regression Model -

Advantages –
1. Simple to implement and easier to interpret the outputs coefficients

Disadvantages -
2. Assumes a linear relationships between dependent and independent variables. That
is, it assumes there is a straight-line relationship between them
3. Outliers can have huge effects on the regression
4. Linear regression assume independence between attributes
5. Linear regression looks at a relationship between the mean of the dependent variable
and the independent variables.
6. Just as the mean is not a complete description of a single variable, linear regression
is not a complete description of relationships among variables
7. Boundaries are linear

14
Introduction to machine learning

Linear Regression Model -

Lab- 1- Estimating mileage based on features of a second hand car

Description – Sample data is available at

https://archive.ics.uci.edu/ml/datasets/Auto+MPG

The dataset has 9 attributes listed below that define the quality
1. mpg: continuous
2. cylinders: multi-valued discrete
3. displacement: continuous
4. horsepower: continuous
5. weight: continuous
6. acceleration: continuous
7. model year: multi-valued discrete
8. origin: multi-valued discrete
9. car name: string (unique for each instance)

Sol : mpg-linear regression.ipynb

15
Introduction to machine learning

ThankYou

Cme250 Lecture2
No ratings yet
Cme250 Lecture2
69 pages
Unit 2
No ratings yet
Unit 2
136 pages
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
No ratings yet
DMV Unit 3 PPT - RSK - 250419 - 125620 Jfhuehiwhu
89 pages
Unit 3 Regression
No ratings yet
Unit 3 Regression
91 pages
Linear Regression Model Presentation
No ratings yet
Linear Regression Model Presentation
7 pages
d3 It ML Jan 2023 Part 2
No ratings yet
d3 It ML Jan 2023 Part 2
32 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
9 - Linear Regression - 1 - 2
No ratings yet
9 - Linear Regression - 1 - 2
25 pages
Linear Regression
No ratings yet
Linear Regression
89 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Machine Learning
No ratings yet
Machine Learning
62 pages
ML - Regression
No ratings yet
ML - Regression
34 pages
Statics Thinking-Regression
No ratings yet
Statics Thinking-Regression
51 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
Advance Machine Learning
No ratings yet
Advance Machine Learning
16 pages
ML 2
No ratings yet
ML 2
155 pages
CS550 Regression
No ratings yet
CS550 Regression
62 pages
1 - Intro To Machine Learning
No ratings yet
1 - Intro To Machine Learning
34 pages
1-Linear Regression
No ratings yet
1-Linear Regression
22 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
98 pages
Regression Questionnaire
No ratings yet
Regression Questionnaire
10 pages
Progression Linaire
No ratings yet
Progression Linaire
187 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
ML Unit
No ratings yet
ML Unit
23 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
Lecture Notes - Linear Regression
No ratings yet
Lecture Notes - Linear Regression
26 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Regression
No ratings yet
Regression
45 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Unit-Vi 2
No ratings yet
Unit-Vi 2
31 pages
Lecture 02
No ratings yet
Lecture 02
43 pages
Machine Learning Notes
No ratings yet
Machine Learning Notes
12 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
MATH6183 Introduction+Regression
No ratings yet
MATH6183 Introduction+Regression
70 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Analytics Compendium
No ratings yet
Analytics Compendium
41 pages
Foundations of Machine Learning - 3
No ratings yet
Foundations of Machine Learning - 3
38 pages
Week 9 - PROG 8510 Week 9
No ratings yet
Week 9 - PROG 8510 Week 9
27 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
No ratings yet
DS303: Introduction To Machine Learning: Manjesh K. Hanawal
17 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Linear Regression-2: Prof. Asim Tewari IIT Bombay
No ratings yet
Linear Regression-2: Prof. Asim Tewari IIT Bombay
19 pages
DSA Shotnotes For 2 Units
No ratings yet
DSA Shotnotes For 2 Units
5 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Linear Regression For Machine Learning
100% (1)
Linear Regression For Machine Learning
17 pages
MCQ Unit 1 Resolution and Composition of Forces
No ratings yet
MCQ Unit 1 Resolution and Composition of Forces
10 pages
LP III Lab Manual
100% (1)
LP III Lab Manual
8 pages
3 Da
No ratings yet
3 Da
16 pages
Experiment No.1 Span of Attention
100% (9)
Experiment No.1 Span of Attention
8 pages
Machine Learning and Linear Regression
100% (1)
Machine Learning and Linear Regression
55 pages
Ruling Planet - Jyotish
No ratings yet
Ruling Planet - Jyotish
2 pages
Introduction To Business Law
No ratings yet
Introduction To Business Law
65 pages
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
No ratings yet
A Practical Approach To Linear Regression in Machine Learning - by Ashwin Raj - Towards Data Science
20 pages
Bop PDF
No ratings yet
Bop PDF
48 pages
EBOOK2PM 24 SEANCES DERNIERE VERSION Copie
No ratings yet
EBOOK2PM 24 SEANCES DERNIERE VERSION Copie
48 pages
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
100% (1)
Parallelism of Statistics and Machine Learning & Logistic Regression Versus Random Forest
72 pages
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
No ratings yet
Session 1: Simple Linear Regression: Figure 1 - Supervised and Unsupervised Learning Methods
16 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
JSA HEMP Well Services Pg11-15
No ratings yet
JSA HEMP Well Services Pg11-15
5 pages
Three Concepts of Political Stability An Agent Based Model
No ratings yet
Three Concepts of Political Stability An Agent Based Model
28 pages
Practices For Success - Buddhist Keys To Abundance and Good Fortune - Buddha-Nature
No ratings yet
Practices For Success - Buddhist Keys To Abundance and Good Fortune - Buddha-Nature
41 pages
Chemistry, 2nd Edition - Julia Burdge
92% (13)
Chemistry, 2nd Edition - Julia Burdge
1,121 pages
PDF Seeing, Knowing, Understanding: Philosophical Essays Barry Stroud Download
No ratings yet
PDF Seeing, Knowing, Understanding: Philosophical Essays Barry Stroud Download
28 pages
Consumer Market Segmentation Basis
No ratings yet
Consumer Market Segmentation Basis
5 pages
Presentation by Francois Mercer
No ratings yet
Presentation by Francois Mercer
14 pages
STS Group1 PPT Presentation 3
No ratings yet
STS Group1 PPT Presentation 3
11 pages
Safe Work in Confined Spaces
100% (1)
Safe Work in Confined Spaces
20 pages
Act 2 Scene V - Script
No ratings yet
Act 2 Scene V - Script
9 pages
Yoga - Christian Perspective - Vishal Mangalwadi
100% (1)
Yoga - Christian Perspective - Vishal Mangalwadi
18 pages
Rehabilitation and Retrofitting of Structurs Question Papers
No ratings yet
Rehabilitation and Retrofitting of Structurs Question Papers
4 pages
Infect Me Not Lesson Plan
No ratings yet
Infect Me Not Lesson Plan
19 pages
DCRG8
No ratings yet
DCRG8
16 pages
Rosewood Hotels and Resorts: Branding To Increase Customer Profitability and Lifetime Value
No ratings yet
Rosewood Hotels and Resorts: Branding To Increase Customer Profitability and Lifetime Value
6 pages
Principles of Alphabetical Arrangement
No ratings yet
Principles of Alphabetical Arrangement
2 pages
Creation Story of Luzon
100% (1)
Creation Story of Luzon
2 pages
Entrepreneurship Sum A 3RD Quarter 1
No ratings yet
Entrepreneurship Sum A 3RD Quarter 1
2 pages
English Literature I
No ratings yet
English Literature I
4 pages
Reign of Ashurbanipal and Akhenaten
No ratings yet
Reign of Ashurbanipal and Akhenaten
2 pages
Customer Journey Map Playbook
100% (11)
Customer Journey Map Playbook
36 pages
Assign in Pe
No ratings yet
Assign in Pe
2 pages
Howard Thurston
No ratings yet
Howard Thurston
3 pages
Along XXXZZZ
No ratings yet
Along XXXZZZ
3 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
From Everand
Ordered Weighted Averaging Aggregation Operator: Fundamentals and Applications
Fouad Sabry
No ratings yet

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Introduction to machine learning

Linear Regression Models -

a. The term "regression" generally refers to predicting a real number. However, it

c. A linear combination is an expression where one or more variables are scaled

response = intercept + constant ∗ explanatory

a. Before we generate a model, we need to understand the degree of relationship

b. Mathematically correlation between two variables indicates how closely their

c. Correlation of extreme possible values of -1 and +1 indicate a perfectly linear

r is near 0 r is near -1 r is near +1

Linear Regression Models (Recap) -

We square all the errors and

Model line always passes

2. P2 - Estimated y value for given x

y P1 3. Ybar – Average of all Y values in data set

6. SSE - Residual error (p1 – p2)^2

o. Coefficient of determinant (Contd…)

4. SSR / SST is called as r^2 (r square) or

5.r^2 is always between 0 and 1 and is a

o. Coefficient of determinant (Contd…) -

%age of total variance that is represented by the line is coeff of determinant

Linear Regression Assumptions

2. Assumption of normality of the error distribution.

Linear Regression Assumptions

3. Assumption of homoscedasticity of errors. The variation of the error or residuals

4. Assumption of independence of errors. There should be no trend in the residuals

Linear Regression Model -

Linear Regression Model -

Lab- 1- Estimating mileage based on features of a second hand car

Description – Sample data is available at

Sol : mpg-linear regression.ipynb

You might also like