0% found this document useful (0 votes)

7 views19 pages

Lec 6

Uploaded by

vucarot2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views19 pages

Lec 6

Uploaded by

vucarot2023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

L INEAR R EGRESSION

Nehal Khosla, Priyanshu Parida

National Institute of Science Education and Research

January 30, 2023

PART I: L INEAR R EGRESSION : P ROBLEM & S OLUTION

1 Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Linear Regression: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Regression Line . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 Types . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Mathematical Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Loss Function: Mean Squared Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Linear Regression: The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.1 Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2 Quality of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

1 / 18
PART II: L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

1 Inductive Bias . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.1 A List . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.2 Choice of Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 / 18
PART III: L INEAR R EGRESSION : A PPLICATIONS AND S HORTCOMINGS

1 Applications and Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 / 18
Part I

L INEAR R EGRESSION : T HE P ROBLEM

4 / 18
R EGRESSION

▶ Regression is a statistical method that attempts to predict the strength and nature of the
relationship between a dependent variable, and one or a series of independent variables.
▶ It does so by finding a curve that minimizes the error in the actual and expected values of the
dependent variable of training data-set.
▶ For proper interpretation of regression, several assumptions (inductive bias) about the data and
the model must hold.
▶ Linear regression is one of the common forms of this method. It establishes a linear relationship
between the two variables.

5 / 18
L INEAR R EGRESSION
I NTRODUCTION

▶ Linear regression is a supervised machine learning algorithm.

▶ The model tries to find a best-fit line to establish a linear relationship between the dependent
(y) and independent(x) variable.
▶ The model then uses this fit to predict the appropriate y-values for unknown x-values.
▶ The best-fit line is achieved minimizing the error between predicted and actual values. This is
done by minimizing the loss function.

6 / 18
L INEAR R EGRESSION
R EGRESSION L INE

The line showing the linear relationship between the dependent and independent variable is called
a regression line. An example of a regression line is shown below:

Figure. Regression line for log R (dependent variable)

vs log d (independent variable)

The regression line may be positive, wherein the dependent variable increases with increase in
independent variable; or it may be negative, wherein the dependent variable decreases with
increase in independent variable (as in above figure).

7 / 18
L INEAR R EGRESSION : T HE P ROBLEM
T YPES

Linear regression may be classified further into the following two types:
▶ Simple Linear Regression: Assumes a linear relationship between a single independent
variable and a dependent variable.
▶ Multiple Linear Regression: Assumes a linear relationship between two or more independent
variables and a dependent variable.

8 / 18
L INEAR R EGRESSION : T HE P ROBLEM
M ATHEMATICAL R EPRESENTATION

Once a linear relationship has been determined by the algorithm, the general form of each model
may be represented as follows:
▶ Simple Linear Regression
y = ax + b + u
▶ Multiple Linear Regression
Y = a1 x + a2 x +a3 x + b + u
where:
y = Dependent variable
x = Independent variable
a = Slope(s) of the variable(s)
b = The y-intercept
u = The regression residual/error term

9 / 18
L INEAR R EGRESSION : T HE P ROBLEM
L OSS F UNCTION : M EAN S QUARED E RROR

▶ The regression line is achieved by minimizing the sum of mean squared error (loss function) for
all points in the domain. The loss function is gives as:

1
MSE = Σ(y − f (x))2
N
where f(x) = a1 x + a2 x .... +b

10 / 18
L INEAR R EGRESSION : T HE S OLUTION
S OLUTION

The best fit line may be found in the following two manners:
▶ Closed form (Exact form) Solution:
• It solves the problem in terms of simple functions and mathematical operators.
• The closed form solution for linear regression is as follows:

B = (X′ X)−1 X′ Y

where B = Matrix of regression parameters

X = Matrix of X values
X’ = Transpose of X
Y = Matrix of Y values
• Although this method gives an accurate model, it is computationally expensive, especially
when there are more than 4 dimensions.
▶ Gradient Descent:
• It is used to minimize MSE by calculating the gradient of the loss function.
• It is an iterative optimization algorithm.

11 / 18
L INEAR R EGRESSION : T HE S OLUTION
Q UALITY OF F IT

▶ The goodness of the fit achieved determines how linearly the variables are correlated.
▶ The goodness of fit may be calculated using the Pearson correlation coefficient, which is given
by:
Σ(x1 − x)(yi − y)
r= p
Σ(xi − x)2 Σ(yi − y)2
▶ The higher the value of r, the better is the fit.

12 / 18
Part II

L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

13 / 18
I NDUCTIVE B IAS
A L IST

Linear regression takes the following assumptions, or inductive biases:

▶ The assumption that the dependent and independent variables are linearly related.
▶ Homoscedasticity: The assumption that the error term should be the same for all points.
▶ The assumption that MSE is the most appropriate loss function for linear regression.

14 / 18
C HOICE OF L OSS F UNCTION

Let us analyse some loss functions to justify the choice of MSE as an appropriate loss function.
▶ L1 = (y-f(x)): This loss function gives out both positive and negative values, which cancel out to
give near zero error for large data.
▶ L2 = |(y-f(x))|: Although errors do not cancel out here, the outliers are penalised equally as
compared to standard data.
▶ L3 = (y-f(x))2 : In this case, the errors do not cancel out. Also, outliers are penalised more, giving
a more appropriate regression line.
Hence, MSE is an appropriate choice for loss function.

15 / 18
Part III

L INEAR R EGRESSION : A PPLICATIONS AND

S HORTCOMINGS

16 / 18
A PPLICATIONS AND S HORTCOMINGS

▶ Linear regression finds its applications in several fields, like market analysis, financial analysis,
environmental health, and medicine.
▶ However, it does leave somethings to desire for. A linear correlation does not indicate
causation, i.e. a connection between two variable does not imply that one causes the other.
▶ Linear regression is prone to noise and overfitting.
▶ It is prone to multicollinearity, i.e. occurence of correlation between two or more independent
variables. This reduces the statistical significance of an independent variable.

17 / 18
R EFERENCES

1. CS460 Machine Learning 2023 Lectures, Subhankar Mishra.

2. Linear Regression in Machine Learning, Javatpoint.
3. ML|Linear Regression, geeksforgeeks.

18 / 18

ML Module 2
No ratings yet
ML Module 2
185 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Unit-2: Machine Learning Techniques (KCS-055) Module-2
No ratings yet
Unit-2: Machine Learning Techniques (KCS-055) Module-2
199 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
Data Science
100% (1)
Data Science
14 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Dadm Assesment #2: Akshat Bansal
No ratings yet
Dadm Assesment #2: Akshat Bansal
24 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Unit 2
No ratings yet
Unit 2
136 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
34 pages
DMML Unit4
No ratings yet
DMML Unit4
77 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Unit 2
No ratings yet
Unit 2
19 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
10.introduction To Artificial Intelligence
No ratings yet
10.introduction To Artificial Intelligence
25 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression Notes
No ratings yet
Linear Regression Notes
4 pages
LR 1751142062
No ratings yet
LR 1751142062
10 pages
Solving One Variable Linear Equations
No ratings yet
Solving One Variable Linear Equations
10 pages
Linear Regression
100% (1)
Linear Regression
8 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Chapter 2 Simple Linear Regression
No ratings yet
Chapter 2 Simple Linear Regression
70 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Linear Regression - 1st Draft
No ratings yet
Linear Regression - 1st Draft
5 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
DA Notes 3
No ratings yet
DA Notes 3
12 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Hanan
No ratings yet
Hanan
9 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
The Monthly Average Cost of Money Used For Playing Mobile Legends by The 1 Year Students of
No ratings yet
The Monthly Average Cost of Money Used For Playing Mobile Legends by The 1 Year Students of
20 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
Planning, Implementation, Monitoring and Evaluation of Extension Program - Ag Ext 2
No ratings yet
Planning, Implementation, Monitoring and Evaluation of Extension Program - Ag Ext 2
37 pages
Class 12 Statistics
No ratings yet
Class 12 Statistics
11 pages
Nist Technical Note 1297 S
100% (1)
Nist Technical Note 1297 S
25 pages
Synopsis Research Format NMU
No ratings yet
Synopsis Research Format NMU
14 pages
Research Methodology - Syllabus
100% (1)
Research Methodology - Syllabus
1 page
Chapter 5 - Marketing Information System
No ratings yet
Chapter 5 - Marketing Information System
10 pages
SHS Practical Research 2
No ratings yet
SHS Practical Research 2
4 pages
Unit1 6thsemCS
No ratings yet
Unit1 6thsemCS
22 pages
Test Bank For Essentials of Business Analytics 3rd Edition by Camm
100% (1)
Test Bank For Essentials of Business Analytics 3rd Edition by Camm
24 pages
Comparing Quantities Using Analytical Tools
No ratings yet
Comparing Quantities Using Analytical Tools
10 pages
Module 1 - Tests of Hypothesis For A Single Sample
100% (1)
Module 1 - Tests of Hypothesis For A Single Sample
27 pages
Research Pending
No ratings yet
Research Pending
232 pages
Basic Statistical Terms
No ratings yet
Basic Statistical Terms
3 pages
3 Hotspot Analysis Module
0% (1)
3 Hotspot Analysis Module
9 pages
Chapter 3
No ratings yet
Chapter 3
7 pages
Detailed Lesson Plan (DLP) Format: Code
No ratings yet
Detailed Lesson Plan (DLP) Format: Code
6 pages
Stata Notes
No ratings yet
Stata Notes
3 pages
2225 5025 1 PB
No ratings yet
2225 5025 1 PB
6 pages
Granthikadi Churna
No ratings yet
Granthikadi Churna
17 pages
IFT Notes R05 Sampling and Estimation
No ratings yet
IFT Notes R05 Sampling and Estimation
16 pages
181 1425 1 PB
No ratings yet
181 1425 1 PB
15 pages
Hall, J. 2020. Rural
No ratings yet
Hall, J. 2020. Rural
8 pages
Chapter-5 Introduction To Probability
No ratings yet
Chapter-5 Introduction To Probability
15 pages
Oup 8
No ratings yet
Oup 8
16 pages
Anova
No ratings yet
Anova
3 pages
Apple 1
No ratings yet
Apple 1
10 pages
Statistics For Economists 1 Ec132 Module Outline 2021
No ratings yet
Statistics For Economists 1 Ec132 Module Outline 2021
4 pages
HASTS 412 Assignment-2
No ratings yet
HASTS 412 Assignment-2
2 pages

Lec 6

Uploaded by

Lec 6

Uploaded by

L INEAR R EGRESSION

Nehal Khosla, Priyanshu Parida

National Institute of Science Education and Research

January 30, 2023

2 Linear Regression: The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Linear Regression: The Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1 Applications and Shortcomings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

L INEAR R EGRESSION : T HE P ROBLEM

▶ Linear regression is a supervised machine learning algorithm.

Figure. Regression line for log R (dependent variable)

where B = Matrix of regression parameters

L INEAR R EGRESSION : T HE I NDUCTIVE B IAS

Linear regression takes the following assumptions, or inductive biases:

L INEAR R EGRESSION : A PPLICATIONS AND

1. CS460 Machine Learning 2023 Lectures, Subhankar Mishra.

You might also like