0% found this document useful (0 votes)

25 views33 pages

Unit 3 Notes

Linear regression is a statistical technique to model the relationship between a dependent variable and one or more independent variables. The goal is to find the best fit line that minimizes the difference between predicted and actual dependent variable values. Key assumptions include linearity, independence, homoscedasticity, normality, and no multicollinearity. The cost function, typically mean squared error, is minimized using gradient descent to estimate optimal parameter values for the regression model. Multivariate linear regression extends this to modeling relationships between a dependent variable and multiple independent variables.

Uploaded by

sjanani.bme2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views33 pages

Unit 3 Notes

Uploaded by

sjanani.bme2021

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 33

UNIT 3

INTRODUCTION
Apple

Orange

Apple

Orange

Apple
1. Linear Regression
1. Linear Regression
UNIT 3
Linear Regression

1. Problem Statement
2. Assumptions
3. Equations
4. Cost Function
5. Gradient Descent
6. Multi Variate Linear Regression
7. Bayesian Linear Regression
Problem Statement

Linear regression is a widely used statistical technique to model the relationship

between a dependent variable and one or more independent variables.

1. Linearity
2. Independence
3. Homoscedasticity
4. Normality
5. No Multicollinearity
Assumptions in Linear Regression
1. Linearity: The relationship between the dependent variable and the independent variable(s) is linear.
This means that as the value of the independent variable changes, the value of the dependent variable
changes proportionally. For example, the relationship between the number of hours worked and the
amount of money earned should be linear.
2. Independence: The observations are independent of each other. In other words, the value of the
dependent variable for one observation does not affect the value of the dependent variable for another
observation. For example, the height of one person should not affect the height of another person.
3. Homoscedasticity: The variance of the errors is constant across all levels of the independent
variable(s). This means that the spread of the residuals is the same for all values of the independent
variable. For example, the variance of the errors in a model that predicts house prices should be the
same for houses of different sizes.
4. Normality: The errors are normally distributed. This means that the distribution of the residuals should
be symmetric around zero, with most of the residuals falling close to zero and fewer residuals falling
farther away from zero. For example, the distribution of the residuals in a model that predicts student
test scores should be normal.
5. No Multicollinearity: The independent variables are not highly correlated with each other. This means
that there should be no strong linear relationships among the independent variables. For example, in a
model that predicts employee salaries, there should be no strong correlation between the years of
education and the number of years of work experience.
Equations

● Observed Value - Data speciﬁed in the given dataset.

● Expected Value - Value calculated using regression line. Hours Marks
● Residual - Difference between observed value and expected value. of study
● Need of squaring - Because when expected is less than observed value —--->
overall reduction in error.
● Slope - a slope of a line is the change in y coordinate with respect to the 1 15
change in x coordinate. 2 70
● Intercept - The point where the line or curve crosses the axis of the graph is
called intercept. If a point crosses the x-axis, then it is called the x-intercept. 4 60
If a point crosses the y-axis, then it is called the y-intercept
8 95
Key points of Linear Regression

Linear regression is a widely used statistical technique to model the relationship between a dependent variable
and one or more independent variables.

● The goal of linear regression is to find the best fit line that describes the relationship between the
dependent variable and the independent variable(s). The best fit line is the one that minimizes the
distance between the predicted values and the actual values of the dependent variable.
● The dependent variable is the outcome variable, while the independent variable(s) are the predictor
variable(s).
● The parameters of the linear regression model are estimated using the method of least squares. The
method of least squares minimizes the sum of the squared errors between the predicted and actual
values of the dependent variable.
● Linear regression can be used for both simple linear regression, where there is only one independent
variable, and multiple linear regression, where there are more than one independent variables.
● Linear regression can be used for both continuous and categorical independent variables. However,
categorical variables need to be encoded as dummy variables.
● In simple linear regression, the relationship between the dependent variable and the independent
variable is modeled using a straight line. In multiple linear regression, the relationship between the
dependent variable and the independent variables is modeled using a plane or a hyperplane.
Cost Function
Mean Squared Error

[1]
In statistics, the mean squared error (MSE) or mean squared deviation (MSD) of an estimator (of a procedure for
estimating an unobserved quantity) measures the average of the squares of the errors—that is, the average squared difference
between the estimated values and the actual value
How to ﬁnd best values for m and b?
Gradient Descent
Key Points of Gradient Descent
Gradient descent is a popular optimization algorithm used to ﬁnd the optimal values of the parameters in
a linear regression model.

The objective of linear regression is to ﬁnd the best ﬁt line that describes the relationship between the
independent variable(s) and the dependent variable.

The best ﬁt line is characterized by the parameters, which are estimated using the method of least
squares. However, ﬁnding the optimal values of the parameters can be computationally expensive,
especially when the dataset is large.

Gradient descent is a useful technique to ﬁnd the optimal parameters efﬁciently.

Key Points of Gradient Descent - Continued

1. Initialize the parameters: Start by initializing the parameters (weights) to small

random values.
2. Compute the cost function: The cost function measures the difference between
the predicted and actual values of the dependent variable. In linear regression, the
cost function is typically the mean squared error (MSE) or the sum of squared
errors (SSE). Compute the cost function using the current values of the
parameters.
3. Compute the gradients: The gradients are the partial derivatives of the cost
function with respect to the parameters. Compute the gradients using the current
values of the parameters.
4. Update the parameters: Update the parameters by subtracting the gradients
multiplied by the learning rate. The learning rate determines how big the steps are
in the direction of the steepest descent. The learning rate is a hyperparameter that
needs to be tuned.
5. Repeat steps 2 to 4: Repeat steps 2 to 4 until the cost function converges or until a
maximum number of iterations is reached.
Multi Variate Regression

Area Size Number of Price of House

Bedrooms

Tambaram 1200 sqft 2 50 lakhs

Egmore 1000 sqft 1 80 lakhs

Velachery 2000 sqft 3 1 crore

Chengalpattu 2400 sqft 2 60 lakhs

Computing Parameters

● Generally, when it comes to multivariate linear regression, we don't throw in all the independent
variables at a time and start minimizing the error function.
● First one should focus on selecting the best possible independent variables that contribute well to
the dependent variable.
● For this, we go on and construct a correlation matrix for all the independent variables and the
dependent variable from the observed data.
● The correlation value gives us an idea about which variable is significant and by what factor.
● From this matrix we pick independent variables in decreasing order of correlation value and run the
regression model to estimate the coefficients by minimizing the error function.
● We stop when there is no prominent improvement in the estimation function by inclusion of the next
independent feature.
● This method can still get complicated when there are large no.of independent features that have
significant contribution in deciding our dependent variable
Key points

Multivariate regression is an extension of simple linear regression. It is used when we

want to predict the value of a variable based on the value of two or more different
variables.

The variable we want to predict is called the Dependent Variable, while those used to
calculate the dependent variable are termed as Independent Variables.

ML Module3 Regression
No ratings yet
ML Module3 Regression
51 pages
Chapter - 2 - Linear and Logistic Regression
No ratings yet
Chapter - 2 - Linear and Logistic Regression
34 pages
Data Science
100% (1)
Data Science
14 pages
Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Linear Regression
No ratings yet
Linear Regression
8 pages
Gradient Descent
No ratings yet
Gradient Descent
11 pages
11 Regression
No ratings yet
11 Regression
34 pages
Linear Models
No ratings yet
Linear Models
50 pages
Introduction To The Renormalization Group: Hands-On Course To The Basics of The RG
100% (1)
Introduction To The Renormalization Group: Hands-On Course To The Basics of The RG
35 pages
Unit-3 Data Analysis
No ratings yet
Unit-3 Data Analysis
36 pages
5.linear Regression
No ratings yet
5.linear Regression
39 pages
Assignment 16 The Ideal Gas
No ratings yet
Assignment 16 The Ideal Gas
8 pages
Linearregressionpl
No ratings yet
Linearregressionpl
9 pages
Dimpas Bscpe 2-7 Assignment No.9
No ratings yet
Dimpas Bscpe 2-7 Assignment No.9
17 pages
DA unit-III
No ratings yet
DA unit-III
30 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
IV Ai & Ds Al3451 ML Unit2
No ratings yet
IV Ai & Ds Al3451 ML Unit2
50 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
ML Section2
No ratings yet
ML Section2
36 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Evans Analytics1e PPT 10
No ratings yet
Evans Analytics1e PPT 10
61 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
1.5.linear Regression
No ratings yet
1.5.linear Regression
5 pages
Unit 3 Da
No ratings yet
Unit 3 Da
20 pages
TCMG - MEEG 573 - SP - 20 - Lecture - 7
No ratings yet
TCMG - MEEG 573 - SP - 20 - Lecture - 7
69 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Lecture6 Regression
No ratings yet
Lecture6 Regression
42 pages
Ideal Gas Law
No ratings yet
Ideal Gas Law
7 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Group 1 Practical
No ratings yet
Group 1 Practical
16 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Classical Machine Learning: Linear Regression: Ramesh S
No ratings yet
Classical Machine Learning: Linear Regression: Ramesh S
28 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Machine Learning Unit2
No ratings yet
Machine Learning Unit2
31 pages
Hanan
No ratings yet
Hanan
9 pages
SimpleMultipleLinearRegression FoundationalMathofAI S24
No ratings yet
SimpleMultipleLinearRegression FoundationalMathofAI S24
6 pages
Waqar Ansari's RISE QM Ch#10
No ratings yet
Waqar Ansari's RISE QM Ch#10
15 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
FML Unit2
No ratings yet
FML Unit2
13 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
Hsslive XI Chemistry QB CH 5. States of Matter
0% (1)
Hsslive XI Chemistry QB CH 5. States of Matter
4 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
MidtermExamandSolution10102016 PDF
No ratings yet
MidtermExamandSolution10102016 PDF
143 pages
3 Unit - Dspu
No ratings yet
3 Unit - Dspu
23 pages
UnivariateRegression 2
No ratings yet
UnivariateRegression 2
72 pages
التحليل الاحصائي للمتغيرات المتعددة
No ratings yet
التحليل الاحصائي للمتغيرات المتعددة
205 pages
7 Estimation Describing A Single Population
No ratings yet
7 Estimation Describing A Single Population
92 pages
Iai&ml Unit-5
No ratings yet
Iai&ml Unit-5
15 pages
Dickey-Fuller Test Real Statistics 2
No ratings yet
Dickey-Fuller Test Real Statistics 2
7 pages
Peasaran Et Al 2001 Bound Test and ARDL Cointegrat
No ratings yet
Peasaran Et Al 2001 Bound Test and ARDL Cointegrat
33 pages
Ideal Chain
No ratings yet
Ideal Chain
5 pages
CH07 Linear Regression
No ratings yet
CH07 Linear Regression
39 pages
Analisis Autokorelasi Spasialtitik Panas Di Kalimantan Timur Menggunakan Indeks Moran PDF
No ratings yet
Analisis Autokorelasi Spasialtitik Panas Di Kalimantan Timur Menggunakan Indeks Moran PDF
8 pages
Examples Lecture 4,5,6,7,8,9
No ratings yet
Examples Lecture 4,5,6,7,8,9
20 pages
Introduction To The Theory of Ferromagnetism: Exact Solution of The Ising Model in One Dimension. Antiferromagnetism
No ratings yet
Introduction To The Theory of Ferromagnetism: Exact Solution of The Ising Model in One Dimension. Antiferromagnetism
8 pages
RLB Contoh
No ratings yet
RLB Contoh
13 pages
S.I C.I Important Questions NQT
No ratings yet
S.I C.I Important Questions NQT
22 pages
6 +ARTIKEL+Nur+Rahma
No ratings yet
6 +ARTIKEL+Nur+Rahma
9 pages
Stat2 Chapter 2-1
No ratings yet
Stat2 Chapter 2-1
10 pages
Foi 5590
No ratings yet
Foi 5590
19 pages
Chap 1,2,3,5,6 (QA) Upload
No ratings yet
Chap 1,2,3,5,6 (QA) Upload
6 pages
08 212020082 Nalitalia Ramjani
No ratings yet
08 212020082 Nalitalia Ramjani
4 pages
ECON3334 Midterm Fall2022 Question
No ratings yet
ECON3334 Midterm Fall2022 Question
7 pages
LMS TEMPLATE - NOTES Chap 4
No ratings yet
LMS TEMPLATE - NOTES Chap 4
6 pages
Word Ekonometrika II
No ratings yet
Word Ekonometrika II
5 pages
Panel Questions
No ratings yet
Panel Questions
5 pages
Wa0046.
No ratings yet
Wa0046.
2 pages
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
From Everand
Acceptance-Rejection Sampling and Multi-dimensional Monte Carlo Integrations Utilizing Mathematica®
SUJAUL CHOWDHURY
No ratings yet
A Conversation About Calculus
From Everand
A Conversation About Calculus
Ginachukwu Amah
No ratings yet
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
From Everand
Multi-dimensional Monte Carlo Integrations Utilizing Mathematica
SUJAUL CHOWDHURY
No ratings yet
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)

Unit 3 Notes

Uploaded by

Unit 3 Notes

Uploaded by

UNIT 3

Linear regression is a widely used statistical technique to model the relationship

● Observed Value - Data speciﬁed in the given dataset.

Gradient descent is a useful technique to ﬁnd the optimal parameters efﬁciently.

1. Initialize the parameters: Start by initializing the parameters (weights) to small

Area Size Number of Price of House

Tambaram 1200 sqft 2 50 lakhs

Egmore 1000 sqft 1 80 lakhs

Velachery 2000 sqft 3 1 crore

Chengalpattu 2400 sqft 2 60 lakhs

Multivariate regression is an extension of simple linear regression. It is used when we

You might also like