0% found this document useful (0 votes)

60 views11 pages

Linear Regression

Linear regression is a machine learning algorithm that models the linear relationship between a dependent variable and one or more independent variables. It makes predictions for continuous variables. For an accurate linear regression model, the data must meet certain assumptions like linearity, independence, homoscedasticity, normality, and no multicollinearity. Gradient descent is commonly used to update the model parameters and minimize error to find the best fit line. Model performance is evaluated using metrics like the coefficient of determination and residual sum of squares.

Uploaded by

graphiegy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

60 views11 pages

Linear Regression

Uploaded by

graphiegy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Linear Regression in Machine learning

Linear regression is a type of supervised machine learning algorithm that computes the linear relationship
between a dependent variable and one or more independent features. When the number of the independent
feature, is 1 then it is known as Univariate Linear regression, and in the case of more than one feature, it is known
as multivariate linear regression.

Linear regression makes predictions for continuous/real or numeric variables such as sales, salary, age, product
price, etc.

Assumption for Linear Regression Model

Linear regression is a powerful tool for understanding and predicting the behaviour
of a variable, however, it needs to meet a few conditions in order to be accurate
and dependable solutions.

1. Linearity: The independent and dependent variables have a linear relationship

with one another. This implies that changes in the dependent variable follow
those in the independent variable(s) in a linear fashion. This means that there
should be a straight line that can be drawn through the data points. If the
relationship is not linear, then linear regression will not be an accurate model.

Independence: The observations in the dataset are independent of each other.

This means that the value of the dependent variable for one observation does
not depend on the value of the dependent variable for another observation. If
the observations are not independent, then linear regression will not be an
accurate model.

Homoscedasticity: Across all levels of the independent variable(s), the

variance of the errors is constant. This indicates that the amount of the
independent variable(s) has no impact on the variance of the errors. If the
variance of the residuals is not constant, then linear regression will not be an
accurate model.
Normality: The residuals should be normally distributed. This means that the
residuals should follow a bell-shaped curve. If the residuals are not normally
distributed, then linear regression will not be an accurate model.

No multicollinearity: There is no high correlation between the independent

variables. This indicates that there is little or no correlation between the
independent variables.Multicollinearity occurs when two or more independent
variables are highly correlated with each other, which can make it difficult to
determine the individual effect of each variable on the dependent variable. If
there is multicollinearity, then linear regression will not be an accurate model.

Linear regression algorithm shows a linear relationship between a dependent (y) and one or
more independent (y) variables, hence called as linear regression. Since linear regression
shows the linear relationship, which means it finds how the value of the dependent variable is
changing according to the value of the independent variable.

The linear regression model provides a sloped straight line representing the relationship
between the variables. Consider the below image:

Types of Linear Regression

There are two main types of linear regression:

Simple Linear Regression: This is the simplest form of linear regression, and it
involves only one independent variable and one dependent variable. The
equation for simple linear regression is:
where:

Y is the dependent variable

X is the independent variable
β0 is the intercept
β1 is the slope
Multiple Linear Regression: This involves more than one independent variable
and one dependent variable. The equation for multiple linear regression is:

where:

Y is the dependent variable

X1, X2, …, Xp are the independent variables
β0 is the intercept
β1, β2, …, βn are the slopes

Some other regression types are-

Polynomial Regression – Polynomial regression goes beyond simple linear

regression by incorporating higher-order polynomial terms of the independent
variable(s) into the model. It is represented by the general equation:
Ridge Regression – Ridge regression is a regularization technique used to prevent
overfitting in linear regression models, especially when dealing with multiple
independent variables. It introduces a penalty term to the least squares objective
function, biasing the model towards solutions with smaller coefficients. The equation
for ridge regression becomes:
Lasso Regression – Lasso regression is another regularization technique that uses
an L1 penalty term to shrink the coefficients of less important independent variables
towards zero, effectively performing feature selection. The equation for lasso
regression becomes:
Elastic Net Regression – Elastic net regression combines the penalties of ridge and
lasso regression, offering a balance between their strengths. It uses a mixed penalty
term of the form
The goal of the algorithm is to find the best Fit Line equation that can predict the values based on
the independent variables

What is the best Fit Line?

Our primary objective while using linear regression is to locate the best-fit line, which
implies that the error between the predicted and actual values should be kept to a
minimum. There will be the least error in the best-fit line.

The best Fit Line equation provides a straight line that represents the relationship between
the dependent and independent variables. The slope of the line indicates how much the
dependent variable changes for a unit change in the independent variable(s).

Positive Linear Relationship:

If the dependent variable increases on the Y-axis and independent variable increases on
X-axis, then such a relationship is termed as a Positive linear relationship.
Negative Linear Relationship:
If the dependent variable decreases on the Y-axis and independent variable increases on
the X-axis, then such a relationship is called a negative linear relationship.
Linear regression performs the task to predict a dependent variable value (y) based
on a given independent variable (x)). Hence, the name is Linear Regression. In the
figure above, X (input) is the work experience and Y (output) is the salary of a
person. The regression line is the best-fit line for our model.

We utilize the cost function to compute the best values in order to get the best fit
line since different values for weights or the coefficient of lines result in different
regression lines.

Hypothesis function for Linear Regression

As we have assumed earlier that our independent feature is the experience i.e X
and the respective salary Y is the dependent variable. Let’s assume there is a linear
relationship between X and Y then the salary can be predicted using:

Here,

are labels to data (Supervised learning)

are the input independent training data (univariate – one input
variable(parameter))
are the predicted values.

The model gets the best regression fit line by finding the best θ1 and θ2 values.

θ1: intercept
θ2: coefficient of x

Once we find the best θ1 and θ2 values, we get the best-fit line. So when we are
finally using our model for prediction, it will predict the value of y for the input
value of x.

Cost function for Linear Regression

The cost function or the loss function is nothing but the error or difference
between the predicted value and the true value Y. It is the Mean Squared Error
(MSE) between the predicted value and the true value. The cost function (J) can be
written as:

How to update θ1 and θ2 values to get the best-fit line?

To achieve the best-fit regression line, the model aims to predict the target value
such that the error difference between the predicted value and the true value Y is
minimum. So, it is very important to update the θ1 and θ2 values, to reach the best
value that minimizes the error between the predicted y value (pred) and the true y
value (y).
Gradient Descent for Linear Regression

A linear regression model can be trained using the optimization algorithm gradient
descent by iteratively modifying the model’s parameters to reduce the mean
squared error (MSE) of the model on a training dataset. To update θ1 and θ2 values
in order to reduce the Cost function (minimizing RMSE value) and achieve the best-
fit line the model uses Gradient Descent. The idea is to start with random θ1 and θ2
values and then iteratively update the values, reaching minimum cost.

A gradient is nothing but a derivative that defines the effects on outputs of the
function with a little bit of variation in inputs.
Let’s differentiate the cost function(J) with respect to

Let’s differentiate the cost function(J) with respect to

Finding the coefficients of a linear equation that best fits the training data is the
objective of linear regression. By moving in the direction of the Mean Squared
Error negative gradient with respect to the coefficients, the coefficients can be
changed. And the respective intercept and coefficient of X will be if is the learning
rate.
Evaluation Metrics for Linear Regression

A variety of evaluation measures can be used to determine the strength of any

linear regression model. These assessment metrics often give an indication of how
well the model is producing the observed outputs.

The most common measurements are:

Coefficient of Determination (R-squared): R-Squared is a statistic that indicates

how much variation the developed model can explain or capture. It is always in
the range of 0 to 1. In general, the better the model matches the data, the
greater the R-squared number.
In mathematical notation, it can be expressed as :
Residual sum of Squares (RSS)– The sum of squares of the residual for each
data point in the plot or data is known as the residual sum of squares, or
RSS. It is a measurement of the difference between the output that was
observed and what was anticipated.

Total Sum of Squares (TSS)– The sum of the data points’ errors from the
answer variable’s mean is known as the total sum of squares, or TSS.

Root Mean Squared Error (RMSE): The square root of the residuals’ variance is
the Root Mean Squared Error. It describes how well the observed data points
match the expected values, or the model’s absolute fit to the data.
In mathematical notation, it can be expressed as:

Rather than dividing the entire number of data points in the model by the number
of degrees of freedom, one must divide the sum of the squared residuals to obtain
an unbiased estimate. Then, this figure is referred to as the Residual Standard Error
(RSE). In mathematical notation, it can be expressed as:

RSME is not as good of a metric as R-squared. Root Mean Squared Error can
fluctuate when the units of the variables vary since its value is dependent on the
variables’ units (it is not a normalized measure).

Unit 3c Linear Regression
No ratings yet
Unit 3c Linear Regression
98 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
Simple Linear Regression With Example Problem
No ratings yet
Simple Linear Regression With Example Problem
12 pages
Unit 2
No ratings yet
Unit 2
136 pages
PDF of Data-Structure
No ratings yet
PDF of Data-Structure
178 pages
Data Science
100% (1)
Data Science
14 pages
Day.9 SML
No ratings yet
Day.9 SML
23 pages
Unit 2
No ratings yet
Unit 2
19 pages
(Unit-04) Part-01 - ML Algo
No ratings yet
(Unit-04) Part-01 - ML Algo
49 pages
ML Unit2
No ratings yet
ML Unit2
69 pages
ML 02 Regression 2
No ratings yet
ML 02 Regression 2
30 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Simple Linear and Logistic Regression
No ratings yet
Simple Linear and Logistic Regression
81 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
5 - AML Lecture 5 - Linear Regression
No ratings yet
5 - AML Lecture 5 - Linear Regression
56 pages
MA Economics Entrance Course
0% (1)
MA Economics Entrance Course
10 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
Regression
No ratings yet
Regression
4 pages
Linear Regression - Module 3
No ratings yet
Linear Regression - Module 3
16 pages
Linear Regression
No ratings yet
Linear Regression
17 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Operation Management
0% (1)
Operation Management
12 pages
Linear Regression
No ratings yet
Linear Regression
35 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
ML Lecture - 3
No ratings yet
ML Lecture - 3
47 pages
Unit III
No ratings yet
Unit III
18 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
33 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Complete Linear Regression Algorithm
No ratings yet
Complete Linear Regression Algorithm
4 pages
Chapter4 Regression
No ratings yet
Chapter4 Regression
15 pages
Linear Regression
No ratings yet
Linear Regression
49 pages
Regression Analysis
No ratings yet
Regression Analysis
49 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
Lecture Note #8 - PEC-CS701E
No ratings yet
Lecture Note #8 - PEC-CS701E
20 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Linear Regression - Everything You Need To Know About Linear Regression
No ratings yet
Linear Regression - Everything You Need To Know About Linear Regression
17 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
DS Unit-Iv
No ratings yet
DS Unit-Iv
34 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Hanan
No ratings yet
Hanan
9 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Generalized Functional Linear Models
No ratings yet
Generalized Functional Linear Models
32 pages
ASTM New atNS
No ratings yet
ASTM New atNS
3 pages
BA 503-Financial Analytics
No ratings yet
BA 503-Financial Analytics
8 pages
Regression
No ratings yet
Regression
24 pages
Logistic Regression Model Study Assignment
100% (1)
Logistic Regression Model Study Assignment
5 pages
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
No ratings yet
Simple Linear Regression Analysis: Mcgraw-Hill/Irwin
16 pages
Aiml Unit 1 Nil
No ratings yet
Aiml Unit 1 Nil
24 pages
Chapter 4 Demand Estimation
No ratings yet
Chapter 4 Demand Estimation
9 pages
This Content Downloaded From 103.197.103.131 On Tue, 18 Jul 2023 10:29:28 +00:00
No ratings yet
This Content Downloaded From 103.197.103.131 On Tue, 18 Jul 2023 10:29:28 +00:00
21 pages
Data Science - Unit-4
No ratings yet
Data Science - Unit-4
30 pages
OBS Syllabus MSC Horticulture Final 06092022
No ratings yet
OBS Syllabus MSC Horticulture Final 06092022
87 pages
Applied Numerical Methods With MATLAB For Engineers and Scientists 2nd Edition Steven Chapra Solutions Manual 1
100% (75)
Applied Numerical Methods With MATLAB For Engineers and Scientists 2nd Edition Steven Chapra Solutions Manual 1
34 pages
A Study of Heritage Hotels in Rajasthan, India
No ratings yet
A Study of Heritage Hotels in Rajasthan, India
12 pages
Harsh ML
No ratings yet
Harsh ML
24 pages
Chemometrics in Spectroscopy (Second Edition) Howard Mark - Ebook PDF Download
100% (1)
Chemometrics in Spectroscopy (Second Edition) Howard Mark - Ebook PDF Download
58 pages
Crop Yield Pred Iction Using Regression Model
No ratings yet
Crop Yield Pred Iction Using Regression Model
6 pages
Stochastic Gradient Descent Algorithm
No ratings yet
Stochastic Gradient Descent Algorithm
6 pages
(FREE PDF Sample) OpenIntro Statistics 4th Edition David Diez Ebooks
No ratings yet
(FREE PDF Sample) OpenIntro Statistics 4th Edition David Diez Ebooks
72 pages
IPL - Prediction - Model - Training - Final - Ipynb - Colab
No ratings yet
IPL - Prediction - Model - Training - Final - Ipynb - Colab
8 pages
An Introduction To Statistical Methods and Data Analysis 6th Edition R. Lyman Ott - Downloadable PDF 2025
No ratings yet
An Introduction To Statistical Methods and Data Analysis 6th Edition R. Lyman Ott - Downloadable PDF 2025
52 pages
Palm Oil Article
No ratings yet
Palm Oil Article
8 pages
COmputational Thinking by Me
No ratings yet
COmputational Thinking by Me
28 pages
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
No ratings yet
Div Class Title Explaining Fixed Effects Random Effects Modeling of Time Series Cross Sectional and Panel Data A Href fn2606 Ref Type FN A Div
21 pages
Factors Affecting Foreign Direct Investment in Pakistan
No ratings yet
Factors Affecting Foreign Direct Investment in Pakistan
15 pages
Determinants of Job Satisfaction Among Extension Agents in Benue State
No ratings yet
Determinants of Job Satisfaction Among Extension Agents in Benue State
11 pages
Harsh Neural Netwrork
No ratings yet
Harsh Neural Netwrork
16 pages
PHD Agriculture Statistics
No ratings yet
PHD Agriculture Statistics
7 pages
Tianeptina 1
No ratings yet
Tianeptina 1
5 pages
Determinants of Academic Success Published17021-19364-1-Pb
No ratings yet
Determinants of Academic Success Published17021-19364-1-Pb
8 pages
QA Imp
No ratings yet
QA Imp
4 pages
Profile
No ratings yet
Profile
2 pages
Assignment 2
No ratings yet
Assignment 2
2 pages

Linear Regression

Uploaded by

Linear Regression

Uploaded by

Linear Regression in Machine learning

Assumption for Linear Regression Model

1. Linearity: The independent and dependent variables have a linear relationship

Independence: The observations in the dataset are independent of each other.

Homoscedasticity: Across all levels of the independent variable(s), the

No multicollinearity: There is no high correlation between the independent

Types of Linear Regression

There are two main types of linear regression:

Y is the dependent variable

Y is the dependent variable

Some other regression types are-

Polynomial Regression – Polynomial regression goes beyond simple linear

What is the best Fit Line?

Positive Linear Relationship:

Hypothesis function for Linear Regression

are labels to data (Supervised learning)

Cost function for Linear Regression

How to update θ1 and θ2 values to get the best-fit line?

Let’s differentiate the cost function(J) with respect to

A variety of evaluation measures can be used to determine the strength of any

The most common measurements are:

Coefficient of Determination (R-squared): R-Squared is a statistic that indicates

You might also like