0% found this document useful (0 votes)

13 views41 pages

Unit - 3 - ML - 24

This document covers supervised learning with a focus on regression techniques, including bias, variance, underfitting, and overfitting. It explains linear regression, its types, and the mathematical representation, as well as advanced techniques like Lasso and Ridge regression, and performance metrics such as MAE and RMSE. Additionally, it discusses optimization algorithms like Batch and Stochastic Gradient Descent.

Uploaded by

knair051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views41 pages

Unit - 3 - ML - 24

Uploaded by

knair051

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 41

Unit -3

Supervised Learning : Regression (06Hrs)

• Bias, Variance, Generalization, Underfitting, Overfitting,
• Linear regression, Regression: Lasso regression, Ridge regression, Gradient descent
algorithm.
• Evaluation Metrics: MAE, RMSE, R2

Dr. Rupali Pawar

Bias and Variance
• Bias :
While making predictions, a difference occurs
between prediction values made by the model
and actual values/expected values, and this
difference is known as bias errors or Errors due
to bias.
Distance between the average predictions by the
model and the truth
• Variance
Difference between the predictions from different
models
Variance errors are either of low variance or high
variance.

Dr.Rupali Pawar
Bias and Variance

Dr.Rupali Pawar
Underfitting
• In the case of underfitting, the model is
not able to learn enough from the
training data, and hence it reduces the
accuracy and produces unreliable
predictions.
• An underfitted model has high bias and
low variance.
How to avoid underfitting:
• By increasing the training time of the
model.
• By increasing the number of features.

Dr.Rupali Pawar
Over fitting and how to reduce overfitting
What is overfitting? How to reduce overfitting
• Building a model that matches the training • Cross-validation: By splitting the data into training and testing sets multiple
times, cross-validation can help identify if a model is overfitting or underfitting
data “too closely”, generating a complex and can be used to tune hyperparameters to reduce variance.
model.
• Feature selection: By choosing the only relevant feature will decrease the
Why does it occur? model’s complexity. and it can reduce the variance error.

• Evaluating a model by testing it on the same • Regularization: We can use L1 or L2 regularization to reduce variance in
machine learning models
data that was used to train it.
• Ensemble methods: It will combine multiple models to improve
• Creating a model that is “too complex”. generalization performance. Bagging, boosting, and stacking are common
ensemble methods that can help reduce variance and improve generalization
What is the impact of over-fitting? performance.
• Simplifying the model: Reducing the complexity of the model, such as
• Model will do well on the training data, but decreasing the number of parameters or layers in a neural network, can also
won’t generalize to out-of-sample data i.e.,on help reduce variance and improve generalization performance.
test data • Early stopping: Early stopping is a technique used to prevent overfitting by
stopping the training of the deep learning model when the performance on the
• Model will have low bias, but high variance. validation set stops improving.

Dr.Rupali Pawar
Dr.Rupali Pawar
Bias Variance Tradeoff

Dr.Rupali Pawar
Linear Regression

• Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a target
or dependent (y) and one or more independent (y) variables, hence called
as linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
• The linear regression model provides a sloped straight line representing
the relationship between the variables.

Dr. Rupali Pawar

Types of Linear Regression
• Linear regression can be further
divided into two types of the
algorithm
• Simple Linear Regression:
If a single independent variable is
used to predict the value of a
numerical dependent variable, then
such a Linear Regression algorithm is
called Simple Linear Regression.
• Multiple Linear regression:
If more than one independent
variable is used to predict the value of
a numerical dependent variable, then
such a Linear Regression algorithm is
called Multiple Linear Regression.

Dr. Rupali Pawar

Linear Regression

• The linear regression model provides a sloped

straight line representing the relationship
between the variables.
• Mathematical Representation
y= a0+a1x+ ε

• Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional
degree of freedom)
a1 = Linear regression coefficient (scale factor to
each input value).
ε = random error

Dr. Rupali Pawar

Linear Regression

• Positive Linear Relationship:

•
If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as
a Positive linear relationship.

• Negative Linear Relationship :

•
If the dependent variable decreases on the Y-axis and
independent variable increases on the X-axis, then such a
relationship is called a negative linear relationship.

Dr. Rupali Pawar

Linear Regression

• Linear regression is one of the easiest and most popular Machine Learning
algorithms. It is a statistical method that is used for predictive analysis.
Linear regression makes predictions for continuous/real or numeric
variables such as sales, salary, age, product price, etc.
• Linear regression algorithm shows a linear relationship between a target
or dependent (y) and one or more independent (x) variables, hence called
as linear regression. Since linear regression shows the linear relationship,
which means it finds how the value of the dependent variable is changing
according to the value of the independent variable.
• The linear regression model provides a sloped straight line representing
the relationship between the variables.

Dr. Rupali Pawar

Linear Regression

• The linear regression model provides a sloped

straight line representing the relationship
between the variables.
• Mathematical Representation
y= a0+a1x+ ε

• Y= Dependent Variable (Target Variable)

X= Independent Variable (predictor Variable)
a0= intercept of the line (Gives an additional
degree of freedom)
a1 = Linear regression coefficient (scale factor to
each input value).
ε = random error

Dr. Rupali Pawar

Linear Regression

• Positive Linear Relationship:

•
If the dependent variable increases on the Y-axis and independent
variable increases on X-axis, then such a relationship is termed as
a Positive linear relationship.

• Negative Linear Relationship :

•
If the dependent variable decreases on the Y-axis and
independent variable increases on the X-axis, then such a
relationship is called a negative linear relationship.

Dr. Rupali Pawar

Best Fit Line
• E = Y – Y`

• where, E denotes the prediction

error or residual error
• Y` denotes the predicted value
• Y denotes the actual value
• A line that fits the data "best" will
be one for which the prediction
errors (one for each data point) are
as small as possible. Y` denotes the predicted value
b denotes the slope of the line
X denotes the independent variable
A is the Y intercept
Finding the best fit line
Finding the best fit line
When working with linear regression Model, our main goal is to find the best
fit line that means the error between predicted values and actual values
should be minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a
different line of regression, so we need to calculate the best values for a0 and
a1 to find the best fit line, so to calculate this we use cost function.
• Cost function- J (a0, a1) =
• The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
• Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.

Dr. Rupali Pawar

Gradient Descent Algorithm
Linear Regression Numerical

Dr. Rupali Pawar

Numerical 1 Method -1 SPPU-Nov_Dec_22

Dr. Rupali Pawar

Numerical 1 Method 2

Dr. Rupali Pawar

Numerical 2 Method -1 SPPU-Nov_24

Dr. Rupali Pawar

Numerical 3 Method -1 SPPU-

Dr. Rupali Pawar

Finding the best fit line:
When working with linear regression Model, our main goal is to find the best
fit line that means the error between predicted values and actual values
should be minimized. The best fit line will have the least error.
The different values for weights or the coefficient of lines (a0, a1) gives a
different line of regression, so we need to calculate the best values for a0 and
a1 to find the best fit line, so to calculate this we use cost function.
• Cost function- J (a0, a1) =
• The different values for weights or coefficient of lines (a0, a1) gives the
different line of regression, and the cost function is used to estimate the
values of the coefficient for the best fit line.
• Cost function optimizes the regression coefficients or weights. It measures
how a linear regression model is performing.

Dr. Rupali Pawar

Applications of Linear Regression
•
• Marks scored by students based on number of hours studied (ideally)-
Here marks scored in exams are independent and the number of hours
studied is independent.
• Predicting crop yields based on the amount of rainfall- Yield is a
dependent variable while the measure of precipitation is an independent
variable.
• Predicting the Salary of a person based on years of experience-
Experience becomes the independent while Salary turns into the
dependent variable.
• Weather Forecasting -

Dr. Rupali Pawar

Logistic Regression
• Logistic regression is one of the most
popular Machine Learning algorithms,
which comes under the Supervised
Learning technique. It is used for
predicting the categorical dependent
variable using a given set of independent
variables.
• Logistic regression predicts the output of
a categorical dependent variable.
Therefore the outcome must be a
categorical or discrete value. It can be
either Yes or No, 0 or 1, true or False, etc.
but instead of giving the exact value as 0
and 1, it gives the probabilistic values
which lie between 0 and 1.
• ..

Dr. Rupali Pawar

sigmoid/ logistic function
• The sigmoid/ logistic function is a mathematical function used to map
the predicted values to probabilities. hθ (x) = 1/(1+e-(z)), z= θ0+θ1
• It maps any real value into another value within a range of 0 and 1.
• The value of the logistic regression must be between 0 and 1, which
cannot go beyond this limit, so it forms a curve like the "S" form. The
S-form curve is called the Sigmoid function or the logistic function.
• In logistic regression, we use the concept of the threshold value,
which defines the probability of either 0 or 1. Such as values above
the threshold value tends to 1, and a value below the threshold
values tends to 0.
Dr. Rupali Pawar
Cost Function for Logistic Regression
• J(θ0,θ1) =1/2m σ [ − y log (hθ (x) ) – (1−y) log (1−hθ (x) )]
• for y=0
• J(θ0,θ1) = - log (1-hθ (x) )
• for y=1
• J(θ0,θ1) = - log (hθ (x) )

Dr. Rupali Pawar

Ridge and Lasso Regression
• Ridge and Lasso regression are some of the simple techniques to reduce model
complexity and prevent over-fitting which may result from simple linear regression.
• Ridge regression is a technique used to analyze multi-linear regression (multicollinear),
also known as L2 regularization. It is Applied when predicted values are greater than the
observed values.

• Lasso stands for – Least Absolute Shrinkage and Selection Operator. It is a technique
where data points are shrunk towards a central point, like the mean. Lasso is also known
as L1 regularization.
• It is applied when the model is overfitted or facing computational challenges.

Dr. Rupali Pawar

Performance Metrics
• For Linear Regression, we use the Mean Squared Error (MSE) cost function, which is
the average of squared error occurred between the predicted values and actual
values. It can be written as:
• For the above linear equation, MSE can be calculated as:
•

• N=Total number of observation

Yi = Actual value
(a1xi+a0)= Predicted value.
• Other Metrics are
• R Squared,
• Adjusted R square,
• RMSE-Root Mean Squared Error
• MAE-Mean Absolute Error
Dr. Rupali Pawar
Batch Gradient Descent and Stochastic Gradient
Descent

Batch Gradient Descent involves calculations over the full training set at
each step, which is very slow on very large training data. Thus, it
becomes very computationally expensive to do Batch GD. However, this
is great for convex or relatively smooth error manifolds. Also, Batch GD
scales well with the number of features.
Batch Gradient Descent and Stochastic
Gradient Descent
• Stochastic Gradient Descent tries to solve the main problem in Batch Gradient
descent which is the usage of whole training data to calculate gradients at each
step. SGD is stochastic in nature i.e. it picks up a “random” instance of training
data at each step and then computes the gradient, making it much faster as
there is much fewer data to manipulate at a single time, unlike Batch GD.

Dr. Rupali Pawar

Batch Gradient Descent and Stochastic
Gradient Descent
• Batch gradient descent and stochastic gradient descent are both optimization algorithms used to
minimize the cost function in machine learning models, such as linear regression and neural
networks. The main differences between the two are:
• Data Processing Approach:
Batch gradient descent computes the gradient of the cost function with respect to the model
parameters using the entire training dataset in each iteration. Stochastic gradient descent, on the
other hand, computes the gradient using only a single training example or a small subset of examples
in each iteration.
• Convergence Speed:
Batch gradient descent takes longer to converge since it computes the gradient using the entire
training dataset in each iteration. Stochastic gradient descent, on the other hand, can converge faster
since it updates the model parameters after processing each example, which can lead to faster
convergence.

Dr. Rupali Pawar

• Convergence Accuracy:
Batch gradient descent is more accurate since it computes the gradient using the entire training dataset.
Stochastic gradient descent, on the other hand, can be less accurate since it computes the gradient using
a subset of examples, which can introduce more noise and variance in the gradient estimate.
• Computation and Memory Requirements:
Batch gradient descent requires more computation and memory since it needs to process the entire
training dataset in each iteration. Stochastic gradient descent, on the other hand, requires less
computation and memory since it only needs to process a single example or a small subset of examples
in each iteration.
• Optimization of Non-Convex Functions:
Stochastic gradient descent is more suitable for optimizing non-convex functions since it can escape
local minima and find the global minimum. Batch gradient descent, on the other hand, can get stuck in
local minima.

Dr. Rupali Pawar

Batch Gradient Descent Stochastic Gradient Descent

Computes gradient using the whole Training sample Computes gradient using a single Training sample

Slow and computationally expensive algorithm Faster and less computationally expensive than Batch GD

Not suggested for huge training samples. Can be used for large training samples.

Deterministic in nature. Stochastic in nature.

Gives optimal solution given sufficient time to converge. Gives good solution but not optimal.

The data sample should be in a random order, and this is why we want to
No random shuffling of points are required.
shuffle the training set for every epoch.

Can’t escape shallow local minima easily. SGD can escape shallow local minima more easily.

Convergence is slow. Reaches the convergence much faster.

It updates the model parameters only after processing the entire

It updates the parameters after each data point.
training set.

The learning rate is fixed and cannot be changed during training. The learning rate can be adjusted dynamically.

It may suffer from overfitting if the model is too complex for the It can help reduce overfitting by updating the model parameters more
dataset. frequently.

Dr. Rupali Pawar

Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Unit 2
No ratings yet
Unit 2
136 pages
Unit-2 ML
No ratings yet
Unit-2 ML
39 pages
Types of Supervised Learning2
No ratings yet
Types of Supervised Learning2
66 pages
Unit 2
No ratings yet
Unit 2
92 pages
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
No ratings yet
What Are Linear Models in Machine Learning (1) .Docx (Unit3 ML)
60 pages
ML Unit-2
No ratings yet
ML Unit-2
123 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Module III (Part II) (Regression and Time Series)
No ratings yet
Module III (Part II) (Regression and Time Series)
118 pages
Panel Data
No ratings yet
Panel Data
9 pages
Regression
No ratings yet
Regression
45 pages
Machine Learning: Bilal Khan
100% (2)
Machine Learning: Bilal Khan
20 pages
Module 5.2
No ratings yet
Module 5.2
51 pages
NOTES - UNIT 2 - Machine Learning
No ratings yet
NOTES - UNIT 2 - Machine Learning
33 pages
Ch-2 Supervised Machine Learning
No ratings yet
Ch-2 Supervised Machine Learning
48 pages
AAI Lecture 10 SP 25
No ratings yet
AAI Lecture 10 SP 25
37 pages
OE-ML Unit - 3
No ratings yet
OE-ML Unit - 3
29 pages
Data Science
100% (1)
Data Science
14 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
MachineLearning Unit-II
No ratings yet
MachineLearning Unit-II
45 pages
18-Linear Regression
No ratings yet
18-Linear Regression
29 pages
ML Unit Ii
No ratings yet
ML Unit Ii
30 pages
Linear Regression
No ratings yet
Linear Regression
16 pages
Unit 2
No ratings yet
Unit 2
19 pages
The Detection of Earnings Manipulation Messod D. Beneish
100% (14)
The Detection of Earnings Manipulation Messod D. Beneish
27 pages
ML Unit-2
No ratings yet
ML Unit-2
34 pages
Module 3
No ratings yet
Module 3
34 pages
AI - Mod 5. Part 3
No ratings yet
AI - Mod 5. Part 3
26 pages
Supervised Learning Algorithms
No ratings yet
Supervised Learning Algorithms
20 pages
Practice Questions Cost Behaviour
100% (1)
Practice Questions Cost Behaviour
6 pages
Lec 3 Regression.
No ratings yet
Lec 3 Regression.
20 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Chapter2 1
No ratings yet
Chapter2 1
55 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Mod3 Eda
No ratings yet
Mod3 Eda
16 pages
MachineLearning Unit II
No ratings yet
MachineLearning Unit II
45 pages
Solving One Variable Linear Equations
No ratings yet
Solving One Variable Linear Equations
10 pages
Linear-Regression ML
No ratings yet
Linear-Regression ML
36 pages
Linear Regression
No ratings yet
Linear Regression
24 pages
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
No ratings yet
Applying Machine Learning Algorithms With Scikit-Learn (Sklearn) - Notes
19 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Lecture Note #8 - PEC-CS701E
No ratings yet
Lecture Note #8 - PEC-CS701E
20 pages
Final Week 2
No ratings yet
Final Week 2
13 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
From Everand
Regression Analysis: An Intuitive Guide for Using and Interpreting Linear Models
Jim Frost
5/5 (4)
Hanan
No ratings yet
Hanan
9 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Linear Regression
No ratings yet
Linear Regression
14 pages
CSL0777 L12
No ratings yet
CSL0777 L12
18 pages
Linear Regression
No ratings yet
Linear Regression
18 pages
Linear Regression
No ratings yet
Linear Regression
60 pages
Linear Regression For Machine Learning
No ratings yet
Linear Regression For Machine Learning
9 pages
L4a - Supervised Learning
No ratings yet
L4a - Supervised Learning
25 pages
Linear Regression
No ratings yet
Linear Regression
11 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Machine Learning and Deep Learning Course
No ratings yet
Machine Learning and Deep Learning Course
23 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
Linear Regression
No ratings yet
Linear Regression
7 pages
Leon-Garcia-IPPR - Chapters 1-6
No ratings yet
Leon-Garcia-IPPR - Chapters 1-6
180 pages
ML Algorithm
No ratings yet
ML Algorithm
4 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
Non Linear Regression
No ratings yet
Non Linear Regression
20 pages
Midterm Review STA216: Generalized Linear Models: I I I I I I
No ratings yet
Midterm Review STA216: Generalized Linear Models: I I I I I I
26 pages
CS1 Specimen Questions and Solutions: July 2020
No ratings yet
CS1 Specimen Questions and Solutions: July 2020
7 pages
Latihan Soal Utk UAS
No ratings yet
Latihan Soal Utk UAS
5 pages
Lecture # 2 (The Classical Linear Regression Model) PDF
No ratings yet
Lecture # 2 (The Classical Linear Regression Model) PDF
3 pages
Stature Estimation From Foot Length
No ratings yet
Stature Estimation From Foot Length
4 pages
9.0 Lesson Plan
No ratings yet
9.0 Lesson Plan
16 pages
Applied Logistic Regression
No ratings yet
Applied Logistic Regression
15 pages
To Improve The Performance of Models Predicting Ba
No ratings yet
To Improve The Performance of Models Predicting Ba
6 pages
R Square 30%
No ratings yet
R Square 30%
10 pages
Q5 Mansci
No ratings yet
Q5 Mansci
3 pages
Linear Regression Basics QUIZS
No ratings yet
Linear Regression Basics QUIZS
13 pages
ACC 324 Wk8to9
No ratings yet
ACC 324 Wk8to9
18 pages
Probit Model
No ratings yet
Probit Model
5 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
AI34
No ratings yet
AI34
3 pages
AE 18-19, Lec 4 Multicollinearity Dummy Variables PDF
No ratings yet
AE 18-19, Lec 4 Multicollinearity Dummy Variables PDF
32 pages
Chap 014
No ratings yet
Chap 014
20 pages
DA222 Karluk
No ratings yet
DA222 Karluk
2 pages
Practice Set - VI: Sub: (MA 231)
No ratings yet
Practice Set - VI: Sub: (MA 231)
2 pages
Regression Model
No ratings yet
Regression Model
26 pages
Ordinal Regression
No ratings yet
Ordinal Regression
4 pages
The Regression Model For The Above Problem Could Be Stated As Under
No ratings yet
The Regression Model For The Above Problem Could Be Stated As Under
5 pages
DNI Exercise Sheet 6 Charlotte Baehren
No ratings yet
DNI Exercise Sheet 6 Charlotte Baehren
4 pages
STA1510 Chapter 8 Summary
No ratings yet
STA1510 Chapter 8 Summary
3 pages
Assignment - Econometrics For Finance
No ratings yet
Assignment - Econometrics For Finance
2 pages