Lab 1

The document outlines an experiment for a Machine Learning course focused on implementing Linear Regression and applying Regularization techniques to address overfitting. It explains the concepts of Linear Regression, Cost Function, Gradient Descent, and Regularization methods such as Lasso and Ridge Regression. The lab assignments require students to perform Linear Regression from scratch and utilize sklearn for Lasso and Ridge on provided datasets.

Uploaded by

yugsavlabooks

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lab 1

Uploaded by

yugsavlabooks

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 6

Department of Computer Science and Engineering (Data Science)

Subject: Machine Learning – I (DJS23DSC402)

AY: 2024-25

Experiment 1

(Regression)

Aim: Implement Linear Regression on the given Dataset and apply Regularization to overcome overfitting
in the model.

Theory:

 Linear Regression: Linear regression is a quiet and simple statistical regression method used for
predictive analysis and shows the relationship between the continuous variables. Linear
regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), consequently called linear regression. If there is a single input
variable (x), such linear regression is called simple linear regression. And if there is more than
one input variable, such linear regression is called multiple linear regression. The linear
regression model gives a sloped straight line describing the relationship within the variables.

The above graph presents the linear relationship between the dependent variable and
independent variables. When the value of x (independent variable) increases, the value of y
(dependent variable) is likewise increasing. The red line is referred to as the best fit straight line.
Based on the given data points, we try to plot a line that models the points the best. To calculate
best-fit line linear regression uses a traditional slope-intercept form.

1
Department of Computer Science and Engineering (Data Science)

y= Dependent Variable; x= Independent Variable; a0= intercept; a1 = Linear regression coefficient.

 Cost function: The cost function helps to figure out the best possible values for a0 and a1, which
provides the best fit line for the data points. Cost function optimizes the regression coefficients
or weights and measures how a linear regression model is performing. The cost function is used
to find the accuracy of the mapping function that maps the input variable to the output
variable. This mapping function is also known as the Hypothesis function. In Linear Regression,
Mean Squared Error (MSE) cost function is used, which is the average of squared error that
occurred between the predicted values and actual values. By simple linear equation y=mx+b we
can calculate MSE as: Let’s y = actual values, yi = predicted values

Using the MSE function, we will change the values of a0 and a1 such that the MSE value settles
at the minima. Model parameters xi, b (a0, a1) can be manipulated to minimize the cost
function. These parameters can be determined using the gradient descent method so that the
cost function value is minimum.

 Gradient descent: Gradient descent is a method of updating a0 and a1 to minimize the cost
function (MSE). A regression model uses gradient descent to update the coefficients of the line
(a0, a1 => xi, b) by reducing the cost function by a random selection of coefficient values and
then iteratively update the values to reach the minimum cost function.

2
Department of Computer Science and Engineering (Data Science)

To update a0 and a1, we take gradients from the cost function. To find these gradients, we take
partial derivatives for a0 and a1.

 Regularization: When linear regression is underfitting there is no other way (given you can’t add
more data) then to increase complexity of the model making it polynomial regression (cubic,
quadratic, etc…) or using other complex model to capture data that linear regression cannot
capture due to its simplicity. When linear regression is overfitting, number of
columns(independent variables) approach number of observations there are two ways to
mitigate it
1. Add more observations
2. Regularization
Since adding more observations is time consuming and often not provided we will use
regularization technique to mitigate overfitting. There are multiple regularization techniques, all

3
Department of Computer Science and Engineering (Data Science)

share the same concept of adding constraints on weights of independent variables(except

theta_0) however they differ in way of constraining. We will go through three most popular
regularization techniques: Ridge regression (L2) and Lasso regression (L1)
 Lasso Regression

The word “LASSO” denotes Least Absolute Shrinkage and Selection Operator. Lasso regression
follows the regularization technique to create prediction. It is given more priority over the other
regression methods because it gives an accurate prediction. Lasso regression model uses
shrinkage technique. In this technique, the data values are shrunk towards a central point
similar to the concept of mean. The lasso regression algorithm suggests a simple, sparse models
(i.e. models with fewer parameters), which is well-suited for models or data showing high levels
of multicollinearity or when we would like to automate certain parts of model selection, like
variable selection or parameter elimination using feature engineering. Lasso Regression
algorithm utilises L1 regularization technique It is taken into consideration when there are more
number of features because it automatically performs feature selection.

Residual Sum of Squares + λ * (Sum of the absolute value of the coefficients)

The equation looks like:

4
Department of Computer Science and Engineering (Data Science)

 Ridge Regression

Ridge Regression is another type of regression algorithm in data science and is usually
considered when there is a high correlation between the independent variables or model
parameters. As the value of correlation increases the least square estimates evaluates unbiased
values. But if the collinearity in the dataset is very high, there can be some bias value. Therefore,
we create a bias matrix in the equation of Ridge Regression algorithm. It is a useful regression
method in which the model is less susceptible to overfitting and hence the model works well
even if the dataset is very small.

The cost function for ridge regression algorithm is:

Where λ is the penalty variable. λ given here is denoted by an alpha parameter in the ridge function.
Hence, by changing the values of alpha, we are controlling the penalty term. Greater the values of alpha,
the higher is the penalty and therefore the magnitude of the coefficients is reduced.We can conclude
that it shrinks the parameters. Therefore, it is used to prevent multicollinearity, it also reduces the
model complexity by shrinking the coefficient.

5
Department of Computer Science and Engineering (Data Science)

Lab Assignments to complete in this session

Use the given dataset and perform the following tasks:

Dataset 1: Simulate a sine curve between 60° and 300° with some random noise.

Dataset 2: food_truck_data.csv

1. Perform Linear Regression on Dataset 1 by computing cost function and gradient descent from scratch.

2. Use sklearn to perform linear regression, Lasso and Ridge on Dataset 2, show the scatter plot for best
fit line using matplotlib and show the results using MSE.

Writeups:

1. Write the psedo code of Linear regression from scratch.

Anushi Sparkling
100% (4)
Anushi Sparkling
70 pages
Amazon's Net Income/loss and Sales Figures For The Period 1995-2015
0% (1)
Amazon's Net Income/loss and Sales Figures For The Period 1995-2015
2 pages
Mission Hospital Case Solution - Sec A
100% (1)
Mission Hospital Case Solution - Sec A
8 pages
GDE Handbook
100% (1)
GDE Handbook
19 pages
Karthik Nambiar 60009220193
No ratings yet
Karthik Nambiar 60009220193
9 pages
ML 2 nd Unit
No ratings yet
ML 2 nd Unit
50 pages
Isn't Linear Regression From Statistics?
No ratings yet
Isn't Linear Regression From Statistics?
4 pages
A) The Least-Squares Method
No ratings yet
A) The Least-Squares Method
19 pages
Machine Learning QB
No ratings yet
Machine Learning QB
32 pages
Module 3.3 Classification Models, An Overview
No ratings yet
Module 3.3 Classification Models, An Overview
11 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
19 pages
Linear Regression
100% (1)
Linear Regression
8 pages
Regression
No ratings yet
Regression
45 pages
Linear Regression with Assumpt
No ratings yet
Linear Regression with Assumpt
3 pages
Unit III
No ratings yet
Unit III
18 pages
Assignment 4 Reportdocx
No ratings yet
Assignment 4 Reportdocx
10 pages
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
No ratings yet
Linear Regression: Student: Mohammed Abu Musameh Supervisor: Eng. Akram Abu Garad
35 pages
ML - Module 2
No ratings yet
ML - Module 2
16 pages
Describe in Brief Different Types of Regression Algorithms
No ratings yet
Describe in Brief Different Types of Regression Algorithms
25 pages
Unit 3dimentionality Reduction
No ratings yet
Unit 3dimentionality Reduction
13 pages
UNIT - III
No ratings yet
UNIT - III
9 pages
Dependent Independent Variable (S) : Regression: What Is Regression
No ratings yet
Dependent Independent Variable (S) : Regression: What Is Regression
15 pages
LassoRegression
No ratings yet
LassoRegression
3 pages
UNIt-3 TY
No ratings yet
UNIt-3 TY
67 pages
ML UNIT-4
No ratings yet
ML UNIT-4
34 pages
ML UNIT-4
No ratings yet
ML UNIT-4
35 pages
Unit - 2 ML
No ratings yet
Unit - 2 ML
32 pages
Ejemplo Reporte
No ratings yet
Ejemplo Reporte
6 pages
DS Unit 2 Essay Answers
No ratings yet
DS Unit 2 Essay Answers
17 pages
Linear Regression
No ratings yet
Linear Regression
10 pages
9 Types of Regression Analysis
No ratings yet
9 Types of Regression Analysis
16 pages
Linear Regression
No ratings yet
Linear Regression
83 pages
Nonlinear Model
No ratings yet
Nonlinear Model
3 pages
Bsce201 Beltran Psetdaw
No ratings yet
Bsce201 Beltran Psetdaw
6 pages
ML Assignment3 Solution
No ratings yet
ML Assignment3 Solution
13 pages
CSE_412__Lab_Manual_3___Linear_Regression
No ratings yet
CSE_412__Lab_Manual_3___Linear_Regression
10 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Models PDF
No ratings yet
Models PDF
86 pages
Module 3
No ratings yet
Module 3
27 pages
ML Using Python Unit3 pdf
No ratings yet
ML Using Python Unit3 pdf
8 pages
Eem520l3 2023
No ratings yet
Eem520l3 2023
25 pages
unit5_R
No ratings yet
unit5_R
5 pages
Chapter_2_Linear and Logistic Regression
No ratings yet
Chapter_2_Linear and Logistic Regression
34 pages
AI Lec5
No ratings yet
AI Lec5
42 pages
Module 3
No ratings yet
Module 3
35 pages
3
No ratings yet
3
12 pages
6 ML Updated
No ratings yet
6 ML Updated
23 pages
Module3 Notes
No ratings yet
Module3 Notes
13 pages
ML Answers Updated
No ratings yet
ML Answers Updated
13 pages
Lab#10 Ai
No ratings yet
Lab#10 Ai
3 pages
Regression: Unit Iii
No ratings yet
Regression: Unit Iii
54 pages
Fitting Curves To Data Using Nonlinear Regression: A Practical and Nonmathematical Review
No ratings yet
Fitting Curves To Data Using Nonlinear Regression: A Practical and Nonmathematical Review
10 pages
2 Regression Models
No ratings yet
2 Regression Models
6 pages
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
No ratings yet
Lab-3: Regression Analysis and Modeling Name: Uid No. Objective
9 pages
Linear Regression in Machine Learning MY NOTES
No ratings yet
Linear Regression in Machine Learning MY NOTES
21 pages
Aiml Unit 3
No ratings yet
Aiml Unit 3
9 pages
Regression Analysis in Machine Learning
No ratings yet
Regression Analysis in Machine Learning
13 pages
Unit 2 ML
No ratings yet
Unit 2 ML
201 pages
2-Linear Regression
No ratings yet
2-Linear Regression
31 pages
linearregression-190924053948
No ratings yet
linearregression-190924053948
10 pages
Linear - Regression & Evaluation Metrics
No ratings yet
Linear - Regression & Evaluation Metrics
31 pages
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
From Everand
Line Drawing Algorithm: Mastering Techniques for Precision Image Rendering
Fouad Sabry
No ratings yet
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Nota Topik 1
No ratings yet
Nota Topik 1
25 pages
Efficiency Analysis of Crop Production in Gurage Zone: The Case of Abeshige Woreda, SNNPR Ethiopia
100% (1)
Efficiency Analysis of Crop Production in Gurage Zone: The Case of Abeshige Woreda, SNNPR Ethiopia
13 pages
UDJ Cheat Sheet - Merged
No ratings yet
UDJ Cheat Sheet - Merged
2 pages
Regression Week 1: Simple Linear Regression Assignment: All Course Content
No ratings yet
Regression Week 1: Simple Linear Regression Assignment: All Course Content
1 page
Bestglm Using R
No ratings yet
Bestglm Using R
39 pages
The Nigerian Manufacturing Sector in The Era of Globalisation
No ratings yet
The Nigerian Manufacturing Sector in The Era of Globalisation
11 pages
OM 1 Ses 4 - 5 - Demand Forecasting Techniques
No ratings yet
OM 1 Ses 4 - 5 - Demand Forecasting Techniques
30 pages
Linear Regression Interview Questions
No ratings yet
Linear Regression Interview Questions
4 pages
Data Analysis Finals1
0% (1)
Data Analysis Finals1
10 pages
How To Write A Spelling Corrector
No ratings yet
How To Write A Spelling Corrector
10 pages
Regression Analysis 2022
No ratings yet
Regression Analysis 2022
92 pages
SME11e PPT ch10std
No ratings yet
SME11e PPT ch10std
79 pages
FACE - Nov 2012 Vol 5
No ratings yet
FACE - Nov 2012 Vol 5
32 pages
Regression Model
No ratings yet
Regression Model
26 pages
QM-II Midterm OCT 2014 Solution
No ratings yet
QM-II Midterm OCT 2014 Solution
19 pages
Heteroscedasticity
No ratings yet
Heteroscedasticity
12 pages
One-Sample Kolmogorov-Smirnov Test: Normalitas
No ratings yet
One-Sample Kolmogorov-Smirnov Test: Normalitas
3 pages
Solution of Problem Set 1 For Purity Hydrocarbon Data PDF
No ratings yet
Solution of Problem Set 1 For Purity Hydrocarbon Data PDF
4 pages
Hasil Analisis Data Dengan SPSS (Lanjutan) Lampiran 4 Hasil Uji Korelasi
No ratings yet
Hasil Analisis Data Dengan SPSS (Lanjutan) Lampiran 4 Hasil Uji Korelasi
12 pages
Modelling and Forecasting Australian Domestic Tourism: George Athanasopoulos, Rob J. Hyndman
No ratings yet
Modelling and Forecasting Australian Domestic Tourism: George Athanasopoulos, Rob J. Hyndman
13 pages
Stata Results
No ratings yet
Stata Results
4 pages
Regression Interpretation
No ratings yet
Regression Interpretation
3 pages
MP - Parallel Edition
No ratings yet
MP - Parallel Edition
6 pages
18 Simultaneous Equation Models Two Stage Least Squares Estimation
No ratings yet
18 Simultaneous Equation Models Two Stage Least Squares Estimation
6 pages
Regression Analysis: Causal Relationship Between The Explanatory and
No ratings yet
Regression Analysis: Causal Relationship Between The Explanatory and
17 pages
Box Jenkins Method
No ratings yet
Box Jenkins Method
5 pages