0% found this document useful (0 votes)
2 views

Lab 1

The document outlines an experiment for a Machine Learning course focused on implementing Linear Regression and applying Regularization techniques to address overfitting. It explains the concepts of Linear Regression, Cost Function, Gradient Descent, and Regularization methods such as Lasso and Ridge Regression. The lab assignments require students to perform Linear Regression from scratch and utilize sklearn for Lasso and Ridge on provided datasets.

Uploaded by

yugsavlabooks
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lab 1

The document outlines an experiment for a Machine Learning course focused on implementing Linear Regression and applying Regularization techniques to address overfitting. It explains the concepts of Linear Regression, Cost Function, Gradient Descent, and Regularization methods such as Lasso and Ridge Regression. The lab assignments require students to perform Linear Regression from scratch and utilize sklearn for Lasso and Ridge on provided datasets.

Uploaded by

yugsavlabooks
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Department of Computer Science and Engineering (Data Science)

Subject: Machine Learning – I (DJS23DSC402)

AY: 2024-25

Experiment 1

(Regression)

Aim: Implement Linear Regression on the given Dataset and apply Regularization to overcome overfitting
in the model.

Theory:

 Linear Regression: Linear regression is a quiet and simple statistical regression method used for
predictive analysis and shows the relationship between the continuous variables. Linear
regression shows the linear relationship between the independent variable (X-axis) and the
dependent variable (Y-axis), consequently called linear regression. If there is a single input
variable (x), such linear regression is called simple linear regression. And if there is more than
one input variable, such linear regression is called multiple linear regression. The linear
regression model gives a sloped straight line describing the relationship within the variables.

The above graph presents the linear relationship between the dependent variable and
independent variables. When the value of x (independent variable) increases, the value of y
(dependent variable) is likewise increasing. The red line is referred to as the best fit straight line.
Based on the given data points, we try to plot a line that models the points the best. To calculate
best-fit line linear regression uses a traditional slope-intercept form.

1
Department of Computer Science and Engineering (Data Science)

y= Dependent Variable; x= Independent Variable; a0= intercept; a1 = Linear regression coefficient.

 Cost function: The cost function helps to figure out the best possible values for a0 and a1, which
provides the best fit line for the data points. Cost function optimizes the regression coefficients
or weights and measures how a linear regression model is performing. The cost function is used
to find the accuracy of the mapping function that maps the input variable to the output
variable. This mapping function is also known as the Hypothesis function. In Linear Regression,
Mean Squared Error (MSE) cost function is used, which is the average of squared error that
occurred between the predicted values and actual values. By simple linear equation y=mx+b we
can calculate MSE as: Let’s y = actual values, yi = predicted values

Using the MSE function, we will change the values of a0 and a1 such that the MSE value settles
at the minima. Model parameters xi, b (a0, a1) can be manipulated to minimize the cost
function. These parameters can be determined using the gradient descent method so that the
cost function value is minimum.

 Gradient descent: Gradient descent is a method of updating a0 and a1 to minimize the cost
function (MSE). A regression model uses gradient descent to update the coefficients of the line
(a0, a1 => xi, b) by reducing the cost function by a random selection of coefficient values and
then iteratively update the values to reach the minimum cost function.

2
Department of Computer Science and Engineering (Data Science)

To update a0 and a1, we take gradients from the cost function. To find these gradients, we take
partial derivatives for a0 and a1.

 Regularization: When linear regression is underfitting there is no other way (given you can’t add
more data) then to increase complexity of the model making it polynomial regression (cubic,
quadratic, etc…) or using other complex model to capture data that linear regression cannot
capture due to its simplicity. When linear regression is overfitting, number of
columns(independent variables) approach number of observations there are two ways to
mitigate it
1. Add more observations
2. Regularization
Since adding more observations is time consuming and often not provided we will use
regularization technique to mitigate overfitting. There are multiple regularization techniques, all

3
Department of Computer Science and Engineering (Data Science)

share the same concept of adding constraints on weights of independent variables(except


theta_0) however they differ in way of constraining. We will go through three most popular
regularization techniques: Ridge regression (L2) and Lasso regression (L1)
 Lasso Regression

The word “LASSO” denotes Least Absolute Shrinkage and Selection Operator. Lasso regression
follows the regularization technique to create prediction. It is given more priority over the other
regression methods because it gives an accurate prediction. Lasso regression model uses
shrinkage technique. In this technique, the data values are shrunk towards a central point
similar to the concept of mean. The lasso regression algorithm suggests a simple, sparse models
(i.e. models with fewer parameters), which is well-suited for models or data showing high levels
of multicollinearity or when we would like to automate certain parts of model selection, like
variable selection or parameter elimination using feature engineering. Lasso Regression
algorithm utilises L1 regularization technique It is taken into consideration when there are more
number of features because it automatically performs feature selection.

Residual Sum of Squares + λ * (Sum of the absolute value of the coefficients)


The equation looks like:

4
Department of Computer Science and Engineering (Data Science)

 Ridge Regression

Ridge Regression is another type of regression algorithm in data science and is usually
considered when there is a high correlation between the independent variables or model
parameters. As the value of correlation increases the least square estimates evaluates unbiased
values. But if the collinearity in the dataset is very high, there can be some bias value. Therefore,
we create a bias matrix in the equation of Ridge Regression algorithm. It is a useful regression
method in which the model is less susceptible to overfitting and hence the model works well
even if the dataset is very small.

The cost function for ridge regression algorithm is:

Where λ is the penalty variable. λ given here is denoted by an alpha parameter in the ridge function.
Hence, by changing the values of alpha, we are controlling the penalty term. Greater the values of alpha,
the higher is the penalty and therefore the magnitude of the coefficients is reduced.We can conclude
that it shrinks the parameters. Therefore, it is used to prevent multicollinearity, it also reduces the
model complexity by shrinking the coefficient.

5
Department of Computer Science and Engineering (Data Science)

Lab Assignments to complete in this session

Use the given dataset and perform the following tasks:

Dataset 1: Simulate a sine curve between 60° and 300° with some random noise.

Dataset 2: food_truck_data.csv

1. Perform Linear Regression on Dataset 1 by computing cost function and gradient descent from scratch.

2. Use sklearn to perform linear regression, Lasso and Ridge on Dataset 2, show the scatter plot for best
fit line using matplotlib and show the results using MSE.

Writeups:

1. Write the psedo code of Linear regression from scratch.

You might also like