0% found this document useful (0 votes)
51 views8 pages

CH - En.u4cse19101 Cheduri Linearregression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views8 pages

CH - En.u4cse19101 Cheduri Linearregression

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

19CSE305 –MACHINE LEARNING

CHEDURI SURYA UMA SHANKAR


CH.EN.U4CSE19101

TITLE: Python lab exercise to implement linear regression

AIM: To find the best fit line for the given data using Linear
Regression

DATASET:

Auto Insurance in Sweden

In the following data


X = number of claims
Y = total payment for all the claims in thousands of Swedish Kronor
for geographical zones in Sweden
X Y
108 392
19 46
13 1
124 422
40 114
57 170
23 5
14 77
45 214
10 65
5 20
48 248
11 23
23 39
7 48
2 6
24 134
6 50
3 4
23 113
6 14
9 48
9 52
3 13
29 103
7 77
4 11
20 98
7 27
4 38
0 0
25 69
6 14
5 40
22 161
11 57
61 217
12 58
4 12
16 59
13 89
60 202
41 181
37 152
55 162
41 73
11 21
27 92
8 76
3 39
17 142
13 93
13 31
15 32
8 55
29 133
30 194
24 137
9 87
31 20
14 95
53 244
26 187

IMPLEMENT DETAILS:
In statistics, linear regression is a linear approach to modelling the
relationship between a dependent variable and one or more
independent variables. Let X be the independent variable and Y be the
dependent variable. We will define a linear relationship between these
two variables as follows:
Y = mX+C
Our objective is to determine the value of m and c, such that the line
corresponding to those values is the best fitting line or gives the
minimum error.

Loss Function:
The loss is the error in our predicted value of m and c. We will use the
Mean Squared Error function to calculate the loss. There are three
steps in this function:
• Find the difference between the actual y and predicted y value
(y = mx + c), for a given x.
• Square this difference.
• Find the mean of the squares for every value in X.
Here yᵢ is the actual value and ȳᵢ is the predicted value. Let’s
substitute the value of ȳᵢ:

So, we square the error and find the mean. hence the name Mean
Squared Error.

The Gradient Descent Algorithm:


Gradient descent is an iterative optimization algorithm to find the
minimum of a function. Here that function is our Loss Function.

Now, apply gradient descent to m and c and approach it step by step:


• Initially let m = 0 and c = 0. Let L be our learning rate. This
controls how much the value of m changes with each step. L
could be a small value like 0.0001 for good accuracy.
• Calculate the partial derivative of the loss function with respect
to m, and plug in the current values of x, y, m and c in it to
obtain the derivative value D.

Dₘ is the value of the partial derivative with respect to m.


Similarly let’s find the partial derivative with respect to c, Dc:
• Now we update the current value of m and c using the following
equation:

• We repeat this process until our loss function is a very small


value or ideally 0 (which means 0 error or 100% accuracy). The
value of m and c that we are left with now will be the optimum
values.

CODE:
# Making the imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.rcParams['figure.figsize'] = (12.0, 9.0) # plot size

# Preprocessing Input data


data = pd.read_csv('ml_lab_1.csv')
X = data.iloc[:, 0] # get all rows from column 0
Y = data.iloc[:, 1] # get all rows from column 1
plt.scatter(X, Y) # draw a scatter plot
plt.show() # display all figures

# Building the model


m = 0 # m is the slope of the line
c = 0 # c is the y intercept
L = 0.0001 # The learning Rate
iters = 1000 # The number of iterations to perform gradient descent

n = float(len(X)) # Number of elements in X

# Performing Gradient Descent


for i in range(iters):
Y_pred = m*X + c # The current predicted value of Y
D_m = (-2/n) * sum(X * (Y - Y_pred)) # Derivative wrt m
D_c = (-2/n) * sum(Y - Y_pred) # Derivative wrt c
m = m - L * D_m # Update m
c = c - L * D_c # Update c

print (m, c)

# Making predictions
Y_pred = m*X + c

plt.scatter(X, Y)
plt.plot([min(X), max(X)], [min(Y_pred), max(Y_pred)],
color='green') # predicted
plt.show()
Result:
m = 3.70478986556524, c = 1.6365274116383624

OBSERVATIONS:
• When learning rate is set to 0.0001, we are getting a slope with
min error and there are 13 points very close to the slope

• When learning rate is set to 0.001, we can say that the points
moved a little further from the slope

• When learning rate is set to 0.00001, we are getting a slope with


min error and there are 13 points very close to the slope almost
similar to the result obtained when L=0.0001
• When learning rate is 0.1, we are not getting any slope

Hence, we can say 0.0001 is the ideal learning rate for the plotting the
graph.

You might also like