0% found this document useful (0 votes)
26 views5 pages

2 Linear Regression

This document describes implementing a linear regression model in Python to predict salary based on years of experience. It loads salary and experience data, calculates the slope and intercept of the best fit line using the linear regression formula, makes predictions on test data, and plots the results. Linear regression finds the best fit straight line to model the relationship between one continuous dependent variable (salary) and one continuous independent variable (years of experience). It is well suited for this problem since the goal is to predict a numeric output (salary) based on a single numeric input (years of experience) and the relationship can be reasonably approximated with a straight line.

Uploaded by

Rushabh Vashikar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views5 pages

2 Linear Regression

This document describes implementing a linear regression model in Python to predict salary based on years of experience. It loads salary and experience data, calculates the slope and intercept of the best fit line using the linear regression formula, makes predictions on test data, and plots the results. Linear regression finds the best fit straight line to model the relationship between one continuous dependent variable (salary) and one continuous independent variable (years of experience). It is well suited for this problem since the goal is to predict a numeric output (salary) based on a single numeric input (years of experience) and the relationship can be reasonably approximated with a straight line.

Uploaded by

Rushabh Vashikar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

EXPERIMENT NUMBER: 02

Date of Performance:

Date of Submission:

Aim: Python Implementation of Linear Regression

Software: Python

Software Platform: Anaconda, Google Colab, Visual Studio

Lab Outcome: LO1: Implement various Machine learning models

Theory: In a simple linear regression, there is one independent variable and one dependent
variable . The model estimates the slope and intercept of the line of best fit, which represents the
relationship between the variables using a straight line Y= B0 + B1 X.

The slope represents the change in the dependent variable for each unit change in the independent
variable, while the intercept represents the predicted value of the dependent variable when the
independent variable is zero.

The goal of the linear regression algorithm is to get the best values for B0 and B1 to find the best
fit line. The best fit line is a line that has the least error which means the error between predicted
values and actual values should be minimum.
Methodology: #Python-Jupyter program for linear regression
import pandas as p
from sklearn.linear_model import LinearRegression
import numpy as n
import matplotlib.pyplot as mtp

data = p.read_csv("salary_Data.csv")

data.head()

YearsExperience Salary
0 1.1 39343.0
1 1.3 46205.0
2 1.5 37731.0
3 2.0 43525.0
4 2.2 39891.0

X = data.iloc[:,0]
X

0 1.1
1 1.3
2 1.5
3 2.0
4 2.2
5 2.9
6 3.0
7 3.2
8 3.2
9 3.7
10 3.9
11 4.0
12 4.0
13 4.1
14 4.5
15 4.9
16 5.1
17 5.3
18 5.9
19 6.0
20 6.8
21 7.1
22 7.9
23 8.2
24 8.7
25 9.0
26 9.5
27 9.6
28 10.3
29 10.5
Name: YearsExperience, dtype: float64

y= data.iloc[:,1]

y
0 39343.0
1 46205.0
2 37731.0
3 43525.0
4 39891.0
5 56642.0
6 60150.0
7 54445.0
8 64445.0
9 57189.0
10 63218.0
11 55794.0
12 56957.0
13 57081.0
14 61111.0
15 67938.0
16 66029.0
17 83088.0
18 81363.0
19 93940.0
20 91738.0
21 98273.0
22 101302.0
23 113812.0
24 109431.0
25 105582.0
26 116969.0
27 112635.0
28 122391.0
29 121872.0
Name: Salary, dtype: float64

xy = X*y

x_squared = X**2

n = len(data)

30

sum_y = sum(y)
sum_y

2280090.0

sum_x = sum(X)
sum_x

159.4

sum_x_squared = sum(x_squared)

sum_xy = sum(xy)
a = ((sum_y*sum_x_squared)-(sum_x*sum_xy))/((n*sum_x_squared)-sum_x**2)
a

25792.20019866868

b = ((n*sum_xy)-(sum_x*sum_y))/((n*sum_x_squared)-sum_x**2)
b

9449.962321455077

y = a+b*1.7

41857.136145142314

x_test = [1.7,2.5,6.5,1,2.2]
y_pred = []
for i in range(len(x_test)):
y_pred.append(a + b* x_test[i])

y_pred

[41857.136145142314,
49417.10600230638,
87216.95528812669,
35242.16252012376,
46582.11730586985]

l = LinearRegression()
x_reshaped = n.reshape(X,(-1,1))
print(l.fit(x_reshaped,y))

LinearRegression()

Output: Python Program with output

prediction = l.predict(x_reshaped)
prediction

array([ 36187.15875227, 38077.15121656, 39967.14368085, 44692.12484158,


46582.11730587, 53197.09093089, 54142.08716303, 56032.07962732,
56032.07962732, 60757.06078805, 62647.05325234, 63592.04948449,
63592.04948449, 64537.04571663, 68317.03064522, 72097.0155738 ,
73987.00803809, 75877.00050238, 81546.97789525, 82491.9741274 ,
90051.94398456, 92886.932681 , 100446.90253816, 103281.8912346 ,
108006.87239533, 110841.86109176, 115566.84225249, 116511.83848464,
123126.81210966, 125016.80457395])

mtp.plot(x_reshaped,prediction)
mtp.scatter(x_test,y_pred,color="red")
mtp.scatter(X,y,color="orange")

<matplotlib.collections.PathCollection at 0x1fdb7275340>
Conclusion: Study of linear regression in ML has been done successfully .

Compare with simple linear


regression and justify for the
given problem
statement which one is more
suitable
Marks Obtained and Signature:-

R1 R2 R3 R4 R5 Signature
(2 Marks) (3 Marks) (4 Marks) (4 Marks) (2 Marks)

You might also like