0% found this document useful (0 votes)
22 views

Implementation of Simple Linear Regression Algorithm Using Python

Uploaded by

ayushisahoo2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Implementation of Simple Linear Regression Algorithm Using Python

Uploaded by

ayushisahoo2004
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

4/4/24, 11:12 PM Untitled - Jupyter Notebook

Implementation of Simple Linear Regression


Algorithm using Python!

Step-1: Data Pre-processing


.First, we will import the three important libraries, which will help us for loading the dataset,
plotting the graphs, and creating the Simple Linear Regression model.!

In [1]: import numpy as np


import pandas as pd
import matplotlib.pyplot as plt

Load the dataset!


In [2]: Thomas_df = pd.read_csv("Iris (1).csv")
Thomas_df

Out[2]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

... ... ... ... ... ... ...

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 6 columns

Data frame columns

localhost:8888/notebooks/ML class/Untitled.ipynb 1/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [3]: Thomas_df.columns

Out[3]: Index(['Id', 'SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthC


m',
'Species'],
dtype='object')

Data frame head


definition and usage.The head() meathod returns a specified number of rows,strings from the
Top

In [4]: Thomas_df.head()

Out[4]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 1 5.1 3.5 1.4 0.2 Iris-setosa

1 2 4.9 3.0 1.4 0.2 Iris-setosa

2 3 4.7 3.2 1.3 0.2 Iris-setosa

3 4 4.6 3.1 1.5 0.2 Iris-setosa

4 5 5.0 3.6 1.4 0.2 Iris-setosa

Data frame tail


definition and usage.The tail() meathod returns a specified number of rows,strings from the
bottom

In [5]: Thomas_df.tail()

Out[5]:
Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

145 146 6.7 3.0 5.2 2.3 Iris-virginica

146 147 6.3 2.5 5.0 1.9 Iris-virginica

147 148 6.5 3.0 5.2 2.0 Iris-virginica

148 149 6.2 3.4 5.4 2.3 Iris-virginica

149 150 5.9 3.0 5.1 1.8 Iris-virginica

shape
In [6]: Thomas_df.shape

Out[6]: (150, 6)

localhost:8888/notebooks/ML class/Untitled.ipynb 2/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

After that, we need to extract the dependent


and independent variables from the given
dataset.!
In [7]: columns = Thomas_df.select_dtypes(include=['number']).columns
x = Thomas_df[columns].drop(columns=['Id', 'SepalLengthCm'])
y = Thomas_df['SepalLengthCm']

In [8]: x

Out[8]:
SepalWidthCm PetalLengthCm PetalWidthCm

0 3.5 1.4 0.2

1 3.0 1.4 0.2

2 3.2 1.3 0.2

3 3.1 1.5 0.2

4 3.6 1.4 0.2

... ... ... ...

145 3.0 5.2 2.3

146 2.5 5.0 1.9

147 3.0 5.2 2.0

148 3.4 5.4 2.3

149 3.0 5.1 1.8

150 rows × 3 columns

In [9]: y

Out[9]: 0 5.1
1 4.9
2 4.7
3 4.6
4 5.0
...
145 6.7
146 6.3
147 6.5
148 6.2
149 5.9
Name: SepalLengthCm, Length: 150, dtype: float64

Split dataset

localhost:8888/notebooks/ML class/Untitled.ipynb 3/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [10]: from sklearn.model_selection import train_test_split


from sklearn.metrics import mean_absolute_error
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression

In [11]: x_train,x_test,y_train,y_test=train_test_split(x,y, test_size=1/3,random_state

In [12]: x_train

Out[12]:
SepalWidthCm PetalLengthCm PetalWidthCm

69 2.5 3.9 1.1

135 3.0 6.1 2.3

56 3.3 4.7 1.6

80 2.4 3.8 1.1

123 2.7 4.9 1.8

... ... ... ...

9 3.1 1.5 0.1

103 2.9 5.6 1.8

67 2.7 4.1 1.0

117 3.8 6.7 2.2

47 3.2 1.4 0.2

100 rows × 3 columns

localhost:8888/notebooks/ML class/Untitled.ipynb 4/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [13]: x_test

localhost:8888/notebooks/ML class/Untitled.ipynb 5/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

Out[13]:
SepalWidthCm PetalLengthCm PetalWidthCm

114 2.8 5.1 2.4

62 2.2 4.0 1.0

33 4.2 1.4 0.2

107 2.9 6.3 1.8

7 3.4 1.5 0.2

100 3.3 6.0 2.5

40 3.5 1.3 0.3

86 3.1 4.7 1.5

76 2.8 4.8 1.4

71 2.8 4.0 1.3

134 2.6 5.6 1.4

51 3.2 4.5 1.5

73 2.8 4.7 1.2

54 2.8 4.6 1.5

63 2.9 4.7 1.4

37 3.1 1.5 0.1

78 2.9 4.5 1.5

90 2.6 4.4 1.2

45 3.0 1.4 0.3

16 3.9 1.3 0.4

121 2.8 4.9 2.0

66 3.0 4.5 1.5

24 3.4 1.9 0.2

8 2.9 1.4 0.2

126 2.8 4.8 1.8

22 3.6 1.0 0.2

44 3.8 1.9 0.4

97 2.9 4.3 1.3

93 2.3 3.3 1.0

26 3.4 1.6 0.4

137 3.1 5.5 1.8

84 3.0 4.5 1.5

27 3.5 1.5 0.2

127 3.0 4.9 1.8

132 2.8 5.6 2.2

59 2.7 3.9 1.4

localhost:8888/notebooks/ML class/Untitled.ipynb 6/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

SepalWidthCm PetalLengthCm PetalWidthCm

18 3.8 1.7 0.3

83 2.7 5.1 1.6

61 3.0 4.2 1.5

92 2.6 4.0 1.2

112 3.0 5.5 2.1

2 3.2 1.3 0.2

141 3.1 5.1 2.3

43 3.5 1.6 0.6

10 3.7 1.5 0.2

60 2.0 3.5 1.0

116 3.0 5.5 1.8

144 3.3 5.7 2.5

119 2.2 5.0 1.5

108 2.5 5.8 1.8

In [14]: y_train

Out[14]: 69 5.6
135 7.7
56 6.3
80 5.5
123 6.3
...
9 4.9
103 6.3
67 5.8
117 7.7
47 4.6
Name: SepalLengthCm, Length: 100, dtype: float64

localhost:8888/notebooks/ML class/Untitled.ipynb 7/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [15]: y_test

Out[15]: 114 5.8


62 6.0
33 5.5
107 7.3
7 5.0
100 6.3
40 5.0
86 6.7
76 6.8
71 6.1
134 6.1
51 6.4
73 6.1
54 6.5
63 6.1
37 4.9
78 6.0
90 5.5
45 4.8
16 5.4
121 5.6
66 5.6
24 4.8
8 4.4
126 6.2
22 4.6
44 5.1
97 6.2
93 5.0
26 5.0
137 6.4
84 5.4
27 5.2
127 6.1
132 6.4
59 5.2
18 5.7
83 6.0
61 5.9
92 5.8
112 6.8
2 4.7
141 6.9
43 5.0
10 5.4
60 5.0
116 6.5
144 6.7
119 6.0
108 6.7
Name: SepalLengthCm, dtype: float64

localhost:8888/notebooks/ML class/Untitled.ipynb 8/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

Step-2: Fitting the Simple Linear Regression to


the Training Set:!
In [16]: from sklearn.linear_model import LinearRegression

In [17]: model = LinearRegression()



model.fit(x_train, y_train)

Out[17]: ▾ LinearRegression
LinearRegression()

Prediction of test set result:!


In [18]: y_pred = model.predict(x_test)
x_pred = model.predict(x_train)

In [19]: y_pred

Out[19]: array([5.90763683, 5.64942975, 5.46582046, 7.32357947, 5.03281158,


6.85611663, 4.87129874, 6.41802591, 6.37416826, 5.82257278,
6.86804939, 6.3264746 , 6.43639669, 6.14881748, 6.36031157,
4.9112593 , 6.13496079, 6.07563694, 4.62980368, 5.05668896,
6.0320937 , 6.19879872, 5.34359009, 4.63592727, 6.09432214,
4.77201433, 5.45901878, 6.1194946 , 5.1694053 , 4.97058315,
6.82969833, 6.19879872, 5.09664952, 6.29969264, 6.43603302,
5.61107868, 5.37359105, 6.40349114, 5.96571485, 5.76485843,
6.5559758 , 4.74974645, 6.16911217, 4.89449802, 5.2243254 ,
5.13328074, 6.7658604 , 6.62303275, 6.07656836, 6.67975459])

localhost:8888/notebooks/ML class/Untitled.ipynb 9/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [20]: x_pred

Out[20]: array([5.6932874 , 6.8822205 , 6.47574026, 5.55175484, 6.10817882,


6.53729061, 5.73968598, 5.98823605, 6.55182538, 6.39285346,
6.38418894, 4.9189924 , 6.19360655, 5.91412409, 5.57563221,
5.63105897, 6.34645488, 7.2820094 , 5.2243254 , 4.53664286,
6.3489958 , 6.17559944, 5.18820084, 5.5312679 , 6.57341517,
4.84129777, 6.45508189, 4.99403576, 4.88515543, 5.68809522,
7.12532506, 6.04179997, 4.76972674, 6.86675431, 6.69911787,
6.71909815, 6.50599455, 5.36585796, 5.11050621, 6.4428347 ,
6.90287887, 4.10524349, 6.59370987, 4.69976521, 5.91827451,
6.54211911, 4.74974645, 5.08279283, 7.13886734, 4.94899336,
4.62207058, 5.36746746, 5.50338309, 6.85450712, 7.41061669,
5.01895489, 4.9112593 , 4.95511696, 6.21104591, 6.19106563,
4.67205183, 4.91447831, 6.1194946 , 4.89288852, 7.24842576,
5.23324324, 7.83589247, 6.25490357, 5.54963868, 6.22199801,
5.18275533, 7.43059698, 5.2182018 , 4.98283033, 7.03439075,
4.89127902, 6.67913759, 5.9002674 , 5.75100175, 5.51630836,
6.38833936, 6.18296886, 6.19038754, 6.44734879, 4.85515446,
5.54402174, 6.48762378, 6.19360655, 5.03281158, 6.35257847,
6.02794328, 6.34967389, 5.8141616 , 4.94126027, 5.08440233,
4.9112593 , 6.77971708, 6.04631406, 7.92905329, 4.82744108])

In [ ]: ​

Step: 4. visualizing the Training set results:!

localhost:8888/notebooks/ML class/Untitled.ipynb 10/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

In [21]: import matplotlib.pyplot as plt


plt.figure(figsize=(10, 7))
plt.scatter(x_train.iloc[:, 0], y_train, color="blue", label="Actual values")
plt.plot(x_train.iloc[:, 0], x_pred, color="red", label="Regression line")
plt.title(" Training set")
plt.xlabel("Feature")
plt.ylabel("Sepal Length (cm)")
plt.legend()
plt.show()

plt.figure(figsize=(10, 7))
plt.scatter(x_test.iloc[:, 0], y_test, color="blue", label="Actual values")
plt.plot(x_test.iloc[:, 0], y_pred, color="red", label="Regression line")
plt.title(" Testing set")
plt.xlabel("Feature")
plt.ylabel("Sepal Length (cm)")
plt.legend()
plt.show()

mae = mean_absolute_error(y_test, y_pred)
print("Here is the Linear Regression Mean Absolute Error:", mae)

localhost:8888/notebooks/ML class/Untitled.ipynb 11/12


4/4/24, 11:12 PM Untitled - Jupyter Notebook

Here is the Linear Regression Mean Absolute Error: 0.25316544984473643

localhost:8888/notebooks/ML class/Untitled.ipynb 12/12

You might also like