0% found this document useful (0 votes)
40 views37 pages

Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha

The document performs linear regression on two datasets. It imports packages, loads and prepares the datasets, checks for linearity, builds linear regression models, and plots the actual and predicted outputs. The linear regression is performed to predict species from sepal length using one dataset, and to predict gender from estimated salary using a second dataset. The models' coefficients and intercepts are output.

Uploaded by

Xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views37 pages

Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha

The document performs linear regression on two datasets. It imports packages, loads and prepares the datasets, checks for linearity, builds linear regression models, and plots the actual and predicted outputs. The linear regression is performed to predict species from sepal length using one dataset, and to predict gender from estimated salary using a second dataset. The models' coefficients and intercepts are output.

Uploaded by

Xyz
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

RegNo: 19MIS1106

Name: SAM MELVIN M

Course Code – SWE4012 – Machine Learning


Lab

Slot: L11+L12

Faculty: Dr. M. Premalatha


Linear regression
Importing packages

In [1]:
import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

%matplotlib inline

Importing data

In [2]:
df1=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\LR_dataset1.csv')

df1.head()

Out[2]: SepalLengthCm Species

0 5.1 Iris-setosa

1 4.9 Iris-setosa

2 4.7 Iris-setosa

3 4.6 Iris-setosa

4 5.0 Iris-setosa

In [3]:
df2=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\LR_dataset2.csv')

df2.head()

Out[3]: EstimatedSalary Gender

0 19000 Male

1 20000 Male

2 43000 Female

3 57000 Female

4 76000 Male
Loading [MathJax]/extensions/Safe.js
Converting categorical to numerical
In [4]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

label = le.fit_transform(df1['Species'])

label

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

Out[4]:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [5]:
df1.drop("Species", axis=1, inplace=True)

df1["Species"] = label

df1

Out[5]: SepalLengthCm Species

0 5.1 0

1 4.9 0

2 4.7 0

3 4.6 0

4 5.0 0

... ... ...

145 6.7 2

146 6.3 2

147 6.5 2

148 6.2 2

149 5.9 2

150 rows × 2 columns

In [6]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

label = le.fit_transform(df2['Gender'])

label

array([1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,

Out[6]:
1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1,

0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1,

1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0,

1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0,

0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,

1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,

1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,

0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,

1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,

0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,

0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,

1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,

0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,

1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0,

Loading [MathJax]/extensions/Safe.js
1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1,

0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1,

0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0,

1, 0, 1, 0])

In [7]:
df2.drop("Gender", axis=1, inplace=True)

df2["Gender"] = label

df2

Out[7]: EstimatedSalary Gender

0 19000 1

1 20000 1

2 43000 0

3 57000 0

4 76000 1

... ... ...

395 41000 0

396 23000 1

397 20000 0

398 33000 1

399 36000 0

400 rows × 2 columns

Checking Linearity
In [8]:
plt.scatter(df1['SepalLengthCm'],df1['Species'])

plt.title("Plot")

plt.xlim(0,10)

plt.ylim(0,5)

plt.xticks(np.arange(0,10,0.5))

plt.legend()

plt.xlabel("SepalLengthCm")

plt.ylabel("Species")

plt.show()

No handles with labels found to put in legend.

Loading [MathJax]/extensions/Safe.js
In [9]:
import seaborn as sb

data_corr = df1.corr()

sb.heatmap(data_corr,annot=True)

<AxesSubplot:>
Out[9]:

In [10]:
plt.scatter(df2['EstimatedSalary'],df2['Gender'])

plt.title("Plot")

plt.xlim(10000,100000)

plt.ylim(0,2)

plt.xticks(np.arange(0,10000,10000))

plt.legend()

plt.xlabel("EstimatedSalary")

plt.ylabel("Gender")

plt.show()

No handles with labels found to put in legend.

In [11]:
import seaborn as sb

data_corr = df2.corr()

sb.heatmap(data_corr,annot=True)

<AxesSubplot:>
Out[11]:

Loading [MathJax]/extensions/Safe.js
Modeling
In [12]:
from sklearn import linear_model

regress = linear_model.LinearRegression()

train_x = np.asanyarray(df1[['SepalLengthCm']])

train_y = np.asanyarray(df1[['Species']])

#print(train_x)

#print(train_y)

regress.fit(train_x,train_y)

# The coefficients

print ('Coefficients: ', regress.coef_)

print ('Intercept: ',regress.intercept_)

regress1 = linear_model.LinearRegression()

train_x1 = np.asanyarray(df2[['EstimatedSalary']])

train_y1 = np.asanyarray(df2[['Gender']])

#print(train_x)

#print(train_y)

regress1.fit (train_x1,train_y1)

# The coefficients

print ('Coefficients: ', regress1.coef_)

print ('Intercept: ',regress1.intercept_)

Coefficients: [[0.77421249]]

Intercept: [-3.52398166]

Coefficients: [[-8.8715045e-07]]

Intercept: [0.55187209]

Plot outputs
In [13]:
plt.scatter(df1.SepalLengthCm, df1.Species, color='blue',label="Actual")

plt.plot(train_x, regress.coef_[0][0]*train_x + regress.intercept_[0], '-r',label="Predict


plt.title("Placement plot")

plt.xlim(0,10)

plt.ylim(0,5)

plt.xticks(np.arange(0,10,0.5))

plt.legend()

plt.xlabel("SepalLengthCm")

plt.ylabel("Species")

plt.show()

Loading [MathJax]/extensions/Safe.js
In [14]:
plt.scatter(df2.EstimatedSalary, df2.Gender, color='blue',label="Actual")

plt.plot(train_x1, regress1.coef_[0][0]*train_x1 + regress1.intercept_[0], '-r',label="Pre


plt.title("Placement plot")

plt.xlim(10000,100000)

plt.ylim(0,2)

plt.xticks(np.arange(10000,100000,10000))

plt.legend()

plt.xlabel("EstimatedSalary")

plt.ylabel("Gender")

plt.show()

Predictions
In [15]:
y_predicted = regress.predict(train_x)

for i in range(0,len(train_x)):

print(train_y[i],y_predicted[i])

df1['Predicted'] = y_predicted

print(y_predicted)

print(df1.head())

[0] [0.42450205]

[0] [0.26965955]

[0] [0.11481705]

[0] [0.0373958]

[0] [0.3470808]

[0] [0.6567658]

Loading [MathJax]/extensions/Safe.js
[0] [0.0373958]

[0] [0.3470808]

[0] [-0.1174467]

[0] [0.26965955]

[0] [0.6567658]

[0] [0.1922383]

[0] [0.1922383]

[0] [-0.19486795]

[0] [0.96645079]

[0] [0.88902954]

[0] [0.6567658]

[0] [0.42450205]

[0] [0.88902954]

[0] [0.42450205]

[0] [0.6567658]

[0] [0.42450205]

[0] [0.0373958]

[0] [0.42450205]

[0] [0.1922383]

[0] [0.3470808]

[0] [0.3470808]

[0] [0.5019233]

[0] [0.5019233]

[0] [0.11481705]

[0] [0.1922383]

[0] [0.6567658]

[0] [0.5019233]

[0] [0.73418704]

[0] [0.26965955]

[0] [0.3470808]

[0] [0.73418704]

[0] [0.26965955]

[0] [-0.1174467]

[0] [0.42450205]

[0] [0.3470808]

[0] [-0.04002545]

[0] [-0.1174467]

[0] [0.3470808]

[0] [0.42450205]

[0] [0.1922383]

[0] [0.42450205]

[0] [0.0373958]

[0] [0.57934455]

[0] [0.3470808]

[1] [1.89550578]

[1] [1.43097829]

[1] [1.81808453]

[1] [0.73418704]

[1] [1.50839954]

[1] [0.88902954]

[1] [1.35355704]

[1] [0.26965955]

[1] [1.58582079]

[1] [0.5019233]

[1] [0.3470808]

[1] [1.04387204]

[1] [1.12129329]

[1] [1.19871454]

[1] [0.81160829]

[1] [1.66324204]

[1] [0.81160829]

[1] [0.96645079]

[1] [1.27613579]

[1] [0.81160829]

Loading [MathJax]/extensions/Safe.js
[1] [1.04387204]

[1] [1.19871454]

[1] [1.35355704]

[1] [1.19871454]

[1] [1.43097829]

[1] [1.58582079]

[1] [1.74066328]

[1] [1.66324204]

[1] [1.12129329]

[1] [0.88902954]

[1] [0.73418704]

[1] [0.73418704]

[1] [0.96645079]

[1] [1.12129329]

[1] [0.6567658]

[1] [1.12129329]

[1] [1.66324204]

[1] [1.35355704]

[1] [0.81160829]

[1] [0.73418704]

[1] [0.73418704]

[1] [1.19871454]

[1] [0.96645079]

[1] [0.3470808]

[1] [0.81160829]

[1] [0.88902954]

[1] [0.88902954]

[1] [1.27613579]

[1] [0.42450205]

[1] [0.88902954]

[2] [1.35355704]

[2] [0.96645079]

[2] [1.97292703]

[2] [1.35355704]

[2] [1.50839954]

[2] [2.36003328]

[2] [0.26965955]

[2] [2.12776953]

[2] [1.66324204]

[2] [2.05034828]

[2] [1.50839954]

[2] [1.43097829]

[2] [1.74066328]

[2] [0.88902954]

[2] [0.96645079]

[2] [1.43097829]

[2] [1.50839954]

[2] [2.43745453]

[2] [2.43745453]

[2] [1.12129329]

[2] [1.81808453]

[2] [0.81160829]

[2] [2.43745453]

[2] [1.35355704]

[2] [1.66324204]

[2] [2.05034828]

[2] [1.27613579]

[2] [1.19871454]

[2] [1.43097829]

[2] [2.05034828]

[2] [2.20519078]

[2] [2.59229703]

[2] [1.43097829]

[2] [1.35355704]

Loading [MathJax]/extensions/Safe.js
[2] [1.19871454]

[2] [2.43745453]

[2] [1.35355704]

[2] [1.43097829]

[2] [1.12129329]

[2] [1.81808453]

[2] [1.66324204]

[2] [1.81808453]

[2] [0.96645079]

[2] [1.74066328]

[2] [1.66324204]

[2] [1.66324204]

[2] [1.35355704]

[2] [1.50839954]

[2] [1.27613579]

[2] [1.04387204]

[[ 0.42450205]

[ 0.26965955]

[ 0.11481705]

[ 0.0373958 ]

[ 0.3470808 ]

[ 0.6567658 ]

[ 0.0373958 ]

[ 0.3470808 ]

[-0.1174467 ]

[ 0.26965955]

[ 0.6567658 ]

[ 0.1922383 ]

[ 0.1922383 ]

[-0.19486795]

[ 0.96645079]

[ 0.88902954]

[ 0.6567658 ]

[ 0.42450205]

[ 0.88902954]

[ 0.42450205]

[ 0.6567658 ]

[ 0.42450205]

[ 0.0373958 ]

[ 0.42450205]

[ 0.1922383 ]

[ 0.3470808 ]

[ 0.3470808 ]

[ 0.5019233 ]

[ 0.5019233 ]

[ 0.11481705]

[ 0.1922383 ]

[ 0.6567658 ]

[ 0.5019233 ]

[ 0.73418704]

[ 0.26965955]

[ 0.3470808 ]

[ 0.73418704]

[ 0.26965955]

[-0.1174467 ]

[ 0.42450205]

[ 0.3470808 ]

[-0.04002545]

[-0.1174467 ]

[ 0.3470808 ]

[ 0.42450205]

[ 0.1922383 ]

[ 0.42450205]

[ 0.0373958 ]

Loading [MathJax]/extensions/Safe.js
[ 0.57934455]

[ 0.3470808 ]

[ 1.89550578]

[ 1.43097829]

[ 1.81808453]

[ 0.73418704]

[ 1.50839954]

[ 0.88902954]

[ 1.35355704]

[ 0.26965955]

[ 1.58582079]

[ 0.5019233 ]

[ 0.3470808 ]

[ 1.04387204]

[ 1.12129329]

[ 1.19871454]

[ 0.81160829]

[ 1.66324204]

[ 0.81160829]

[ 0.96645079]

[ 1.27613579]

[ 0.81160829]

[ 1.04387204]

[ 1.19871454]

[ 1.35355704]

[ 1.19871454]

[ 1.43097829]

[ 1.58582079]

[ 1.74066328]

[ 1.66324204]

[ 1.12129329]

[ 0.88902954]

[ 0.73418704]

[ 0.73418704]

[ 0.96645079]

[ 1.12129329]

[ 0.6567658 ]

[ 1.12129329]

[ 1.66324204]

[ 1.35355704]

[ 0.81160829]

[ 0.73418704]

[ 0.73418704]

[ 1.19871454]

[ 0.96645079]

[ 0.3470808 ]

[ 0.81160829]

[ 0.88902954]

[ 0.88902954]

[ 1.27613579]

[ 0.42450205]

[ 0.88902954]

[ 1.35355704]

[ 0.96645079]

[ 1.97292703]

[ 1.35355704]

[ 1.50839954]

[ 2.36003328]

[ 0.26965955]

[ 2.12776953]

[ 1.66324204]

[ 2.05034828]

[ 1.50839954]

[ 1.43097829]

Loading [MathJax]/extensions/Safe.js
[ 1.74066328]

[ 0.88902954]

[ 0.96645079]

[ 1.43097829]

[ 1.50839954]

[ 2.43745453]

[ 2.43745453]

[ 1.12129329]

[ 1.81808453]

[ 0.81160829]

[ 2.43745453]

[ 1.35355704]

[ 1.66324204]

[ 2.05034828]

[ 1.27613579]

[ 1.19871454]

[ 1.43097829]

[ 2.05034828]

[ 2.20519078]

[ 2.59229703]

[ 1.43097829]

[ 1.35355704]

[ 1.19871454]

[ 2.43745453]

[ 1.35355704]

[ 1.43097829]

[ 1.12129329]

[ 1.81808453]

[ 1.66324204]

[ 1.81808453]

[ 0.96645079]

[ 1.74066328]

[ 1.66324204]

[ 1.66324204]

[ 1.35355704]

[ 1.50839954]

[ 1.27613579]

[ 1.04387204]]

SepalLengthCm Species Predicted

0 5.1 0 0.424502

1 4.9 0 0.269660

2 4.7 0 0.114817

3 4.6 0 0.037396

4 5.0 0 0.347081

In [16]:
y1_predicted = regress1.predict(train_x1)

for i in range(0,len(train_x1)):

print(train_y1[i],y1_predicted[i])

df2['Predicted'] = y1_predicted

print(y1_predicted)

print(df2.head())

[1] [0.53501623]

[1] [0.53412908]

[0] [0.51372462]

[0] [0.50130451]

[1] [0.48444866]

[1] [0.50041736]

[0] [0.47735145]

[0] [0.41879952]

[1] [0.52259613]

[0] [0.49420731]

[0] [0.48090005]

[0] [0.50574027]

Loading [MathJax]/extensions/Safe.js
[1] [0.47557715]

[1] [0.53590338]

[1] [0.47912575]

[1] [0.48090005]

[1] [0.52969333]

[1] [0.52880618]

[1] [0.52703188]

[0] [0.52614473]

[1] [0.53235478]

[0] [0.50840172]

[1] [0.51549892]

[0] [0.53235478]

[1] [0.53146763]

[1] [0.53412908]

[1] [0.52703188]

[0] [0.52525758]

[1] [0.51372462]

[1] [0.53590338]

[1] [0.48622296]

[0] [0.43033248]

[0] [0.53767768]

[0] [0.51283747]

[1] [0.47202855]

[1] [0.52791903]

[0] [0.52703188]

[1] [0.50840172]

[0] [0.48799726]

[0] [0.52437043]

[0] [0.53679053]

[0] [0.50662742]

[1] [0.45605984]

[1] [0.53856483]

[0] [0.47735145]

[1] [0.53412908]

[1] [0.4817872]

[0] [0.50396597]

[1] [0.43210678]

[0] [0.4729157]

[0] [0.52348328]

[0] [0.51283747]

[0] [0.4782386]

[0] [0.53146763]

[0] [0.50041736]

[0] [0.50307882]

[0] [0.50928887]

[1] [0.4817872]

[1] [0.53590338]

[0] [0.44807549]

[1] [0.53412908]

[1] [0.47469]

[0] [0.49332016]

[1] [0.44541404]

[0] [0.4782386]

[1] [0.50041736]

[1] [0.53501623]

[0] [0.47912575]

[0] [0.49598161]

[0] [0.49154586]

[1] [0.48090005]

[0] [0.52791903]

[0] [0.53146763]

[0] [0.45162409]

[1] [0.53590338]

[1] [0.45251124]

Loading [MathJax]/extensions/Safe.js
[1] [0.50574027]

[0] [0.52791903]

[0] [0.47469]

[0] [0.53679053]

[1] [0.48090005]

[1] [0.51461177]

[1] [0.50840172]

[1] [0.47380285]

[0] [0.49686876]

[0] [0.44718834]

[1] [0.50307882]

[0] [0.4764643]

[1] [0.4800129]

[1] [0.50751457]

[1] [0.4800129]

[0] [0.44896264]

[1] [0.53856483]

[0] [0.52703188]

[0] [0.4782386]

[0] [0.51283747]

[0] [0.52969333]

[1] [0.44275258]

[1] [0.48711011]

[0] [0.51904752]

[1] [0.47380285]

[1] [0.49953021]

[0] [0.47557715]

[0] [0.41968667]

[0] [0.53324193]

[1] [0.48799726]

[0] [0.52082182]

[1] [0.4729157]

[1] [0.47557715]

[0] [0.48090005]

[0] [0.48888441]

[0] [0.48888441]

[1] [0.49775591]

[1] [0.50307882]

[1] [0.48090005]

[1] [0.50130451]

[1] [0.48533581]

[1] [0.50574027]

[1] [0.49953021]

[1] [0.49953021]

[0] [0.48533581]

[1] [0.48799726]

[0] [0.48533581]

[1] [0.50485312]

[0] [0.50662742]

[0] [0.49775591]

[1] [0.49420731]

[1] [0.52348328]

[1] [0.53679053]

[0] [0.47735145]

[1] [0.50041736]

[1] [0.52437043]

[1] [0.47469]

[0] [0.49154586]

[0] [0.50307882]

[1] [0.49598161]

[0] [0.47912575]

[1] [0.45694699]

[0] [0.49953021]

[1] [0.52969333]

Loading [MathJax]/extensions/Safe.js
[1] [0.4764643]

[0] [0.49154586]

[1] [0.49953021]

[1] [0.4729157]

[0] [0.52969333]

[0] [0.4729157]

[0] [0.46670565]

[0] [0.52525758]

[1] [0.49775591]

[1] [0.48622296]

[0] [0.53856483]

[1] [0.51195032]

[1] [0.48444866]

[0] [0.50751457]

[1] [0.51017602]

[0] [0.53856483]

[1] [0.49953021]

[1] [0.48533581]

[1] [0.52525758]

[0] [0.43210678]

[1] [0.46315705]

[1] [0.47202855]

[0] [0.52259613]

[1] [0.51816037]

[0] [0.49065871]

[0] [0.47557715]

[0] [0.50307882]

[0] [0.48888441]

[1] [0.42057382]

[0] [0.51017602]

[1] [0.47380285]

[1] [0.44984979]

[0] [0.44718834]

[0] [0.51372462]

[0] [0.48799726]

[0] [0.52703188]

[0] [0.51017602]

[1] [0.53235478]

[1] [0.53146763]

[0] [0.52170897]

[1] [0.53767768]

[0] [0.48888441]

[0] [0.44807549]

[1] [0.51372462]

[0] [0.49864306]

[1] [0.49332016]

[0] [0.47912575]

[0] [0.51549892]

[1] [0.48799726]

[1] [0.52348328]

[1] [0.47735145]

[0] [0.52880618]

[1] [0.51372462]

[1] [0.48977156]

[1] [0.4729157]

[1] [0.51372462]

[0] [0.4817872]

[0] [0.51993467]

[1] [0.48090005]

[1] [0.53235478]

[1] [0.51727322]

[1] [0.48622296]

[0] [0.43299393]

[0] [0.48888441]

Loading [MathJax]/extensions/Safe.js
[0] [0.46226989]

[0] [0.51017602]

[0] [0.43654253]

[0] [0.45073694]

[0] [0.42589673]

[0] [0.53235478]

[0] [0.46670565]

[1] [0.41879952]

[0] [0.51461177]

[1] [0.50041736]

[1] [0.51372462]

[0] [0.45605984]

[1] [0.49420731]

[1] [0.48267436]

[0] [0.46670565]

[1] [0.42500958]

[0] [0.48090005]

[1] [0.4711414]

[1] [0.42412243]

[1] [0.46138274]

[0] [0.49864306]

[1] [0.50485312]

[0] [0.44009113]

[1] [0.43388108]

[0] [0.48799726]

[0] [0.48090005]

[0] [0.42146097]

[1] [0.51461177]

[1] [0.45694699]

[1] [0.47557715]

[0] [0.45251124]

[1] [0.4817872]

[1] [0.50130451]

[0] [0.48090005]

[0] [0.47912575]

[0] [0.42500958]

[1] [0.41968667]

[1] [0.49953021]

[0] [0.47380285]

[0] [0.45960844]

[0] [0.48799726]

[0] [0.42234812]

[0] [0.50751457]

[0] [0.44363974]

[1] [0.50574027]

[0] [0.4658185]

[0] [0.51727322]

[1] [0.50574027]

[0] [0.43299393]

[0] [0.42234812]

[0] [0.51283747]

[0] [0.47202855]

[0] [0.48799726]

[1] [0.50130451]

[0] [0.4675928]

[0] [0.43565538]

[0] [0.48356151]

[1] [0.42412243]

[0] [0.44097828]

[0] [0.48799726]

[1] [0.47202855]

[0] [0.45605984]

[1] [0.48533581]

[1] [0.48622296]

Loading [MathJax]/extensions/Safe.js
[0] [0.42412243]

[1] [0.49775591]

[0] [0.43388108]

[0] [0.48444866]

[1] [0.51461177]

[1] [0.45783414]

[0] [0.52880618]

[1] [0.48622296]

[1] [0.48888441]

[1] [0.47380285]

[0] [0.51816037]

[0] [0.51993467]

[0] [0.47380285]

[1] [0.49775591]

[1] [0.48977156]

[0] [0.53324193]

[1] [0.42678388]

[0] [0.4693671]

[0] [0.49686876]

[0] [0.42944533]

[1] [0.4817872]

[0] [0.48267436]

[1] [0.43299393]

[1] [0.4729157]

[1] [0.51727322]

[1] [0.48356151]

[0] [0.50130451]

[0] [0.49598161]

[1] [0.48711011]

[0] [0.45251124]

[1] [0.4817872]

[1] [0.44807549]

[0] [0.51816037]

[1] [0.48622296]

[0] [0.43033248]

[1] [0.4817872]

[0] [0.49864306]

[1] [0.50396597]

[0] [0.43299393]

[0] [0.45162409]

[1] [0.44097828]

[0] [0.50751457]

[0] [0.48977156]

[1] [0.46670565]

[0] [0.50751457]

[0] [0.42678388]

[0] [0.4817872]

[0] [0.48533581]

[0] [0.45960844]

[1] [0.50307882]

[1] [0.52348328]

[1] [0.49864306]

[0] [0.42944533]

[0] [0.47912575]

[1] [0.50574027]

[0] [0.52525758]

[0] [0.43565538]

[0] [0.49864306]

[1] [0.48799726]

[0] [0.48533581]

[1] [0.44718834]

[0] [0.45694699]

[1] [0.50662742]

[0] [0.44630119]

Loading [MathJax]/extensions/Safe.js
[1] [0.49420731]

[1] [0.49420731]

[1] [0.49864306]

[0] [0.50396597]

[1] [0.42412243]

[1] [0.4817872]

[0] [0.50307882]

[1] [0.44363974]

[0] [0.45960844]

[1] [0.48533581]

[0] [0.49420731]

[0] [0.50662742]

[1] [0.45872129]

[0] [0.49598161]

[1] [0.48799726]

[0] [0.45605984]

[1] [0.48356151]

[1] [0.49775591]

[0] [0.45162409]

[1] [0.48533581]

[0] [0.47202855]

[0] [0.50130451]

[1] [0.4640442]

[1] [0.52170897]

[1] [0.48977156]

[0] [0.48799726]

[1] [0.48888441]

[1] [0.50396597]

[1] [0.43742968]

[0] [0.52170897]

[0] [0.50751457]

[0] [0.4817872]

[1] [0.45960844]

[0] [0.52614473]

[0] [0.51017602]

[1] [0.47380285]

[1] [0.48888441]

[0] [0.52880618]

[0] [0.51106317]

[1] [0.4782386]

[0] [0.48711011]

[1] [0.43654253]

[0] [0.48090005]

[0] [0.52348328]

[0] [0.48622296]

[0] [0.50485312]

[1] [0.47469]

[0] [0.53146763]

[1] [0.49509446]

[1] [0.52259613]

[0] [0.42855818]

[1] [0.52703188]

[0] [0.52259613]

[1] [0.49864306]

[0] [0.51727322]

[1] [0.48888441]

[1] [0.52170897]

[0] [0.52082182]

[1] [0.52259613]

[1] [0.53146763]

[0] [0.51195032]

[1] [0.51461177]

[0] [0.49953021]

[0] [0.51549892]

Loading [MathJax]/extensions/Safe.js
[1] [0.53146763]

[0] [0.53412908]

[1] [0.52259613]

[0] [0.51993467]

[[0.53501623]

[0.53412908]

[0.51372462]

[0.50130451]

[0.48444866]

[0.50041736]

[0.47735145]

[0.41879952]

[0.52259613]

[0.49420731]

[0.48090005]

[0.50574027]

[0.47557715]

[0.53590338]

[0.47912575]

[0.48090005]

[0.52969333]

[0.52880618]

[0.52703188]

[0.52614473]

[0.53235478]

[0.50840172]

[0.51549892]

[0.53235478]

[0.53146763]

[0.53412908]

[0.52703188]

[0.52525758]

[0.51372462]

[0.53590338]

[0.48622296]

[0.43033248]

[0.53767768]

[0.51283747]

[0.47202855]

[0.52791903]

[0.52703188]

[0.50840172]

[0.48799726]

[0.52437043]

[0.53679053]

[0.50662742]

[0.45605984]

[0.53856483]

[0.47735145]

[0.53412908]

[0.4817872 ]

[0.50396597]

[0.43210678]

[0.4729157 ]

[0.52348328]

[0.51283747]

[0.4782386 ]

[0.53146763]

[0.50041736]

[0.50307882]

[0.50928887]

[0.4817872 ]

[0.53590338]

[0.44807549]

Loading [MathJax]/extensions/Safe.js
[0.53412908]

[0.47469 ]

[0.49332016]

[0.44541404]

[0.4782386 ]

[0.50041736]

[0.53501623]

[0.47912575]

[0.49598161]

[0.49154586]

[0.48090005]

[0.52791903]

[0.53146763]

[0.45162409]

[0.53590338]

[0.45251124]

[0.50574027]

[0.52791903]

[0.47469 ]

[0.53679053]

[0.48090005]

[0.51461177]

[0.50840172]

[0.47380285]

[0.49686876]

[0.44718834]

[0.50307882]

[0.4764643 ]

[0.4800129 ]

[0.50751457]

[0.4800129 ]

[0.44896264]

[0.53856483]

[0.52703188]

[0.4782386 ]

[0.51283747]

[0.52969333]

[0.44275258]

[0.48711011]

[0.51904752]

[0.47380285]

[0.49953021]

[0.47557715]

[0.41968667]

[0.53324193]

[0.48799726]

[0.52082182]

[0.4729157 ]

[0.47557715]

[0.48090005]

[0.48888441]

[0.48888441]

[0.49775591]

[0.50307882]

[0.48090005]

[0.50130451]

[0.48533581]

[0.50574027]

[0.49953021]

[0.49953021]

[0.48533581]

[0.48799726]

[0.48533581]

[0.50485312]

Loading [MathJax]/extensions/Safe.js
[0.50662742]

[0.49775591]

[0.49420731]

[0.52348328]

[0.53679053]

[0.47735145]

[0.50041736]

[0.52437043]

[0.47469 ]

[0.49154586]

[0.50307882]

[0.49598161]

[0.47912575]

[0.45694699]

[0.49953021]

[0.52969333]

[0.4764643 ]

[0.49154586]

[0.49953021]

[0.4729157 ]

[0.52969333]

[0.4729157 ]

[0.46670565]

[0.52525758]

[0.49775591]

[0.48622296]

[0.53856483]

[0.51195032]

[0.48444866]

[0.50751457]

[0.51017602]

[0.53856483]

[0.49953021]

[0.48533581]

[0.52525758]

[0.43210678]

[0.46315705]

[0.47202855]

[0.52259613]

[0.51816037]

[0.49065871]

[0.47557715]

[0.50307882]

[0.48888441]

[0.42057382]

[0.51017602]

[0.47380285]

[0.44984979]

[0.44718834]

[0.51372462]

[0.48799726]

[0.52703188]

[0.51017602]

[0.53235478]

[0.53146763]

[0.52170897]

[0.53767768]

[0.48888441]

[0.44807549]

[0.51372462]

[0.49864306]

[0.49332016]

[0.47912575]

[0.51549892]

Loading [MathJax]/extensions/Safe.js
[0.48799726]

[0.52348328]

[0.47735145]

[0.52880618]

[0.51372462]

[0.48977156]

[0.4729157 ]

[0.51372462]

[0.4817872 ]

[0.51993467]

[0.48090005]

[0.53235478]

[0.51727322]

[0.48622296]

[0.43299393]

[0.48888441]

[0.46226989]

[0.51017602]

[0.43654253]

[0.45073694]

[0.42589673]

[0.53235478]

[0.46670565]

[0.41879952]

[0.51461177]

[0.50041736]

[0.51372462]

[0.45605984]

[0.49420731]

[0.48267436]

[0.46670565]

[0.42500958]

[0.48090005]

[0.4711414 ]

[0.42412243]

[0.46138274]

[0.49864306]

[0.50485312]

[0.44009113]

[0.43388108]

[0.48799726]

[0.48090005]

[0.42146097]

[0.51461177]

[0.45694699]

[0.47557715]

[0.45251124]

[0.4817872 ]

[0.50130451]

[0.48090005]

[0.47912575]

[0.42500958]

[0.41968667]

[0.49953021]

[0.47380285]

[0.45960844]

[0.48799726]

[0.42234812]

[0.50751457]

[0.44363974]

[0.50574027]

[0.4658185 ]

[0.51727322]

[0.50574027]

Loading [MathJax]/extensions/Safe.js
[0.43299393]

[0.42234812]

[0.51283747]

[0.47202855]

[0.48799726]

[0.50130451]

[0.4675928 ]

[0.43565538]

[0.48356151]

[0.42412243]

[0.44097828]

[0.48799726]

[0.47202855]

[0.45605984]

[0.48533581]

[0.48622296]

[0.42412243]

[0.49775591]

[0.43388108]

[0.48444866]

[0.51461177]

[0.45783414]

[0.52880618]

[0.48622296]

[0.48888441]

[0.47380285]

[0.51816037]

[0.51993467]

[0.47380285]

[0.49775591]

[0.48977156]

[0.53324193]

[0.42678388]

[0.4693671 ]

[0.49686876]

[0.42944533]

[0.4817872 ]

[0.48267436]

[0.43299393]

[0.4729157 ]

[0.51727322]

[0.48356151]

[0.50130451]

[0.49598161]

[0.48711011]

[0.45251124]

[0.4817872 ]

[0.44807549]

[0.51816037]

[0.48622296]

[0.43033248]

[0.4817872 ]

[0.49864306]

[0.50396597]

[0.43299393]

[0.45162409]

[0.44097828]

[0.50751457]

[0.48977156]

[0.46670565]

[0.50751457]

[0.42678388]

[0.4817872 ]

[0.48533581]

Loading [MathJax]/extensions/Safe.js
[0.45960844]

[0.50307882]

[0.52348328]

[0.49864306]

[0.42944533]

[0.47912575]

[0.50574027]

[0.52525758]

[0.43565538]

[0.49864306]

[0.48799726]

[0.48533581]

[0.44718834]

[0.45694699]

[0.50662742]

[0.44630119]

[0.49420731]

[0.49420731]

[0.49864306]

[0.50396597]

[0.42412243]

[0.4817872 ]

[0.50307882]

[0.44363974]

[0.45960844]

[0.48533581]

[0.49420731]

[0.50662742]

[0.45872129]

[0.49598161]

[0.48799726]

[0.45605984]

[0.48356151]

[0.49775591]

[0.45162409]

[0.48533581]

[0.47202855]

[0.50130451]

[0.4640442 ]

[0.52170897]

[0.48977156]

[0.48799726]

[0.48888441]

[0.50396597]

[0.43742968]

[0.52170897]

[0.50751457]

[0.4817872 ]

[0.45960844]

[0.52614473]

[0.51017602]

[0.47380285]

[0.48888441]

[0.52880618]

[0.51106317]

[0.4782386 ]

[0.48711011]

[0.43654253]

[0.48090005]

[0.52348328]

[0.48622296]

[0.50485312]

[0.47469 ]

[0.53146763]

Loading [MathJax]/extensions/Safe.js
[0.49509446]

[0.52259613]

[0.42855818]

[0.52703188]

[0.52259613]

[0.49864306]

[0.51727322]

[0.48888441]

[0.52170897]

[0.52082182]

[0.52259613]

[0.53146763]

[0.51195032]

[0.51461177]

[0.49953021]

[0.51549892]

[0.53146763]

[0.53412908]

[0.52259613]

[0.51993467]]

EstimatedSalary Gender Predicted

0 19000 1 0.535016

1 20000 1 0.534129

2 43000 0 0.513725

3 57000 0 0.501305

4 76000 1 0.484449

Performance measurement
In [17]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(train_y, y_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(train_y, y_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(train_y, y_predicted

Mean Absolute Error: 0.4176851492362822

Mean Squared Error: 0.2583986123119253

Root Mean Squared Error: 0.5083292361372945

In [18]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(train_y1, y1_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(train_y1, y1_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(train_y1, y1_predicte

Mean Absolute Error: 0.49797455487682124

Mean Squared Error: 0.24898727743841056

Root Mean Squared Error: 0.4989862497488388

In [19]:
from sklearn.metrics import r2_score

test_x = np.asanyarray(df1[['SepalLengthCm']])

test_y = np.asanyarray(df1[['Species']])

test_y_predicted = regress.predict(test_x)

print("Mean absolute error (MAE):" , np.mean(np.absolute(test_y_predicted - test_y)))

print("Mean square error (MSE): " , np.mean((test_y_predicted - test_y) ** 2))

print("R2-score: %.2f (RMSE):" , r2_score(test_y, test_y_predicted) )

Mean absolute error (MAE): 0.4176851492362822

Mean square error (MSE): 0.2583986123119253

R2-score: %.2f (RMSE): 0.6124020815321121

In [20]:
from sklearn.metrics import r2_score

Loading [MathJax]/extensions/Safe.js

test_x1 = np.asanyarray(df2[['EstimatedSalary']])

test_y1 = np.asanyarray(df2[['Gender']])

test_y1_predicted = regress1.predict(test_x1)

print("Mean absolute error (MAE):" , np.mean(np.absolute(test_y1_predicted - test_y1)))

print("Mean square error (MSE): " , np.mean((test_y1_predicted - test_y1) ** 2))

print("R2-score: %.2f (RMSE):" , r2_score(test_y1, test_y1_predicted) )

Mean absolute error (MAE): 0.49797455487682124

Mean square error (MSE): 0.24898727743841056

R2-score: %.2f (RMSE): 0.003652351186832492

Comparison and Inference


The R2-score for dataset 1 is 0.612 and R2-score for dataset 2 is 0.0036. Here R2-score
for dataset 1 is greater than R2-score of dataset 2. This shows that dataset 1 works
better in the linear regression model when compared with dataset 2.

Using training and testing data


In [21]:
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(df1[['SepalLengthCm']], df1[['Species


print(train_x)

print("Testing")

print(test_x)

from sklearn import linear_model

regress = linear_model.LinearRegression()

regress.fit (train_x,train_y)

# The coefficients

print ('Coefficients: ', regress.coef_)

print ('Intercept: ',regress.intercept_)

SepalLengthCm

81 5.5

133 6.3

137 6.4

75 6.6

109 7.2

.. ...

71 6.1

106 4.9

14 5.8

92 5.8

102 7.1

[105 rows x 1 columns]

Testing

SepalLengthCm

73 6.1

18 5.7

118 7.7

78 6.0

76 6.8

31 5.4

64 5.6

141 6.9

68 6.2

82 5.8

110 6.5

Loading [MathJax]/extensions/Safe.js
12 4.8

36 5.5

9 4.9

19 5.1

56 6.3

104 6.5

69 5.6

55 5.7

132 6.4

29 4.7

127 6.1

26 5.0

128 6.4

131 7.9

145 6.7

108 6.7

143 6.8

45 4.8

30 4.8

22 4.6

15 5.7

65 6.7

11 4.8

42 4.4

146 6.3

51 6.4

27 5.2

4 5.0

32 5.2

142 5.8

85 6.0

86 6.7

16 5.4

10 5.4

Coefficients: [[0.74418421]]

Intercept: [-3.29101915]

In [22]:
y_predicted = regress.predict(test_x)

print(test_x)

print(test_y)

print(y_predicted)

SepalLengthCm

73 6.1

18 5.7

118 7.7

78 6.0

76 6.8

31 5.4

64 5.6

141 6.9

68 6.2

82 5.8

110 6.5

12 4.8

36 5.5

9 4.9

19 5.1

56 6.3

104 6.5

69 5.6

55 5.7

132 6.4

29 4.7

Loading [MathJax]/extensions/Safe.js6.1

127
26 5.0

128 6.4

131 7.9

145 6.7

108 6.7

143 6.8

45 4.8

30 4.8

22 4.6

15 5.7

65 6.7

11 4.8

42 4.4

146 6.3

51 6.4

27 5.2

4 5.0

32 5.2

142 5.8

85 6.0

86 6.7

16 5.4

10 5.4

Species

73 1

18 0

118 2

78 1

76 1

31 0

64 1

141 2

68 1

82 1

110 2

12 0

36 0

9 0

19 0

56 1

104 2

69 1

55 1

132 2

29 0

127 2

26 0

128 2

131 2

145 2

108 2

143 2

45 0

30 0

22 0

15 0

65 1

11 0

42 0

146 2

51 1

27 0

4 0

32 0

Loading [MathJax]/extensions/Safe.js
142 2

85 1

86 1

16 0

10 0

[[ 1.24850451]

[ 0.95083083]

[ 2.43919924]

[ 1.17408609]

[ 1.76943345]

[ 0.72757557]

[ 0.87641241]

[ 1.84385188]

[ 1.32292293]

[ 1.02524925]

[ 1.54617819]

[ 0.28106504]

[ 0.80199399]

[ 0.35548346]

[ 0.5043203 ]

[ 1.39734135]

[ 1.54617819]

[ 0.87641241]

[ 0.95083083]

[ 1.47175977]

[ 0.20664662]

[ 1.24850451]

[ 0.42990188]

[ 1.47175977]

[ 2.58803608]

[ 1.69501503]

[ 1.69501503]

[ 1.76943345]

[ 0.28106504]

[ 0.28106504]

[ 0.1322282 ]

[ 0.95083083]

[ 1.69501503]

[ 0.28106504]

[-0.01660864]

[ 1.39734135]

[ 1.47175977]

[ 0.57873872]

[ 0.42990188]

[ 0.57873872]

[ 1.02524925]

[ 1.17408609]

[ 1.69501503]

[ 0.72757557]

[ 0.72757557]]

In [23]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(test_y, y_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(test_y, y_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(test_y, y_predicted)

Mean Absolute Error: 0.44066483796662653

Mean Squared Error: 0.25967220415563674

Root Mean Squared Error: 0.5095804197137452

In [24]:
from sklearn.model_selection import cross_val_score

accuracy = cross_val_score(regress, df1[['SepalLengthCm']], df1[['Species']], cv = 5,scori


print(accuracy)

Loading [MathJax]/extensions/Safe.js
[ 0. -0.20701597 0. -0.97441394 0. ]

In [25]:
from sklearn.model_selection import train_test_split

train_x1, test_x1, train_y1, test_y1 = train_test_split(df2[['EstimatedSalary']], df2[['Ge


print(train_x1)

print("Testing")

print(test_x1)

from sklearn import linear_model

regress1 = linear_model.LinearRegression()

regress1.fit (train_x1,train_y1)

# The coefficients

print ('Coefficients: ', regress1.coef_)

print ('Intercept: ',regress1.intercept_)

EstimatedSalary

157 75000

109 80000

17 26000

347 108000

24 23000

.. ...

71 27000

106 35000

270 133000

348 77000

102 86000

[280 rows x 1 columns]

Testing

EstimatedSalary

209 22000

280 88000

33 44000

210 96000

93 28000

.. ...

60 20000

79 17000

285 93000

305 54000

281 61000

[120 rows x 1 columns]

Coefficients: [[-5.40299639e-07]]

Intercept: [0.52742508]

In [26]:
y1_predicted = regress1.predict(test_x1)

print(test_x1)

print(test_y1)

print(y1_predicted)

EstimatedSalary

209 22000

280 88000

33 44000

210 96000

93 28000

.. ...

60 20000

79 17000

285 93000

Loading [MathJax]/extensions/Safe.js
305 54000

281 61000

[120 rows x 1 columns]

Gender

209 0

280 0

33 0

210 0

93 0

.. ...

60 1

79 0

285 0

305 1

281 1

[120 rows x 1 columns]

[[0.51553849]

[0.47987871]

[0.5036519 ]

[0.47555631]

[0.51229669]

[0.4939265 ]

[0.46961302]

[0.48258021]

[0.48690261]

[0.4923056 ]

[0.4923056 ]

[0.50905489]

[0.5014907 ]

[0.51499819]

[0.48041901]

[0.46907272]

[0.50689369]

[0.48744291]

[0.5047325 ]

[0.4950071 ]

[0.51283699]

[0.48420111]

[0.51499819]

[0.48636231]

[0.51715939]

[0.51499819]

[0.48420111]

[0.4987892 ]

[0.45988763]

[0.51607879]

[0.50527279]

[0.48960411]

[0.44962193]

[0.48474141]

[0.46961302]

[0.48690261]

[0.4977086 ]

[0.4923056 ]

[0.51121609]

[0.4906847 ]

[0.50959519]

[0.4977086 ]

[0.4993295 ]

[0.51661909]

[0.5009504 ]

[0.45232343]

Loading [MathJax]/extensions/Safe.js
[0.4944668 ]

[0.48906381]

[0.50527279]

[0.5041922 ]

[0.48474141]

[0.48312051]

[0.4977086 ]

[0.48420111]

[0.48420111]

[0.51067579]

[0.47231452]

[0.46366972]

[0.48744291]

[0.48636231]

[0.4998698 ]

[0.496628 ]

[0.47879811]

[0.48041901]

[0.4955474 ]

[0.45340403]

[0.47879811]

[0.44962193]

[0.4955474 ]

[0.46637122]

[0.45286373]

[0.48149961]

[0.4960877 ]

[0.51661909]

[0.50905489]

[0.5004101 ]

[0.45556523]

[0.51337729]

[0.502031 ]

[0.48528171]

[0.496628 ]

[0.51229669]

[0.45502493]

[0.50635339]

[0.51337729]

[0.46258912]

[0.4993295 ]

[0.48744291]

[0.48095931]

[0.48366081]

[0.48095931]

[0.4982489 ]

[0.4955474 ]

[0.48420111]

[0.51878029]

[0.48474141]

[0.45718613]

[0.4928459 ]

[0.45016223]

[0.46691152]

[0.46150852]

[0.48906381]

[0.51391759]

[0.51715939]

[0.5004101 ]

[0.44638013]

[0.51175639]

[0.46961302]

[0.50905489]

[0.4944668 ]

Loading [MathJax]/extensions/Safe.js
[0.51067579]

[0.48906381]

[0.50635339]

[0.50635339]

[0.46691152]

[0.51661909]

[0.51823999]

[0.47717721]

[0.4982489 ]

[0.4944668 ]]

In [27]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(test_y1, y1_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(test_y1, y1_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(test_y1, y1_predicted

Mean Absolute Error: 0.4977105615001104

Mean Squared Error: 0.24813962286392463

Root Mean Squared Error: 0.4981361489230877

In [28]:
from sklearn.model_selection import cross_val_score

accuracy1 = cross_val_score(regress1, df2[['EstimatedSalary']], df2[['Gender']], cv = 5,sc


print(accuracy1)

[ 0.00712845 -0.04048316 0.00279302 -0.01524552 -0.02317653]

Comparison and Inference


On comparing the Mean absolute error, Mean squared error, Root mean squared error
of dataset1 and dataset2, we could conclude that dataset1 better fit the linear
regression model when compared with dataset2.

Bayes Classification
In [29]:
from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import CategoricalNB

Dataset 1:
In [30]:
import pandas as pd

In [31]:
df1=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\Bayes_dataset1.csv')

In [32]:
df1.head()

Out[32]: SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa


Loading [MathJax]/extensions/Safe.js
In [33]:
df1.isnull().sum()

SepalLengthCm 0

Out[33]:
SepalWidthCm 0

PetalLengthCm 0

PetalWidthCm 0

Species 0

dtype: int64

In [34]:
features=df1.columns.tolist()

features.remove('Species')

features

['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']


Out[34]:

In [35]:
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

encoder=OrdinalEncoder()

data_encoded=encoder.fit_transform(df1[features])

df1_encoded=pd.DataFrame(data_encoded, columns=features)

In [36]:
encoder=LabelEncoder()

target_encoded=encoder.fit_transform(df1['Species'])

df1_encoded['Species']=target_encoded

encoder.inverse_transform(target_encoded)

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

Out[36]:
'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

Loading [MathJax]/extensions/Safe.js
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica'], dtype=object)

In [37]:
df1_encoded.head()

Out[37]: SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 8.0 14.0 4.0 1.0 0

1 6.0 9.0 4.0 1.0 0

2 4.0 11.0 3.0 1.0 0

3 3.0 10.0 5.0 1.0 0

4 7.0 15.0 4.0 1.0 0

In [38]:
X_train, X_test, y_train, y_test = train_test_split(df1_encoded.drop('Species',axis=1), df

In [39]:
print(X_train.shape)

print(y_test.shape)

(120, 4)

(30,)

In [40]:
cnb = CategoricalNB()

In [41]:
cnb.fit(X_train, y_train)

y_predict = cnb.predict(X_test)

print(y_predict)

[2 1 0 2 0 2 0 1 1 1 1 1 1 2 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0]

In [42]:
print("Number of mislabeled points out of a total %d points : %d"%(X_test.shape[0], (y_tes

Number of mislabeled points out of a total 30 points : 2

In [43]:
from sklearn import metrics

print('Accucary:', metrics.accuracy_score(y_test, y_predict))

print('Confusion Matrix\n',metrics.confusion_matrix(y_test, y_predict))

Accucary: 0.9333333333333333

Confusion Matrix

[[11 0 0]

[ 0 12 1]

[ 0 1 5]]

Dataset 2
In [44]:
Loading [MathJax]/extensions/Safe.js
df2=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\Bayes_dataset2.csv')

In [45]:
df2.head()

Out[45]: Age EstimatedSalary Purchased Gender

0 19 19000 0 Male

1 35 20000 0 Male

2 26 43000 0 Female

3 27 57000 0 Female

4 19 76000 0 Male

In [46]:
df2.isnull().sum()

Age 0

Out[46]:
EstimatedSalary 0

Purchased 0

Gender 0

dtype: int64

In [47]:
features=df2.columns.tolist()

features.remove('Gender')

features

['Age', 'EstimatedSalary', 'Purchased']


Out[47]:

In [48]:
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

encoder=OrdinalEncoder()

data_encoded=encoder.fit_transform(df2[features])

df2_encoded=pd.DataFrame(data_encoded, columns=features)

In [49]:
encoder=LabelEncoder()

target_encoded=encoder.fit_transform(df2['Gender'])

df2_encoded['Gender']=target_encoded

encoder.inverse_transform(target_encoded)

array(['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female',

Out[49]:
'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Male', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female',

'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Female', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female',

'Female', 'Male', 'Male', 'Male', 'Male', 'Female', 'Female',

'Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Female', 'Male',

'Female', 'Male', 'Female', 'Female', 'Male', 'Male', 'Male',

'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Male',

Loading [MathJax]/extensions/Safe.js
'Female', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Male', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female',

'Male', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female',

'Male', 'Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female',

'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Male', 'Male', 'Female', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Female', 'Female', 'Male', 'Female', 'Female', 'Female', 'Female',

'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Female', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Male', 'Male', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female',

'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Female',

'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female',

'Male', 'Male', 'Male', 'Female', 'Male', 'Male', 'Male', 'Female',

'Female', 'Female', 'Male', 'Female', 'Female', 'Male', 'Male',

'Female', 'Female', 'Male', 'Female', 'Male', 'Female', 'Female',

'Female', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male',

'Female', 'Male', 'Female'], dtype=object)

In [50]:
df2_encoded.head()

Out[50]: Age EstimatedSalary Purchased Gender

0 1.0 4.0 0.0 1

1 17.0 5.0 0.0 1

2 8.0 26.0 0.0 0

3 9.0 39.0 0.0 0

4 1.0 57.0 0.0 1

In [51]:
X_train, X_test, y_train, y_test = train_test_split(df2_encoded.drop('Gender',axis=1), df2

In [52]:
print(X_train.shape)

print(y_test.shape)

(320, 3)

(80,)

In [53]:
cnb = CategoricalNB()

Loading [MathJax]/extensions/Safe.js
In [54]:
cnb.fit(X_train, y_train)

y_predict = cnb.predict(X_test)

print(y_predict)

[1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 0 0 1 0 1

1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0

0 0 1 1 1 1]

In [55]:
print("Number of mislabeled points out of a total %d points : %d"%(X_test.shape[0], (y_tes

Number of mislabeled points out of a total 80 points : 42

In [56]:
from sklearn import metrics

print('Accucary:', metrics.accuracy_score(y_test, y_predict))

print('Confusion Matrix\n',metrics.confusion_matrix(y_test, y_predict))

Accucary: 0.475

Confusion Matrix

[[17 23]

[19 21]]

Comparison and Inference


Accuracy for dataset 1 is 0.9333 and Accuracy for dataset 2 is 0.475. This shows that
Dataset 1 has more accuracy in prediction

In [ ]:

Loading [MathJax]/extensions/Safe.js

You might also like