0% found this document useful (0 votes)

40 views37 pages

Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha

The document performs linear regression on two datasets. It imports packages, loads and prepares the datasets, checks for linearity, builds linear regression models, and plots the actual and predicted outputs. The linear regression is performed to predict species from sepal length using one dataset, and to predict gender from estimated salary using a second dataset. The models' coefficients and intercepts are output.

Uploaded by

Xyz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

40 views37 pages

Regno: 19mis1106 Name: Sam Melvin M Course Code - Swe4012 - Machine Learning Lab Slot: L11+L12 Faculty: Dr. M. Premalatha

Uploaded by

Xyz

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

RegNo: 19MIS1106

Name: SAM MELVIN M

Course Code – SWE4012 – Machine Learning

Lab

Slot: L11+L12

Faculty: Dr. M. Premalatha

Linear regression
Importing packages

In [1]:
import matplotlib.pyplot as plt

import numpy as np

import pandas as pd

%matplotlib inline

Importing data

In [2]:
df1=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\LR_dataset1.csv')

df1.head()

Out[2]: SepalLengthCm Species

0 5.1 Iris-setosa

1 4.9 Iris-setosa

2 4.7 Iris-setosa

3 4.6 Iris-setosa

4 5.0 Iris-setosa

In [3]:
df2=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\LR_dataset2.csv')

df2.head()

Out[3]: EstimatedSalary Gender

0 19000 Male

1 20000 Male

2 43000 Female

3 57000 Female

4 76000 Male
Loading [MathJax]/extensions/Safe.js
Converting categorical to numerical
In [4]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

label = le.fit_transform(df1['Species'])

label

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

Out[4]:
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,

0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,

2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

In [5]:
df1.drop("Species", axis=1, inplace=True)

df1["Species"] = label

df1

Out[5]: SepalLengthCm Species

0 5.1 0

1 4.9 0

2 4.7 0

3 4.6 0

4 5.0 0

... ... ...

145 6.7 2

146 6.3 2

147 6.5 2

148 6.2 2

149 5.9 2

150 rows × 2 columns

In [6]:
from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()

label = le.fit_transform(df2['Gender'])

label

array([1, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0,

Out[6]:
1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1,

0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 1, 0, 1,

1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0,

1, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 0, 1, 0, 1, 1, 0,

0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1,

1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 1, 1, 0,

1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0,

0, 1, 1, 0, 1, 0, 0, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, 0, 0,

1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1,

0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 1, 1, 1, 0, 1, 1, 0, 0, 0, 1, 1,

0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0,

1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0,

0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0,

1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 0, 1, 0,

Loading [MathJax]/extensions/Safe.js
1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1,

0, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 1,

0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 1, 1, 0, 1, 0, 0,

1, 0, 1, 0])

In [7]:
df2.drop("Gender", axis=1, inplace=True)

df2["Gender"] = label

df2

Out[7]: EstimatedSalary Gender

0 19000 1

1 20000 1

2 43000 0

3 57000 0

4 76000 1

... ... ...

395 41000 0

396 23000 1

397 20000 0

398 33000 1

399 36000 0

400 rows × 2 columns

Checking Linearity
In [8]:
plt.scatter(df1['SepalLengthCm'],df1['Species'])

plt.title("Plot")

plt.xlim(0,10)

plt.ylim(0,5)

plt.xticks(np.arange(0,10,0.5))

plt.legend()

plt.xlabel("SepalLengthCm")

plt.ylabel("Species")

plt.show()

No handles with labels found to put in legend.

Loading [MathJax]/extensions/Safe.js
In [9]:
import seaborn as sb

data_corr = df1.corr()

sb.heatmap(data_corr,annot=True)

<AxesSubplot:>
Out[9]:

In [10]:
plt.scatter(df2['EstimatedSalary'],df2['Gender'])

plt.title("Plot")

plt.xlim(10000,100000)

plt.ylim(0,2)

plt.xticks(np.arange(0,10000,10000))

plt.legend()

plt.xlabel("EstimatedSalary")

plt.ylabel("Gender")

plt.show()

No handles with labels found to put in legend.

In [11]:
import seaborn as sb

data_corr = df2.corr()

sb.heatmap(data_corr,annot=True)

<AxesSubplot:>
Out[11]:

Loading [MathJax]/extensions/Safe.js
Modeling
In [12]:
from sklearn import linear_model

regress = linear_model.LinearRegression()

train_x = np.asanyarray(df1[['SepalLengthCm']])

train_y = np.asanyarray(df1[['Species']])

#print(train_x)

#print(train_y)

regress.fit(train_x,train_y)

# The coefficients

print ('Coefficients: ', regress.coef_)

print ('Intercept: ',regress.intercept_)

regress1 = linear_model.LinearRegression()

train_x1 = np.asanyarray(df2[['EstimatedSalary']])

train_y1 = np.asanyarray(df2[['Gender']])

#print(train_x)

#print(train_y)

regress1.fit (train_x1,train_y1)

# The coefficients

print ('Coefficients: ', regress1.coef_)

print ('Intercept: ',regress1.intercept_)

Coefficients: [[0.77421249]]

Intercept: [-3.52398166]

Coefficients: [[-8.8715045e-07]]

Intercept: [0.55187209]

Plot outputs
In [13]:
plt.scatter(df1.SepalLengthCm, df1.Species, color='blue',label="Actual")

plt.plot(train_x, regress.coef_[0][0]*train_x + regress.intercept_[0], '-r',label="Predict

plt.title("Placement plot")

plt.xlim(0,10)

plt.ylim(0,5)

plt.xticks(np.arange(0,10,0.5))

plt.legend()

plt.xlabel("SepalLengthCm")

plt.ylabel("Species")

plt.show()

Loading [MathJax]/extensions/Safe.js
In [14]:
plt.scatter(df2.EstimatedSalary, df2.Gender, color='blue',label="Actual")

plt.plot(train_x1, regress1.coef_[0][0]*train_x1 + regress1.intercept_[0], '-r',label="Pre

plt.title("Placement plot")

plt.xlim(10000,100000)

plt.ylim(0,2)

plt.xticks(np.arange(10000,100000,10000))

plt.legend()

plt.xlabel("EstimatedSalary")

plt.ylabel("Gender")

plt.show()

Predictions
In [15]:
y_predicted = regress.predict(train_x)

for i in range(0,len(train_x)):

print(train_y[i],y_predicted[i])

df1['Predicted'] = y_predicted

print(y_predicted)

print(df1.head())

[0] [0.42450205]

[0] [0.26965955]

[0] [0.11481705]

[0] [0.0373958]

[0] [0.3470808]

[0] [0.6567658]

Loading [MathJax]/extensions/Safe.js
[0] [0.0373958]

[0] [0.3470808]

[0] [-0.1174467]

[0] [0.26965955]

[0] [0.6567658]

[0] [0.1922383]

[0] [-0.19486795]

[0] [0.96645079]

[0] [0.88902954]

[0] [0.6567658]

[0] [0.42450205]

[0] [0.88902954]

[0] [0.42450205]

[0] [0.6567658]

[0] [0.42450205]

[0] [0.0373958]

[0] [0.42450205]

[0] [0.1922383]

[0] [0.3470808]

[0] [0.5019233]

[0] [0.11481705]

[0] [0.1922383]

[0] [0.6567658]

[0] [0.5019233]

[0] [0.73418704]

[0] [0.26965955]

[0] [0.3470808]

[0] [0.73418704]

[0] [0.26965955]

[0] [-0.1174467]

[0] [0.42450205]

[0] [0.3470808]

[0] [-0.04002545]

[0] [-0.1174467]

[0] [0.3470808]

[0] [0.42450205]

[0] [0.1922383]

[0] [0.42450205]

[0] [0.0373958]

[0] [0.57934455]

[0] [0.3470808]

[1] [1.89550578]

[1] [1.43097829]

[1] [1.81808453]

[1] [0.73418704]

[1] [1.50839954]

[1] [0.88902954]

[1] [1.35355704]

[1] [0.26965955]

[1] [1.58582079]

[1] [0.5019233]

[1] [0.3470808]

[1] [1.04387204]

[1] [1.12129329]

[1] [1.19871454]

[1] [0.81160829]

[1] [1.66324204]

[1] [0.81160829]

[1] [0.96645079]

[1] [1.27613579]

[1] [0.81160829]

Loading [MathJax]/extensions/Safe.js
[1] [1.04387204]

[1] [1.19871454]

[1] [1.35355704]

[1] [1.19871454]

[1] [1.43097829]

[1] [1.58582079]

[1] [1.74066328]

[1] [1.66324204]

[1] [1.12129329]

[1] [0.88902954]

[1] [0.73418704]

[1] [0.96645079]

[1] [1.12129329]

[1] [0.6567658]

[1] [1.12129329]

[1] [1.66324204]

[1] [1.35355704]

[1] [0.81160829]

[1] [0.73418704]

[1] [1.19871454]

[1] [0.96645079]

[1] [0.3470808]

[1] [0.81160829]

[1] [0.88902954]

[1] [1.27613579]

[1] [0.42450205]

[1] [0.88902954]

[2] [1.35355704]

[2] [0.96645079]

[2] [1.97292703]

[2] [1.35355704]

[2] [1.50839954]

[2] [2.36003328]

[2] [0.26965955]

[2] [2.12776953]

[2] [1.66324204]

[2] [2.05034828]

[2] [1.50839954]

[2] [1.43097829]

[2] [1.74066328]

[2] [0.88902954]

[2] [0.96645079]

[2] [1.43097829]

[2] [1.50839954]

[2] [2.43745453]

[2] [1.12129329]

[2] [1.81808453]

[2] [0.81160829]

[2] [2.43745453]

[2] [1.35355704]

[2] [1.66324204]

[2] [2.05034828]

[2] [1.27613579]

[2] [1.19871454]

[2] [1.43097829]

[2] [2.05034828]

[2] [2.20519078]

[2] [2.59229703]

[2] [1.43097829]

[2] [1.35355704]

Loading [MathJax]/extensions/Safe.js
[2] [1.19871454]

[2] [2.43745453]

[2] [1.35355704]

[2] [1.43097829]

[2] [1.12129329]

[2] [1.81808453]

[2] [1.66324204]

[2] [1.81808453]

[2] [0.96645079]

[2] [1.74066328]

[2] [1.66324204]

[2] [1.35355704]

[2] [1.50839954]

[2] [1.27613579]

[2] [1.04387204]

[[ 0.42450205]

[ 0.26965955]

[ 0.11481705]

[ 0.0373958 ]

[ 0.3470808 ]

[ 0.6567658 ]

[ 0.0373958 ]

[ 0.3470808 ]

[-0.1174467 ]

[ 0.26965955]

[ 0.6567658 ]

[ 0.1922383 ]

[-0.19486795]

[ 0.96645079]

[ 0.88902954]

[ 0.6567658 ]

[ 0.42450205]

[ 0.88902954]

[ 0.42450205]

[ 0.6567658 ]

[ 0.42450205]

[ 0.0373958 ]

[ 0.42450205]

[ 0.1922383 ]

[ 0.3470808 ]

[ 0.5019233 ]

[ 0.11481705]

[ 0.1922383 ]

[ 0.6567658 ]

[ 0.5019233 ]

[ 0.73418704]

[ 0.26965955]

[ 0.3470808 ]

[ 0.73418704]

[ 0.26965955]

[-0.1174467 ]

[ 0.42450205]

[ 0.3470808 ]

[-0.04002545]

[-0.1174467 ]

[ 0.3470808 ]

[ 0.42450205]

[ 0.1922383 ]

[ 0.42450205]

[ 0.0373958 ]

Loading [MathJax]/extensions/Safe.js
[ 0.57934455]

[ 0.3470808 ]

[ 1.89550578]

[ 1.43097829]

[ 1.81808453]

[ 0.73418704]

[ 1.50839954]

[ 0.88902954]

[ 1.35355704]

[ 0.26965955]

[ 1.58582079]

[ 0.5019233 ]

[ 0.3470808 ]

[ 1.04387204]

[ 1.12129329]

[ 1.19871454]

[ 0.81160829]

[ 1.66324204]

[ 0.81160829]

[ 0.96645079]

[ 1.27613579]

[ 0.81160829]

[ 1.04387204]

[ 1.19871454]

[ 1.35355704]

[ 1.19871454]

[ 1.43097829]

[ 1.58582079]

[ 1.74066328]

[ 1.66324204]

[ 1.12129329]

[ 0.88902954]

[ 0.73418704]

[ 0.96645079]

[ 1.12129329]

[ 0.6567658 ]

[ 1.12129329]

[ 1.66324204]

[ 1.35355704]

[ 0.81160829]

[ 0.73418704]

[ 1.19871454]

[ 0.96645079]

[ 0.3470808 ]

[ 0.81160829]

[ 0.88902954]

[ 1.27613579]

[ 0.42450205]

[ 0.88902954]

[ 1.35355704]

[ 0.96645079]

[ 1.97292703]

[ 1.35355704]

[ 1.50839954]

[ 2.36003328]

[ 0.26965955]

[ 2.12776953]

[ 1.66324204]

[ 2.05034828]

[ 1.50839954]

[ 1.43097829]

Loading [MathJax]/extensions/Safe.js
[ 1.74066328]

[ 0.88902954]

[ 0.96645079]

[ 1.43097829]

[ 1.50839954]

[ 2.43745453]

[ 1.12129329]

[ 1.81808453]

[ 0.81160829]

[ 2.43745453]

[ 1.35355704]

[ 1.66324204]

[ 2.05034828]

[ 1.27613579]

[ 1.19871454]

[ 1.43097829]

[ 2.05034828]

[ 2.20519078]

[ 2.59229703]

[ 1.43097829]

[ 1.35355704]

[ 1.19871454]

[ 2.43745453]

[ 1.35355704]

[ 1.43097829]

[ 1.12129329]

[ 1.81808453]

[ 1.66324204]

[ 1.81808453]

[ 0.96645079]

[ 1.74066328]

[ 1.66324204]

[ 1.35355704]

[ 1.50839954]

[ 1.27613579]

[ 1.04387204]]

SepalLengthCm Species Predicted

0 5.1 0 0.424502

1 4.9 0 0.269660

2 4.7 0 0.114817

3 4.6 0 0.037396

4 5.0 0 0.347081

In [16]:
y1_predicted = regress1.predict(train_x1)

for i in range(0,len(train_x1)):

print(train_y1[i],y1_predicted[i])

df2['Predicted'] = y1_predicted

print(y1_predicted)

print(df2.head())

[1] [0.53501623]

[1] [0.53412908]

[0] [0.51372462]

[0] [0.50130451]

[1] [0.48444866]

[1] [0.50041736]

[0] [0.47735145]

[0] [0.41879952]

[1] [0.52259613]

[0] [0.49420731]

[0] [0.48090005]

[0] [0.50574027]

Loading [MathJax]/extensions/Safe.js
[1] [0.47557715]

[1] [0.53590338]

[1] [0.47912575]

[1] [0.48090005]

[1] [0.52969333]

[1] [0.52880618]

[1] [0.52703188]

[0] [0.52614473]

[1] [0.53235478]

[0] [0.50840172]

[1] [0.51549892]

[0] [0.53235478]

[1] [0.53146763]

[1] [0.53412908]

[1] [0.52703188]

[0] [0.52525758]

[1] [0.51372462]

[1] [0.53590338]

[1] [0.48622296]

[0] [0.43033248]

[0] [0.53767768]

[0] [0.51283747]

[1] [0.47202855]

[1] [0.52791903]

[0] [0.52703188]

[1] [0.50840172]

[0] [0.48799726]

[0] [0.52437043]

[0] [0.53679053]

[0] [0.50662742]

[1] [0.45605984]

[1] [0.53856483]

[0] [0.47735145]

[1] [0.53412908]

[1] [0.4817872]

[0] [0.50396597]

[1] [0.43210678]

[0] [0.4729157]

[0] [0.52348328]

[0] [0.51283747]

[0] [0.4782386]

[0] [0.53146763]

[0] [0.50041736]

[0] [0.50307882]

[0] [0.50928887]

[1] [0.4817872]

[1] [0.53590338]

[0] [0.44807549]

[1] [0.53412908]

[1] [0.47469]

[0] [0.49332016]

[1] [0.44541404]

[0] [0.4782386]

[1] [0.50041736]

[1] [0.53501623]

[0] [0.47912575]

[0] [0.49598161]

[0] [0.49154586]

[1] [0.48090005]

[0] [0.52791903]

[0] [0.53146763]

[0] [0.45162409]

[1] [0.53590338]

[1] [0.45251124]

Loading [MathJax]/extensions/Safe.js
[1] [0.50574027]

[0] [0.52791903]

[0] [0.47469]

[0] [0.53679053]

[1] [0.48090005]

[1] [0.51461177]

[1] [0.50840172]

[1] [0.47380285]

[0] [0.49686876]

[0] [0.44718834]

[1] [0.50307882]

[0] [0.4764643]

[1] [0.4800129]

[1] [0.50751457]

[1] [0.4800129]

[0] [0.44896264]

[1] [0.53856483]

[0] [0.52703188]

[0] [0.4782386]

[0] [0.51283747]

[0] [0.52969333]

[1] [0.44275258]

[1] [0.48711011]

[0] [0.51904752]

[1] [0.47380285]

[1] [0.49953021]

[0] [0.47557715]

[0] [0.41968667]

[0] [0.53324193]

[1] [0.48799726]

[0] [0.52082182]

[1] [0.4729157]

[1] [0.47557715]

[0] [0.48090005]

[0] [0.48888441]

[1] [0.49775591]

[1] [0.50307882]

[1] [0.48090005]

[1] [0.50130451]

[1] [0.48533581]

[1] [0.50574027]

[1] [0.49953021]

[0] [0.48533581]

[1] [0.48799726]

[0] [0.48533581]

[1] [0.50485312]

[0] [0.50662742]

[0] [0.49775591]

[1] [0.49420731]

[1] [0.52348328]

[1] [0.53679053]

[0] [0.47735145]

[1] [0.50041736]

[1] [0.52437043]

[1] [0.47469]

[0] [0.49154586]

[0] [0.50307882]

[1] [0.49598161]

[0] [0.47912575]

[1] [0.45694699]

[0] [0.49953021]

[1] [0.52969333]

Loading [MathJax]/extensions/Safe.js
[1] [0.4764643]

[0] [0.49154586]

[1] [0.49953021]

[1] [0.4729157]

[0] [0.52969333]

[0] [0.4729157]

[0] [0.46670565]

[0] [0.52525758]

[1] [0.49775591]

[1] [0.48622296]

[0] [0.53856483]

[1] [0.51195032]

[1] [0.48444866]

[0] [0.50751457]

[1] [0.51017602]

[0] [0.53856483]

[1] [0.49953021]

[1] [0.48533581]

[1] [0.52525758]

[0] [0.43210678]

[1] [0.46315705]

[1] [0.47202855]

[0] [0.52259613]

[1] [0.51816037]

[0] [0.49065871]

[0] [0.47557715]

[0] [0.50307882]

[0] [0.48888441]

[1] [0.42057382]

[0] [0.51017602]

[1] [0.47380285]

[1] [0.44984979]

[0] [0.44718834]

[0] [0.51372462]

[0] [0.48799726]

[0] [0.52703188]

[0] [0.51017602]

[1] [0.53235478]

[1] [0.53146763]

[0] [0.52170897]

[1] [0.53767768]

[0] [0.48888441]

[0] [0.44807549]

[1] [0.51372462]

[0] [0.49864306]

[1] [0.49332016]

[0] [0.47912575]

[0] [0.51549892]

[1] [0.48799726]

[1] [0.52348328]

[1] [0.47735145]

[0] [0.52880618]

[1] [0.51372462]

[1] [0.48977156]

[1] [0.4729157]

[1] [0.51372462]

[0] [0.4817872]

[0] [0.51993467]

[1] [0.48090005]

[1] [0.53235478]

[1] [0.51727322]

[1] [0.48622296]

[0] [0.43299393]

[0] [0.48888441]

Loading [MathJax]/extensions/Safe.js
[0] [0.46226989]

[0] [0.51017602]

[0] [0.43654253]

[0] [0.45073694]

[0] [0.42589673]

[0] [0.53235478]

[0] [0.46670565]

[1] [0.41879952]

[0] [0.51461177]

[1] [0.50041736]

[1] [0.51372462]

[0] [0.45605984]

[1] [0.49420731]

[1] [0.48267436]

[0] [0.46670565]

[1] [0.42500958]

[0] [0.48090005]

[1] [0.4711414]

[1] [0.42412243]

[1] [0.46138274]

[0] [0.49864306]

[1] [0.50485312]

[0] [0.44009113]

[1] [0.43388108]

[0] [0.48799726]

[0] [0.48090005]

[0] [0.42146097]

[1] [0.51461177]

[1] [0.45694699]

[1] [0.47557715]

[0] [0.45251124]

[1] [0.4817872]

[1] [0.50130451]

[0] [0.48090005]

[0] [0.47912575]

[0] [0.42500958]

[1] [0.41968667]

[1] [0.49953021]

[0] [0.47380285]

[0] [0.45960844]

[0] [0.48799726]

[0] [0.42234812]

[0] [0.50751457]

[0] [0.44363974]

[1] [0.50574027]

[0] [0.4658185]

[0] [0.51727322]

[1] [0.50574027]

[0] [0.43299393]

[0] [0.42234812]

[0] [0.51283747]

[0] [0.47202855]

[0] [0.48799726]

[1] [0.50130451]

[0] [0.4675928]

[0] [0.43565538]

[0] [0.48356151]

[1] [0.42412243]

[0] [0.44097828]

[0] [0.48799726]

[1] [0.47202855]

[0] [0.45605984]

[1] [0.48533581]

[1] [0.48622296]

Loading [MathJax]/extensions/Safe.js
[0] [0.42412243]

[1] [0.49775591]

[0] [0.43388108]

[0] [0.48444866]

[1] [0.51461177]

[1] [0.45783414]

[0] [0.52880618]

[1] [0.48622296]

[1] [0.48888441]

[1] [0.47380285]

[0] [0.51816037]

[0] [0.51993467]

[0] [0.47380285]

[1] [0.49775591]

[1] [0.48977156]

[0] [0.53324193]

[1] [0.42678388]

[0] [0.4693671]

[0] [0.49686876]

[0] [0.42944533]

[1] [0.4817872]

[0] [0.48267436]

[1] [0.43299393]

[1] [0.4729157]

[1] [0.51727322]

[1] [0.48356151]

[0] [0.50130451]

[0] [0.49598161]

[1] [0.48711011]

[0] [0.45251124]

[1] [0.4817872]

[1] [0.44807549]

[0] [0.51816037]

[1] [0.48622296]

[0] [0.43033248]

[1] [0.4817872]

[0] [0.49864306]

[1] [0.50396597]

[0] [0.43299393]

[0] [0.45162409]

[1] [0.44097828]

[0] [0.50751457]

[0] [0.48977156]

[1] [0.46670565]

[0] [0.50751457]

[0] [0.42678388]

[0] [0.4817872]

[0] [0.48533581]

[0] [0.45960844]

[1] [0.50307882]

[1] [0.52348328]

[1] [0.49864306]

[0] [0.42944533]

[0] [0.47912575]

[1] [0.50574027]

[0] [0.52525758]

[0] [0.43565538]

[0] [0.49864306]

[1] [0.48799726]

[0] [0.48533581]

[1] [0.44718834]

[0] [0.45694699]

[1] [0.50662742]

[0] [0.44630119]

Loading [MathJax]/extensions/Safe.js
[1] [0.49420731]

[1] [0.49420731]

[1] [0.49864306]

[0] [0.50396597]

[1] [0.42412243]

[1] [0.4817872]

[0] [0.50307882]

[1] [0.44363974]

[0] [0.45960844]

[1] [0.48533581]

[0] [0.49420731]

[0] [0.50662742]

[1] [0.45872129]

[0] [0.49598161]

[1] [0.48799726]

[0] [0.45605984]

[1] [0.48356151]

[1] [0.49775591]

[0] [0.45162409]

[1] [0.48533581]

[0] [0.47202855]

[0] [0.50130451]

[1] [0.4640442]

[1] [0.52170897]

[1] [0.48977156]

[0] [0.48799726]

[1] [0.48888441]

[1] [0.50396597]

[1] [0.43742968]

[0] [0.52170897]

[0] [0.50751457]

[0] [0.4817872]

[1] [0.45960844]

[0] [0.52614473]

[0] [0.51017602]

[1] [0.47380285]

[1] [0.48888441]

[0] [0.52880618]

[0] [0.51106317]

[1] [0.4782386]

[0] [0.48711011]

[1] [0.43654253]

[0] [0.48090005]

[0] [0.52348328]

[0] [0.48622296]

[0] [0.50485312]

[1] [0.47469]

[0] [0.53146763]

[1] [0.49509446]

[1] [0.52259613]

[0] [0.42855818]

[1] [0.52703188]

[0] [0.52259613]

[1] [0.49864306]

[0] [0.51727322]

[1] [0.48888441]

[1] [0.52170897]

[0] [0.52082182]

[1] [0.52259613]

[1] [0.53146763]

[0] [0.51195032]

[1] [0.51461177]

[0] [0.49953021]

[0] [0.51549892]

Loading [MathJax]/extensions/Safe.js
[1] [0.53146763]

[0] [0.53412908]

[1] [0.52259613]

[0] [0.51993467]

[[0.53501623]

[0.53412908]

[0.51372462]

[0.50130451]

[0.48444866]

[0.50041736]

[0.47735145]

[0.41879952]

[0.52259613]

[0.49420731]

[0.48090005]

[0.50574027]

[0.47557715]

[0.53590338]

[0.47912575]

[0.48090005]

[0.52969333]

[0.52880618]

[0.52703188]

[0.52614473]

[0.53235478]

[0.50840172]

[0.51549892]

[0.53235478]

[0.53146763]

[0.53412908]

[0.52703188]

[0.52525758]

[0.51372462]

[0.53590338]

[0.48622296]

[0.43033248]

[0.53767768]

[0.51283747]

[0.47202855]

[0.52791903]

[0.52703188]

[0.50840172]

[0.48799726]

[0.52437043]

[0.53679053]

[0.50662742]

[0.45605984]

[0.53856483]

[0.47735145]

[0.53412908]

[0.4817872 ]

[0.50396597]

[0.43210678]

[0.4729157 ]

[0.52348328]

[0.51283747]

[0.4782386 ]

[0.53146763]

[0.50041736]

[0.50307882]

[0.50928887]

[0.4817872 ]

[0.53590338]

[0.44807549]

Loading [MathJax]/extensions/Safe.js
[0.53412908]

[0.47469 ]

[0.49332016]

[0.44541404]

[0.4782386 ]

[0.50041736]

[0.53501623]

[0.47912575]

[0.49598161]

[0.49154586]

[0.48090005]

[0.52791903]

[0.53146763]

[0.45162409]

[0.53590338]

[0.45251124]

[0.50574027]

[0.52791903]

[0.47469 ]

[0.53679053]

[0.48090005]

[0.51461177]

[0.50840172]

[0.47380285]

[0.49686876]

[0.44718834]

[0.50307882]

[0.4764643 ]

[0.4800129 ]

[0.50751457]

[0.4800129 ]

[0.44896264]

[0.53856483]

[0.52703188]

[0.4782386 ]

[0.51283747]

[0.52969333]

[0.44275258]

[0.48711011]

[0.51904752]

[0.47380285]

[0.49953021]

[0.47557715]

[0.41968667]

[0.53324193]

[0.48799726]

[0.52082182]

[0.4729157 ]

[0.47557715]

[0.48090005]

[0.48888441]

[0.49775591]

[0.50307882]

[0.48090005]

[0.50130451]

[0.48533581]

[0.50574027]

[0.49953021]

[0.48533581]

[0.48799726]

[0.48533581]

[0.50485312]

Loading [MathJax]/extensions/Safe.js
[0.50662742]

[0.49775591]

[0.49420731]

[0.52348328]

[0.53679053]

[0.47735145]

[0.50041736]

[0.52437043]

[0.47469 ]

[0.49154586]

[0.50307882]

[0.49598161]

[0.47912575]

[0.45694699]

[0.49953021]

[0.52969333]

[0.4764643 ]

[0.49154586]

[0.49953021]

[0.4729157 ]

[0.52969333]

[0.4729157 ]

[0.46670565]

[0.52525758]

[0.49775591]

[0.48622296]

[0.53856483]

[0.51195032]

[0.48444866]

[0.50751457]

[0.51017602]

[0.53856483]

[0.49953021]

[0.48533581]

[0.52525758]

[0.43210678]

[0.46315705]

[0.47202855]

[0.52259613]

[0.51816037]

[0.49065871]

[0.47557715]

[0.50307882]

[0.48888441]

[0.42057382]

[0.51017602]

[0.47380285]

[0.44984979]

[0.44718834]

[0.51372462]

[0.48799726]

[0.52703188]

[0.51017602]

[0.53235478]

[0.53146763]

[0.52170897]

[0.53767768]

[0.48888441]

[0.44807549]

[0.51372462]

[0.49864306]

[0.49332016]

[0.47912575]

[0.51549892]

Loading [MathJax]/extensions/Safe.js
[0.48799726]

[0.52348328]

[0.47735145]

[0.52880618]

[0.51372462]

[0.48977156]

[0.4729157 ]

[0.51372462]

[0.4817872 ]

[0.51993467]

[0.48090005]

[0.53235478]

[0.51727322]

[0.48622296]

[0.43299393]

[0.48888441]

[0.46226989]

[0.51017602]

[0.43654253]

[0.45073694]

[0.42589673]

[0.53235478]

[0.46670565]

[0.41879952]

[0.51461177]

[0.50041736]

[0.51372462]

[0.45605984]

[0.49420731]

[0.48267436]

[0.46670565]

[0.42500958]

[0.48090005]

[0.4711414 ]

[0.42412243]

[0.46138274]

[0.49864306]

[0.50485312]

[0.44009113]

[0.43388108]

[0.48799726]

[0.48090005]

[0.42146097]

[0.51461177]

[0.45694699]

[0.47557715]

[0.45251124]

[0.4817872 ]

[0.50130451]

[0.48090005]

[0.47912575]

[0.42500958]

[0.41968667]

[0.49953021]

[0.47380285]

[0.45960844]

[0.48799726]

[0.42234812]

[0.50751457]

[0.44363974]

[0.50574027]

[0.4658185 ]

[0.51727322]

[0.50574027]

Loading [MathJax]/extensions/Safe.js
[0.43299393]

[0.42234812]

[0.51283747]

[0.47202855]

[0.48799726]

[0.50130451]

[0.4675928 ]

[0.43565538]

[0.48356151]

[0.42412243]

[0.44097828]

[0.48799726]

[0.47202855]

[0.45605984]

[0.48533581]

[0.48622296]

[0.42412243]

[0.49775591]

[0.43388108]

[0.48444866]

[0.51461177]

[0.45783414]

[0.52880618]

[0.48622296]

[0.48888441]

[0.47380285]

[0.51816037]

[0.51993467]

[0.47380285]

[0.49775591]

[0.48977156]

[0.53324193]

[0.42678388]

[0.4693671 ]

[0.49686876]

[0.42944533]

[0.4817872 ]

[0.48267436]

[0.43299393]

[0.4729157 ]

[0.51727322]

[0.48356151]

[0.50130451]

[0.49598161]

[0.48711011]

[0.45251124]

[0.4817872 ]

[0.44807549]

[0.51816037]

[0.48622296]

[0.43033248]

[0.4817872 ]

[0.49864306]

[0.50396597]

[0.43299393]

[0.45162409]

[0.44097828]

[0.50751457]

[0.48977156]

[0.46670565]

[0.50751457]

[0.42678388]

[0.4817872 ]

[0.48533581]

Loading [MathJax]/extensions/Safe.js
[0.45960844]

[0.50307882]

[0.52348328]

[0.49864306]

[0.42944533]

[0.47912575]

[0.50574027]

[0.52525758]

[0.43565538]

[0.49864306]

[0.48799726]

[0.48533581]

[0.44718834]

[0.45694699]

[0.50662742]

[0.44630119]

[0.49420731]

[0.49864306]

[0.50396597]

[0.42412243]

[0.4817872 ]

[0.50307882]

[0.44363974]

[0.45960844]

[0.48533581]

[0.49420731]

[0.50662742]

[0.45872129]

[0.49598161]

[0.48799726]

[0.45605984]

[0.48356151]

[0.49775591]

[0.45162409]

[0.48533581]

[0.47202855]

[0.50130451]

[0.4640442 ]

[0.52170897]

[0.48977156]

[0.48799726]

[0.48888441]

[0.50396597]

[0.43742968]

[0.52170897]

[0.50751457]

[0.4817872 ]

[0.45960844]

[0.52614473]

[0.51017602]

[0.47380285]

[0.48888441]

[0.52880618]

[0.51106317]

[0.4782386 ]

[0.48711011]

[0.43654253]

[0.48090005]

[0.52348328]

[0.48622296]

[0.50485312]

[0.47469 ]

[0.53146763]

Loading [MathJax]/extensions/Safe.js
[0.49509446]

[0.52259613]

[0.42855818]

[0.52703188]

[0.52259613]

[0.49864306]

[0.51727322]

[0.48888441]

[0.52170897]

[0.52082182]

[0.52259613]

[0.53146763]

[0.51195032]

[0.51461177]

[0.49953021]

[0.51549892]

[0.53146763]

[0.53412908]

[0.52259613]

[0.51993467]]

EstimatedSalary Gender Predicted

0 19000 1 0.535016

1 20000 1 0.534129

2 43000 0 0.513725

3 57000 0 0.501305

4 76000 1 0.484449

Performance measurement
In [17]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(train_y, y_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(train_y, y_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(train_y, y_predicted

Mean Absolute Error: 0.4176851492362822

Mean Squared Error: 0.2583986123119253

Root Mean Squared Error: 0.5083292361372945

In [18]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(train_y1, y1_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(train_y1, y1_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(train_y1, y1_predicte

Mean Absolute Error: 0.49797455487682124

Mean Squared Error: 0.24898727743841056

Root Mean Squared Error: 0.4989862497488388

In [19]:
from sklearn.metrics import r2_score

test_x = np.asanyarray(df1[['SepalLengthCm']])

test_y = np.asanyarray(df1[['Species']])

test_y_predicted = regress.predict(test_x)

print("Mean absolute error (MAE):" , np.mean(np.absolute(test_y_predicted - test_y)))

print("Mean square error (MSE): " , np.mean((test_y_predicted - test_y) ** 2))

print("R2-score: %.2f (RMSE):" , r2_score(test_y, test_y_predicted) )

Mean absolute error (MAE): 0.4176851492362822

Mean square error (MSE): 0.2583986123119253

R2-score: %.2f (RMSE): 0.6124020815321121

In [20]:
from sklearn.metrics import r2_score

Loading [MathJax]/extensions/Safe.js

test_x1 = np.asanyarray(df2[['EstimatedSalary']])

test_y1 = np.asanyarray(df2[['Gender']])

test_y1_predicted = regress1.predict(test_x1)

print("Mean absolute error (MAE):" , np.mean(np.absolute(test_y1_predicted - test_y1)))

print("Mean square error (MSE): " , np.mean((test_y1_predicted - test_y1) ** 2))

print("R2-score: %.2f (RMSE):" , r2_score(test_y1, test_y1_predicted) )

Mean absolute error (MAE): 0.49797455487682124

Mean square error (MSE): 0.24898727743841056

R2-score: %.2f (RMSE): 0.003652351186832492

Comparison and Inference

The R2-score for dataset 1 is 0.612 and R2-score for dataset 2 is 0.0036. Here R2-score
for dataset 1 is greater than R2-score of dataset 2. This shows that dataset 1 works
better in the linear regression model when compared with dataset 2.

Using training and testing data

In [21]:
from sklearn.model_selection import train_test_split

train_x, test_x, train_y, test_y = train_test_split(df1[['SepalLengthCm']], df1[['Species

print(train_x)

print("Testing")

print(test_x)

from sklearn import linear_model

regress = linear_model.LinearRegression()

regress.fit (train_x,train_y)

# The coefficients

print ('Coefficients: ', regress.coef_)

print ('Intercept: ',regress.intercept_)

SepalLengthCm

81 5.5

133 6.3

137 6.4

75 6.6

109 7.2

.. ...

71 6.1

106 4.9

14 5.8

92 5.8

102 7.1

[105 rows x 1 columns]

Testing

SepalLengthCm

73 6.1

18 5.7

118 7.7

78 6.0

76 6.8

31 5.4

64 5.6

141 6.9

68 6.2

82 5.8

110 6.5

Loading [MathJax]/extensions/Safe.js
12 4.8

36 5.5

9 4.9

19 5.1

56 6.3

104 6.5

69 5.6

55 5.7

132 6.4

29 4.7

127 6.1

26 5.0

128 6.4

131 7.9

145 6.7

108 6.7

143 6.8

45 4.8

30 4.8

22 4.6

15 5.7

65 6.7

11 4.8

42 4.4

146 6.3

51 6.4

27 5.2

4 5.0

32 5.2

142 5.8

85 6.0

86 6.7

16 5.4

10 5.4

Coefficients: [[0.74418421]]

Intercept: [-3.29101915]

In [22]:
y_predicted = regress.predict(test_x)

print(test_x)

print(test_y)

print(y_predicted)

SepalLengthCm

73 6.1

18 5.7

118 7.7

78 6.0

76 6.8

31 5.4

64 5.6

141 6.9

68 6.2

82 5.8

110 6.5

12 4.8

36 5.5

9 4.9

19 5.1

56 6.3

104 6.5

69 5.6

55 5.7

132 6.4

29 4.7

Loading [MathJax]/extensions/Safe.js6.1

127
26 5.0

128 6.4

131 7.9

145 6.7

108 6.7

143 6.8

45 4.8

30 4.8

22 4.6

15 5.7

65 6.7

11 4.8

42 4.4

146 6.3

51 6.4

27 5.2

4 5.0

32 5.2

142 5.8

85 6.0

86 6.7

16 5.4

10 5.4

Species

73 1

18 0

118 2

78 1

76 1

31 0

64 1

141 2

68 1

82 1

110 2

12 0

36 0

9 0

19 0

56 1

104 2

69 1

55 1

132 2

29 0

127 2

26 0

128 2

131 2

145 2

108 2

143 2

45 0

30 0

22 0

15 0

65 1

11 0

42 0

146 2

51 1

27 0

4 0

32 0

Loading [MathJax]/extensions/Safe.js
142 2

85 1

86 1

16 0

10 0

[[ 1.24850451]

[ 0.95083083]

[ 2.43919924]

[ 1.17408609]

[ 1.76943345]

[ 0.72757557]

[ 0.87641241]

[ 1.84385188]

[ 1.32292293]

[ 1.02524925]

[ 1.54617819]

[ 0.28106504]

[ 0.80199399]

[ 0.35548346]

[ 0.5043203 ]

[ 1.39734135]

[ 1.54617819]

[ 0.87641241]

[ 0.95083083]

[ 1.47175977]

[ 0.20664662]

[ 1.24850451]

[ 0.42990188]

[ 1.47175977]

[ 2.58803608]

[ 1.69501503]

[ 1.76943345]

[ 0.28106504]

[ 0.1322282 ]

[ 0.95083083]

[ 1.69501503]

[ 0.28106504]

[-0.01660864]

[ 1.39734135]

[ 1.47175977]

[ 0.57873872]

[ 0.42990188]

[ 0.57873872]

[ 1.02524925]

[ 1.17408609]

[ 1.69501503]

[ 0.72757557]

[ 0.72757557]]

In [23]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(test_y, y_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(test_y, y_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(test_y, y_predicted)

Mean Absolute Error: 0.44066483796662653

Mean Squared Error: 0.25967220415563674

Root Mean Squared Error: 0.5095804197137452

In [24]:
from sklearn.model_selection import cross_val_score

accuracy = cross_val_score(regress, df1[['SepalLengthCm']], df1[['Species']], cv = 5,scori

print(accuracy)

Loading [MathJax]/extensions/Safe.js
[ 0. -0.20701597 0. -0.97441394 0. ]

In [25]:
from sklearn.model_selection import train_test_split

train_x1, test_x1, train_y1, test_y1 = train_test_split(df2[['EstimatedSalary']], df2[['Ge

print(train_x1)

print("Testing")

print(test_x1)

from sklearn import linear_model

regress1 = linear_model.LinearRegression()

regress1.fit (train_x1,train_y1)

# The coefficients

print ('Coefficients: ', regress1.coef_)

print ('Intercept: ',regress1.intercept_)

EstimatedSalary

157 75000

109 80000

17 26000

347 108000

24 23000

.. ...

71 27000

106 35000

270 133000

348 77000

102 86000

[280 rows x 1 columns]

Testing

EstimatedSalary

209 22000

280 88000

33 44000

210 96000

93 28000

.. ...

60 20000

79 17000

285 93000

305 54000

281 61000

[120 rows x 1 columns]

Coefficients: [[-5.40299639e-07]]

Intercept: [0.52742508]

In [26]:
y1_predicted = regress1.predict(test_x1)

print(test_x1)

print(test_y1)

print(y1_predicted)

EstimatedSalary

209 22000

280 88000

33 44000

210 96000

93 28000

.. ...

60 20000

79 17000

285 93000

Loading [MathJax]/extensions/Safe.js
305 54000

281 61000

[120 rows x 1 columns]

Gender

209 0

280 0

33 0

210 0

93 0

.. ...

60 1

79 0

285 0

305 1

281 1

[120 rows x 1 columns]

[[0.51553849]

[0.47987871]

[0.5036519 ]

[0.47555631]

[0.51229669]

[0.4939265 ]

[0.46961302]

[0.48258021]

[0.48690261]

[0.4923056 ]

[0.50905489]

[0.5014907 ]

[0.51499819]

[0.48041901]

[0.46907272]

[0.50689369]

[0.48744291]

[0.5047325 ]

[0.4950071 ]

[0.51283699]

[0.48420111]

[0.51499819]

[0.48636231]

[0.51715939]

[0.51499819]

[0.48420111]

[0.4987892 ]

[0.45988763]

[0.51607879]

[0.50527279]

[0.48960411]

[0.44962193]

[0.48474141]

[0.46961302]

[0.48690261]

[0.4977086 ]

[0.4923056 ]

[0.51121609]

[0.4906847 ]

[0.50959519]

[0.4977086 ]

[0.4993295 ]

[0.51661909]

[0.5009504 ]

[0.45232343]

Loading [MathJax]/extensions/Safe.js
[0.4944668 ]

[0.48906381]

[0.50527279]

[0.5041922 ]

[0.48474141]

[0.48312051]

[0.4977086 ]

[0.48420111]

[0.51067579]

[0.47231452]

[0.46366972]

[0.48744291]

[0.48636231]

[0.4998698 ]

[0.496628 ]

[0.47879811]

[0.48041901]

[0.4955474 ]

[0.45340403]

[0.47879811]

[0.44962193]

[0.4955474 ]

[0.46637122]

[0.45286373]

[0.48149961]

[0.4960877 ]

[0.51661909]

[0.50905489]

[0.5004101 ]

[0.45556523]

[0.51337729]

[0.502031 ]

[0.48528171]

[0.496628 ]

[0.51229669]

[0.45502493]

[0.50635339]

[0.51337729]

[0.46258912]

[0.4993295 ]

[0.48744291]

[0.48095931]

[0.48366081]

[0.48095931]

[0.4982489 ]

[0.4955474 ]

[0.48420111]

[0.51878029]

[0.48474141]

[0.45718613]

[0.4928459 ]

[0.45016223]

[0.46691152]

[0.46150852]

[0.48906381]

[0.51391759]

[0.51715939]

[0.5004101 ]

[0.44638013]

[0.51175639]

[0.46961302]

[0.50905489]

[0.4944668 ]

Loading [MathJax]/extensions/Safe.js
[0.51067579]

[0.48906381]

[0.50635339]

[0.46691152]

[0.51661909]

[0.51823999]

[0.47717721]

[0.4982489 ]

[0.4944668 ]]

In [27]:
from sklearn import metrics

print('Mean Absolute Error:', metrics.mean_absolute_error(test_y1, y1_predicted))

print('Mean Squared Error:', metrics.mean_squared_error(test_y1, y1_predicted))

print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(test_y1, y1_predicted

Mean Absolute Error: 0.4977105615001104

Mean Squared Error: 0.24813962286392463

Root Mean Squared Error: 0.4981361489230877

In [28]:
from sklearn.model_selection import cross_val_score

accuracy1 = cross_val_score(regress1, df2[['EstimatedSalary']], df2[['Gender']], cv = 5,sc

print(accuracy1)

[ 0.00712845 -0.04048316 0.00279302 -0.01524552 -0.02317653]

Comparison and Inference

On comparing the Mean absolute error, Mean squared error, Root mean squared error
of dataset1 and dataset2, we could conclude that dataset1 better fit the linear
regression model when compared with dataset2.

Bayes Classification
In [29]:
from sklearn.model_selection import train_test_split

from sklearn.naive_bayes import CategoricalNB

Dataset 1:
In [30]:
import pandas as pd

In [31]:
df1=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\Bayes_dataset1.csv')

In [32]:
df1.head()

Out[32]: SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 5.1 3.5 1.4 0.2 Iris-setosa

1 4.9 3.0 1.4 0.2 Iris-setosa

2 4.7 3.2 1.3 0.2 Iris-setosa

3 4.6 3.1 1.5 0.2 Iris-setosa

4 5.0 3.6 1.4 0.2 Iris-setosa

Loading [MathJax]/extensions/Safe.js
In [33]:
df1.isnull().sum()

SepalLengthCm 0

Out[33]:
SepalWidthCm 0

PetalLengthCm 0

PetalWidthCm 0

Species 0

dtype: int64

In [34]:
features=df1.columns.tolist()

features.remove('Species')

features

['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']

Out[34]:

In [35]:
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

encoder=OrdinalEncoder()

data_encoded=encoder.fit_transform(df1[features])

df1_encoded=pd.DataFrame(data_encoded, columns=features)

In [36]:
encoder=LabelEncoder()

target_encoded=encoder.fit_transform(df1['Species'])

df1_encoded['Species']=target_encoded

encoder.inverse_transform(target_encoded)

array(['Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

Out[36]:
'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-setosa', 'Iris-setosa',

'Iris-setosa', 'Iris-setosa', 'Iris-versicolor', 'Iris-versicolor',

'Iris-versicolor', 'Iris-versicolor', 'Iris-versicolor',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

Loading [MathJax]/extensions/Safe.js
'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica', 'Iris-virginica',

'Iris-virginica', 'Iris-virginica'], dtype=object)

In [37]:
df1_encoded.head()

Out[37]: SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species

0 8.0 14.0 4.0 1.0 0

1 6.0 9.0 4.0 1.0 0

2 4.0 11.0 3.0 1.0 0

3 3.0 10.0 5.0 1.0 0

4 7.0 15.0 4.0 1.0 0

In [38]:
X_train, X_test, y_train, y_test = train_test_split(df1_encoded.drop('Species',axis=1), df

In [39]:
print(X_train.shape)

print(y_test.shape)

(120, 4)

(30,)

In [40]:
cnb = CategoricalNB()

In [41]:
cnb.fit(X_train, y_train)

y_predict = cnb.predict(X_test)

print(y_predict)

[2 1 0 2 0 2 0 1 1 1 1 1 1 2 1 0 1 1 0 0 2 1 0 0 2 0 0 1 1 0]

In [42]:
print("Number of mislabeled points out of a total %d points : %d"%(X_test.shape[0], (y_tes

Number of mislabeled points out of a total 30 points : 2

In [43]:
from sklearn import metrics

print('Accucary:', metrics.accuracy_score(y_test, y_predict))

print('Confusion Matrix\n',metrics.confusion_matrix(y_test, y_predict))

Accucary: 0.9333333333333333

Confusion Matrix

[[11 0 0]

[ 0 12 1]

[ 0 1 5]]

Dataset 2
In [44]:
Loading [MathJax]/extensions/Safe.js
df2=pd.read_csv('C:\\Users\\ADMIN\\Desktop\\E1-SWE4012\\Lab\\Bayes_dataset2.csv')

In [45]:
df2.head()

Out[45]: Age EstimatedSalary Purchased Gender

0 19 19000 0 Male

1 35 20000 0 Male

2 26 43000 0 Female

3 27 57000 0 Female

4 19 76000 0 Male

In [46]:
df2.isnull().sum()

Age 0

Out[46]:
EstimatedSalary 0

Purchased 0

Gender 0

dtype: int64

In [47]:
features=df2.columns.tolist()

features.remove('Gender')

features

['Age', 'EstimatedSalary', 'Purchased']

Out[47]:

In [48]:
from sklearn.preprocessing import OrdinalEncoder, LabelEncoder

encoder=OrdinalEncoder()

data_encoded=encoder.fit_transform(df2[features])

df2_encoded=pd.DataFrame(data_encoded, columns=features)

In [49]:
encoder=LabelEncoder()

target_encoded=encoder.fit_transform(df2['Gender'])

df2_encoded['Gender']=target_encoded

encoder.inverse_transform(target_encoded)

array(['Male', 'Male', 'Female', 'Female', 'Male', 'Male', 'Female',

Out[49]:
'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Male', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female',

'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Female', 'Female', 'Male', 'Male', 'Male', 'Female', 'Female',

'Female', 'Male', 'Male', 'Male', 'Male', 'Female', 'Female',

'Male', 'Female', 'Male', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Male', 'Male', 'Male', 'Male', 'Female', 'Male',

'Female', 'Male', 'Female', 'Female', 'Male', 'Male', 'Male',

'Female', 'Male', 'Male', 'Male', 'Female', 'Female', 'Male',

Loading [MathJax]/extensions/Safe.js
'Female', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Male', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Male', 'Female', 'Male', 'Female', 'Female', 'Male', 'Female',

'Male', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female',

'Male', 'Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Male',

'Male', 'Male', 'Female', 'Female', 'Female', 'Female', 'Female',

'Female', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Female', 'Male', 'Male', 'Female', 'Male', 'Female',

'Male', 'Male', 'Male', 'Female', 'Male', 'Female', 'Male',

'Female', 'Female', 'Female', 'Male', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Female', 'Female', 'Male', 'Male',

'Female', 'Female', 'Female', 'Female', 'Female', 'Female', 'Male',

'Female', 'Female', 'Male', 'Female', 'Female', 'Female', 'Female',

'Female', 'Male', 'Female', 'Female', 'Female', 'Male', 'Female',

'Female', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Female', 'Male', 'Male', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Female', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Male', 'Female', 'Female', 'Male', 'Female',

'Male', 'Male', 'Female', 'Male', 'Female', 'Male', 'Female',

'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Male',

'Female', 'Female', 'Female', 'Female', 'Female', 'Male', 'Male',

'Male', 'Female', 'Female', 'Male', 'Female', 'Female', 'Female',

'Male', 'Female', 'Male', 'Female', 'Male', 'Female', 'Male',

'Male', 'Male', 'Female', 'Male', 'Male', 'Female', 'Male',

'Female', 'Male', 'Female', 'Female', 'Male', 'Female', 'Male',

'Female', 'Male', 'Male', 'Female', 'Male', 'Female', 'Female',

'Male', 'Male', 'Male', 'Female', 'Male', 'Male', 'Male', 'Female',

'Female', 'Female', 'Male', 'Female', 'Female', 'Male', 'Male',

'Female', 'Female', 'Male', 'Female', 'Male', 'Female', 'Female',

'Female', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female',

'Male', 'Female', 'Male', 'Female', 'Male', 'Male', 'Female',

'Male', 'Male', 'Female', 'Male', 'Female', 'Female', 'Male',

'Female', 'Male', 'Female'], dtype=object)

In [50]:
df2_encoded.head()

Out[50]: Age EstimatedSalary Purchased Gender

0 1.0 4.0 0.0 1

1 17.0 5.0 0.0 1

2 8.0 26.0 0.0 0

3 9.0 39.0 0.0 0

4 1.0 57.0 0.0 1

In [51]:
X_train, X_test, y_train, y_test = train_test_split(df2_encoded.drop('Gender',axis=1), df2

In [52]:
print(X_train.shape)

print(y_test.shape)

(320, 3)

(80,)

In [53]:
cnb = CategoricalNB()

Loading [MathJax]/extensions/Safe.js
In [54]:
cnb.fit(X_train, y_train)

y_predict = cnb.predict(X_test)

print(y_predict)

[1 0 1 1 0 0 1 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1 0 0 0 1 1 1 0 1 1 1 0 0 1 0 1

1 1 0 0 1 0 1 0 0 0 0 0 1 1 1 1 1 1 0 0 0 1 0 1 1 1 0 1 0 1 0 0 1 1 1 0 0

0 0 1 1 1 1]

In [55]:
print("Number of mislabeled points out of a total %d points : %d"%(X_test.shape[0], (y_tes

Number of mislabeled points out of a total 80 points : 42

In [56]:
from sklearn import metrics

print('Accucary:', metrics.accuracy_score(y_test, y_predict))

print('Confusion Matrix\n',metrics.confusion_matrix(y_test, y_predict))

Accucary: 0.475

Confusion Matrix

[[17 23]

[19 21]]

Comparison and Inference

Accuracy for dataset 1 is 0.9333 and Accuracy for dataset 2 is 0.475. This shows that
Dataset 1 has more accuracy in prediction

In [ ]:

Loading [MathJax]/extensions/Safe.js

One Hot Encoding
No ratings yet
One Hot Encoding
12 pages
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
Import As
100% (1)
Import As
27 pages
DL Lab2
No ratings yet
DL Lab2
38 pages
Dsa Lab Manual
No ratings yet
Dsa Lab Manual
35 pages
Cern Electron Mass Prediction 0 9859 R
No ratings yet
Cern Electron Mass Prediction 0 9859 R
53 pages
Assignment-07-DBSCAN Clustering (Crimes) - Jupyter Notebook
No ratings yet
Assignment-07-DBSCAN Clustering (Crimes) - Jupyter Notebook
11 pages
Exp 4
No ratings yet
Exp 4
21 pages
Heart Disease Prediction! ?
No ratings yet
Heart Disease Prediction! ?
52 pages
AFB Saurabh Last Year
No ratings yet
AFB Saurabh Last Year
11 pages
Az4 Ipynb
No ratings yet
Az4 Ipynb
17 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
PCA
No ratings yet
PCA
23 pages
DAV Prac BHR
No ratings yet
DAV Prac BHR
22 pages
Random Forest Classification
No ratings yet
Random Forest Classification
24 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Lecture 2
No ratings yet
Lecture 2
80 pages
Stats
No ratings yet
Stats
33 pages
AI Final PDF
No ratings yet
AI Final PDF
38 pages
AD3411
No ratings yet
AD3411
28 pages
Int AI TW-PW 01
No ratings yet
Int AI TW-PW 01
14 pages
Interpolatingfunction : Bla Ndsolve ( (F ''' (T) + F (T) F '' (T) 0, F (0) 0, F ' (0) 0, F ' (100 000) 1), F, T)
No ratings yet
Interpolatingfunction : Bla Ndsolve ( (F ''' (T) + F (T) F '' (T) 0, F (0) 0, F ' (0) 0, F ' (100 000) 1), F, T)
7 pages
Tarea 8
No ratings yet
Tarea 8
7 pages
Practical 5
No ratings yet
Practical 5
13 pages
Merged
No ratings yet
Merged
35 pages
AM19 EDA Assignment4
No ratings yet
AM19 EDA Assignment4
16 pages
ML Practice Assignment
No ratings yet
ML Practice Assignment
7 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Credit - Defaulters - Prediction Using Logostic Regression
No ratings yet
Credit - Defaulters - Prediction Using Logostic Regression
17 pages
CSA Lab 2
No ratings yet
CSA Lab 2
5 pages
ML Labs
No ratings yet
ML Labs
14 pages
Fds Mannual
No ratings yet
Fds Mannual
39 pages
ML (Sudhanshu)
No ratings yet
ML (Sudhanshu)
24 pages
Practical 3
No ratings yet
Practical 3
8 pages
PROBLEMA 3 .Ipynb - Colab
No ratings yet
PROBLEMA 3 .Ipynb - Colab
12 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages
Pipeline
No ratings yet
Pipeline
5 pages
DL Lab 3
No ratings yet
DL Lab 3
5 pages
Openlab 1
No ratings yet
Openlab 1
17 pages
HW 1
No ratings yet
HW 1
11 pages
Experiment - 12: Random Forest in Python
No ratings yet
Experiment - 12: Random Forest in Python
3 pages
Mlext
No ratings yet
Mlext
1 page
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Scipy Cheat Sheet Python For Data Science: Linear Algebra
No ratings yet
Scipy Cheat Sheet Python For Data Science: Linear Algebra
1 page
Scipy Cheat Sheet Python For Data Science: Linear Algebra
No ratings yet
Scipy Cheat Sheet Python For Data Science: Linear Algebra
1 page
23bet10114 Naman Gupta Assignment-1
No ratings yet
23bet10114 Naman Gupta Assignment-1
17 pages
Data Science Practical 9
No ratings yet
Data Science Practical 9
6 pages
Random Forest
No ratings yet
Random Forest
8 pages
Task Digits Numbers
No ratings yet
Task Digits Numbers
1 page
SciPy Cheat Sheet
No ratings yet
SciPy Cheat Sheet
1 page
Datascience PR 6 Veda
No ratings yet
Datascience PR 6 Veda
6 pages
SciPyGUIA PYTHON-02
No ratings yet
SciPyGUIA PYTHON-02
1 page
DAR CompleteFile 1
No ratings yet
DAR CompleteFile 1
41 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
Payal Practical5 Edited
No ratings yet
Payal Practical5 Edited
5 pages
DSBDA05
No ratings yet
DSBDA05
5 pages
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
No ratings yet
Polynomial Regression From Scratch in Python - by Rashida Nasrin Sucky - Towards Data Science
1 page
Predictive 23-06-2025 - Jupyter Notebook
No ratings yet
Predictive 23-06-2025 - Jupyter Notebook
14 pages
Module 1 Notes
100% (1)
Module 1 Notes
73 pages
Robust Regression Shrinkage and Consistent Variable Selection Through The LAD-Lasso
No ratings yet
Robust Regression Shrinkage and Consistent Variable Selection Through The LAD-Lasso
9 pages
Applied Linear Statistical Models 5th Edition Student S Solutions Manual PDF
67% (3)
Applied Linear Statistical Models 5th Edition Student S Solutions Manual PDF
111 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
De Souza
No ratings yet
De Souza
10 pages
Cha 2
0% (1)
Cha 2
23 pages
Quantile Regression: 40 Years On: Annual Review of Economics
No ratings yet
Quantile Regression: 40 Years On: Annual Review of Economics
24 pages
Determining The Sample Size (Continuous Data)
No ratings yet
Determining The Sample Size (Continuous Data)
2 pages
Machine Learning
No ratings yet
Machine Learning
19 pages
Instant Access To Modern Signal Processing 1st Edition Xian-Da Zhang Ebook Full Chapters
No ratings yet
Instant Access To Modern Signal Processing 1st Edition Xian-Da Zhang Ebook Full Chapters
67 pages
BBT 3106 - Probability & Statistics II - August 2023EC
No ratings yet
BBT 3106 - Probability & Statistics II - August 2023EC
3 pages
Activity 4 CGPA Vs Placement Package Program
No ratings yet
Activity 4 CGPA Vs Placement Package Program
4 pages
Maximum Likelihood Estimation
No ratings yet
Maximum Likelihood Estimation
22 pages
Practice Questions - Quantitative - Reading 7
No ratings yet
Practice Questions - Quantitative - Reading 7
13 pages
Question Text: Complete Mark 1.00 Out of 1.00
No ratings yet
Question Text: Complete Mark 1.00 Out of 1.00
10 pages
Ba 5
No ratings yet
Ba 5
4 pages
Everything You Need To Know About Linear Regression
No ratings yet
Everything You Need To Know About Linear Regression
23 pages
Lecture 2 SLR - 1
No ratings yet
Lecture 2 SLR - 1
28 pages
3917-Article Text-6993-1-10-20220909
No ratings yet
3917-Article Text-6993-1-10-20220909
9 pages
bài tập
No ratings yet
bài tập
4 pages
Citsit
No ratings yet
Citsit
16 pages
Chapter 4 Data Management Reviewer
No ratings yet
Chapter 4 Data Management Reviewer
1 page
Tugas Statel Individu
No ratings yet
Tugas Statel Individu
17 pages
ARDL
No ratings yet
ARDL
17 pages
Credit Card Data - Final Project Proposal - Victor
No ratings yet
Credit Card Data - Final Project Proposal - Victor
1 page
Quiz 1 Practice Solutions: Conceptual Exercises
No ratings yet
Quiz 1 Practice Solutions: Conceptual Exercises
6 pages
Lab 12 Worksheet Correlation
No ratings yet
Lab 12 Worksheet Correlation
2 pages
Nama: Sintia Febiola Nim: 191251952 Kelas: 5 CD Pagi: Tests of Normality
No ratings yet
Nama: Sintia Febiola Nim: 191251952 Kelas: 5 CD Pagi: Tests of Normality
2 pages
Machine Learning Assignment 2
No ratings yet
Machine Learning Assignment 2
4 pages
Homework 07 Answers
No ratings yet
Homework 07 Answers
3 pages
Shake Them Haters off Volume 16: Mastering Your Spelling Skill – the Study Guide- 1 of 3
From Everand
Shake Them Haters off Volume 16: Mastering Your Spelling Skill – the Study Guide- 1 of 3
Russell Bailey
No ratings yet