0% found this document useful (0 votes)
18 views86 pages

Aiml Raw

The document describes a program to perform binary classification on the Titanic dataset. It loads and explores the dataset, then splits it into training and test sets for model building and evaluation using logistic regression. Classification metrics are used to evaluate the model performance.

Uploaded by

abhinandanpaul1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views86 pages

Aiml Raw

The document describes a program to perform binary classification on the Titanic dataset. It loads and explores the dataset, then splits it into training and test sets for model building and evaluation using logistic regression. Classification metrics are used to evaluate the model performance.

Uploaded by

abhinandanpaul1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 86

Page 1

UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE

M.Sc. Computer Science with Spl. in Data Science – Semester II


ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
JOURNAL
2021-2022

Seat No.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 2

UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE

CERTIFICATE
This is to certify that the work entered in this journal was done in the University
Department of Computer Science laboratory by
Mr./Ms. ARCHANA SUKUMARAN NAIR
Seat No. for the course of M.Sc.
Computer Science with Spl. in Data Science - Semester II (CBCS) (Revised)
during the academic year 2021-2022 in a satisfactory manner.

Subject In-charge Head of Department

External Examiner

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 3

Index

Sr. no. Name of the practical Page No. Sign

1 A) MINI-MAX
B) ALPHA BETA PRUNING
2 A) BINARY CLASSIFICATION
B) MULTI-CLASS CLASSIFICATION
3 A) LINEAR REGRESSION
B) POLYNOMIAL REGRESSION
4 C)
S-ALGORITHM

5 A) DECISION TREE ALGORITHM


B) RANDOM FOREST ALGORITHM
6 SUPPORT VECTOR MACHINE

7 A) K-NEAREST NEIGHBOUR ALGORITHM


B) K-MEANS ALGORITHM
8 C) NAIVE BAYES AND GAUSSIAN NAIVE BAYES
A)
MODEL
B) LOGISTIC REGRESSION

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 4

Practical 1a:
Aim: Write a program to implement Mini-Max Algorithm.

Theory:
Mini-max algorithm is a kind of backtracking algorithm that is used in decision
making and game theory, assuming that both the players will play optimally. The
two players maximizer tries to do its opposite.

Algorithm: Function Minimax(node,depth,maximizing Player)


is If depth==0 or node is a terminal node then return static.
Maximizing Player then , for maximizer player
maxEva = -Infinity
for each child node do
eva =minimax (child, depth-1 ,false).
maxEva = max ( max Eva, eva)…….(gives maximum of values)
return maxEva

Else ………(for Minimizer player)


minEva = +infinity
for each child of node do
eva = minimax (child, depth-1, true)
minEva = min (minEva, eva) …..(gives minimum the value)
return minEva

Tree based Example:

Code:
Input
Q. Write a program to implement Min-max Algorithm
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 5

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 6

Practical 1B:
1. B )Write a program to implement Alpha-Beta Pruninging Mini-Max Algorithm

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 7

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 8

Practical 2A:
Q.Write a program to input binary classification.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 9

CODE:
In [1]:
# A) AIM : WAP to input dataset and perform Binary classification.
# Evaluate the model based on classification metrics and infer your result.

In [2]:

#import libraries

In [3]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [4]: Page 10

#load data

In [5]:

titanic_data=pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/titanic

In [6]:

len(titanic_data)

Out[6]:

891

titanic_data.head()

Out[77]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 11

In [78]:

titanic_data.index

Out[78]:

RangeIndex(start=0, stop=891, step=1)

In [79]:

titanic_data.columns

Out[79]:

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',


'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [80]: Page 12

titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [81]:

titanic_data.dtypes

Out[81]:

PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [82]: Page 13

titanic_data.describe()

Out[82]:

PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000


mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

DATA ANALYSIS
In [83]:

#Data Analysis
#Import Seaborn for visually analysing the data

#Find out how many survived vs Died using countplot method of seaboarn

In [84]:

sns.countplot(x='Survived',data=titanic_data)

Out[84]:

<AxesSubplot:xlabel='Survived', ylabel='count'>

In [85]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 14
In [86]:

sns.countplot(x='Survived',data=titanic_data,hue='Sex')

Out[86]:

<AxesSubplot:xlabel='Survived', ylabel='count'>

In [87]:

#Check for null

In [88]:

titanic_data.isna()

Out[88]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin

0 False False False False False False False False False False True
1 False False False False False False False False False False False

2 False False False False False False False False False False True

3 False False False False False False False False False False False

4 False False False False False False False False False False True

... ... ... ... ... ... ... ... ... ... ... ...

886 False False False False False False False False False False True

887 False False False False False False False False False False False

888 False False False False False True False False False False True

889 False False False False False False False False False False False

890 False False False False False False False False False False True

891 rows × 12 columns

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [89]: Page 15

#Check how many values are null


In [90]:

titanic_data.isna().sum()

Out[90]:

PassengerId
Survived 0
0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype:
int64
In [91]:

#find the distribution for the age column


In [92]:

sns.displot(x='Age',data=titanic_data)

Out[92]:

<seaborn.axisgrid.FacetGrid at 0x1758099fe20>

DATA CLEANING

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [93]: Page 16

#fill age column


In [94]:

titanic_data['Age'].fillna(titanic_data['Age'].mean(),inplace=True)
In [95]:

#verify null value


In [96]:

titanic_data['Age'].isna().sum()

Out[96]:

In [97]:

#Drop cabin column


In [98]:

titanic_data.drop('Cabin',axis=1,inplace=True)
In [99]:

#see the contents of the data


In [100]:

titanic_data.head()

Out[100]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
Laina 3101282

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr. William Henr


4 5 0 3

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 17

Preaparing Data for Model


In [101]:

#Check for the non-numeric column

In [102]:

titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 76.7+ KB

In [135]:

titanic_data.size

Out[135]:

7128

In [103]:

# We can see, Name, Sex, Ticket and Embarked are non-numerical.Name,Embarked and Ticket num

In [104]:

#convert sex column to numerical values

In [105]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
gender=pd.get_dummies(titanic_data['Sex'],drop_first=True)
Page 18

In [106]:

titanic_data['Gender']=gender

In [107]:

titanic_data.head()

Out[107]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry

In [108]:

#drop the columns which are not required

In [109]:

titanic_data.drop(['Name','Sex','Ticket','Embarked'],axis=1,inplace=True)

In [110]:

titanic_data.head()

Out[110]:

PassengerId Survived Pclass Age SibSp Parch Fare Gender

0 1 0 3 22.0 1 0 7.2500 1

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 19

1 2 1 1 38.0 1 0 71.2833 0

2 3 1 3 26.0 0 0 7.9250 0

3 4 1 1 35.0 1 0 53.1000 0

4 5 0 3 35.0 0 0 8.0500 1

10/13

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [111]: Page 20

#Seperate Dependent and Independent variables

In [112]:

x=titanic_data[['PassengerId','Pclass','Age','SibSp','Parch','Fare','Gender']]
y=titanic_data['Survived']

In [113]:

Out[113]:

0 0
1 1
2 1
3 1
4 0
..
886 0
887 1
888 0
889 1
890 0
Name: Survived, Length: 891, dtype: int64

DATA MODELLING
In [114]:

#Building Model using Logestic Regression


#import train test split method

In [115]:

from sklearn.model_selection import train_test_split

In [116]:

#train test split

In [117]:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [118]:

#import Logistic Regression

In [119]:

from sklearn.linear_model import LogisticRegression

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [120]: Page 21

#Fit Logistic Regression

In [121]:

lr=LogisticRegression()

In [122]:

lr.fit(x_train,y_train)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =

_check_optimize_result( Out[122]: LogisticRegression()

In [123]:

#predict

In [124]:

predict=lr.predict(x_test)

In [125]:

#print confusion matrix

In [126]:

from sklearn.metrics import confusion_matrix

In [127]:

pd.DataFrame(confusion_matrix(y_test,predict),columns=['Predicted No','Predicted Yes'],inde

Out[127]:

Predicted No Predicted Yes

Actual No 152 23
Actual Yes 37 83

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
5/19/22, 11:56 PM ._archa_AIML_Pract2A_BinaryClassification - Jupyter Notebook
Page 22
In [128]:

#import classification report

In [129]:

from sklearn.metrics import classification_report

In [130]:

print(classification_report(y_test,predict))

precision recall f1-score support

0 0.80 0.87 0.84 175


1 0.78 0.69 0.73 120

accuracy 0.80 295


macro avg 0.79 0.78 0.78 295
weighted avg 0.80 0.80 0.79 295

In [131]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 2B : MULTICLASS CLASSIFICATION
Page 23

CODE:
In [ ]:

# NAME : Archana Nair


# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to input dataset and perform Multiclass classification.
# Evaluate the model based on classification metrics and infer your result.

In [ ]:

#import libraries

In [1]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [ ]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [2]:
Page 24
dataset=pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/Social_Netwo

In [4]:

dataset.head()

Out[4]:

User ID Gender Age EstimatedSalary Purchased

0 15624510 Male 19 19000 0


1 15810944 Male 35 20000 0

2 15668575 Female 26 43000 0

3 15603246 Female 27 57000 0

4 15804002 Male 19 76000 0

In [76]:

len(dataset)

Out[76]:

400

In [6]:

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User ID 400 non-null int64
1 Gender 400 non-null object
2 Age 400 non-null int64
3 EstimatedSalary 400 non-null int64
4 Purchased 400 non-null int64
dtypes: int64(4), object(1)
memory usage: 15.8+ KB

In [71]:

dataset.shape

Out[71]:

(400, 5)

In [8]:

dataset.index

Out[8]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [10]: Page 25

dataset.columns

Out[10]:

Index(['User ID', 'Gender', 'Age', 'EstimatedSalary', 'Purchased'], dtype='o bject')

In [74]:

dataset.dtypes

Out[74]:

User ID int64
Gender object
Age int64
EstimatedSalary int64
Purchased int64
dtype: object

In [12]:

X=dataset.iloc[:,1:4]

In [13]:

X=pd.get_dummies(X)

In [14]:

X=X.values

In [15]:

Out[15]:
array([[ 19, 19000, 0, 1],
[ 35, 20000, 0, 1],
[ 26, 43000, 1, 0],
...,
[ 50, 20000, 1, 0],
[ 36, 33000, 0, 1],
[ 49, 36000, 1, 0]], dtype=int64)

DATA ANALYSIS
In [17]:

sns.jointplot(x='Age',y='EstimatedSalary',data=dataset, hue = 'Purchased', kind= 'scatter')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 26

In [19]:

sns.jointplot(x='Age',y='EstimatedSalary',data=dataset, hue = 'Purchased', kind= 'hist');

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
DATA MODELLING Page 27

In [20]:

#Splitting the dataset into the Train set and Test set

In [21]:

X = dataset.iloc[:, [2, 3]].values


y = dataset.iloc[:, -1].values

In [22]:

#Splitting the dataset into the Train set and Test set

In [36]:

from sklearn.model_selection import train_test_split


x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state =

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [25]: Page 28

#Feature Scaling

In [41]:

from sklearn.preprocessing import StandardScaler


sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

In [60]:

from sklearn.neighbors import KNeighborsClassifier

In [61]:

KNN= KNeighborsClassifier(n_neighbors=5,
weights='uniform',
algorithm='kd_tree',
leaf_size=30,
p=2,
metric='minkowski',
n_jobs=-1)

In [62]:

KNN.fit(x_train,y_train)

Out[62]:

KNeighborsClassifier(algorithm='kd_tree', n_jobs=-1)

In [75]:

KNN.predict(x_test)

Out[75]:

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)

In [63]:

y_pred=KNN.predict(x_test)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [64]: Page 29

print(classification_report(y_test.reshape(-1,1),y_pred))

precision recall f1-score support

0 0.96 0.94 0.95 68


1 0.88 0.91 0.89 32

accuracy 0.93 100


macro avg 0.92 0.92 0.92 100
weighted avg 0.93 0.93 0.93 100

In [65]:

from sklearn.model_selection import cross_val_score

In [66]:

print('Cross val',cross_val_score(KNN,y_test.reshape(-1,1),y_pred,cv=10))
print('Cross val',np.mean(cross_val_score(KNN,y_test.reshape(-1,1),y_pred,)))

Cross val [0.8 1. 1. 0.9 0.9 1. 1. 1. 0.8 0.9]


Cross val 0.93

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 30

PRACTICAL 3A: LINEAR REGRESSION

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 31

CODE:

# A) AIM : WAP to implement Linear Regresion

In [2]:

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error

In [3]:

df = pd.read_csv('C:/Users/archa/Downloads/vgsales.csv')
df.head()

Out[3]:

Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales O

0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77

Super Mario
1 2 NES 1985.0 Platform Nintendo 29.08 3.58 6.81
Bros.

2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79

Wii Sports
3 4 Wii 2009.0 Sports Nintendo 15.75 11.01 3.28
Resort

Pokemon Role-
4 5 Red/Pokemon GB 1996.0 Nintendo 11.27 8.89 10.22
Playing
Blue

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 32

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 16598 non-null int64
1 Name 16598 non-null object
2 Platform 16598 non-null object
3 Year 16327 non-null float64
4 Genre 16598 non-null object
5 Publisher 16540 non-null object
6 NA_Sales 16598 non-null float64
7 EU_Sales 16598 non-null float64
8 JP_Sales 16598 non-null float64
9 Other_Sales 16598 non-null float64
10 Global_Sales 16598 non-null float64
In [5]:

df.describe()

Out[5]:

Rank Year NA_Sales EU_Sales JP_Sales Other_Sales G

count 16598.000000 16327.000000 16598.000000 16598.000000 16598.000000 16598.000000 16


mean 8300.605254 2006.406443 0.264667 0.146652 0.077782 0.048063

std 4791.853933 5.828981 0.816683 0.505351 0.309291 0.188588

min 1.000000 1980.000000 0.000000 0.000000 0.000000 0.000000

25% 4151.250000 2003.000000 0.000000 0.000000 0.000000 0.000000

50% 8300.500000 2007.000000 0.080000 0.020000 0.000000 0.010000

75% 12449.750000 2010.000000 0.240000 0.110000 0.040000 0.040000

max 16600.000000 2020.000000 41.490000 29.020000 10.220000 10.570000

In [6]:

df.isnull().sum()

Out[6]:

Rank 0
Name 0
Platform 0
Year 271
Genre 0
Publisher 58
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
NA_Sales 0 Page 33
EU_Sales 0
JP_Sales 0
Other_Sales 0
Global_Sales 0
dtype: int64

In [7]:

df.drop(["Rank","Name","Year","Publisher"],axis=1,inplace=True)
df.head()

Out[7]:

Platform Genre NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales

0 Wii Sports 41.49 29.02 3.77 8.46 82.74


1 NES Platform 29.08 3.58 6.81 0.77 40.24

2 Wii Racing 15.85 12.88 3.79 3.31 35.82

3 Wii Sports 15.75 11.01 3.28 2.96 33.00

4 GB Role-Playing 11.27 8.89 10.22 1.00 31.37

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [8]:
Page 34
dums = pd.get_dummies(df[["Platform","Genre"]])
dums.head()

Out[8]:

Platform_2600 Platform_3DO Platform_3DS Platform_DC Platform_DS Platform_GB Platfor

0 0 0 0 0 0 0
1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 1

5 rows × 43 columns

In [9]:

dums.drop(["Platform_2600","Genre_Misc"],axis=1,inplace=True)

In [10]:

final_df= pd.concat([df,dums],axis=1)
final_df.drop(["Platform","Genre"],axis=1,inplace=True)
final_df.head()

Out[10]:

NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Platform_3DO Platform_3DS Pla

0 41.49 29.02 3.77 8.46 82.74 0 0


1 29.08 3.58 6.81 0.77 40.24 0 0

2 15.85 12.88 3.79 3.31 35.82 0 0

3 15.75 11.01 3.28 2.96 33.00 0 0

4 11.27 8.89 10.22 1.00 31.37 0 0

5 rows × 46 columns

In [11]:

import seaborn as sns


import matplotlib.pyplot as plt
g = sns.regplot(final_df.Global_Sales,final_df.EU_Sales,ci=None,scatter_kws= {"color":"r","
plt.xlim(-2,85)
plt.ylim(bottom=0)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 35

C:\Users\archa\anaconda3\lib\site-packages\seaborn\_decorators.py:36: Future
Warning: Pass the following variables as keyword args: x, y. From version 0.
12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretat
ion.

(0.0, 30.471405021832812)

In [12]:

final_df.EU_Sales[df.EU_Sales>15]
#this value is in index 0.

Out[12]:

0 29.02
Name: EU_Sales, dtype: float64
In [13]:

df_outlier = final_df.drop([0],axis=0)

In [14]:

import matplotlib.pyplot as plt


g = sns.regplot(df_outlier.Global_Sales,df_outlier.EU_Sales,ci=None,scatter_kws= {"color":"
plt.xlim(-2,45)
plt.ylim(bottom=0)

C:\Users\archa\anaconda3\lib\site-packages\seaborn\_decorators.py:36: Future
Warning: Pass the following variables as keyword args: x, y. From version 0.
12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretat
ion.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 36

(0.0, 13.524113383535223)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [15]:
Page 37

x = df_outlier[["EU_Sales"]]
y = df_outlier["Global_Sales"]

In [16]:
In [ ]:

reg = LinearRegression()
model = reg.fit(x,y)

In [17]:

model.score(x,y)

In [18]:

In [19]:

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(11617, 1)
(11617,)
(4980, 1)
(4980,)

In [20]:

lm = LinearRegression()
model = lm.fit(x_train,y_train)

In [21]:

from sklearn.metrics import mean_squared_error


y_pred = model.predict(x_test)
np.sqrt(mean_squared_error(y_test,y_pred))

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 3B: POLYNOMIAL REGRESSION Page 38

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 39

CODE:

In [ ]:

# NAME : Archana Nair


# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement Polynomial Regresion

In [1]:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

In [3]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#importing datasets Page 40
data_set= pd.read_csv('C:/Users/archa/Downloads/Position_Salaries.csv')
data_set.head()

Out[3]:

Position Level Salary

0 Business Analyst 1 45000


1 Junior Consultant 2 50000

2 Senior Consultant 3 60000

3 Manager 4 80000

4 Country Manager 5 110000

In [4]:

#Extracting Independent and dependent Variable


x= data_set.iloc[:, 1:2].values
y= data_set.iloc[:, 2].values

In [5]:

#Fitting the Linear Regression to the dataset


from sklearn.linear_model import LinearRegression
lin_regs= LinearRegression()
lin_regs.fit(x,y)

Out[5]:

LinearRegression()

In [6]:

#Fitting the Polynomial regression to the dataset


from sklearn.preprocessing import PolynomialFeatures
poly_regs= PolynomialFeatures(degree= 2)
x_poly= poly_regs.fit_transform(x)
lin_reg_2 =LinearRegression()
lin_reg_2.fit(x_poly, y)

Out[6]:

LinearRegression()

In [7]:

#Visulaizing the result for Linear Regression model


mtp.scatter(x,y,color="blue")
mtp.plot(x,lin_regs.predict(x), color="red")
mtp.title("Bluff detection model(Linear Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 41

In [8]:

#Visulaizing the result for Polynomial Regression


mtp.scatter(x,y,color="blue")
mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), color="red")
mtp.title("Bluff detection model(Polynomial Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()

In [10]:

lin_pred = lin_regs.predict([[6.5]])
print(lin_pred)

[330378.78787879]

In [11]:

poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
print(poly_pred)

[189498.10606061]

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 42
PRACTICAL 4:
FIND S-ALGORITHM FOR FINDING HYPOTHESIS BASED ON TRAING
SAMPLES.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 43

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
In [ ]: Page 44

# A) AIM : WAP to implement S Algorithm

In [8]:

import pandas as pd
import numpy as np

In [16]:

d = pd.read_csv("C:/Users/archa/Downloads/ws.csv")

d.head()

Out[16]:

Sunny Warm Normal Strong Warm.1 Same Yes

0 Sunny Warm High Strong Warm Same Yes


1 Rainy Cold High Strong Warm Change No

2 Sunny Warm High Strong Cool Change Yes

In [17]:

t = np.array(d)[:,-1]
print("The target is: ",t)

The target is: ['Yes' 'No' 'Yes']

In [18]:
def fun(c,t):
for i, val in enumerate(t):
if val == "Yes":
specific_hypothesis = c[i].copy()
break

for i, val in enumerate(c):


if t[i] == "Yes":
for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
specific_hypothesis[x] = '?'
else:
pass
return specific_hypothesis
print(" The final hypothesis is:",train(a,t))

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5A: DECISION TREE Page 45

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 46
In [33]:
# A) AIM : WAP to implement Decision Tree Algorithm

In [20]:

# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [21]
data = pd.read_csv("C:/Users/archa/Downloads/WineQuality.csv")
data.head()

Out[21]:

sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol quality

1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5

2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5

1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6

1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

In [22]:

data.shape

Out[22]:

(1599, 13)
.
In [25]:
data.isna().sum()
Out[25]:
Unnamed: 0 0
fixed.acidity 0
volatile.acidity 0
citric.acid 0
residual.sugar 0
chlorides 0
free.sulfur.dioxide 0
total.sulfur.dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 47
In [26]:

# creating X and y

X = data.drop(columns = 'quality')
y = data['quality']

In [27]:

# splitting data into training and testing data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state =

In [28]:

# scaling our data

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [29]:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

Out[29]:

DecisionTreeClassifier()

In [30]:

y_pred = clf.predict(X_test)

In [31]:

clf.score(X_train, y_train)

Out[31]:

1.0

In [32]:

clf.score(X_test, y_test)

Out[32]:

0.6229166666666667
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5B :
RANDOMFOREST Page 48

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 49

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 50
CODE
# A) AIM : WAP to implement Random Forest Algorithm

In [20]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [21]:

df = pd.read_csv("C:/Users/archa/Downloads/temps.csv")
df.head()

Out[21]:

year month day week temp_2 temp_1 average actual friend

0 2019 1 1 Fri 45 45 45.6 45 29


1 2019 1 2 Sat 44 45 45.7 44 61

2 2019 1 3 Sun 45 44 45.8 41 56

3 2019 1 4 Mon 44 41 45.9 40 53

4 2019 1 5 Tues 41 40 46.0 44 41

In [34]:

df.dtypes

Out[34]:

year int64
month int64
day int64
temp_2 int64
temp_1 int64
average float64
friend int64
week_Fri uint8
week_Mon uint8
week_Sat uint8
week_Sun uint8
week_Thurs uint8
week_Tues uint8
week_Wed uint8
dtype: object

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 51
In [22]:

# the shape of our features


df.shape

Out[22]:

(348, 9)
In [23]:

# column names
df.columns

Out[23]:

Index(['year', 'month', 'day', 'week', 'temp_2', 'temp_1', 'average', 'actua l',


'friend'], dtype='object')
In [24]:

# checking for null values


df.isnull().sum()

Out[24]:

year 0
month 0
day 0
week 0
temp_2 0
temp_1 0
average 0
actual 0
friend 0
dtype: int64

In [25]:

# One-hot encode categorical features


df = pd.get_dummies(df)
df.head(5)

Out[25]:

year month day temp_2 temp_1 average actual friend week_Fri week_Mon week_Sat

0 2019 1 1 45 45 45.6 45 29 1 0 0
1 2019 1 2 44 45 45.7 44 61 0 0 1

2 2019 1 3 45 44 45.8 41 56 0 0 0

3 2019 1 4 44 41 45.9 40 53 0 1 0

4 2019 1 5 41 40 46.0 44 41 0 0 0

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 52
In [26]:

print('Shape of features after one-hot encoding:', df.shape)

Shape of features after one-hot encoding: (348, 15)

In [27]:

# Labels are the values we want to predict


labels = df['actual']

# Remove the labels from the features


df = df.drop('actual', axis = 1)

# Saving feature names for later use


feature_list = list(df.columns)

In [35]:

# Using Skicit-learn to split data into training and testing sets


from sklearn.model_selection import train_test_split

# Split the data into training and testing sets


train_features, test_features, train_labels, test_labels = train_test_split(df, labels, tes

In [29]:

print('Training Features Shape:', train_features.shape)


print('Training Labels Shape:', train_labels.shape)
print('Testing Features Shape:', test_features.shape)
print('Testing Labels Shape:', test_labels.shape)

Training Features Shape: (278, 14)


Training Labels Shape: (278,)
Testing Features Shape: (70, 14)
Testing Labels Shape: (70,)

In [30]:

# Training the Forest


# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model
rf = RandomForestRegressor(n_estimators= 1000, random_state=42)

# Train the model on training data


rf.fit(train_features, train_labels);

In [31]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#Make prediction on test data
# Use the forest's predict method on the test data Page 53
predictions = rf.predict(test_features)

# Calculate the absolute errors


errors = abs(predictions - test_labels)

# Print out the mean absolute error (mae)


print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')

Mean Absolute Error: 3.78 degrees.

In [32]:

# Calculate mean absolute percentage error (MAPE)


mape = 100 * (errors / test_labels)

# Calculate and display accuracy


accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

Accuracy: 94.02 %.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 6:SUPPORT VECTOR MACHINE (
Page 54

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [ ]: Page 55
#WAP to implement Support Vector Machine (LSVM/Kernel SVM/Soft Margin SVM)

#import libraries

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:

#load data

In [4]:

df = pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/Social_Network_

In [5]:

df.head()

Out[5]:

User ID Gender Age EstimatedSalar Purchased


y
0 15624510 Male 19 1900 0
0
1 15810944 Male 35 2000 0
0
2 15668575 Female 26 4300 0
0
3 15603246 Female 27 5700 0
0
4 15804002 Male 19 7600 0
0

In [6]:

df.shape

Out[6]:

(400, 5)
In [7]:

df.info

Out[7]:

<bound method DataFrame.info of User ID Gender Age EstimatedSalary


Purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
.. ... ... ... ... ..
395 15691863 Female 46 41000 .1
396 15706071 Male 51 23000 1
397 15654296 Female 50 20000 1
398 15755018 Male 36 33000 0
399 15594041 Female 49 36000 1

[400 rows x 5 columns]>

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 56
In [11]:

df.columns

Out[11]:

Index(['User ID', 'Gender', 'Age', 'EstimatedSalary', 'Purchased'], dtype='o bject')

In [12]:

x = df.iloc[:,[2,3]]
y = df.iloc[:,4]

In [13]:

x.head()

Out[13]:

Age EstimatedSalary

0 19 19000
1 35 20000

2 26 43000

3 27 57000

4 19 76000

In [14]:

y.head()

Out[14]:

0 0
1 0
2 0
3 0
4 0
Name: Purchased, dtype: int64

In [77]:

#splitting the dataset into Training & Testing set

from sklearn.model_selection import train_test_split


x_train, x_test, y_train,y_test =train_test_split(x,y,test_size=0.75,random_state=0)

In [78]:

print("Training data:",x_train.shape)
print("Testing data",x_test.shape)

Training data: (100, 2)

Testing data (300, 2)


MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [79]: Page 57

#feature scaling

from sklearn.preprocessing import StandardScaler


sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)

In [80]:

from sklearn.svm import SVC


classifier = SVC(kernel='linear',random_state=0)
classifier.fit(x_train,y_train)

Out[80]:

SVC(kernel='linear', random_state=0)
In [81]:

#predicting test set results


y_pred = classifier.predict(x_test)
y_pred

Out[81]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0], dtype=int64)
In [82]:

from sklearn.metrics import accuracy_score


accuracy_score(y_test, y_pred)

Out[82]:

0.7966666666666666

In [83]:

#Plotting Data points


import matplotlib.pyplot as plt

plt.scatter(x_test[:, 0], x_test[:, 1],


c=y_test) plt.xlabel('Age')
plt.ylabel('Estimated
Salary')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 58

In [88]:

import matplotlib.pyplot as plt

from sklearn.svm import SVC


classifier = SVC(kernel='linear',random_state=0)
classifier.fit(x_train,y_train)

y_pred = classifier.predict(x_test)

#Plotting Data points


plt.scatter(x_test[:, 0], x_test[:, 1], c=y_test)

plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.title('Test data')

#Creating hyperplane
w = classifier.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-2, 2)
yy = a * xx -(classifier.intercept_[0]) / w[1]
#Plot Hyperplane
plt.plot(xx, yy)
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [55]: Page 59

from sklearn.metrics import classification_report


classification_report(y_test,y_pred)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result)) Out[55]:

' precision recall f1-score support\n\n 0


0.00 0.00 0.00 68\n 1 0.32 1.00
0.48 32\n\n accuracy 0.32 100\n
macro avg 0.16 0.50 0.24 100\nweighted avg 0.10
0.32 0.16 100\n'

In [57]:

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.00 0.00 0.00 68


1 0.32 1.00 0.48 32

accuracy 0.32 100


macro avg 0.16 0.50 0.24 100
weighted avg 0.10 0.32 0.16 100

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 60
PRACTICAL 7A:K-NEAREST NEIGHBOUR .

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
Page 61

In [ ]:

# NAME : Archana Nair


# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement KNN Algorithm
# Evaluate the model based on classification metrics and infer your result.

In [31]:

#Import python libraries


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [32]:

#Load the dataset


df = pd.read_csv('C:/Users/archa/Downloads/diabetes.csv')

df.head()

Out[32]:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunctio

0 6 148 72 35 0 33.6 0.62


1 1 85 66 29 0 26.6 0.35

2 8 183 64 0 0 23.3 0.67

3 1 89 66 23 94 28.1 0.16

4 0 137 40 35 168 43.1 2.28

In [33]:

df.shape

Out[33]:

(768, 9)
In [57]:

df.dtypes

Out[57]:

Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Outcome int64 Page 62
dtype: object

In [34]:

X = df.drop('Outcome',axis=1).values
y = df['Outcome'].values

In [35]:

#importing train_test_split
from sklearn.model_selection import train_test_split

In [36]:

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.4,random_state=42, stratif

In [37]:

#import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

#Setup arrays to store training and test accuracies


neighbors = np.arange(1,9)
train_accuracy =np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

for i,k in enumerate(neighbors):

#Setup a knn classifier with k neighbors


knn = KNeighborsClassifier(n_neighbors=k)

#Fit the model


knn.fit(X_train, y_train)

#Compute accuracy on the training set


train_accuracy[i] = knn.score(X_train, y_train)

#Compute accuracy on the test set


test_accuracy[i] = knn.score(X_test, y_test)

In [38]:

#Plotting
plt.title('k-NN Varying number of neighbors')
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training accuracy')
plt.legend()
plt.xlabel('Number of neighbors')
plt.ylabel('Accuracy')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 63

In [39]:

#Setup a knn classifier with k neighbors


knn = KNeighborsClassifier(n_neighbors=7)

In [40]:

#Fit the model


knn.fit(X_train,y_train)

Out[40]:

KNeighborsClassifier(n_neighbors=7)

In [41]:

knn.score(X_test,y_test)

Out[41]:

0.7305194805194806

In [42]:

#import confusion_matrix
from sklearn.metrics import confusion_matrix

In [43]:

y_pred = knn.predict(X_test)

In [44]:

confusion_matrix(y_test,y_pred)

Out[44]:

array([[165, 36],
[ 47, 60]], dtype=int64)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#import classification_report
from sklearn.metrics import classification_report Page 64

In [46]:

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.78 0.82 0.80 201


1 0.62 0.56 0.59 107

accurac 0.73 308

macro 0.70 0.69 0.70 308


weighted avg 0.73 0.73 0.73 308

In [47]:

y_pred_proba = knn.predict_proba(X_test)[:,1]

In [48]:

from sklearn.metrics import roc_curve

In [49]:

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

In [50]:

plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='Knn')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('Knn(n_neighbors=7) ROC curve')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 65
In [51]:

#Area under ROC curve


from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

Out[51]:

0.7345050448691124

In [52]:

#import GridSearchCV
from sklearn.model_selection import GridSearchCV

In [53]:

#In case of classifier like knn the parameter to be tuned is n_neighbors


param_grid = {'n_neighbors':np.arange(1,50)}

In [54]:

knn = KNeighborsClassifier()
knn_cv= GridSearchCV(knn,param_grid,cv=5)
knn_cv.fit(X,y)

Out[54]:

GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': array([ 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])})

In [55]:

knn_cv.best_score_

Out[55]:

0.7578558696205755

In [56]:

knn_cv.best_params_

Out[56]:

{'n_neighbors': 14}

Thus a knn classifier with number of neighbors as 14 achieves the best score/accuracy of 0.7578 i.e about 76%

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 66
PRACTICAL 7B:K-MEANS ALGORITHM.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 67

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 68

In [ ]:
# B) AIM : WAP to implement KMeans Algorithm
# Evaluate the model based on classification metrics and infer your result.

In [5]:

#import Libraries99
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt

In [6]:

data = pd.read_csv("C:/Users/archa/Downloads/Mall_Customers.csv")
data.head()

Out[6]:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39
1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [7]:

data = data.drop(columns = ['CustomerID'])


data.head()

Out[7]:

Genre Age Annual Income (k$) Spending Score (1-100)

0 Male 19 15 39
1 Male 21 15 81

2 Female 20 16 6

3 Female 23 16 77

4 Female 31 17 40

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 69
In [11]:

data.shape

Out[11]:

(200, 4)

In [8]:

data.dtypes

Out[8]:

Genre object
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object

In [9]:

#analyse missing values


data.isna().sum()

Out[9]:

Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64
In [5]:

sns.pairplot(data)

Out[5]:

<seaborn.axisgrid.PairGrid at 0x20172d70730>

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 70
In [6]:

data.columns

Out[6]:

Index(['Genre', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)'], dtyp


e='object')
In [7]:

col = ['Genre', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)']


for i in col:
plt.figure(figsize =(5,3), dpi = 100)
plt.hist(x = i, data = data)
plt.xlabel(col)
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 71
In [10]:

data['Genre'] = data['Genre'].map({'Male':1,'Female':2 }) # we can use get_dummies instead


data.head()

Out[10]:

Genre Age Annual Income (k$) Spending Score (1-100)

0 1 19 15 39
1 1 21 15 81

2 2 20 16 6

3 2 23 16 77

4 2 31 17 40

In [11]:

data.dtypes

Out[11]:

Genre int64
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object

In [26]:

#scaling transformation
#1.zscore normalization using standardScalar(Same mean)
#2.Minmax normalization using MinMax Scaler(0 to 1)

In [12]:

df_customer = data.iloc[:,2:4]
df_customer.head()

Out[12]:

Annual Income (k$) Spending Score (1-100)

0 15 39
1 15 81

2 16 6

3 16 77

4 17 40

In [13]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 72
from sklearn.preprocessing import StandardScaler
data_scaled = StandardScaler().fit_transform(df_customer)
data_scaled

Out[13]:

array([[-1.73899919, -0.43480148],
[-1.73899919, 1.19570407],
[-1.70082976, -1.71591298],
[-1.70082976, 1.04041783],
[-1.66266033, -0.39597992],
[-1.66266033, 1.00159627],
[-1.62449091, -1.71591298],
[-1.62449091, 1.70038436],
[-1.58632148, -1.83237767],
[-1.58632148, 0.84631002],
[-1.58632148, -1.4053405 ],
[-1.58632148, 1.89449216],
[-1.54815205, -1.36651894],
[-1.54815205, 1.04041783],
[-1.54815205, -1.44416206],
[-1.54815205, 1.11806095],
[-1.50998262, -0.59008772],
[-1.50998262, 0.61338066],

In [14]:

# Finding the optimal number of K


wcss= []
for i in range(2,11):
kmodel = KMeans(n_clusters = i, init = 'random')
kmodel.fit(data_scaled)
wcss.append(kmodel.inertia_)

In [15]:

wcss

Out[15]:

[269.01679374906655,
157.70400815035939,
108.92131661364358,
65.56840815571681,
55.103778121150555,
44.86475569922555,
37.24321153347672,
33.85792110528426,
30.684270071530346]

In [16]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 73

#Plotting
plt.figure(figsize = (8,6), dpi=100)
plt.plot(range(2,11),wcss, marker = 'o', c='blue', markerfacecolor='red')
plt.xlabel('No of Clusters')
plt.ylabel('WCSS')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 74
In [17]:

# Creating the final Kmeans model with no of clusters = 5


Kmodel_final = KMeans(n_clusters = 5, init = 'k-means++').fit(data_scaled)

In [18]:

cl = Kmodel_final.predict(data_scaled)

In [19]:

cl

Out[19]:

array([0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3,
0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 1,
0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 2, 1, 2, 4, 2, 4, 2,
1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2])

In [20]:

# Adding the clusters to a new column in the dataset


df_customer['cl']=cl
df_customer.head()

Out[20]:

Annual Income (k$) Spending Score (1-100) cl

0 15 39 0

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
15 81 3
Page 75
2 16 6 0

3 16 77 3

4 17 40 0

In [27]:

# Visualization of clusters
plt.figure(figsize = (6,4), dpi = 100)
plt.scatter(x=df_customer['Annual Income (k$)'],y=df_customer['Spending Score (1-100)'],c=c
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score')
plt.show()

cl 1=high income low spender c2 = high income high spender c3 = low income high spender c4 = low income
low spender c5 = moderate income moderate spender

Conclusion
Mall customer data is clustered into 5 clusters. The green cluster indicates the people who have high spending
score but a low annual income. The purple cluster shows the people who have a low annual income & low
spending score. The blue cluster shows people who have an average annual income & average spending
score. The sea green cluster indicate the people who have high annual income & high spending score. The
yellow cluster shows the people who have low spending score & high annual income.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 76

PRACTICAL 8A:
NAIVE BAYES MODEL AND GAUSSIAN NAIVE BAYES MODEL

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 77

In [ ]:

# NAME : Archana Nair


# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement Naive Bayes Model and Gaussian Naive Bayes Model

In [29]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [30]:

#Reading Dataset
df = pd.read_csv("C:/Users/archa/Downloads/Social_Network_Ads.csv")
df.head()

Out[30]:

User ID Gender Age EstimatedSalary Purchased

0 15624510 Male 19 19000 0


1 15810944 Male 35 20000 0

2 15668575 Female 26 43000 0

3 15603246 Female 27 57000 0

4 15804002 Male 19 76000 0

In [42]:

df.shape

Out[42]:

(400, 5)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 78

In [43]:

df.describe()

Out[43]:

User ID Age EstimatedSalary Purchased

count 4.000000e+02 400.000000 400.000000 400.000000


mean 1.569154e+07 37.655000 69742.500000 0.357500

std 7.165832e+04 10.482877 34096.960282 0.479864

min 1.556669e+07 18.000000 15000.000000 0.000000

25% 1.562676e+07 29.750000 43000.000000 0.000000

50% 1.569434e+07 37.000000 70000.000000 0.000000

75% 1.575036e+07 46.000000 88000.000000 1.000000

max 1.581524e+07 60.000000 150000.000000 1.000000

In [44]:

df.dtypes

Out[44]:

User ID int64
Gender object
Age int64
In [46]:

df.isna().sum()

Out[46]:

User ID 0
Gender 0
Age 0
EstimatedSalary 0
Purchased 0
dtype: int64

In [31]:

X = df.iloc[:, [1, 2, 3]].values


y = df.iloc[:, -1].values

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [45]: Page 79

sns.pairplot(df)

Out[45]:

<seaborn.axisgrid.PairGrid at 0x1f0c97d8dc0>

In [32]:

from sklearn.preprocessing import LabelEncoder


le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
In [34]:

#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [35]:

#Training the Naive Bayes model on the training set


from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Out[35]:

GaussianNB()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [36]: Page 80

y_pred = classifier.predict(X_test)

In [37]:

y_pred

Out[37]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)

In [38]:

y_test

Out[38]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1], dtype=int64)

In [39]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.93 0.97 0.95 58


1 0.90 0.82 0.86 22

accuracy 0.93 80
macro avg 0.92 0.89 0.90 80
weighted avg 0.92 0.93 0.92 80

In [40]:

from sklearn.metrics import confusion_matrix


pd.DataFrame(confusion_matrix(y_test,y_pred),columns=['Predicted No','Predicted Yes'],index

Out[40]:

Predicted No Predicted Yes

Actual No 56 2
Actual Yes 4 18

In [41]:

accuracy = metrics.accuracy_score(y_test, y_pred)


accuracy

Out[41]:

0.925
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 8B:LOGISTIC REGRESSION
Page 81

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 82
CODE:
In [1]:
# A) AIM : WAP to implement Logistic Regression

In [25]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
In [26]:

df = pd.read_csv("C:/Users/archa/Downloads/heart.csv")
df.head()

Out[26]:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0


1 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0

2 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0

3 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0

4 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0

In [27]:

df.shape

Out[27]:

(1025, 14)
In [28]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 1025 non-null int64
1 sex 1025 non-null int64
2 cp 1025 non-null int64
3 trestbps 1025 non-null int64
4 chol 1025 non-null int64
5 fbs 1025 non-null int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 83
6 restecg 1025 non-null int64
7 thalach 1025 non-null int64
8 exang 1025 non-null int64
9 oldpeak 1025 non-null float64
10 slope 1025 non-null int64
11 ca 1025 non-null int64
12 thal 1025 non-null int64
13 target 1025 non-null int64
dtypes: float64(1), int64(13)
memory usage: 112.2 KB

In [29]:

df.describe()

Out[29]:

age sex cp trestb ch fbs rest


p o
count 1025.00000 1025.00000 1025.00000 1025.00000 1025.0000 1025.00000 1025.000
mean 54.43414 0.69561 0.94243 131.6117 246.0000 0.14926 0.529
std 9.07229 0.46037 1.02964 017.5167 51.5925 0.35652 0.527
min 29.00000 0.00000 0.00000 1
94.0000 126.0000 0.00000 0.000
25% 48.00000 0.00000 0
0.00000 120.00000 211.0000 0.00000 0.000
50% 56.00000 1.00000 1.00000 130.00000 240.0000 0.00000 1.000
75% 61.00000 1.00000 2.00000 140.00000 275.0000 0.00000 1.000
max 77.00000 1.00000 3.00000 200.00000 564.0000 1.00000 2.000
0 0 0 0 0 0 0

In [30]:

df.target.value_counts()

Out[30]:

1 526
0 499
Name: target, dtype: int
In [31]:

df.isna().sum()

Out[31]:

age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 84

In [48]:

sns.pairplot(df)

Out[48]:

<seaborn.axisgrid.PairGrid at 0x1a723a1ec40>

In [32]:

sns.countplot(x="target", data=df, palette="bwr")


plt.show()

In [33]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 85
sns.countplot(x='sex', data=df, palette="mako_r")
plt.xlabel("Sex (0 = female, 1= male)")
plt.show()

In [34]:

df = df.drop(columns = ['cp', 'thal', 'slope'])


df.head()

Out[34]:

age sex trestbps chol fbs restecg thalach exang oldpeak ca target

0 52 1 125 212 0 1 168 0 1.0 2 0


1 53 1 140 203 1 0 155 1 3.1 0 0

2 70 1 145 174 0 1 125 1 2.6 0 0

3 61 1 148 203 0 1 161 0 0.0 1 0

4 62 0 138 294 1 1 106 0 1.9 3 0

In [35]:

y = df.target.values
x = df.drop(['target'], axis = 1)
In [36]:

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state=0)


In [37]:

log_reg = LogisticRegression()

In [38]:

log_reg.fit(x_train, y_train)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 86
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =

_check_optimize_result( Out[38]:

LogisticRegression()

In [39]:

y_pred = log_reg.predict(x_test)

In [40]:

from sklearn.metrics import classification_report


print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.86 0.70 0.78 98


1 0.77 0.90 0.83 107

accuracy 0.80 205


macro avg 0.82 0.80 0.80 205
weighted avg 0.81 0.80 0.80 205

In [46]:

from sklearn.metrics import confusion_matrix


pd.DataFrame(confusion_matrix(y_test,y_pred),columns=['Predicted No','Predicted Yes'],index

Out[46]:

Predicted No Predicted Yes

Actual No 69 29
Actual Yes 11 96

In [47]:

accuracy = metrics.accuracy_score(y_test, y_pred)


accuracy

Out[47]: 0.80487804878

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22

You might also like