0% found this document useful (0 votes)

18 views86 pages

Aiml Raw

The document describes a program to perform binary classification on the Titanic dataset. It loads and explores the dataset, then splits it into training and test sets for model building and evaluation using logistic regression. Classification metrics are used to evaluate the model performance.

Uploaded by

abhinandanpaul1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views86 pages

Aiml Raw

Uploaded by

abhinandanpaul1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 86

Page 1

UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE

M.Sc. Computer Science with Spl. in Data Science – Semester II

ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
JOURNAL
2021-2022

Seat No.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 2

UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE

CERTIFICATE
This is to certify that the work entered in this journal was done in the University
Department of Computer Science laboratory by
Mr./Ms. ARCHANA SUKUMARAN NAIR
Seat No. for the course of M.Sc.
Computer Science with Spl. in Data Science - Semester II (CBCS) (Revised)
during the academic year 2021-2022 in a satisfactory manner.

Subject In-charge Head of Department

External Examiner

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 3

Index

Sr. no. Name of the practical Page No. Sign

1 A) MINI-MAX
B) ALPHA BETA PRUNING
2 A) BINARY CLASSIFICATION
B) MULTI-CLASS CLASSIFICATION
3 A) LINEAR REGRESSION
B) POLYNOMIAL REGRESSION
4 C)
S-ALGORITHM

5 A) DECISION TREE ALGORITHM

B) RANDOM FOREST ALGORITHM
6 SUPPORT VECTOR MACHINE

7 A) K-NEAREST NEIGHBOUR ALGORITHM

B) K-MEANS ALGORITHM
8 C) NAIVE BAYES AND GAUSSIAN NAIVE BAYES
A)
MODEL
B) LOGISTIC REGRESSION

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 4

Practical 1a:
Aim: Write a program to implement Mini-Max Algorithm.

Theory:
Mini-max algorithm is a kind of backtracking algorithm that is used in decision
making and game theory, assuming that both the players will play optimally. The
two players maximizer tries to do its opposite.

Algorithm: Function Minimax(node,depth,maximizing Player)

is If depth==0 or node is a terminal node then return static.
Maximizing Player then , for maximizer player
maxEva = -Infinity
for each child node do
eva =minimax (child, depth-1 ,false).
maxEva = max ( max Eva, eva)…….(gives maximum of values)
return maxEva

Else ………(for Minimizer player)

minEva = +infinity
for each child of node do
eva = minimax (child, depth-1, true)
minEva = min (minEva, eva) …..(gives minimum the value)
return minEva

Tree based Example:

Code:
Input
Q. Write a program to implement Min-max Algorithm
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 5

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 6

Practical 1B:
1. B )Write a program to implement Alpha-Beta Pruninging Mini-Max Algorithm

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 7

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 8

Practical 2A:
Q.Write a program to input binary classification.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 9

CODE:
In [1]:
# A) AIM : WAP to input dataset and perform Binary classification.
# Evaluate the model based on classification metrics and infer your result.

In [2]:

#import libraries

In [3]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [4]: Page 10

#load data

In [5]:

titanic_data=pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/titanic

In [6]:

len(titanic_data)

Out[6]:

891

titanic_data.head()

Out[77]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 11

In [78]:

titanic_data.index

Out[78]:

RangeIndex(start=0, stop=891, step=1)

In [79]:

titanic_data.columns

Out[79]:

Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp',

'Parch', 'Ticket', 'Fare', 'Cabin', 'Embarked'],
dtype='object')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [80]: Page 12

titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB

In [81]:

titanic_data.dtypes

Out[81]:

PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [82]: Page 13

titanic_data.describe()

Out[82]:

PassengerId Survived Pclass Age SibSp Parch Fare

count 891.000000 891.000000 891.000000 714.000000 891.000000 891.000000 891.000000

mean 446.000000 0.383838 2.308642 29.699118 0.523008 0.381594 32.204208

std 257.353842 0.486592 0.836071 14.526497 1.102743 0.806057 49.693429

min 1.000000 0.000000 1.000000 0.420000 0.000000 0.000000 0.000000

25% 223.500000 0.000000 2.000000 20.125000 0.000000 0.000000 7.910400

50% 446.000000 0.000000 3.000000 28.000000 0.000000 0.000000 14.454200

75% 668.500000 1.000000 3.000000 38.000000 1.000000 0.000000 31.000000

max 891.000000 1.000000 3.000000 80.000000 8.000000 6.000000 512.329200

DATA ANALYSIS
In [83]:

#Data Analysis
#Import Seaborn for visually analysing the data

#Find out how many survived vs Died using countplot method of seaboarn

In [84]:

sns.countplot(x='Survived',data=titanic_data)

Out[84]:

<AxesSubplot:xlabel='Survived', ylabel='count'>

In [85]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 14
In [86]:

sns.countplot(x='Survived',data=titanic_data,hue='Sex')

Out[86]:

<AxesSubplot:xlabel='Survived', ylabel='count'>

In [87]:

#Check for null

In [88]:

titanic_data.isna()

Out[88]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin

0 False False False False False False False False False False True
1 False False False False False False False False False False False

2 False False False False False False False False False False True

3 False False False False False False False False False False False

4 False False False False False False False False False False True

... ... ... ... ... ... ... ... ... ... ... ...

886 False False False False False False False False False False True

887 False False False False False False False False False False False

888 False False False False False True False False False False True

889 False False False False False False False False False False False

890 False False False False False False False False False False True

891 rows × 12 columns

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [89]: Page 15

#Check how many values are null

In [90]:

titanic_data.isna().sum()

Out[90]:

PassengerId
Survived 0
0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype:
int64
In [91]:

#find the distribution for the age column

In [92]:

sns.displot(x='Age',data=titanic_data)

Out[92]:

<seaborn.axisgrid.FacetGrid at 0x1758099fe20>

DATA CLEANING

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [93]: Page 16

#fill age column

In [94]:

titanic_data['Age'].fillna(titanic_data['Age'].mean(),inplace=True)
In [95]:

#verify null value

In [96]:

titanic_data['Age'].isna().sum()

Out[96]:

In [97]:

#Drop cabin column

In [98]:

titanic_data.drop('Cabin',axis=1,inplace=True)
In [99]:

#see the contents of the data

In [100]:

titanic_data.head()

Out[100]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
Laina 3101282

Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr. William Henr

4 5 0 3

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 17

Preaparing Data for Model

In [101]:

#Check for the non-numeric column

In [102]:

titanic_data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 76.7+ KB

In [135]:

titanic_data.size

Out[135]:

7128

In [103]:

# We can see, Name, Sex, Ticket and Embarked are non-numerical.Name,Embarked and Ticket num

In [104]:

#convert sex column to numerical values

In [105]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
gender=pd.get_dummies(titanic_data['Sex'],drop_first=True)
Page 18

In [106]:

titanic_data['Gender']=gender

In [107]:

titanic_data.head()

Out[107]:

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare

Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris

Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...

Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)

Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry

In [108]:

#drop the columns which are not required

In [109]:

titanic_data.drop(['Name','Sex','Ticket','Embarked'],axis=1,inplace=True)

In [110]:

titanic_data.head()

Out[110]:

PassengerId Survived Pclass Age SibSp Parch Fare Gender

0 1 0 3 22.0 1 0 7.2500 1

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 19

1 2 1 1 38.0 1 0 71.2833 0

2 3 1 3 26.0 0 0 7.9250 0

3 4 1 1 35.0 1 0 53.1000 0

4 5 0 3 35.0 0 0 8.0500 1

10/13

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [111]: Page 20

#Seperate Dependent and Independent variables

In [112]:

x=titanic_data[['PassengerId','Pclass','Age','SibSp','Parch','Fare','Gender']]
y=titanic_data['Survived']

In [113]:

Out[113]:

0 0
1 1
2 1
3 1
4 0
..
886 0
887 1
888 0
889 1
890 0
Name: Survived, Length: 891, dtype: int64

DATA MODELLING
In [114]:

#Building Model using Logestic Regression

#import train test split method

In [115]:

from sklearn.model_selection import train_test_split

In [116]:

#train test split

In [117]:

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33, random_state=42)

In [118]:

#import Logistic Regression

In [119]:

from sklearn.linear_model import LogisticRegression

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [120]: Page 21

#Fit Logistic Regression

In [121]:

lr=LogisticRegression()

In [122]:

lr.fit(x_train,y_train)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =

_check_optimize_result( Out[122]: LogisticRegression()

In [123]:

#predict

In [124]:

predict=lr.predict(x_test)

In [125]:

#print confusion matrix

In [126]:

from sklearn.metrics import confusion_matrix

In [127]:

pd.DataFrame(confusion_matrix(y_test,predict),columns=['Predicted No','Predicted Yes'],inde

Out[127]:

Predicted No Predicted Yes

Actual No 152 23
Actual Yes 37 83

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
5/19/22, 11:56 PM ._archa_AIML_Pract2A_BinaryClassification - Jupyter Notebook
Page 22
In [128]:

#import classification report

In [129]:

from sklearn.metrics import classification_report

In [130]:

print(classification_report(y_test,predict))

precision recall f1-score support

0 0.80 0.87 0.84 175

1 0.78 0.69 0.73 120

accuracy 0.80 295

macro avg 0.79 0.78 0.78 295
weighted avg 0.80 0.80 0.79 295

In [131]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 2B : MULTICLASS CLASSIFICATION
Page 23

CODE:
In [ ]:

# NAME : Archana Nair

# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to input dataset and perform Multiclass classification.
# Evaluate the model based on classification metrics and infer your result.

In [ ]:

#import libraries

In [1]:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

In [ ]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [2]:
Page 24
dataset=pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/Social_Netwo

In [4]:

dataset.head()

Out[4]:

User ID Gender Age EstimatedSalary Purchased

0 15624510 Male 19 19000 0

1 15810944 Male 35 20000 0

2 15668575 Female 26 43000 0

3 15603246 Female 27 57000 0

4 15804002 Male 19 76000 0

In [76]:

len(dataset)

Out[76]:

400

In [6]:

dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User ID 400 non-null int64
1 Gender 400 non-null object
2 Age 400 non-null int64
3 EstimatedSalary 400 non-null int64
4 Purchased 400 non-null int64
dtypes: int64(4), object(1)
memory usage: 15.8+ KB

In [71]:

dataset.shape

Out[71]:

(400, 5)

In [8]:

dataset.index

Out[8]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [10]: Page 25

dataset.columns

Out[10]:

Index(['User ID', 'Gender', 'Age', 'EstimatedSalary', 'Purchased'], dtype='o bject')

In [74]:

dataset.dtypes

Out[74]:

User ID int64
Gender object
Age int64
EstimatedSalary int64
Purchased int64
dtype: object

In [12]:

X=dataset.iloc[:,1:4]

In [13]:

X=pd.get_dummies(X)

In [14]:

X=X.values

In [15]:

Out[15]:
array([[ 19, 19000, 0, 1],
[ 35, 20000, 0, 1],
[ 26, 43000, 1, 0],
...,
[ 50, 20000, 1, 0],
[ 36, 33000, 0, 1],
[ 49, 36000, 1, 0]], dtype=int64)

DATA ANALYSIS
In [17]:

sns.jointplot(x='Age',y='EstimatedSalary',data=dataset, hue = 'Purchased', kind= 'scatter')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 26

In [19]:

sns.jointplot(x='Age',y='EstimatedSalary',data=dataset, hue = 'Purchased', kind= 'hist');

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
DATA MODELLING Page 27

In [20]:

#Splitting the dataset into the Train set and Test set

In [21]:

X = dataset.iloc[:, [2, 3]].values

y = dataset.iloc[:, -1].values

In [22]:

#Splitting the dataset into the Train set and Test set

In [36]:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X, y, test_size = 0.25, random_state =

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [25]: Page 28

#Feature Scaling

In [41]:

from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
x_train = sc.fit_transform(x_train)
x_test = sc.transform(x_test)

In [60]:

from sklearn.neighbors import KNeighborsClassifier

In [61]:

KNN= KNeighborsClassifier(n_neighbors=5,
weights='uniform',
algorithm='kd_tree',
leaf_size=30,
p=2,
metric='minkowski',
n_jobs=-1)

In [62]:

KNN.fit(x_train,y_train)

Out[62]:

KNeighborsClassifier(algorithm='kd_tree', n_jobs=-1)

In [75]:

KNN.predict(x_test)

Out[75]:

array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)

In [63]:

y_pred=KNN.predict(x_test)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [64]: Page 29

print(classification_report(y_test.reshape(-1,1),y_pred))

precision recall f1-score support

0 0.96 0.94 0.95 68

1 0.88 0.91 0.89 32

accuracy 0.93 100

macro avg 0.92 0.92 0.92 100
weighted avg 0.93 0.93 0.93 100

In [65]:

from sklearn.model_selection import cross_val_score

In [66]:

print('Cross val',cross_val_score(KNN,y_test.reshape(-1,1),y_pred,cv=10))
print('Cross val',np.mean(cross_val_score(KNN,y_test.reshape(-1,1),y_pred,)))

Cross val [0.8 1. 1. 0.9 0.9 1. 1. 1. 0.8 0.9]

Cross val 0.93

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 30

PRACTICAL 3A: LINEAR REGRESSION

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 31

CODE:

# A) AIM : WAP to implement Linear Regresion

In [2]:

import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error

In [3]:

df = pd.read_csv('C:/Users/archa/Downloads/vgsales.csv')
df.head()

Out[3]:

Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales O

0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77

Super Mario
1 2 NES 1985.0 Platform Nintendo 29.08 3.58 6.81
Bros.

2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79

Wii Sports
3 4 Wii 2009.0 Sports Nintendo 15.75 11.01 3.28
Resort

Pokemon Role-
4 5 Red/Pokemon GB 1996.0 Nintendo 11.27 8.89 10.22
Playing
Blue

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 32

In [4]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 16598 non-null int64
1 Name 16598 non-null object
2 Platform 16598 non-null object
3 Year 16327 non-null float64
4 Genre 16598 non-null object
5 Publisher 16540 non-null object
6 NA_Sales 16598 non-null float64
7 EU_Sales 16598 non-null float64
8 JP_Sales 16598 non-null float64
9 Other_Sales 16598 non-null float64
10 Global_Sales 16598 non-null float64
In [5]:

df.describe()

Out[5]:

Rank Year NA_Sales EU_Sales JP_Sales Other_Sales G

count 16598.000000 16327.000000 16598.000000 16598.000000 16598.000000 16598.000000 16

mean 8300.605254 2006.406443 0.264667 0.146652 0.077782 0.048063

std 4791.853933 5.828981 0.816683 0.505351 0.309291 0.188588

min 1.000000 1980.000000 0.000000 0.000000 0.000000 0.000000

25% 4151.250000 2003.000000 0.000000 0.000000 0.000000 0.000000

50% 8300.500000 2007.000000 0.080000 0.020000 0.000000 0.010000

75% 12449.750000 2010.000000 0.240000 0.110000 0.040000 0.040000

max 16600.000000 2020.000000 41.490000 29.020000 10.220000 10.570000

In [6]:

df.isnull().sum()

Out[6]:

Rank 0
Name 0
Platform 0
Year 271
Genre 0
Publisher 58
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
NA_Sales 0 Page 33
EU_Sales 0
JP_Sales 0
Other_Sales 0
Global_Sales 0
dtype: int64

In [7]:

df.drop(["Rank","Name","Year","Publisher"],axis=1,inplace=True)
df.head()

Out[7]:

Platform Genre NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales

0 Wii Sports 41.49 29.02 3.77 8.46 82.74

1 NES Platform 29.08 3.58 6.81 0.77 40.24

2 Wii Racing 15.85 12.88 3.79 3.31 35.82

3 Wii Sports 15.75 11.01 3.28 2.96 33.00

4 GB Role-Playing 11.27 8.89 10.22 1.00 31.37

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [8]:
Page 34
dums = pd.get_dummies(df[["Platform","Genre"]])
dums.head()

Out[8]:

Platform_2600 Platform_3DO Platform_3DS Platform_DC Platform_DS Platform_GB Platfor

0 0 0 0 0 0 0
1 0 0 0 0 0 0

2 0 0 0 0 0 0

3 0 0 0 0 0 0

4 0 0 0 0 0 1

5 rows × 43 columns

In [9]:

dums.drop(["Platform_2600","Genre_Misc"],axis=1,inplace=True)

In [10]:

final_df= pd.concat([df,dums],axis=1)
final_df.drop(["Platform","Genre"],axis=1,inplace=True)
final_df.head()

Out[10]:

NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales Platform_3DO Platform_3DS Pla

0 41.49 29.02 3.77 8.46 82.74 0 0

1 29.08 3.58 6.81 0.77 40.24 0 0

2 15.85 12.88 3.79 3.31 35.82 0 0

3 15.75 11.01 3.28 2.96 33.00 0 0

4 11.27 8.89 10.22 1.00 31.37 0 0

5 rows × 46 columns

In [11]:

import seaborn as sns

import matplotlib.pyplot as plt
g = sns.regplot(final_df.Global_Sales,final_df.EU_Sales,ci=None,scatter_kws= {"color":"r","
plt.xlim(-2,85)
plt.ylim(bottom=0)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 35

C:\Users\archa\anaconda3\lib\site-packages\seaborn\_decorators.py:36: Future
Warning: Pass the following variables as keyword args: x, y. From version 0.
12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretat
ion.

(0.0, 30.471405021832812)

In [12]:

final_df.EU_Sales[df.EU_Sales>15]
#this value is in index 0.

Out[12]:

0 29.02
Name: EU_Sales, dtype: float64
In [13]:

df_outlier = final_df.drop([0],axis=0)

In [14]:

import matplotlib.pyplot as plt

g = sns.regplot(df_outlier.Global_Sales,df_outlier.EU_Sales,ci=None,scatter_kws= {"color":"
plt.xlim(-2,45)
plt.ylim(bottom=0)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 36

(0.0, 13.524113383535223)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [15]:
Page 37

x = df_outlier[["EU_Sales"]]
y = df_outlier["Global_Sales"]

In [16]:
In [ ]:

reg = LinearRegression()
model = reg.fit(x,y)

In [17]:

model.score(x,y)

In [18]:

In [19]:

print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(11617, 1)
(11617,)
(4980, 1)
(4980,)

In [20]:

lm = LinearRegression()
model = lm.fit(x_train,y_train)

In [21]:

from sklearn.metrics import mean_squared_error

y_pred = model.predict(x_test)
np.sqrt(mean_squared_error(y_test,y_pred))

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 3B: POLYNOMIAL REGRESSION Page 38

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 39

CODE:

In [ ]:

# NAME : Archana Nair

# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement Polynomial Regresion

In [1]:

# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

In [3]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#importing datasets Page 40
data_set= pd.read_csv('C:/Users/archa/Downloads/Position_Salaries.csv')
data_set.head()

Out[3]:

Position Level Salary

0 Business Analyst 1 45000

1 Junior Consultant 2 50000

2 Senior Consultant 3 60000

3 Manager 4 80000

4 Country Manager 5 110000

In [4]:

#Extracting Independent and dependent Variable

x= data_set.iloc[:, 1:2].values
y= data_set.iloc[:, 2].values

In [5]:

#Fitting the Linear Regression to the dataset

from sklearn.linear_model import LinearRegression
lin_regs= LinearRegression()
lin_regs.fit(x,y)

Out[5]:

LinearRegression()

In [6]:

#Fitting the Polynomial regression to the dataset

from sklearn.preprocessing import PolynomialFeatures
poly_regs= PolynomialFeatures(degree= 2)
x_poly= poly_regs.fit_transform(x)
lin_reg_2 =LinearRegression()
lin_reg_2.fit(x_poly, y)

Out[6]:

LinearRegression()

In [7]:

#Visulaizing the result for Linear Regression model

mtp.scatter(x,y,color="blue")
mtp.plot(x,lin_regs.predict(x), color="red")
mtp.title("Bluff detection model(Linear Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 41

In [8]:

#Visulaizing the result for Polynomial Regression

mtp.scatter(x,y,color="blue")
mtp.plot(x, lin_reg_2.predict(poly_regs.fit_transform(x)), color="red")
mtp.title("Bluff detection model(Polynomial Regression)")
mtp.xlabel("Position Levels")
mtp.ylabel("Salary")
mtp.show()

In [10]:

lin_pred = lin_regs.predict([[6.5]])
print(lin_pred)

[330378.78787879]

In [11]:

poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
print(poly_pred)

[189498.10606061]

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 42
PRACTICAL 4:
FIND S-ALGORITHM FOR FINDING HYPOTHESIS BASED ON TRAING
SAMPLES.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 43

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
In [ ]: Page 44

# A) AIM : WAP to implement S Algorithm

In [8]:

import pandas as pd
import numpy as np

In [16]:

d = pd.read_csv("C:/Users/archa/Downloads/ws.csv")

d.head()

Out[16]:

Sunny Warm Normal Strong Warm.1 Same Yes

0 Sunny Warm High Strong Warm Same Yes

1 Rainy Cold High Strong Warm Change No

2 Sunny Warm High Strong Cool Change Yes

In [17]:

t = np.array(d)[:,-1]
print("The target is: ",t)

The target is: ['Yes' 'No' 'Yes']

In [18]:
def fun(c,t):
for i, val in enumerate(t):
if val == "Yes":
specific_hypothesis = c[i].copy()
break

for i, val in enumerate(c):

if t[i] == "Yes":
for x in range(len(specific_hypothesis)):
if val[x] != specific_hypothesis[x]:
specific_hypothesis[x] = '?'
else:
pass
return specific_hypothesis
print(" The final hypothesis is:",train(a,t))

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5A: DECISION TREE Page 45

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 46
In [33]:
# A) AIM : WAP to implement Decision Tree Algorithm

In [20]:

# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [21]
data = pd.read_csv("C:/Users/archa/Downloads/WineQuality.csv")
data.head()

Out[21]:

sugar chlorides free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol quality

1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

2.6 0.098 25.0 67.0 0.9968 3.20 0.68 9.8 5

2.3 0.092 15.0 54.0 0.9970 3.26 0.65 9.8 5

1.9 0.075 17.0 60.0 0.9980 3.16 0.58 9.8 6

1.9 0.076 11.0 34.0 0.9978 3.51 0.56 9.4 5

In [22]:

data.shape

Out[22]:

(1599, 13)
.
In [25]:
data.isna().sum()
Out[25]:
Unnamed: 0 0
fixed.acidity 0
volatile.acidity 0
citric.acid 0
residual.sugar 0
chlorides 0
free.sulfur.dioxide 0
total.sulfur.dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 47
In [26]:

# creating X and y

X = data.drop(columns = 'quality')
y = data['quality']

In [27]:

# splitting data into training and testing data

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.30, random_state =

In [28]:

# scaling our data

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)

In [29]:

from sklearn.tree import DecisionTreeClassifier

clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)

Out[29]:

DecisionTreeClassifier()

In [30]:

y_pred = clf.predict(X_test)

In [31]:

clf.score(X_train, y_train)

Out[31]:

1.0

In [32]:

clf.score(X_test, y_test)

Out[32]:

0.6229166666666667
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5B :
RANDOMFOREST Page 48

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 49

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 50
CODE
# A) AIM : WAP to implement Random Forest Algorithm

In [20]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [21]:

df = pd.read_csv("C:/Users/archa/Downloads/temps.csv")
df.head()

Out[21]:

year month day week temp_2 temp_1 average actual friend

0 2019 1 1 Fri 45 45 45.6 45 29

1 2019 1 2 Sat 44 45 45.7 44 61

2 2019 1 3 Sun 45 44 45.8 41 56

3 2019 1 4 Mon 44 41 45.9 40 53

4 2019 1 5 Tues 41 40 46.0 44 41

In [34]:

df.dtypes

Out[34]:

year int64
month int64
day int64
temp_2 int64
temp_1 int64
average float64
friend int64
week_Fri uint8
week_Mon uint8
week_Sat uint8
week_Sun uint8
week_Thurs uint8
week_Tues uint8
week_Wed uint8
dtype: object

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 51
In [22]:

# the shape of our features

df.shape

Out[22]:

(348, 9)
In [23]:

# column names
df.columns

Out[23]:

Index(['year', 'month', 'day', 'week', 'temp_2', 'temp_1', 'average', 'actua l',

'friend'], dtype='object')
In [24]:

# checking for null values

df.isnull().sum()

Out[24]:

year 0
month 0
day 0
week 0
temp_2 0
temp_1 0
average 0
actual 0
friend 0
dtype: int64

In [25]:

# One-hot encode categorical features

df = pd.get_dummies(df)
df.head(5)

Out[25]:

year month day temp_2 temp_1 average actual friend week_Fri week_Mon week_Sat

0 2019 1 1 45 45 45.6 45 29 1 0 0
1 2019 1 2 44 45 45.7 44 61 0 0 1

2 2019 1 3 45 44 45.8 41 56 0 0 0

3 2019 1 4 44 41 45.9 40 53 0 1 0

4 2019 1 5 41 40 46.0 44 41 0 0 0

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 52
In [26]:

print('Shape of features after one-hot encoding:', df.shape)

Shape of features after one-hot encoding: (348, 15)

In [27]:

# Labels are the values we want to predict

labels = df['actual']

# Remove the labels from the features

df = df.drop('actual', axis = 1)

# Saving feature names for later use

feature_list = list(df.columns)

In [35]:

# Using Skicit-learn to split data into training and testing sets

from sklearn.model_selection import train_test_split

# Split the data into training and testing sets

train_features, test_features, train_labels, test_labels = train_test_split(df, labels, tes

In [29]:

print('Training Features Shape:', train_features.shape)

print('Training Labels Shape:', train_labels.shape)
print('Testing Features Shape:', test_features.shape)
print('Testing Labels Shape:', test_labels.shape)

Training Features Shape: (278, 14)

Training Labels Shape: (278,)
Testing Features Shape: (70, 14)
Testing Labels Shape: (70,)

In [30]:

# Training the Forest

# Import the model we are using
from sklearn.ensemble import RandomForestRegressor

# Instantiate model
rf = RandomForestRegressor(n_estimators= 1000, random_state=42)

# Train the model on training data

rf.fit(train_features, train_labels);

In [31]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#Make prediction on test data
# Use the forest's predict method on the test data Page 53
predictions = rf.predict(test_features)

# Calculate the absolute errors

errors = abs(predictions - test_labels)

# Print out the mean absolute error (mae)

print('Mean Absolute Error:', round(np.mean(errors), 2), 'degrees.')

Mean Absolute Error: 3.78 degrees.

In [32]:

# Calculate mean absolute percentage error (MAPE)

mape = 100 * (errors / test_labels)

# Calculate and display accuracy

accuracy = 100 - np.mean(mape)
print('Accuracy:', round(accuracy, 2), '%.')

Accuracy: 94.02 %.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 6:SUPPORT VECTOR MACHINE (
Page 54

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [ ]: Page 55
#WAP to implement Support Vector Machine (LSVM/Kernel SVM/Soft Margin SVM)

#import libraries

In [1]:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

In [2]:

#load data

In [4]:

df = pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/Social_Network_

In [5]:

df.head()

Out[5]:

User ID Gender Age EstimatedSalar Purchased

y
0 15624510 Male 19 1900 0
0
1 15810944 Male 35 2000 0
0
2 15668575 Female 26 4300 0
0
3 15603246 Female 27 5700 0
0
4 15804002 Male 19 7600 0
0

In [6]:

df.shape

Out[6]:

(400, 5)
In [7]:

df.info

Out[7]:

<bound method DataFrame.info of User ID Gender Age EstimatedSalary

Purchased
0 15624510 Male 19 19000 0
1 15810944 Male 35 20000 0
2 15668575 Female 26 43000 0
3 15603246 Female 27 57000 0
4 15804002 Male 19 76000 0
.. ... ... ... ... ..
395 15691863 Female 46 41000 .1
396 15706071 Male 51 23000 1
397 15654296 Female 50 20000 1
398 15755018 Male 36 33000 0
399 15594041 Female 49 36000 1

[400 rows x 5 columns]>

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 56
In [11]:

df.columns

Out[11]:

Index(['User ID', 'Gender', 'Age', 'EstimatedSalary', 'Purchased'], dtype='o bject')

In [12]:

x = df.iloc[:,[2,3]]
y = df.iloc[:,4]

In [13]:

x.head()

Out[13]:

Age EstimatedSalary

0 19 19000
1 35 20000

2 26 43000

3 27 57000

4 19 76000

In [14]:

y.head()

Out[14]:

0 0
1 0
2 0
3 0
4 0
Name: Purchased, dtype: int64

In [77]:

#splitting the dataset into Training & Testing set

from sklearn.model_selection import train_test_split

x_train, x_test, y_train,y_test =train_test_split(x,y,test_size=0.75,random_state=0)

In [78]:

print("Training data:",x_train.shape)
print("Testing data",x_test.shape)

Training data: (100, 2)

Testing data (300, 2)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [79]: Page 57

#feature scaling

from sklearn.preprocessing import StandardScaler

sc_x = StandardScaler()
x_train = sc_x.fit_transform(x_train)
x_test = sc_x.transform(x_test)

In [80]:

from sklearn.svm import SVC

classifier = SVC(kernel='linear',random_state=0)
classifier.fit(x_train,y_train)

Out[80]:

SVC(kernel='linear', random_state=0)
In [81]:

#predicting test set results

y_pred = classifier.predict(x_test)
y_pred

Out[81]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0], dtype=int64)
In [82]:

from sklearn.metrics import accuracy_score

accuracy_score(y_test, y_pred)

Out[82]:

0.7966666666666666

In [83]:

#Plotting Data points

import matplotlib.pyplot as plt

plt.scatter(x_test[:, 0], x_test[:, 1],

c=y_test) plt.xlabel('Age')
plt.ylabel('Estimated
Salary')

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 58

In [88]:

import matplotlib.pyplot as plt

from sklearn.svm import SVC

classifier = SVC(kernel='linear',random_state=0)
classifier.fit(x_train,y_train)

y_pred = classifier.predict(x_test)

#Plotting Data points

plt.scatter(x_test[:, 0], x_test[:, 1], c=y_test)

plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.title('Test data')

#Creating hyperplane
w = classifier.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-2, 2)
yy = a * xx -(classifier.intercept_[0]) / w[1]
#Plot Hyperplane
plt.plot(xx, yy)
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [55]: Page 59

from sklearn.metrics import classification_report

classification_report(y_test,y_pred)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result)) Out[55]:

' precision recall f1-score support\n\n 0

0.00 0.00 0.00 68\n 1 0.32 1.00
0.48 32\n\n accuracy 0.32 100\n
macro avg 0.16 0.50 0.24 100\nweighted avg 0.10
0.32 0.16 100\n'

In [57]:

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.00 0.00 0.00 68

1 0.32 1.00 0.48 32

accuracy 0.32 100

macro avg 0.16 0.50 0.24 100
weighted avg 0.10 0.32 0.16 100

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 60
PRACTICAL 7A:K-NEAREST NEIGHBOUR .

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
Page 61

In [ ]:

# NAME : Archana Nair

# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement KNN Algorithm
# Evaluate the model based on classification metrics and infer your result.

In [31]:

#Import python libraries

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('ggplot')

In [32]:

#Load the dataset

df = pd.read_csv('C:/Users/archa/Downloads/diabetes.csv')

df.head()

Out[32]:

Pregnancies Glucose BloodPressure SkinThickness Insulin BMI DiabetesPedigreeFunctio

0 6 148 72 35 0 33.6 0.62

1 1 85 66 29 0 26.6 0.35

2 8 183 64 0 0 23.3 0.67

3 1 89 66 23 94 28.1 0.16

4 0 137 40 35 168 43.1 2.28

In [33]:

df.shape

Out[33]:

(768, 9)
In [57]:

df.dtypes

Out[57]:

Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Outcome int64 Page 62
dtype: object

In [34]:

X = df.drop('Outcome',axis=1).values
y = df['Outcome'].values

In [35]:

#importing train_test_split
from sklearn.model_selection import train_test_split

In [36]:

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.4,random_state=42, stratif

In [37]:

#import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier

#Setup arrays to store training and test accuracies

neighbors = np.arange(1,9)
train_accuracy =np.empty(len(neighbors))
test_accuracy = np.empty(len(neighbors))

for i,k in enumerate(neighbors):

#Setup a knn classifier with k neighbors

knn = KNeighborsClassifier(n_neighbors=k)

#Fit the model

knn.fit(X_train, y_train)

#Compute accuracy on the training set

train_accuracy[i] = knn.score(X_train, y_train)

#Compute accuracy on the test set

test_accuracy[i] = knn.score(X_test, y_test)

In [38]:

#Plotting
plt.title('k-NN Varying number of neighbors')
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training accuracy')
plt.legend()
plt.xlabel('Number of neighbors')
plt.ylabel('Accuracy')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 63

In [39]:

#Setup a knn classifier with k neighbors

knn = KNeighborsClassifier(n_neighbors=7)

In [40]:

#Fit the model

knn.fit(X_train,y_train)

Out[40]:

KNeighborsClassifier(n_neighbors=7)

In [41]:

knn.score(X_test,y_test)

Out[41]:

0.7305194805194806

In [42]:

#import confusion_matrix
from sklearn.metrics import confusion_matrix

In [43]:

y_pred = knn.predict(X_test)

In [44]:

confusion_matrix(y_test,y_pred)

Out[44]:

array([[165, 36],
[ 47, 60]], dtype=int64)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#import classification_report
from sklearn.metrics import classification_report Page 64

In [46]:

print(classification_report(y_test,y_pred))

precision recall f1-score support

0 0.78 0.82 0.80 201

1 0.62 0.56 0.59 107

accurac 0.73 308

macro 0.70 0.69 0.70 308

weighted avg 0.73 0.73 0.73 308

In [47]:

y_pred_proba = knn.predict_proba(X_test)[:,1]

In [48]:

from sklearn.metrics import roc_curve

In [49]:

fpr, tpr, thresholds = roc_curve(y_test, y_pred_proba)

In [50]:

plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='Knn')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('Knn(n_neighbors=7) ROC curve')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 65
In [51]:

#Area under ROC curve

from sklearn.metrics import roc_auc_score
roc_auc_score(y_test,y_pred_proba)

Out[51]:

0.7345050448691124

In [52]:

#import GridSearchCV
from sklearn.model_selection import GridSearchCV

In [53]:

#In case of classifier like knn the parameter to be tuned is n_neighbors

param_grid = {'n_neighbors':np.arange(1,50)}

In [54]:

knn = KNeighborsClassifier()
knn_cv= GridSearchCV(knn,param_grid,cv=5)
knn_cv.fit(X,y)

Out[54]:

GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': array([ 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])})

In [55]:

knn_cv.best_score_

Out[55]:

0.7578558696205755

In [56]:

knn_cv.best_params_

Out[56]:

{'n_neighbors': 14}

Thus a knn classifier with number of neighbors as 14 achieves the best score/accuracy of 0.7578 i.e about 76%

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 66
PRACTICAL 7B:K-MEANS ALGORITHM.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 67

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 68

In [ ]:
# B) AIM : WAP to implement KMeans Algorithm
# Evaluate the model based on classification metrics and infer your result.

In [5]:

#import Libraries99
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt

In [6]:

data = pd.read_csv("C:/Users/archa/Downloads/Mall_Customers.csv")
data.head()

Out[6]:

CustomerID Genre Age Annual Income (k$) Spending Score (1-100)

0 1 Male 19 15 39
1 2 Male 21 15 81

2 3 Female 20 16 6

3 4 Female 23 16 77

4 5 Female 31 17 40

In [7]:

data = data.drop(columns = ['CustomerID'])

data.head()

Out[7]:

Genre Age Annual Income (k$) Spending Score (1-100)

0 Male 19 15 39
1 Male 21 15 81

2 Female 20 16 6

3 Female 23 16 77

4 Female 31 17 40

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 69
In [11]:

data.shape

Out[11]:

(200, 4)

In [8]:

data.dtypes

Out[8]:

Genre object
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object

In [9]:

#analyse missing values

data.isna().sum()

Out[9]:

Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64
In [5]:

sns.pairplot(data)

Out[5]:

<seaborn.axisgrid.PairGrid at 0x20172d70730>

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 70
In [6]:

data.columns

Out[6]:

Index(['Genre', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)'], dtyp

e='object')
In [7]:

col = ['Genre', 'Age', 'Annual Income (k$)', 'Spending Score (1-100)']

for i in col:
plt.figure(figsize =(5,3), dpi = 100)
plt.hist(x = i, data = data)
plt.xlabel(col)
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 71
In [10]:

data['Genre'] = data['Genre'].map({'Male':1,'Female':2 }) # we can use get_dummies instead

data.head()

Out[10]:

Genre Age Annual Income (k$) Spending Score (1-100)

0 1 19 15 39
1 1 21 15 81

2 2 20 16 6

3 2 23 16 77

4 2 31 17 40

In [11]:

data.dtypes

Out[11]:

Genre int64
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object

In [26]:

#scaling transformation
#1.zscore normalization using standardScalar(Same mean)
#2.Minmax normalization using MinMax Scaler(0 to 1)

In [12]:

df_customer = data.iloc[:,2:4]
df_customer.head()

Out[12]:

Annual Income (k$) Spending Score (1-100)

0 15 39
1 15 81

2 16 6

3 16 77

4 17 40

In [13]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 72
from sklearn.preprocessing import StandardScaler
data_scaled = StandardScaler().fit_transform(df_customer)
data_scaled

Out[13]:

array([[-1.73899919, -0.43480148],
[-1.73899919, 1.19570407],
[-1.70082976, -1.71591298],
[-1.70082976, 1.04041783],
[-1.66266033, -0.39597992],
[-1.66266033, 1.00159627],
[-1.62449091, -1.71591298],
[-1.62449091, 1.70038436],
[-1.58632148, -1.83237767],
[-1.58632148, 0.84631002],
[-1.58632148, -1.4053405 ],
[-1.58632148, 1.89449216],
[-1.54815205, -1.36651894],
[-1.54815205, 1.04041783],
[-1.54815205, -1.44416206],
[-1.54815205, 1.11806095],
[-1.50998262, -0.59008772],
[-1.50998262, 0.61338066],

In [14]:

# Finding the optimal number of K

wcss= []
for i in range(2,11):
kmodel = KMeans(n_clusters = i, init = 'random')
kmodel.fit(data_scaled)
wcss.append(kmodel.inertia_)

In [15]:

wcss

Out[15]:

[269.01679374906655,
157.70400815035939,
108.92131661364358,
65.56840815571681,
55.103778121150555,
44.86475569922555,
37.24321153347672,
33.85792110528426,
30.684270071530346]

In [16]:

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 73

#Plotting
plt.figure(figsize = (8,6), dpi=100)
plt.plot(range(2,11),wcss, marker = 'o', c='blue', markerfacecolor='red')
plt.xlabel('No of Clusters')
plt.ylabel('WCSS')
plt.show()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 74
In [17]:

# Creating the final Kmeans model with no of clusters = 5

Kmodel_final = KMeans(n_clusters = 5, init = 'k-means++').fit(data_scaled)

In [18]:

cl = Kmodel_final.predict(data_scaled)

In [19]:

Out[19]:

array([0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3,
0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 1,
0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 2, 1, 2, 4, 2, 4, 2,
1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2])

In [20]:

# Adding the clusters to a new column in the dataset

df_customer['cl']=cl
df_customer.head()

Out[20]:

Annual Income (k$) Spending Score (1-100) cl

0 15 39 0

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
15 81 3
Page 75
2 16 6 0

3 16 77 3

4 17 40 0

In [27]:

# Visualization of clusters
plt.figure(figsize = (6,4), dpi = 100)
plt.scatter(x=df_customer['Annual Income (k$)'],y=df_customer['Spending Score (1-100)'],c=c
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score')
plt.show()

cl 1=high income low spender c2 = high income high spender c3 = low income high spender c4 = low income
low spender c5 = moderate income moderate spender

Conclusion
Mall customer data is clustered into 5 clusters. The green cluster indicates the people who have high spending
score but a low annual income. The purple cluster shows the people who have a low annual income & low
spending score. The blue cluster shows people who have an average annual income & average spending
score. The sea green cluster indicate the people who have high annual income & high spending score. The
yellow cluster shows the people who have low spending score & high annual income.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 76

PRACTICAL 8A:
NAIVE BAYES MODEL AND GAUSSIAN NAIVE BAYES MODEL

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 77

In [ ]:

# NAME : Archana Nair

# SUBJECT : Artificial Intelligence & Machine Learning
# COURSE : M.Sc. Computer Science with Specialization in Data Science
# A) AIM : WAP to implement Naive Bayes Model and Gaussian Naive Bayes Model

In [29]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

In [30]:

#Reading Dataset
df = pd.read_csv("C:/Users/archa/Downloads/Social_Network_Ads.csv")
df.head()

Out[30]:

User ID Gender Age EstimatedSalary Purchased

0 15624510 Male 19 19000 0

1 15810944 Male 35 20000 0

2 15668575 Female 26 43000 0

3 15603246 Female 27 57000 0

4 15804002 Male 19 76000 0

In [42]:

df.shape

Out[42]:

(400, 5)

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 78

In [43]:

df.describe()

Out[43]:

User ID Age EstimatedSalary Purchased

count 4.000000e+02 400.000000 400.000000 400.000000

mean 1.569154e+07 37.655000 69742.500000 0.357500

std 7.165832e+04 10.482877 34096.960282 0.479864

min 1.556669e+07 18.000000 15000.000000 0.000000

25% 1.562676e+07 29.750000 43000.000000 0.000000

50% 1.569434e+07 37.000000 70000.000000 0.000000

75% 1.575036e+07 46.000000 88000.000000 1.000000

max 1.581524e+07 60.000000 150000.000000 1.000000

In [44]:

df.dtypes

Out[44]:

User ID int64
Gender object
Age int64
In [46]:

df.isna().sum()

Out[46]:

User ID 0
Gender 0
Age 0
EstimatedSalary 0
Purchased 0
dtype: int64

In [31]:

X = df.iloc[:, [1, 2, 3]].values

y = df.iloc[:, -1].values

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [45]: Page 79

sns.pairplot(df)

Out[45]:

<seaborn.axisgrid.PairGrid at 0x1f0c97d8dc0>

In [32]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
X[:,0] = le.fit_transform(X[:,0])
In [34]:

#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

In [35]:

#Training the Naive Bayes model on the training set

from sklearn.naive_bayes import GaussianNB
classifier = GaussianNB()
classifier.fit(X_train, y_train)

Out[35]:

GaussianNB()

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [36]: Page 80

y_pred = classifier.predict(X_test)

In [37]:

y_pred

Out[37]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)

In [38]:

y_test

Out[38]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1], dtype=int64)

In [39]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.93 0.97 0.95 58

1 0.90 0.82 0.86 22

accuracy 0.93 80
macro avg 0.92 0.89 0.90 80
weighted avg 0.92 0.93 0.92 80

In [40]:

from sklearn.metrics import confusion_matrix

pd.DataFrame(confusion_matrix(y_test,y_pred),columns=['Predicted No','Predicted Yes'],index

Out[40]:

Predicted No Predicted Yes

Actual No 56 2
Actual Yes 4 18

In [41]:

accuracy = metrics.accuracy_score(y_test, y_pred)

accuracy

Out[41]:

0.925
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 8B:LOGISTIC REGRESSION
Page 81

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 82
CODE:
In [1]:
# A) AIM : WAP to implement Logistic Regression

In [25]:

# importing libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
In [26]:

df = pd.read_csv("C:/Users/archa/Downloads/heart.csv")
df.head()

Out[26]:

age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target

0 52 1 0 125 212 0 1 168 0 1.0 2 2 3 0

1 53 1 0 140 203 1 0 155 1 3.1 0 0 3 0

2 70 1 0 145 174 0 1 125 1 2.6 0 0 3 0

3 61 1 0 148 203 0 1 161 0 0.0 2 1 3 0

4 62 0 0 138 294 1 1 106 0 1.9 1 3 2 0

In [27]:

df.shape

Out[27]:

(1025, 14)
In [28]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 1025 non-null int64
1 sex 1025 non-null int64
2 cp 1025 non-null int64
3 trestbps 1025 non-null int64
4 chol 1025 non-null int64
5 fbs 1025 non-null int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 83
6 restecg 1025 non-null int64
7 thalach 1025 non-null int64
8 exang 1025 non-null int64
9 oldpeak 1025 non-null float64
10 slope 1025 non-null int64
11 ca 1025 non-null int64
12 thal 1025 non-null int64
13 target 1025 non-null int64
dtypes: float64(1), int64(13)
memory usage: 112.2 KB

In [29]:

df.describe()

Out[29]:

age sex cp trestb ch fbs rest

p o
count 1025.00000 1025.00000 1025.00000 1025.00000 1025.0000 1025.00000 1025.000
mean 54.43414 0.69561 0.94243 131.6117 246.0000 0.14926 0.529
std 9.07229 0.46037 1.02964 017.5167 51.5925 0.35652 0.527
min 29.00000 0.00000 0.00000 1
94.0000 126.0000 0.00000 0.000
25% 48.00000 0.00000 0
0.00000 120.00000 211.0000 0.00000 0.000
50% 56.00000 1.00000 1.00000 130.00000 240.0000 0.00000 1.000
75% 61.00000 1.00000 2.00000 140.00000 275.0000 0.00000 1.000
max 77.00000 1.00000 3.00000 200.00000 564.0000 1.00000 2.000
0 0 0 0 0 0 0

In [30]:

df.target.value_counts()

Out[30]:

1 526
0 499
Name: target, dtype: int
In [31]:

df.isna().sum()

Out[31]:

age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 84

In [48]:

sns.pairplot(df)

Out[48]:

<seaborn.axisgrid.PairGrid at 0x1a723a1ec40>

In [32]:

sns.countplot(x="target", data=df, palette="bwr")

plt.show()

In [33]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 85
sns.countplot(x='sex', data=df, palette="mako_r")
plt.xlabel("Sex (0 = female, 1= male)")
plt.show()

In [34]:

df = df.drop(columns = ['cp', 'thal', 'slope'])

df.head()

Out[34]:

age sex trestbps chol fbs restecg thalach exang oldpeak ca target

0 52 1 125 212 0 1 168 0 1.0 2 0

1 53 1 140 203 1 0 155 1 3.1 0 0

2 70 1 145 174 0 1 125 1 2.6 0 0

3 61 1 148 203 0 1 161 0 0.0 1 0

4 62 0 138 294 1 1 106 0 1.9 3 0

In [35]:

y = df.target.values
x = df.drop(['target'], axis = 1)
In [36]:

x_train, x_test, y_train, y_test = train_test_split(x,y,test_size = 0.2,random_state=0)

In [37]:

log_reg = LogisticRegression()

In [38]:

log_reg.fit(x_train, y_train)

C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 86
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =

_check_optimize_result( Out[38]:

LogisticRegression()

In [39]:

y_pred = log_reg.predict(x_test)

In [40]:

from sklearn.metrics import classification_report

print(classification_report(y_test, y_pred))

precision recall f1-score support

0 0.86 0.70 0.78 98

1 0.77 0.90 0.83 107

accuracy 0.80 205

macro avg 0.82 0.80 0.80 205
weighted avg 0.81 0.80 0.80 205

In [46]:

from sklearn.metrics import confusion_matrix

pd.DataFrame(confusion_matrix(y_test,y_pred),columns=['Predicted No','Predicted Yes'],index

Out[46]:

Predicted No Predicted Yes

Actual No 69 29
Actual Yes 11 96

In [47]:

accuracy = metrics.accuracy_score(y_test, y_pred)

accuracy

Out[47]: 0.80487804878

MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22

323-1851-520 (6500 R9.3 PMS) Issue2
No ratings yet
323-1851-520 (6500 R9.3 PMS) Issue2
494 pages
Deep Learning-Question Bank-Module-Wise
67% (3)
Deep Learning-Question Bank-Module-Wise
5 pages
AI4youngster - 6 - Topic NLP
No ratings yet
AI4youngster - 6 - Topic NLP
66 pages
Chapter 5 - Classification Problems
100% (1)
Chapter 5 - Classification Problems
25 pages
Module 1 PPT
No ratings yet
Module 1 PPT
122 pages
Enine Timing Clibration Engine Speed Timing Sensor Calibrate RENR9319
100% (2)
Enine Timing Clibration Engine Speed Timing Sensor Calibrate RENR9319
8 pages
Observing Cassette Culture
No ratings yet
Observing Cassette Culture
172 pages
New Text Document
No ratings yet
New Text Document
52 pages
SA Vol 30 MU-MIMO
100% (1)
SA Vol 30 MU-MIMO
73 pages
Instructional Games Complete Information PDF
No ratings yet
Instructional Games Complete Information PDF
63 pages
ML Lab Records
No ratings yet
ML Lab Records
101 pages
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
No ratings yet
Gmail - Booking Confirmation On IRCTC, Train - 03401, 25-Sep-2021, CC, BGP - PNBE
1 page
Central Applications Office (CAO) Paper 2
No ratings yet
Central Applications Office (CAO) Paper 2
3 pages
Heap and Priority Queue: Li Yin February 6, 2019
No ratings yet
Heap and Priority Queue: Li Yin February 6, 2019
12 pages
Auditing Lustre File System: Texas Tech University Texas, USA Saarefin@ttu - Edu
No ratings yet
Auditing Lustre File System: Texas Tech University Texas, USA Saarefin@ttu - Edu
5 pages
9905D Series Brass Calibration System Datasheet DS 0105
No ratings yet
9905D Series Brass Calibration System Datasheet DS 0105
2 pages
SEMV DJ19finalupdated
No ratings yet
SEMV DJ19finalupdated
43 pages
Working With Categorical Data Chapter4
No ratings yet
Working With Categorical Data Chapter4
33 pages
Varianta 4
No ratings yet
Varianta 4
19 pages
Lenovo ThinkStation E32 Datasheet
No ratings yet
Lenovo ThinkStation E32 Datasheet
4 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
6 pages
DEH-P7400HD OwnersManual112811
No ratings yet
DEH-P7400HD OwnersManual112811
112 pages
H11AA1, H11AA3, H11AA2, H11AA4 AC Input/Phototransistor Optocouplers
No ratings yet
H11AA1, H11AA3, H11AA2, H11AA4 AC Input/Phototransistor Optocouplers
8 pages
A Review of Methods, Techniques and Tools For Project Planning and Control
No ratings yet
A Review of Methods, Techniques and Tools For Project Planning and Control
20 pages
Machine Learning
100% (1)
Machine Learning
46 pages
Updated ML LAB Manual-2020-21
No ratings yet
Updated ML LAB Manual-2020-21
57 pages
Master of Computer Application: 2nd Year, Semester-3
No ratings yet
Master of Computer Application: 2nd Year, Semester-3
3 pages
Java Midlet Spec
No ratings yet
Java Midlet Spec
7 pages
Internal Guide:-Mrs. A.A. Askhedkar: by Dikita Chauhan Amita Joshi & Anuja Karadkhedkar
No ratings yet
Internal Guide:-Mrs. A.A. Askhedkar: by Dikita Chauhan Amita Joshi & Anuja Karadkhedkar
33 pages
Class Material - 1
No ratings yet
Class Material - 1
66 pages
Mayuresh Ai
No ratings yet
Mayuresh Ai
12 pages
Ryan Moore Resume
No ratings yet
Ryan Moore Resume
2 pages
Machine Learning Basics
No ratings yet
Machine Learning Basics
57 pages
My ML Lab Manual
No ratings yet
My ML Lab Manual
21 pages
Router Configuration With SYNTAX (Updated)
No ratings yet
Router Configuration With SYNTAX (Updated)
9 pages
Sem 620
No ratings yet
Sem 620
22 pages
Makaut 6th Semester Syllabus
100% (1)
Makaut 6th Semester Syllabus
22 pages
DS FML QB Bat20 PDF
No ratings yet
DS FML QB Bat20 PDF
51 pages
AK ML Lab Manual
No ratings yet
AK ML Lab Manual
103 pages
ML Priyesha - 778
No ratings yet
ML Priyesha - 778
23 pages
Machine Learning With Real Life Project: by - Rishabh Gaur
100% (2)
Machine Learning With Real Life Project: by - Rishabh Gaur
26 pages
National Institute of Technology Patna: Department of Computer Science & Engineering
No ratings yet
National Institute of Technology Patna: Department of Computer Science & Engineering
2 pages
Machine Intelligence
No ratings yet
Machine Intelligence
3 pages
M.tech ML&C - Curriculum - Revision From 2018
No ratings yet
M.tech ML&C - Curriculum - Revision From 2018
13 pages
TBC 603 Fundamentals of Machine Learning
No ratings yet
TBC 603 Fundamentals of Machine Learning
2 pages
Data Analytics and Visualization Lab
No ratings yet
Data Analytics and Visualization Lab
81 pages
Payload
No ratings yet
Payload
1,152 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
ML
No ratings yet
ML
2 pages
Modbus Display User Guide
No ratings yet
Modbus Display User Guide
6 pages
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
No ratings yet
Course Syllabus - Spring Semester 2022/2023 DS 14350 Computer Architecture For Machine Learning
3 pages
CCS 4102 Machine Learning Examinations-August 2020
No ratings yet
CCS 4102 Machine Learning Examinations-August 2020
2 pages
Data Science Bootcamp (Day-01) (1) - Compressed
No ratings yet
Data Science Bootcamp (Day-01) (1) - Compressed
161 pages
21CST603-Model QP 2
No ratings yet
21CST603-Model QP 2
3 pages
III BCA ML - Syll - Model - All Units
No ratings yet
III BCA ML - Syll - Model - All Units
85 pages
ML Aml Cse It Lab Manual Final
No ratings yet
ML Aml Cse It Lab Manual Final
22 pages
Dvda Lab Manuals 087
No ratings yet
Dvda Lab Manuals 087
37 pages
Introduction Machine Learning
No ratings yet
Introduction Machine Learning
53 pages
Anant MLDS File
No ratings yet
Anant MLDS File
38 pages
CE880 Lecture5 Slides
No ratings yet
CE880 Lecture5 Slides
32 pages
Safin Ai Prac 1-14
No ratings yet
Safin Ai Prac 1-14
20 pages
Shwet Mlds
No ratings yet
Shwet Mlds
35 pages
Third Year - SEM V - DJ19
No ratings yet
Third Year - SEM V - DJ19
43 pages
Notes Unit 1-3 Part-I
No ratings yet
Notes Unit 1-3 Part-I
20 pages
Vamshi ml-1,2
No ratings yet
Vamshi ml-1,2
25 pages
LLM Using Prompting Method
No ratings yet
LLM Using Prompting Method
21 pages
24CSPC212-PIC Lab Manual
No ratings yet
24CSPC212-PIC Lab Manual
45 pages
PDF Sni 7268 20091 Air Pengisi Ketel Uap - Compress
No ratings yet
PDF Sni 7268 20091 Air Pengisi Ketel Uap - Compress
14 pages
Machine Learning - Syllabus - TT1
No ratings yet
Machine Learning - Syllabus - TT1
4 pages
Concepts - of - Machine - Learning (Minor)
No ratings yet
Concepts - of - Machine - Learning (Minor)
14 pages
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
No ratings yet
CONCEPTS IN MACHINE LEARNING-Ktunotes - in
14 pages
Introduction To Machine Learning Course Code: 4350702
No ratings yet
Introduction To Machine Learning Course Code: 4350702
12 pages
Machine Learning
No ratings yet
Machine Learning
14 pages
Comp5541 20231
No ratings yet
Comp5541 20231
3 pages
HW 02
No ratings yet
HW 02
3 pages
Basic Computer Class 1
No ratings yet
Basic Computer Class 1
1 page
Machine Learning Lab
No ratings yet
Machine Learning Lab
46 pages
Artificial Intelligence and Machine Learning
No ratings yet
Artificial Intelligence and Machine Learning
4 pages
Machine Learning (BCSL606) Lab Manual
No ratings yet
Machine Learning (BCSL606) Lab Manual
30 pages
SYBSc Data Science Sem IV NEP Syllabus 2024-2025
No ratings yet
SYBSc Data Science Sem IV NEP Syllabus 2024-2025
65 pages
r22 ML Lab Manual Final
No ratings yet
r22 ML Lab Manual Final
51 pages
TBC 603 Fundamentals of Machine Learning
No ratings yet
TBC 603 Fundamentals of Machine Learning
2 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
33 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
T1 Scheme 24 25
No ratings yet
T1 Scheme 24 25
5 pages
ML Previous Question Papers
No ratings yet
ML Previous Question Papers
6 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
AMT305SYLLABUS
No ratings yet
AMT305SYLLABUS
16 pages
IGNOU MCA Cloud Computing and IoT Previous year Unsolved Papers MCS 227
From Everand
IGNOU MCA Cloud Computing and IoT Previous year Unsolved Papers MCS 227
Manish Soni
No ratings yet
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
From Everand
Introduction to Quantum Computing & Machine Learning Technologies: 1, #1
M. Sreedevi
No ratings yet