Aiml Raw
Aiml Raw
UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE
Seat No.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 2
UNIVERSITY OF MUMBAI
DEPARTMENT OF COMPUTER SCIENCE
CERTIFICATE
This is to certify that the work entered in this journal was done in the University
Department of Computer Science laboratory by
Mr./Ms. ARCHANA SUKUMARAN NAIR
Seat No. for the course of M.Sc.
Computer Science with Spl. in Data Science - Semester II (CBCS) (Revised)
during the academic year 2021-2022 in a satisfactory manner.
External Examiner
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 3
Index
1 A) MINI-MAX
B) ALPHA BETA PRUNING
2 A) BINARY CLASSIFICATION
B) MULTI-CLASS CLASSIFICATION
3 A) LINEAR REGRESSION
B) POLYNOMIAL REGRESSION
4 C)
S-ALGORITHM
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 4
Practical 1a:
Aim: Write a program to implement Mini-Max Algorithm.
Theory:
Mini-max algorithm is a kind of backtracking algorithm that is used in decision
making and game theory, assuming that both the players will play optimally. The
two players maximizer tries to do its opposite.
Code:
Input
Q. Write a program to implement Min-max Algorithm
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 5
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 6
Practical 1B:
1. B )Write a program to implement Alpha-Beta Pruninging Mini-Max Algorithm
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 7
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 8
Practical 2A:
Q.Write a program to input binary classification.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 9
CODE:
In [1]:
# A) AIM : WAP to input dataset and perform Binary classification.
# Evaluate the model based on classification metrics and infer your result.
In [2]:
#import libraries
In [3]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.linear_model import LogisticRegression
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [4]: Page 10
#load data
In [5]:
In [6]:
len(titanic_data)
Out[6]:
891
titanic_data.head()
Out[77]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...
Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 11
In [78]:
titanic_data.index
Out[78]:
In [79]:
titanic_data.columns
Out[79]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [80]: Page 12
titanic_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 714 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Cabin 204 non-null object
11 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB
In [81]:
titanic_data.dtypes
Out[81]:
PassengerId int64
Survived int64
Pclass int64
Name object
Sex object
Age float64
SibSp int64
Parch int64
Ticket object
Fare float64
Cabin object
Embarked object
dtype: object
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [82]: Page 13
titanic_data.describe()
Out[82]:
DATA ANALYSIS
In [83]:
#Data Analysis
#Import Seaborn for visually analysing the data
#Find out how many survived vs Died using countplot method of seaboarn
In [84]:
sns.countplot(x='Survived',data=titanic_data)
Out[84]:
<AxesSubplot:xlabel='Survived', ylabel='count'>
In [85]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 14
In [86]:
sns.countplot(x='Survived',data=titanic_data,hue='Sex')
Out[86]:
<AxesSubplot:xlabel='Survived', ylabel='count'>
In [87]:
In [88]:
titanic_data.isna()
Out[88]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin
0 False False False False False False False False False False True
1 False False False False False False False False False False False
2 False False False False False False False False False False True
3 False False False False False False False False False False False
4 False False False False False False False False False False True
... ... ... ... ... ... ... ... ... ... ... ...
886 False False False False False False False False False False True
887 False False False False False False False False False False False
888 False False False False False True False False False False True
889 False False False False False False False False False False False
890 False False False False False False False False False False True
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [89]: Page 15
titanic_data.isna().sum()
Out[90]:
PassengerId
Survived 0
0
Pclass 0
Name 0
Sex 0
Age 177
SibSp 0
Parch 0
Ticket 0
Fare 0
Cabin 687
Embarked 2
dtype:
int64
In [91]:
sns.displot(x='Age',data=titanic_data)
Out[92]:
<seaborn.axisgrid.FacetGrid at 0x1758099fe20>
DATA CLEANING
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [93]: Page 16
titanic_data['Age'].fillna(titanic_data['Age'].mean(),inplace=True)
In [95]:
titanic_data['Age'].isna().sum()
Out[96]:
In [97]:
titanic_data.drop('Cabin',axis=1,inplace=True)
In [99]:
titanic_data.head()
Out[100]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...
Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
Laina 3101282
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 17
In [102]:
titanic_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 PassengerId 891 non-null int64
1 Survived 891 non-null int64
2 Pclass 891 non-null int64
3 Name 891 non-null object
4 Sex 891 non-null object
5 Age 891 non-null float64
6 SibSp 891 non-null int64
7 Parch 891 non-null int64
8 Ticket 891 non-null object
9 Fare 891 non-null float64
10 Embarked 889 non-null object
dtypes: float64(2), int64(5), object(4)
memory usage: 76.7+ KB
In [135]:
titanic_data.size
Out[135]:
7128
In [103]:
# We can see, Name, Sex, Ticket and Embarked are non-numerical.Name,Embarked and Ticket num
In [104]:
In [105]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
gender=pd.get_dummies(titanic_data['Sex'],drop_first=True)
Page 18
In [106]:
titanic_data['Gender']=gender
In [107]:
titanic_data.head()
Out[107]:
PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare
Braund,
0 1 0 3 Mr. Owen male 22.0 1 0 A/5 21171 7.2500
Harris
Cumings,
Mrs. John
Bradley
1 2 1 1 (Florence female 38.0 1 0 PC 17599 71.2833
Briggs
Th...
Heikkinen, STON/O2.
2 3 1 3 Miss. female 26.0 0 0 7.9250
3101282
Laina
Futrelle,
Mrs.
Jacques
3 4 1 1 female 35.0 1 0 113803 53.1000
Heath
(Lily May
Peel)
Allen, Mr.
4 5 0 3 William male 35.0 0 0 373450 8.0500
Henry
In [108]:
In [109]:
titanic_data.drop(['Name','Sex','Ticket','Embarked'],axis=1,inplace=True)
In [110]:
titanic_data.head()
Out[110]:
0 1 0 3 22.0 1 0 7.2500 1
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 19
1 2 1 1 38.0 1 0 71.2833 0
2 3 1 3 26.0 0 0 7.9250 0
3 4 1 1 35.0 1 0 53.1000 0
4 5 0 3 35.0 0 0 8.0500 1
10/13
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [111]: Page 20
In [112]:
x=titanic_data[['PassengerId','Pclass','Age','SibSp','Parch','Fare','Gender']]
y=titanic_data['Survived']
In [113]:
Out[113]:
0 0
1 1
2 1
3 1
4 0
..
886 0
887 1
888 0
889 1
890 0
Name: Survived, Length: 891, dtype: int64
DATA MODELLING
In [114]:
In [115]:
In [116]:
In [117]:
In [118]:
In [119]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [120]: Page 21
In [121]:
lr=LogisticRegression()
In [122]:
lr.fit(x_train,y_train)
C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =
In [123]:
#predict
In [124]:
predict=lr.predict(x_test)
In [125]:
In [126]:
In [127]:
Out[127]:
Actual No 152 23
Actual Yes 37 83
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
5/19/22, 11:56 PM ._archa_AIML_Pract2A_BinaryClassification - Jupyter Notebook
Page 22
In [128]:
In [129]:
In [130]:
print(classification_report(y_test,predict))
In [131]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 2B : MULTICLASS CLASSIFICATION
Page 23
CODE:
In [ ]:
In [ ]:
#import libraries
In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
In [ ]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [2]:
Page 24
dataset=pd.read_csv('C:/Users/archa/Data Science/Semester 2/AI & ML/Practicals/Social_Netwo
In [4]:
dataset.head()
Out[4]:
In [76]:
len(dataset)
Out[76]:
400
In [6]:
dataset.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 400 entries, 0 to 399
Data columns (total 5 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User ID 400 non-null int64
1 Gender 400 non-null object
2 Age 400 non-null int64
3 EstimatedSalary 400 non-null int64
4 Purchased 400 non-null int64
dtypes: int64(4), object(1)
memory usage: 15.8+ KB
In [71]:
dataset.shape
Out[71]:
(400, 5)
In [8]:
dataset.index
Out[8]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [10]: Page 25
dataset.columns
Out[10]:
In [74]:
dataset.dtypes
Out[74]:
User ID int64
Gender object
Age int64
EstimatedSalary int64
Purchased int64
dtype: object
In [12]:
X=dataset.iloc[:,1:4]
In [13]:
X=pd.get_dummies(X)
In [14]:
X=X.values
In [15]:
Out[15]:
array([[ 19, 19000, 0, 1],
[ 35, 20000, 0, 1],
[ 26, 43000, 1, 0],
...,
[ 50, 20000, 1, 0],
[ 36, 33000, 0, 1],
[ 49, 36000, 1, 0]], dtype=int64)
DATA ANALYSIS
In [17]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 26
In [19]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
DATA MODELLING Page 27
In [20]:
#Splitting the dataset into the Train set and Test set
In [21]:
In [22]:
#Splitting the dataset into the Train set and Test set
In [36]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [25]: Page 28
#Feature Scaling
In [41]:
In [60]:
In [61]:
KNN= KNeighborsClassifier(n_neighbors=5,
weights='uniform',
algorithm='kd_tree',
leaf_size=30,
p=2,
metric='minkowski',
n_jobs=-1)
In [62]:
KNN.fit(x_train,y_train)
Out[62]:
KNeighborsClassifier(algorithm='kd_tree', n_jobs=-1)
In [75]:
KNN.predict(x_test)
Out[75]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
1, 0, 0, 1, 0, 0, 0, 0, 0, 1, 1, 1], dtype=int64)
In [63]:
y_pred=KNN.predict(x_test)
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [64]: Page 29
print(classification_report(y_test.reshape(-1,1),y_pred))
In [65]:
In [66]:
print('Cross val',cross_val_score(KNN,y_test.reshape(-1,1),y_pred,cv=10))
print('Cross val',np.mean(cross_val_score(KNN,y_test.reshape(-1,1),y_pred,)))
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 30
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 31
CODE:
In [2]:
import numpy as np
import pandas as pd
import seaborn as sns
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score,mean_squared_error
In [3]:
df = pd.read_csv('C:/Users/archa/Downloads/vgsales.csv')
df.head()
Out[3]:
Super Mario
1 2 NES 1985.0 Platform Nintendo 29.08 3.58 6.81
Bros.
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79
Wii Sports
3 4 Wii 2009.0 Sports Nintendo 15.75 11.01 3.28
Resort
Pokemon Role-
4 5 Red/Pokemon GB 1996.0 Nintendo 11.27 8.89 10.22
Playing
Blue
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 32
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 16598 entries, 0 to 16597
Data columns (total 11 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Rank 16598 non-null int64
1 Name 16598 non-null object
2 Platform 16598 non-null object
3 Year 16327 non-null float64
4 Genre 16598 non-null object
5 Publisher 16540 non-null object
6 NA_Sales 16598 non-null float64
7 EU_Sales 16598 non-null float64
8 JP_Sales 16598 non-null float64
9 Other_Sales 16598 non-null float64
10 Global_Sales 16598 non-null float64
In [5]:
df.describe()
Out[5]:
In [6]:
df.isnull().sum()
Out[6]:
Rank 0
Name 0
Platform 0
Year 271
Genre 0
Publisher 58
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
NA_Sales 0 Page 33
EU_Sales 0
JP_Sales 0
Other_Sales 0
Global_Sales 0
dtype: int64
In [7]:
df.drop(["Rank","Name","Year","Publisher"],axis=1,inplace=True)
df.head()
Out[7]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [8]:
Page 34
dums = pd.get_dummies(df[["Platform","Genre"]])
dums.head()
Out[8]:
0 0 0 0 0 0 0
1 0 0 0 0 0 0
2 0 0 0 0 0 0
3 0 0 0 0 0 0
4 0 0 0 0 0 1
5 rows × 43 columns
In [9]:
dums.drop(["Platform_2600","Genre_Misc"],axis=1,inplace=True)
In [10]:
final_df= pd.concat([df,dums],axis=1)
final_df.drop(["Platform","Genre"],axis=1,inplace=True)
final_df.head()
Out[10]:
5 rows × 46 columns
In [11]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 35
C:\Users\archa\anaconda3\lib\site-packages\seaborn\_decorators.py:36: Future
Warning: Pass the following variables as keyword args: x, y. From version 0.
12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretat
ion.
(0.0, 30.471405021832812)
In [12]:
final_df.EU_Sales[df.EU_Sales>15]
#this value is in index 0.
Out[12]:
0 29.02
Name: EU_Sales, dtype: float64
In [13]:
df_outlier = final_df.drop([0],axis=0)
In [14]:
C:\Users\archa\anaconda3\lib\site-packages\seaborn\_decorators.py:36: Future
Warning: Pass the following variables as keyword args: x, y. From version 0.
12, the only valid positional argument will be `data`, and passing other arg
uments without an explicit keyword will result in an error or misinterpretat
ion.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 36
(0.0, 13.524113383535223)
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [15]:
Page 37
x = df_outlier[["EU_Sales"]]
y = df_outlier["Global_Sales"]
In [16]:
In [ ]:
reg = LinearRegression()
model = reg.fit(x,y)
In [17]:
model.score(x,y)
In [18]:
In [19]:
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
(11617, 1)
(11617,)
(4980, 1)
(4980,)
In [20]:
lm = LinearRegression()
model = lm.fit(x_train,y_train)
In [21]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 3B: POLYNOMIAL REGRESSION Page 38
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 39
CODE:
In [ ]:
In [1]:
# importing libraries
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
In [3]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#importing datasets Page 40
data_set= pd.read_csv('C:/Users/archa/Downloads/Position_Salaries.csv')
data_set.head()
Out[3]:
3 Manager 4 80000
In [4]:
In [5]:
Out[5]:
LinearRegression()
In [6]:
Out[6]:
LinearRegression()
In [7]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 41
In [8]:
In [10]:
lin_pred = lin_regs.predict([[6.5]])
print(lin_pred)
[330378.78787879]
In [11]:
poly_pred = lin_reg_2.predict(poly_regs.fit_transform([[6.5]]))
print(poly_pred)
[189498.10606061]
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 42
PRACTICAL 4:
FIND S-ALGORITHM FOR FINDING HYPOTHESIS BASED ON TRAING
SAMPLES.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 43
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
In [ ]: Page 44
In [8]:
import pandas as pd
import numpy as np
In [16]:
d = pd.read_csv("C:/Users/archa/Downloads/ws.csv")
d.head()
Out[16]:
In [17]:
t = np.array(d)[:,-1]
print("The target is: ",t)
In [18]:
def fun(c,t):
for i, val in enumerate(t):
if val == "Yes":
specific_hypothesis = c[i].copy()
break
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5A: DECISION TREE Page 45
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 46
In [33]:
# A) AIM : WAP to implement Decision Tree Algorithm
In [20]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [21]
data = pd.read_csv("C:/Users/archa/Downloads/WineQuality.csv")
data.head()
Out[21]:
In [22]:
data.shape
Out[22]:
(1599, 13)
.
In [25]:
data.isna().sum()
Out[25]:
Unnamed: 0 0
fixed.acidity 0
volatile.acidity 0
citric.acid 0
residual.sugar 0
chlorides 0
free.sulfur.dioxide 0
total.sulfur.dioxide 0
density 0
pH 0
sulphates 0
alcohol 0
quality 0
dtype: int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 47
In [26]:
# creating X and y
X = data.drop(columns = 'quality')
y = data['quality']
In [27]:
In [28]:
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
In [29]:
clf = DecisionTreeClassifier()
clf.fit(X_train, y_train)
Out[29]:
DecisionTreeClassifier()
In [30]:
y_pred = clf.predict(X_test)
In [31]:
clf.score(X_train, y_train)
Out[31]:
1.0
In [32]:
clf.score(X_test, y_test)
Out[32]:
0.6229166666666667
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 5B :
RANDOMFOREST Page 48
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 49
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 50
CODE
# A) AIM : WAP to implement Random Forest Algorithm
In [20]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
In [21]:
df = pd.read_csv("C:/Users/archa/Downloads/temps.csv")
df.head()
Out[21]:
In [34]:
df.dtypes
Out[34]:
year int64
month int64
day int64
temp_2 int64
temp_1 int64
average float64
friend int64
week_Fri uint8
week_Mon uint8
week_Sat uint8
week_Sun uint8
week_Thurs uint8
week_Tues uint8
week_Wed uint8
dtype: object
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 51
In [22]:
Out[22]:
(348, 9)
In [23]:
# column names
df.columns
Out[23]:
Out[24]:
year 0
month 0
day 0
week 0
temp_2 0
temp_1 0
average 0
actual 0
friend 0
dtype: int64
In [25]:
Out[25]:
year month day temp_2 temp_1 average actual friend week_Fri week_Mon week_Sat
0 2019 1 1 45 45 45.6 45 29 1 0 0
1 2019 1 2 44 45 45.7 44 61 0 0 1
2 2019 1 3 45 44 45.8 41 56 0 0 0
3 2019 1 4 44 41 45.9 40 53 0 1 0
4 2019 1 5 41 40 46.0 44 41 0 0 0
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 52
In [26]:
In [27]:
In [35]:
In [29]:
In [30]:
# Instantiate model
rf = RandomForestRegressor(n_estimators= 1000, random_state=42)
In [31]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#Make prediction on test data
# Use the forest's predict method on the test data Page 53
predictions = rf.predict(test_features)
In [32]:
Accuracy: 94.02 %.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 6:SUPPORT VECTOR MACHINE (
Page 54
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [ ]: Page 55
#WAP to implement Support Vector Machine (LSVM/Kernel SVM/Soft Margin SVM)
#import libraries
In [1]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
In [2]:
#load data
In [4]:
In [5]:
df.head()
Out[5]:
In [6]:
df.shape
Out[6]:
(400, 5)
In [7]:
df.info
Out[7]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 56
In [11]:
df.columns
Out[11]:
In [12]:
x = df.iloc[:,[2,3]]
y = df.iloc[:,4]
In [13]:
x.head()
Out[13]:
Age EstimatedSalary
0 19 19000
1 35 20000
2 26 43000
3 27 57000
4 19 76000
In [14]:
y.head()
Out[14]:
0 0
1 0
2 0
3 0
4 0
Name: Purchased, dtype: int64
In [77]:
In [78]:
print("Training data:",x_train.shape)
print("Testing data",x_test.shape)
#feature scaling
In [80]:
Out[80]:
SVC(kernel='linear', random_state=0)
In [81]:
Out[81]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1,
0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1,
1, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0,
1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1,
1, 0, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 1,
0, 0, 0, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0], dtype=int64)
In [82]:
Out[82]:
0.7966666666666666
In [83]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 58
In [88]:
y_pred = classifier.predict(x_test)
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.title('Test data')
#Creating hyperplane
w = classifier.coef_[0]
a = -w[0] / w[1]
xx = np.linspace(-2, 2)
yy = a * xx -(classifier.intercept_[0]) / w[1]
#Plot Hyperplane
plt.plot(xx, yy)
plt.show()
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [55]: Page 59
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result))
C:\Users\archa\anaconda3\lib\site-packages\sklearn\metrics\_classification.p
y:1245: UndefinedMetricWarning: Precision and F-score are ill-defined and be
ing set to 0.0 in labels with no predicted samples. Use `zero_division` para
meter to control this behavior.
_warn_prf(average, modifier, msg_start, len(result)) Out[55]:
In [57]:
print(classification_report(y_test,y_pred))
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 60
PRACTICAL 7A:K-NEAREST NEIGHBOUR .
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE:
Page 61
In [ ]:
In [31]:
In [32]:
df.head()
Out[32]:
3 1 89 66 23 94 28.1 0.16
In [33]:
df.shape
Out[33]:
(768, 9)
In [57]:
df.dtypes
Out[57]:
Pregnancies int64
Glucose int64
BloodPressure int64
SkinThickness int64
Insulin int64
BMI float64
DiabetesPedigreeFunction float64
Age int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Outcome int64 Page 62
dtype: object
In [34]:
X = df.drop('Outcome',axis=1).values
y = df['Outcome'].values
In [35]:
#importing train_test_split
from sklearn.model_selection import train_test_split
In [36]:
In [37]:
#import KNeighborsClassifier
from sklearn.neighbors import KNeighborsClassifier
In [38]:
#Plotting
plt.title('k-NN Varying number of neighbors')
plt.plot(neighbors, test_accuracy, label='Testing Accuracy')
plt.plot(neighbors, train_accuracy, label='Training accuracy')
plt.legend()
plt.xlabel('Number of neighbors')
plt.ylabel('Accuracy')
plt.show()
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 63
In [39]:
In [40]:
Out[40]:
KNeighborsClassifier(n_neighbors=7)
In [41]:
knn.score(X_test,y_test)
Out[41]:
0.7305194805194806
In [42]:
#import confusion_matrix
from sklearn.metrics import confusion_matrix
In [43]:
y_pred = knn.predict(X_test)
In [44]:
confusion_matrix(y_test,y_pred)
Out[44]:
array([[165, 36],
[ 47, 60]], dtype=int64)
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
#import classification_report
from sklearn.metrics import classification_report Page 64
In [46]:
print(classification_report(y_test,y_pred))
In [47]:
y_pred_proba = knn.predict_proba(X_test)[:,1]
In [48]:
In [49]:
In [50]:
plt.plot([0,1],[0,1],'k--')
plt.plot(fpr,tpr, label='Knn')
plt.xlabel('fpr')
plt.ylabel('tpr')
plt.title('Knn(n_neighbors=7) ROC curve')
plt.show()
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 65
In [51]:
Out[51]:
0.7345050448691124
In [52]:
#import GridSearchCV
from sklearn.model_selection import GridSearchCV
In [53]:
In [54]:
knn = KNeighborsClassifier()
knn_cv= GridSearchCV(knn,param_grid,cv=5)
knn_cv.fit(X,y)
Out[54]:
GridSearchCV(cv=5, estimator=KNeighborsClassifier(),
param_grid={'n_neighbors': array([ 1, 2, 3, 4, 5, 6, 7,
8, 9, 10, 11, 12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34,
35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49])})
In [55]:
knn_cv.best_score_
Out[55]:
0.7578558696205755
In [56]:
knn_cv.best_params_
Out[56]:
{'n_neighbors': 14}
Thus a knn classifier with number of neighbors as 14 achieves the best score/accuracy of 0.7578 i.e about 76%
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 66
PRACTICAL 7B:K-MEANS ALGORITHM.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 67
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 68
In [ ]:
# B) AIM : WAP to implement KMeans Algorithm
# Evaluate the model based on classification metrics and infer your result.
In [5]:
#import Libraries99
import pandas as pd
import numpy as np
import seaborn as sns
from sklearn.cluster import KMeans
from sklearn.model_selection import GridSearchCV
import matplotlib.pyplot as plt
In [6]:
data = pd.read_csv("C:/Users/archa/Downloads/Mall_Customers.csv")
data.head()
Out[6]:
0 1 Male 19 15 39
1 2 Male 21 15 81
2 3 Female 20 16 6
3 4 Female 23 16 77
4 5 Female 31 17 40
In [7]:
Out[7]:
0 Male 19 15 39
1 Male 21 15 81
2 Female 20 16 6
3 Female 23 16 77
4 Female 31 17 40
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 69
In [11]:
data.shape
Out[11]:
(200, 4)
In [8]:
data.dtypes
Out[8]:
Genre object
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object
In [9]:
Out[9]:
Genre 0
Age 0
Annual Income (k$) 0
Spending Score (1-100) 0
dtype: int64
In [5]:
sns.pairplot(data)
Out[5]:
<seaborn.axisgrid.PairGrid at 0x20172d70730>
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 70
In [6]:
data.columns
Out[6]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 71
In [10]:
Out[10]:
0 1 19 15 39
1 1 21 15 81
2 2 20 16 6
3 2 23 16 77
4 2 31 17 40
In [11]:
data.dtypes
Out[11]:
Genre int64
Age int64
Annual Income (k$) int64
Spending Score (1-100) int64
dtype: object
In [26]:
#scaling transformation
#1.zscore normalization using standardScalar(Same mean)
#2.Minmax normalization using MinMax Scaler(0 to 1)
In [12]:
df_customer = data.iloc[:,2:4]
df_customer.head()
Out[12]:
0 15 39
1 15 81
2 16 6
3 16 77
4 17 40
In [13]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 72
from sklearn.preprocessing import StandardScaler
data_scaled = StandardScaler().fit_transform(df_customer)
data_scaled
Out[13]:
array([[-1.73899919, -0.43480148],
[-1.73899919, 1.19570407],
[-1.70082976, -1.71591298],
[-1.70082976, 1.04041783],
[-1.66266033, -0.39597992],
[-1.66266033, 1.00159627],
[-1.62449091, -1.71591298],
[-1.62449091, 1.70038436],
[-1.58632148, -1.83237767],
[-1.58632148, 0.84631002],
[-1.58632148, -1.4053405 ],
[-1.58632148, 1.89449216],
[-1.54815205, -1.36651894],
[-1.54815205, 1.04041783],
[-1.54815205, -1.44416206],
[-1.54815205, 1.11806095],
[-1.50998262, -0.59008772],
[-1.50998262, 0.61338066],
In [14]:
In [15]:
wcss
Out[15]:
[269.01679374906655,
157.70400815035939,
108.92131661364358,
65.56840815571681,
55.103778121150555,
44.86475569922555,
37.24321153347672,
33.85792110528426,
30.684270071530346]
In [16]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 73
#Plotting
plt.figure(figsize = (8,6), dpi=100)
plt.plot(range(2,11),wcss, marker = 'o', c='blue', markerfacecolor='red')
plt.xlabel('No of Clusters')
plt.ylabel('WCSS')
plt.show()
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 74
In [17]:
In [18]:
cl = Kmodel_final.predict(data_scaled)
In [19]:
cl
Out[19]:
array([0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3,
0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 3, 0, 1,
0, 3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 4, 2, 1, 2, 4, 2, 4, 2,
1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 1, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2, 4, 2,
4, 2])
In [20]:
Out[20]:
0 15 39 0
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
15 81 3
Page 75
2 16 6 0
3 16 77 3
4 17 40 0
In [27]:
# Visualization of clusters
plt.figure(figsize = (6,4), dpi = 100)
plt.scatter(x=df_customer['Annual Income (k$)'],y=df_customer['Spending Score (1-100)'],c=c
plt.xlabel('Annual Income (k$)')
plt.ylabel('Spending Score')
plt.show()
cl 1=high income low spender c2 = high income high spender c3 = low income high spender c4 = low income
low spender c5 = moderate income moderate spender
Conclusion
Mall customer data is clustered into 5 clusters. The green cluster indicates the people who have high spending
score but a low annual income. The purple cluster shows the people who have a low annual income & low
spending score. The blue cluster shows people who have an average annual income & average spending
score. The sea green cluster indicate the people who have high annual income & high spending score. The
yellow cluster shows the people who have low spending score & high annual income.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 76
PRACTICAL 8A:
NAIVE BAYES MODEL AND GAUSSIAN NAIVE BAYES MODEL
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
CODE: Page 77
In [ ]:
In [29]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn import metrics
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
In [30]:
#Reading Dataset
df = pd.read_csv("C:/Users/archa/Downloads/Social_Network_Ads.csv")
df.head()
Out[30]:
In [42]:
df.shape
Out[42]:
(400, 5)
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 78
In [43]:
df.describe()
Out[43]:
In [44]:
df.dtypes
Out[44]:
User ID int64
Gender object
Age int64
In [46]:
df.isna().sum()
Out[46]:
User ID 0
Gender 0
Age 0
EstimatedSalary 0
Purchased 0
dtype: int64
In [31]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [45]: Page 79
sns.pairplot(df)
Out[45]:
<seaborn.axisgrid.PairGrid at 0x1f0c97d8dc0>
In [32]:
#Feature scaling
from sklearn.preprocessing import StandardScaler
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)
In [35]:
Out[35]:
GaussianNB()
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
In [36]: Page 80
y_pred = classifier.predict(X_test)
In [37]:
y_pred
Out[37]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 1,
0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1], dtype=int64)
In [38]:
y_test
Out[38]:
array([0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1,
0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0,
1, 0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1,
0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1], dtype=int64)
In [39]:
from sklearn.metrics import classification_report
print(classification_report(y_test, y_pred))
accuracy 0.93 80
macro avg 0.92 0.89 0.90 80
weighted avg 0.92 0.93 0.92 80
In [40]:
Out[40]:
Actual No 56 2
Actual Yes 4 18
In [41]:
Out[41]:
0.925
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
PRACTICAL 8B:LOGISTIC REGRESSION
Page 81
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 82
CODE:
In [1]:
# A) AIM : WAP to implement Logistic Regression
In [25]:
# importing libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
In [26]:
df = pd.read_csv("C:/Users/archa/Downloads/heart.csv")
df.head()
Out[26]:
age sex cp trestbps chol fbs restecg thalach exang oldpeak slope ca thal target
In [27]:
df.shape
Out[27]:
(1025, 14)
In [28]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1025 entries, 0 to 1024
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 1025 non-null int64
1 sex 1025 non-null int64
2 cp 1025 non-null int64
3 trestbps 1025 non-null int64
4 chol 1025 non-null int64
5 fbs 1025 non-null int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 83
6 restecg 1025 non-null int64
7 thalach 1025 non-null int64
8 exang 1025 non-null int64
9 oldpeak 1025 non-null float64
10 slope 1025 non-null int64
11 ca 1025 non-null int64
12 thal 1025 non-null int64
13 target 1025 non-null int64
dtypes: float64(1), int64(13)
memory usage: 112.2 KB
In [29]:
df.describe()
Out[29]:
In [30]:
df.target.value_counts()
Out[30]:
1 526
0 499
Name: target, dtype: int
In [31]:
df.isna().sum()
Out[31]:
age 0
sex 0
cp 0
trestbps 0
chol 0
fbs 0
restecg 0
thalach 0
exang 0
oldpeak 0
slope 0
ca 0
thal 0
target 0
dtype: int64
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 84
In [48]:
sns.pairplot(df)
Out[48]:
<seaborn.axisgrid.PairGrid at 0x1a723a1ec40>
In [32]:
In [33]:
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 85
sns.countplot(x='sex', data=df, palette="mako_r")
plt.xlabel("Sex (0 = female, 1= male)")
plt.show()
In [34]:
Out[34]:
age sex trestbps chol fbs restecg thalach exang oldpeak ca target
In [35]:
y = df.target.values
x = df.drop(['target'], axis = 1)
In [36]:
log_reg = LogisticRegression()
In [38]:
log_reg.fit(x_train, y_train)
C:\Users\archa\anaconda3\lib\site-packages\sklearn\linear_model\_logistic.p
y:763: ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22
Page 86
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scik
it-learn.org/stable/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regre
ssion (https://scikit-learn.org/stable/modules/linear_model.html#logistic-re
gression)
n_iter_i =
_check_optimize_result( Out[38]:
LogisticRegression()
In [39]:
y_pred = log_reg.predict(x_test)
In [40]:
In [46]:
Out[46]:
Actual No 69 29
Actual Yes 11 96
In [47]:
Out[47]: 0.80487804878
MSc Computer Science with spl in DATA SCIENCE SEM2- Artificial intelligence and Machine learning Journal 2021-22