ML Lab Manual R20 1
ML Lab Manual R20 1
S NO LIST OF EXPERIMENTS
1 Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based
on a given set of training data samples. Read the training data from a .CSV file.
2 For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate- Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
3 Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new
sample.
4 Exercises to solve the real-world problems using the following machine learning methods: a)
Linear Regression b) Logistic Regression c) Binary Classifier
5 Develop a program for Bias, Variance, Remove duplicates , Cross Validation
7 Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
same
using appropriate data sets.
8 Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print
both correct
and wrong predictions.
9 Implement the non-parametric Locally Weighted Regression algorithm in order to fit data
points. Select appropriate data set for your experiment and draw graphs.
10 Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to
perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy,
precision, and recall for your data set.
11 Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for
clustering using k-Means algorithm. Compare the results of these two algorithms and comment
on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
12 Exploratory Data Analysis for Classification using Pandas or Matplotlib.
13 Write a Python program to construct a Bayesian network considering medical data. Use this
model to
demonstrate the diagnosis of heart patients using standard Heart Disease Data Set
14 Write a program to Implement Support Vector Machines and Principle Component Analysis
Page 1
III B.Tech II Sem ML Lab Manual
Experiment – 1:
Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a
given set of training data samples. Read the training datafrom a .CSV file.
Aim: Demonstration of FIND-S algorithm for finding the most specific hypothesis
Import csv
Reader=csv.reader(f)
Your_list=list(reader)
Print(i)
Ifi[-1]==”True”:
J=0
For x in i:
If x!=”True”
h[0][j] = x
h[0][j] = '?'
else:
pass
j=j+1
print(h)
Output
Experiment – 2:
For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of allhypotheses consistent with the
training examples.
Program code
class Holder:
'''
Constructor of class Holder holding two parameters,self refers to the instance of the class
'''
self.factors[i]=[]
def add_values(self,factor,values):self.factors[factor]=values
class CandidateElimination:
Positive={}
def run_algorithm(self):'''
Initialize the specific and general boundaries, and loop the dataset against thealgorithm
'''
Page 3
III B.Tech II Sem ML Lab Manual
G = self.initializeG()S = self.initializeS()
'''
count=0
for s in S:
if not self.consistent(s,trial_set[0]):S_new.remove(s)
generalization = self.generalize_inconsistent_S(s,trial_set[0])generalization =
self.get_general(generalization,G)
if generalization: S_new.append(generalization)
S = S_new[:]
S = self.remove_more_general(S)print(S)
else:#if it is negative
G_new = G[:] #initialize the dictionary with no key-value pair (dataset cantake any value)
print (G_new)for g in G:
ifself.consistent(g,trial_set[0]):G_new.remove(g)
specializations = self.specialize_inconsistent_G(g,trial_set[0])specializationss =
self.get_specific(specializations,S)
G = G_new[:]
G = self.remove_more_specific(G)print(G)
Page 4
III B.Tech II Sem ML Lab Manual
def initializeS(self):
Page 5
III B.Tech II Sem ML Lab Manual
def initializeG(self):
def is_positive(self,trial_set):
return True
else:
def match_factor(self,value1,value2):
''' Check for the factors values match, necessary while checking the consistency oftraining trial_set with
return False
def consistent(self,hypothesis,instance):
''' Check whether the instance is part of the hypothesis '''for i,factor in enumerate(hypothesis):
return True
for g in hypotheses:
if not self.consistent(g,instance):G_new.remove(g)
return G_new
Page 6
III B.Tech II Sem ML Lab Manual
Page 7
III B.Tech II Sem ML Lab Manual
for s in hypotheses:
if self.consistent(s,instance):S_new.remove(s)
return S_new
def remove_more_general(self,hypotheses):
''' After generalizing S for a positive trial_set, the hypothesis in Sgeneral than others in S should be
removed '''
return S_new
def remove_more_specific(self,hypotheses):
''' After specializing G for a negative trial_set, the hypothesis in Gspecific than others in G should be
removed '''
return G_new
def generalize_inconsistent_S(self,hypothesis,instance):
''' When a inconsistent hypothesis for positive trial_set is seen in the specificboundary S,
itshould be generalized to be consistent with the trial_set ... we will get onehypothesis'''
if factor == '-':
hypo[i] = instance[i]
def specialize_inconsistent_G(self,hypothesis,instance):
Page 8
III B.Tech II Sem ML Lab Manual
''' When a inconsistent hypothesis for negative trial_set is seen in the generalboundary G
III B.Tech II Sem ML Lab Manual
should be specialized to be consistent with the trial_set.. we will get a set ofhypotheses '''
specializations = []
if factor == '?':
return specializations
def get_general(self,generalization,G):
for g in G:
if self.more_general(g,generalization):return generalization
return None
def get_specific(self,specializations,S):
''' Checks if there is more specific hypothesis in Sfor each of hypothesis in specializations of an
for s in S:
if self.more_specific(s,hypo) or s==self.initializeS()[0]:valid_specializations.append(hypo)
return valid_specializations
def exists_general(self,hypothesis,G):
'''Used to check if there exists a more general hypothesis ingeneral boundary for version space'''
for g in G:
if self.more_general(g,hypothesis):return True
Page 10
III B.Tech II Sem ML Lab Manual
return False
Page 11
III B.Tech II Sem ML Lab Manual
def exists_specific(self,hypothesis,S):
'''Used to check if there exists a more specific hypothesis ingeneral boundary for version space'''
for s in S:
if self.more_specific(s,hypothesis):return True
return False
def more_general(self,hyp1,hyp2):
''' Check whether hyp1 is more general than hyp2 '''hyp = zip(hyp1,hyp2)
continue
elif j == '?':
if i != '?':
return False
elif i != j:
return False
else:
continue
return True
dataset=[(('sunny','warm','normal','strong','warm','same'),'Y'),(('sunny','warm','high','stron
g','warm','same'),'Y'),(('rainy','cold','high','strong','warm','change'),'N'),(('sunny','warm','hi
gh','strong','cool','change'),'Y')]
a = CandidateElimination(dataset,f) #pass the dataset to the algorithm class and call therun algoritm method
a.run_algorithm()
Output
[('sunny', '?', '?', '?', '?', '?'), ('?', 'warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'same')]
[('sunny']
Experiment-3:
Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an
appropriate data set for building the decision tree and apply this knowledge to classify a new sample.
Program code:
class Node:
for y in range(data.shape[0]):
WISE Page 15
III B.Tech II Sem ML Lab Manual
for x in range(items.shape[0]):
pos = 0
dict[items[x]][pos] = data[y]pos += 1
if delete:
def entropy(S):
return 0
for x in range(items.shape[0]):
total_size = data.shape[0]
range(items.shape[0]):
return total_entropy / iv
node = Node(metadata[split])
for x in range(items.shape[0]):
s = ""
return s
print(empty(level), node.answer)return
Data_loader.py
import csv
def read_data(filename):
metadata = []traindata = []
Input:
Tennis.csv
Output
outlook
overcastb'yes'
rain
wind
sunny
humidityb'high'b'no'
b'normal'b'yes
Experiment – 4:
Exercises to solve the real-world problems using the following machine learning methods.a). Linear
Regression b). Logistic Regression
Aim:
To solve the real-world problems using the machine learning methods. Linear Regression and
Logistic Regression
Dataset: std_marks.csv-constructed on own by using students lab internal and external marks.
Program code:
import pandas as pd
from sklearn import linear_model
import matplotlib.pyplot as plt
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
data=pd.read_csv(r"E:\sudhakar\std_marks.csv")
print('First 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
III B.Tech II Sem ML Lab Manual
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
x_set=data[['internal']]
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data[['external']]
print('First 5 rows of features set are:')
print(y_set.head())
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
model=linear_model.LinearRegression()
model.fit(x_train,y_train)
print('Regression coefficient is',float(model.coef_))
print('Regression intercept is',float(model.intercept_))
y_pred=model.predict(x_test)
y_preds=[]
for i in y_pred:
7 y_preds.append(float(i))
print('Predicted values for test data are:')
print(y_preds)
print('mean squared error is ',mean_squared_error(y_test,y_pred))
plt.scatter(x_test,y_test,color='blue',label='actual y values')
plt.plot(x_test,y_pred,color='red',linewidth=3,label='predicted regression line')
plt.ylabel('y value')
plt.xlabel('x value')
plt.title('simple linear regression')
plt.legend(loc='best')
plt.show()
Output screen shots:
III B.Tech II Sem ML Lab Manual
III B.Tech II Sem ML Lab Manual
Exercise 1b:
Program code:
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.preprocessing import StandardScaler
data=pd.read_csv(r"E:\sudhakar\heart.csv")
print('The first 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['target'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print('Class labels are:')
print(class_labels)
sns.countplot(data['target'])
col_names=data.columns
III B.Tech II Sem ML Lab Manual
feature_names=col_names[:-1]
feature_names=list(feature_names)
print('Feature names are:')
print(feature_names)
x_set = data.drop(['target'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data[['target']]
print('First 5 rows of features set are:')
print(y_set.head())
scaler=StandardScaler()
x_train,x_test, y_train, y_test = train_test_split(x_set,y_set, test_size = 0.3)
scaler.fit(x_train)
x_train=scaler.transform(x_train)
model = LogisticRegression()
model.fit(x_train, y_train)
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
WISE Page 24
III B.Tech II Sem ML Lab Manual
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
Experiment – 5:
Program code:
import pandas as pd
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn import linear_model
import matplotlib.pyplot as plt
from statistics import mean,stdev
data=pd.read_csv(r"E:\machine learning\datasets\winequality.csv")
dim=data.shape
print('Dimensions of the data set are',dim)
III B.Tech II Sem ML Lab Manual
Experiment-7
Build an Artificial Neural Network by implementing the Back propagation algorithm and test the
Program Code
III B.Tech II Sem ML Lab Manual
import numpy as np
return x * (1 - x)
#Variable initialization
layers neuronsoutput_neurons = 1 #number of neurons at output layer #weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
#Forward Propogation
output = sigmoid(outinp)
#Backpropagation
EO = y-output
EH = d_output.dot(wout.T)
d_hiddenlayer = EH * hiddengrad
III B.Tech II Sem ML Lab Manual
Input:
[[ 0.66666667 1. ]
[ 0.33333333 0.55555556]
[ 1. 0.66666667]]
Actual Output:[[0.92]
[ 0.86]
[ 0.89]]
Predicted Output:[[0.89559591]
[ 0.88142069]
[ 0.8928407 ]]
Experiment-8:
Write a program to implement k-Nearest Neighbor algorithm to classify the iris data set. Print both
Program Code:
trainingSet.append(dataset[x])else:
testSet.append(dataset[x])
WISE Page 31
III B.Tech II Sem ML Lab Manual
for x in range(length):
III B.Tech II Sem ML Lab Manual
length = len(testInstance)-1
for x in range(len(trainingSet)):
distances.sort(key=operator.itemgetter(1))neighbors = []
for x in range(k):
neighbors.append(distances[x][0])return neighbors
def getResponse(neighbors):classVotes = {}
classVotes[response] += 1
else:
classVotes[response] = 1
sortedVotes = sorted(classVotes.iteritems(),reverse=True)
return sortedVotes[0][0]
key=operator.itemgetter(1
),
if testSet[x][-1] == predictions[x]:correct += 1
def main():
for x in range(len(testSet)):
predictions.append(result)
III B.Tech II Sem ML Lab Manual
predictions)
OUTPUT
[[11 0 0]
[0 9 1]
[0 1 8]]
Experiment – 9:
Implement the non-parametric Locally Weighted Regression algorithm in orderto fit data points.
Program Code
W=(X.T*(wei*X)).I*(X.T*(wei*ymat.T))return W
mtip = np1.mat(tip)
m= np1.shape(mbill)[1]
#set k here
ypred = localWeightRegression(X,mtip,2)
Output
III B.Tech II Sem ML Lab Manual
Experiment-10:
Assuming a set of documents that need to be classified, use the naïve Bayesian Classifier model
to perform this task. Built-in Java classes/API can be used to write the program. Calculate the
accuracy,precision, and recall for your data set
Aim: classification of set of documents using Naive Bayesian classification
Program code
import pandas as pd
msg=pd.read_csv('naivetext1.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
X=msg.messagey=msg.labelnumprint(X)
print(y)
#splitting the dataset into train and test data from
sklearn.model_selection import train_test_split
xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print(xtest.shape)
print(xtrain.shape)
print(ytest.shape)
print(ytrain.shape)
#output of count vectoriser is a sparse matrix
from sklearn.feature_extraction.text
import CountVectorizercount_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
print(count_vect.get_feature_names())
III B.Tech II Sem ML Lab Manual
df=pd.DataFrame(xtrain_dtm.toarray(),columns=count_vect.get_feature_names())
III B.Tech II Sem ML Lab Manual
print(df)
#tabular representation
print(xtrain_dtm)
#sparse matrix representation
# Training Naive Bayes (NB) classifier on training data
from sklearn.naive_bayes import MultinomialNB clf
= MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)
#printing accuracy metrics
from sklearn import metricsprint('Accuracy metrics')
print('Accuracy of the classifer is',metrics.accuracy_score(ytest,predicted))
print('Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('Recall and Precison ')
print(metrics.recall_score(ytest,predicted))
print(metrics.precision_score(ytest,predicted))
'''docs_new = ['I like this place', 'My boss is not my saviour']
OUTPUT
['about', 'am', 'amazing', 'an', 'and', 'awesome', 'beers', 'best', 'boss', 'can', 'deal',
'do', 'enemy', 'feel', 'fun', 'good', 'have', 'horrible', 'house', 'is', 'like', 'love', 'my',
'not', 'of', 'place', 'restaurant', 'sandwich', 'sick', 'stuff', 'these', 'this', 'tired', 'to',
'today', 'tomorrow', 'very', 'view', 'we', 'went', 'what', 'will', 'with', 'work']about am amazing an and
awesome beers best boss can ... today \
0 10 0 0 0 0 1 0 0 0 ... 0
1 00 0 0 0 0 0 1 0 0 ... 0
2 00 1100 0 0 0 0 ... 0
3 00 0 0 0 0 0 0 0 0 ... 1
4 00 0 0 0 0 0 0 0 0 ... 0
5 01 001 0 0 0 0 0 ... 0
6 00 0 0 0 0 0 0 0 1 ... 0
7 00 0 0 0 0 0 0 0 0 ... 0
8 01 0 0 0 0 0 0 0 0 ... 0
9 00 0 1 0 1 0 0 0 0 ... 0
10 0 0 0 0 0 0 0 0 0 0 ... 0
11 0 0 0 0 0 0 0 0 1 0 ... 0
12 0 0 0 1 0 1 0 0 0 0 ... 0
Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the same data set for clustering
usingkMeans algorithm. Compare the results of these two algorithms and comment on the
quality of clustering. You can add Java/Python ML library classes/API in the program.
Aim: Implementation of EM algorithm to cluster a Heart Disease Data Set
Program Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets.samples_generator
import make_blobsX, y_true = make_blobs(n_samples=100, centers =
4,Cluster_std=0.60,random_state=0)
X = X[:, ::-1]
#flip axes for better plotting
from sklearn.mixture import GaussianMixture
gmm = GaussianMixture (n_components = 4).fit(X)lables = gmm.predict(X)
plt.scatter(X[:, 0], X[:, 1], c=labels, s=40, cmap=‟viridis‟);probs = gmm.predict_proba(X)
print(probs[:5].round(3))
size = 50 * probs.max(1) ** 2
# square emphasizes differences
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap=‟viridis‟, s=size);
from matplotlib.patches import Ellipse
III B.Tech II Sem ML Lab Manual
Output
[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]
Page 42
III B.Tech II Sem ML Lab Manual
K MEANS :
from sklearn.cluster import KMeans
#from sklearn import metricsimport numpy as np
import matplotlib.pyplot as plt
import pandas as pd
data=pd.read_csv("kmeansdata.csv")
df1=pd.DataFrame(data)
print(df1)
f1 = df1['Distance_Feature'].valuesf2 = df1['Speeding_Feature'].values
X=np.matrix(list(zip(f1,f2)))plt.plot()
plt.xlim([0, 100])
plt.ylim([0, 50]) plt.title('Dataset') plt.ylabel('speeding_feature')plt.xlabel('Distance_Feature')
plt.scatter(f1,f2)
plt.show()
# create new plot and data
plt.plot()
colors = ['b', 'g', 'r']
markers = ['o', 'v', 's']
# KMeans algorithm#K = 3
kmeans_model = KMeans(n_clusters=3).fit(X)
plt.plot()
for i, l in enumerate(kmeans_model.labels_):
plt.plot(f1[i], f2[i], color=colors[l], marker=markers[l],ls='None')plt.xlim([0, 100])
plt.ylim([0, 50])plt.show()
Driver_ID,Distance_Feature,Speeding_Feature
3423311935,71.24,28
3423313212,52.53,25
3423313724,64.54,27
3423311373,55.69,22
3423310999,54.58,25
3423313857,41.91,10
3423312432,58.64,20
3423311434,52.02,8
3423311328,31.25,34
3423312488,44.31,19
3423311254,49.35,40
3423312943,58.07,45
3423312536,44.22,22
3423311542,55.73,19
3423312176,46.63,43
3423314176,52.97,32
3423314202,46.25,35
3423311346,51.55,27
3423310666,57.05,26
3423313527,58.45,30
3423312182,43.42,23
3423313590,55.68,37
3423312268,55.15,18
Page
III B.Tech II Sem ML Lab Manual
Page 44
III B.Tech II Sem ML Lab Manual
Experiment -12
Aim: Exploratory data analysis for classification using pandas and Matplotlib
Dataset: tae.csv- The data consist of evaluations of teaching performance over three regular
semesters and two summer semesters of 151 teaching assistant (TA) assignments at the Statistics
Department of the University of Wisconsin-Madison. The scores were divided into 3 roughly equal-
sized categories ("low", "medium", and "high") to form the class variable. The data set is collected
from https://archive.ics.uci.edu/ml/datasets/Teaching+Assistant+Evaluation
Program code:
import pandas as pd
import matplotlib.pyplot as plt
print(‘pandas version is’, pd. version )
data = pd.read_csv(r"E:\sudhakar\tae.csv",header=None)
col_names=['native_speaker','instructor','course','semester','class_size','score']
data.columns=col_names
print('Data type of target variable is:',data['score'].dtype)
print('Converting target variable data type to categorical')
data['score']=data['score'].astype('category')
print('Afrer conversion, data type of target variable is:',data['score'].dtype)
print('Dimesnions of the data set:')
print(data.shape)
print('The first 5 rows of the data set are:')
print(data.head())
print('The last 5 rows of the data set are:')
print(data.tail())
print('Randomly selected 5 rows of the data set are:')
print(data.sample(5))
print('The columns of the data set are:')
print(data.columns.tolist())
print('Names and data types of attributes are:')
print(data.dtypes)
print('Converting native_speaker data type to categorical')
data['native_speaker']=data['native_speaker'].astype('category')
print('After conversion,Names and data types of attributes are:')
print(data.dtypes)
print('Information of the data set attributes:')
print(data.info())
print('Statistics of the numerical attributes of the data set are:')
print(data.describe())
print('Statistics of the all attributes of the data set are:')
print(data.describe(include='all'))
print('Corelation matrix of the numerical attributes of the data set is:')
corr=data.corr()
print(corr)
print('Distribution of the target variable is:')
print(data['score'].value_counts())
print('Target class distrubtion w.r.t \'native_speaker\' attribute')
print(pd.crosstab(data.native_speaker,data.score))
Page 45
III B.Tech II Sem ML Lab Manual
Page 46
III B.Tech II Sem ML Lab Manual
print(pd.crosstab(data.native_speaker,data.score,normalize='index'))
print('Target class distrubtion w.r.t \'native_speaker\' attribute using groupby')
data.groupby('native_speaker').score.value_counts()
print('Checking for null values:')
print(data.isnull().sum())
data.dropna(subset=['instructor'],axis=0,inplace=True)
Page 47
III B.Tech II Sem ML Lab Manual
Page 48
III B.Tech II Sem ML Lab Manual
Page 49
III B.Tech II Sem ML Lab Manual
Page 50
III B.Tech II Sem ML Lab Manual
Page 51
III B.Tech II Sem ML Lab Manual
Page 52
III B.Tech II Sem ML Lab Manual
Experiment -13:
Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart Disease Data Set.
import bayespy as bp
import numpy as np
import csv
from colorama import init
from colorama import Fore, Back, Style
init()
Page 53
III B.Tech II Sem ML Lab Manual
data.append([ageEnum[x[0]],genderEnum[x[1]],familyHistoryEnum[x[2]],dietEnum[x[3]],lifeStyleEn
um[x[4]],cholesterolEnum[x[5]],heartDiseaseEnum[x[6]]])
# Training data for machine learning todo: should import from csv
data = np.array(data)
N = len(data)
p_gender = bp.nodes.Dirichlet(1.0*np.ones(2))
gender = bp.nodes.Categorical(p_gender, plates=(N,))
gender.observe(data[:,1])
p_familyhistory = bp.nodes.Dirichlet(1.0*np.ones(2))
familyhistory = bp.nodes.Categorical(p_familyhistory, plates=(N,))
familyhistory.observe(data[:,2])
p_diet = bp.nodes.Dirichlet(1.0*np.ones(3))
diet = bp.nodes.Categorical(p_diet, plates=(N,))
diet.observe(data[:,3])
p_lifestyle = bp.nodes.Dirichlet(1.0*np.ones(4))
lifestyle = bp.nodes.Categorical(p_lifestyle, plates=(N,))
lifestyle.observe(data[:,4])
p_cholesterol = bp.nodes.Dirichlet(1.0*np.ones(3))
cholesterol = bp.nodes.Categorical(p_cholesterol, plates=(N,))
cholesterol.observe(data[:,5])
C:\Anaconda3\lib\site-packages\bayespy\inference\vmp\nodes\categorical.py:107: FutureWarning:
Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead
of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will
result either in an error or a different result.
u0[[np.arange(np.size(x)), np.ravel(x)]] = 1
Page 54
III B.Tech II Sem ML Lab Manual
p_heartdisease.update()
# Interactive Test
m=0
while m == 0:
print("\n")
res = bp.nodes.MultiMixture([int(input('Enter Age: ' + str(ageEnum))), int(input('Enter Gender: ' +
str(genderEnum))), int(input('Enter FamilyHistory: ' + str(familyHistoryEnum))), int(input('Enter
dietEnum: ' + str(dietEnum))), int(input('Enter LifeStyle: ' + str(lifeStyleEnum))), int(input('Enter
Cholesterol: ' + str(cholesterolEnum)))], bp.nodes.Categorical,
p_heartdisease).get_moments()[0][heartDiseaseEnum['Yes']]
print("Probability(HeartDisease) = " + str(res))
#print(Style.RESET_ALL)
m = int(input("Enter for Continue:0, Exit :1 "))
OUTPUT
Experiment -14:
Page 55
III B.Tech II Sem ML Lab Manual
Page 56
III B.Tech II Sem ML Lab Manual
Program code:
import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
from sklearn.metrics import accuracy_score
from sklearn.metrics import recall_score
from sklearn.metrics import precision_score
from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
data = pd.read_csv(r"E:\sudhakar\haberman.csv", header=None)
#age=age of the patient
#year=Patient's year of operation (year - 1900)
#pos_axil_nodes=Number of positive axillary nodes detected
#survival_status:1 -the patient survived 5 years or longer
# :2 -the patient died within 5 year
col_names=['age','year','pos_axil_nodes','survival_status']
data.columns=col_names
#we removed the attribute year of operation
data=data.drop(['year'], axis=1)
print('The first 5 rows of the data set are:')
print(data.head())
dim=data.shape
print('Dimensions of the data set are',dim)
print('Statistics of the data are:')
print(data.describe())
print('Correlation matrix of the data set is:')
print(data.corr())
class_lbls=data['survival_status'].unique()
class_labels=[]
for x in class_lbls:
class_labels.append(str(x))
print('Class labels are:')
print(class_labels)
sns.countplot(data['survival_status'])
col_names=data.columns
feature_names=col_names[:-1]
feature_names=list(feature_names)
print('Feature names are:')
print(feature_names)
x_set = data.drop(['survival_status'], axis=1)
print('First 5 rows of features set are:')
print(x_set.head())
y_set=data['survival_status']
print('First 5 rows of target variable are:')
print(y_set.head())
Page 57
III B.Tech II Sem ML Lab Manual
Page 58
III B.Tech II Sem ML Lab Manual
x_test=scaler.transform(x_test)
y_pred=model.predict(x_test)
print('Predicted class labels for test data are:')
print(y_pred)
print("Accuracy:",accuracy_score(y_test, y_pred))
print("Precision:",precision_score(y_test, y_pred))
print("Recall:",recall_score(y_test, y_pred))
print(classification_report(y_test,y_pred,target_names=class_labels))
cm=confusion_matrix(y_test,y_pred)
df_cm = pd.DataFrame(cm, columns=class_labels, index = class_labels)
df_cm.index.name = 'Actual'
df_cm.columns.name = 'Predicted'
sns.set(font_scale=1.5)
sns.heatmap(df_cm, annot=True,cmap="Blues",fmt='d')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('Data points in traning data set')
plt.scatter(x_train[:, 0], x_train[:, 1], c=y_train, s=30, cmap=plt.cm.Paired)
plt.xlabel('age')
plt.ylabel('pos_axil_nodes')
plt.title('support vectors and decision boundary')
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
# create grid to evaluate model
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
Z = model.decision_function(xy).reshape(XX.shape)
ax.contour(XX, YY, Z, colors='red', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
# plot support vectors
ax.scatter(model.support_vectors_[:, 0], model.support_vectors_[:, 1], s=30,
facecolors='green')
plt.show()
Page 59
III B.Tech II Sem ML Lab Manual
WISE Page 60
III B.Tech II Sem ML Lab Manual
Page 61
III B.Tech II Sem ML Lab Manual
Experiment -14:
DS = pnd.read_csv('Wine.csv')
# Now, we will distribute the dataset into two components "X" and "Y"
X = DS.iloc[: , 0:13].values
Y = DS.iloc[: , 13].values
Page 62
III B.Tech II Sem ML Lab Manual
SC = SS()
X_train = SC.fit_transform(X_train)
X_test = SC.transform(X_test)
X_train = PCa.fit_transform(X_train)
X_test = PCa.transform(X_test)
explained_variance = PCa.explained_variance_ratio_
classifier_1 = LR (random_state = 0)
classifier_1.fit(X_train, Y_train)
Output:
LogisticRegression(random_state=0)
Page 63