0% found this document useful (0 votes)
32 views44 pages

ML Manual AIDS

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE offers a Machine Learning Laboratory course under the Department of Artificial Intelligence and Data Science, aimed at equipping students with practical knowledge of machine learning algorithms. The course includes various exercises such as implementing the Candidate-Elimination algorithm, building an Artificial Neural Network, and applying different classification models. The institution is recognized by the Government of Tamil Nadu and affiliated with Anna University, Chennai.

Uploaded by

dhananjeyans41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views44 pages

ML Manual AIDS

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE offers a Machine Learning Laboratory course under the Department of Artificial Intelligence and Data Science, aimed at equipping students with practical knowledge of machine learning algorithms. The course includes various exercises such as implementing the Candidate-Elimination algorithm, building an Artificial Neural Network, and applying different classification models. The institution is recognized by the Government of Tamil Nadu and affiliated with Anna University, Chennai.

Uploaded by

dhananjeyans41
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE

OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION


A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND


DATA SCIENCE

AL3461 MACHINE LEARNING LABORATORY

REGULATION-2021

NAME :

REGISTER NUMBER :

YEAR : II

SEMESTER : IV
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.
.

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND


DATA SCIENCE

VISION
To produce high quality, creative and ethical engineers, and technologists
contributing effectively to the ever-advancing Artificial Intelligence and Data
Science field.

MISSION
To educate future software engineers with strong fundamentals by
continuously improving the teaching-learning methodologies using
contemporary aids.
To produce ethical engineers/researchers by instilling the values of
humility, humaneness, honesty and courage to serve the society.
To create a knowledge hub of Artificial Intelligence and Data Science
with everlasting urge to learn by developing, maintaining and continuously
improving the resources/Data Science.
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, AI&DS, ECE, CSE,IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

Register No:

BONAFIDE CERTIFICATE

This is to certify that this is a bonafide record of the work done by


Mr./Ms. of IIrd YEAR/ IV SEM

B.Tech ARTIFICIAL INTELLIGENCE AND DATA SCIENCE in


AL3461- MACHINE LEARNING LABORATORY during the Academic year 2022 – 2023.

Faculty-in-charge Head of the Department

Submitted for the University Practical Examination held on :

Internal Examiner External Examiner

DATE:
MISRIMAL NAVAJEE MUNOTH JAIN ENGINEERING COLLEGE
OWNED AND MANAGED BY TAMILNADU EDUCATIONAL AND MEDICAL FOUNDATION
A JAIN MINORITY INSTITUTION
APPROVED BY AICTE &PROGRAMMES ACCREDITED BY NBA, NEW DELHI, (UG PROGRAMMES – MECH, EEE, ECE, CSE & IT)
ALL PROGRAMMES RECOGNIZED BY THE GOVERNMENT OF TAMIL NADU AND AFFILIATED TO ANNA UNIVERSITY, CHENNAI
GURU MARUDHARKESARI BUILDING, JYOTHI NAGAR, RAJIV GANDHI SALAI, OMR THORAIPAKKAM, CHENNAI - 600 097.

AL3461 MACHINE LEARNING LABORATORY

COURSE OUTCOMES

Understand the implementation procedures for the machine learning


CO1
algorithms

CO 2 Design Java/Python programs for various Learning algorithms.

CO 3 Apply appropriate Machine Learning algorithms to data sets

Identify and apply Machine Learning algorithms to solve real world


CO 4
problems
AL3461 MACHINE LEARNING LABORATORY
CONTENT

PAGE
S.NO. TOPIC DATE SIGNATURE
NO

For a given set of training data


examples stored in a .CSV file
1. implement and demonstrate
the Candidate –Elimination
algorithm to output a
description of the set of all
hypotheses consistent with the
training examples
Build an Artificial Neural
2. Network by implementing
the Backpropagation
algorithm
and test the same using
appropriate data sets.
Write a program to implement
3. the naïve Bayesian classifier for a
sample training dataset stored as
a .CSV file. Compute the accuracy
of the classifier, considering few
test data sets.

Implement naïve Bayesian


4.
Classifier model to classify a
set of documents and
measure the accuracy,
precision ,and recall.

Write a program to construct a


5.
Bayesian network considering
medical data. Use this
model to demonstrate the
diagnosis of heart patients using
standard Heart Disease
Data Set.
Apply EM algorithm to cluster
a set of data stored in a .CSV
file. Use the same data set for
clustering using the k-Means
6.
algorithm. Compare the results
of these two algorithms.

Write a program to implement


7. k-Nearest Neighbour
algorithm to classify the iris
dataset. Print both correct and
wrong predictions

Write a program to implement


8.
Decision Tree classification
model

9. Implement Logistic regression


Algorithm with a dataset .And
measure the accuracy score and
confusion matrix

10. Implement Linear regression


Algorithm with a dataset .And
measure the accuracy score

V
SYLLABUS
AL3461 MACHINE LEARNING LABORATORY
COURSE OBJECTIVES
➢ To get practical knowledge on implementing machine learning algorithms in real time problem for
getting solutions
➢ To implement supervised learning and their applications
➢ To understand unsupervised learning like clustering and EM algorithms
➢ To understand the theoretical and practical aspects of probabilistic graphical models.

Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh

Suggested Exercises:

1. For a given set of training data examples stored in a .CSV file, implement and demonstrate the
Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent
with the training examples.
2. Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the
same using appropriate data sets.
3. Write a program to implement the naïve Bayesian classifier for a sample training data set stored
as a .CSV file and compute the accuracy with a few test data sets.
4. Implement naïve Bayesian Classifier model to classify a set of documents and measure the
accuracy, precision, and recall.
5. Write a program to construct a Bayesian network considering medical data. Use this model to
d demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
6. Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for
clustering using the k-Means algorithm. Compare the results of these two algorithms.
7. Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print
both correct and wrong predictions.
8.Write a program to implement Decision Tree classification model.
9. Implement Logistic regression Algorithm with a dataset . And measure the accuracy score and
Confusion matrix.
10. Implement Linear regression Algorithm with a dataset . And measure the accuracy score.
Ex.No: 1 For a given set of training data examples stored in a
.CSV file implement and demonstrate the Candidate –
Elimination algorithm to output a description of the set
Date : of all hypotheses consistent with the training examples

AIM:
To implement and demonstrate the Candidate-Elimination algorithm to output a
description of the set of all hypotheses consistent with the training examples.

DATASET: trainingdata1.xlsx

Link : https://docs.google.com/spreadsheets/d/1X-
SG3qz2zCkWvGXMQ2GlsP76zvf7hMD4/edit?usp=share_link&ouid=107168863405783275058&rtpof=true&sd=tru
e

ALGORITHM:

Step 1: Load Data set

Step 2: Importing the dataset.

Step 3: For each training example

Step 4: : If example is positive example


if attribute_value == hypothesis_value:
Do nothing
else:
replace attribute value with '?' (Basically generalizing it)

Step 5: If example is Negative example


Make generalize hypothesis more specific.

10
PROGRAM/SOURCE CODE:
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_excel('trainingdata1.xlsx'))
print(data)
Origin Manufacturer color Decade Type Example Type
0 Japan Honda blue 1980 economy positive
1 Japan Toyota green 1970 sports positive
2 Japan Toyota blue 1990 economy negative
3 USA Chrysler red 1980 economy positive
4 Japan Honda white 1980 economy positive

concepts = np.array(data.iloc[:,0:-1])
target = np.array(data.iloc[:,-1])
print("concept:",concepts)
print("target:",target)

concept: [['Japan' 'Honda' 'blue' 1980 'economy']


['Japan' 'Toyota' 'green' 1970 'sports']
['Japan' 'Toyota' 'blue' 1990 'economy']
['USA' 'Chrysler' 'red' 1980 'economy']
['Japan' 'Honda' 'white' 1980 'economy']]
target: ['positive' 'positive' 'negative' 'positive' 'positive’]

def learn(concepts, target):


specific_h = concepts[0].copy()
print("Initialization of specific_h and general_h")
print("specific_h: ",specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in range(len(specific_h))]
print("general_h: ",general_h)
print("concepts: ",concepts)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
print(specific_h[x])
#print("h[x]",h[x])
if h[x] != specific_h[x]:
specific_h[x] = '?'
general_h[x][x] = '?'
if target[i] == "no":
for x in range(len(specific_h)):
if h[x] != specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
11
print("\nSteps of Candidate Elimination Algorithm: ",i+1)
print("Specific_h: ",i+1)
print(specific_h,"\n")
print("general_h :", i+1)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val == ['?', '?', '?', '?', '?', '?']]
print("\nIndices",indices)
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final,g_final = learn(concepts, target)
print("\nFinal Specific_h:", s_final, sep="\n")

OUTPUT :
Initialization of specific_h and general_h
specific_h: ['Japan' 'Honda' 'blue' 1980 'economy']
general_h: [['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'],
['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?',
'?', '?', '?']]
concepts: [['Japan' 'Honda' 'blue' 1980 'economy']
['Japan' 'Toyota' 'green' 1970 'sports']
['Japan' 'Toyota' 'blue' 1990 'economy']
['USA' 'Chrysler' 'red' 1980 'economy']
['Japan' 'Honda' 'white' 1980 'economy']]

Steps of Candidate Elimination Algorithm: 5


Specific_h: 5
['Japan' 'Honda' 'blue' 1980 'economy']

general_h : 5
[['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?',
'?', '?'], ['?', '?', '?', '?', '?'], ['?', '?', '?', '?', '?']]

Indices []

Final Specific_h:
['Japan' 'Honda' 'blue' 1980 'economy']

12
RESULT:
Thus, the program to Implement the concept of decision trees with suitable data set from real world
problem and classify the data set to produce new sample using Python has been executed successfully.

13
Ex.No: 2 Build an Artificial Neural Network by implementing the
Backpropagation algorithm and test the same using
Date : appropriate data sets.

AIM:
Build an Artificial Neural Network by implementing the Backpropagation algorithm and
test the same using appropriate data sets.

ALGORITHM:

Step 1: Import `numpy`.


Step 2: Define and normalize the input and output data.
Step 3: Define the sigmoid function and its derivative.
Step 4: Initialize the hyperparameters and random weights and biases.
Step 5: Train the neural network using forward and backpropagation.
Step 6: Predict the output for new data using the trained weights and biases.
Step 7: Print the input, actual output, and predicted output.

PROGRAM/SOURCE CODE:
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y = y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1 #number of neurons at output layer
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))

# draws a random range of numbers uniformly of dim x*y

14
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout
output = sigmoid(outinp)

#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d _output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)

#how much hidden layer wts contributed to error


d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n" ,output)

OUTPUT:
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[92.]
[86.]
[89.]]
Predicted Output:
[[0.99999908]
[0.99999712]
[0.99999904]]

RESULT:
Thus, the program to Build an Artificial Neural Network by implementing the Backpropagation
algorithm using python has been executed successfully.

15
Ex.No: 3 Write a program to implement the naïve Bayesian classifier for a
sample training dataset stored as a .CSV file. Compute the accuracy
of the classifier, considering few test data sets
Date :

AIM:
To implement the naïve Bayesian classifier for a sample training dataset stored as a .CSV
file and compute the accuracy with a few test data sets.

DATASET: pima_indian.csv

LINK: https://drive.google.com/file/d/18PcjOtDELvR8wY4-iCiAXm1wNuox67a7/view?usp=share_link

ALGORITHM:

Step 1: Import `pandas`, `train_test_split` and `GaussianNB` from `sklearn.naive_bayes`, and


`metrics` from `sklearn`.

Step 2: . Load the dataset and split it into training and testing datasets using `train_test_split`.

Step 3: . Train the Naive Bayes classifier on the training data using
`GaussianNB().fit(xtrain,ytrain.ravel())`.

Step 4: Predict the class labels for the testing data using `clf.predict(xtest)`.

Step 5: Print the accuracy of the classifier using `metrics.accuracy_score()`.s.

PROGRAM/SOURCE CODE:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics

df = pd.read_csv("pima_indian.csv")
feature_col_names = ['num_preg', 'glucose_conc', 'diastolic_bp', 'thickness', 'insulin', 'bmi', 'diab_pred', 'ag
e']
predicted_class_names = ['diabetes']

16
X = df[feature_col_names].values # these are factors for the prediction
y = df[predicted_class_names].values # this is what we want to predict

#splitting the dataset into train and test data

xtrain,xtest,ytrain,ytest=train_test_split(X,y,test_size=0.33)
print ('\n the total number of Training Data :',ytrain.shape)
print ('\n the total number of Test Data :',ytest.shape)

# Training Naive Bayes (NB) classifier on training data.

clf = GaussianNB().fit(xtrain,ytrain.ravel())
predicted = clf.predict(xtest)
predictTestData= clf.predict([[6,148,72,35,0,33.6,0.627,50]])

#printing Confusion matrix, accuracy, Precision and Recall

print('\n Confusion matrix')


print(metrics.confusion_matrix(ytest,predicted))

print('\n Accuracy of the classifier is',metrics.accuracy_score(ytest,predicted))

print('\n The value of Precision', metrics.precision_score(ytest,predicted))

print('\n The value of Recall', metrics.recall_score(ytest,predicted))

print("Predicted Value for individual Test Data:", predictTestData)

OUTPUT:
the total number of Training Data : (514, 1)

the total number of Test Data : (254, 1)

Confusion matrix
[[148 23]
[ 35 48]]
Accuracy of the classifier is 0.7716535433070866

The value of Precision 0.676056338028169

The value of Recall 0.5783132530120482


Predicted Value for individual Test Data: [1]

RESULT:
Thus, the program to implement the naïve Bayesian classifier for a sample training dataset stored as a
.CSV file using Python has been executed successfully.

17
Ex.No: 4 Implement naïve Bayesian Classifier model to classify a
set of documents and measure the accuracy, precision
Date : ,and recall.

AIM:
Implement naïve Bayesian Classifier model to classify a set of documents and
measure the accuracy, precision, and recall

DATASET: naivetext.csv

LINK: https://drive.google.com/file/d/1sEpbtiB9qP6DdpqlvbB_6OL8aBsJqe8s/view?usp=share_link

ALGORITHM:

Step 1: Import required libraries.

Step 2: . Load the dataset and convert labels to numerical values.

Step 3: Split the dataset into training and test data.

Step 4: Convert messages into document-term matrices using CountVectorizer.

Step 5: Train a Multinomial Naive Bayes classifier on the training data.

Step 6: Use the trained classifier to make predictions on the test data.

Step 7: Calculate and print the accuracy, confusion matrix, precision, and recall.

PROGRAM/SOURCE CODE:

import pandas as pd
msg=pd.read_csv('naivetext.csv',names=['message','label'])
print('The dimensions of the dataset',msg.shape)
msg['labelnum']=msg.label.map({'pos':1,'neg':0})
18
X=msg.message
y=msg.labelnum

print(X)
print(y)

from sklearn.model_selection import train_test_split


xtrain,xtest,ytrain,ytest=train_test_split(X,y)
print ('\n The total number of Training Data :',ytrain.shape)
print ('\n The total number of Test Data :',ytest.shape)

#output of count vectoriser is a sparse matrix


from sklearn.feature_extraction.text import CountVectorizer
count_vect = CountVectorizer()
xtrain_dtm = count_vect.fit_transform(xtrain)
xtest_dtm=count_vect.transform(xtest)
df=pd.DataFrame(xtrain_dtm.toarray())

# Training Naive Bayes (NB) classifier on training data.


from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB().fit(xtrain_dtm,ytrain)
predicted = clf.predict(xtest_dtm)

#printing accuracy, Confusion matrix, Precision and Recall


from sklearn import metrics
print("\n Accuracy of the classifer is")
metrics.accuracy_score(ytest,predicted)
print('\n Confusion matrix')
print(metrics.confusion_matrix(ytest,predicted))
print('\n The value of Precision' ,
metrics.precision_score(ytest,predicted))
print('\n The value of Recall' ,
metrics.recall_score(ytest,predicted))

OUTPUT:
The dimensions of the dataset (18, 2)
0 I love this sandwich
1 This is an amazing place
2 I feel very good about these beers
3 This is my best work
4 What an awesome view
5 I do not like this restaurant
6 I am tired of this stuff
7 I can't deal with this
8 He is my sworn enemy
9 My boss is horrible
10 This is an awesome place
11 I do not like the taste of this juice
12 I love to dance
13 I am sick and tired of this place
14 What a great holiday
19
15 That is a bad locality to stay
16 We will have good fun tomorrow
17 I went to my enemy's house today

Name: message, dtype: object


0 1
1 1
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Name: labelnum, dtype: int64

The total number of Training Data : (13,)

The total number of Test Data : (5,)

Accuracy of the classifer is

Confusion matrix
[[2 0]
[0 3]]

The value of Precision 1.0

The value of Recall 1.0

20
RESULT:

Thus, the program to Implement naïve Bayesian Classifier model to classify a


set of documents and measure the accuracy, precision ,and recall using Python has
been executed successfully

21
Ex.No: 5 Write a program to construct a Bayesian network
considering medical data. Use this model to demonstrate
Date : the diagnosis of heart patients using standard Heart Disease
Data Set.

AIM:
To construct a Bayesian network considering medical data. Use this model to demonstrate
the diagnosis of heart patients using standard Heart Disease Data Set.

DATASET: heart.csv

LINK: https://drive.google.com/file/d/10C80zeowRWEGazpPZw_n0wK4f_rRlRbL/view?usp=share_link

ALGORITHM:

Step 1: Install 'pgmpy' package.


Step 2: Import required libraries and classes.
Step 3: Read heart disease dataset in CSV format.
Step 4: Replace missing values with 'NaN'.

Step 5: Define a Bayesian network model

Step 6: Learn CPDs of the model from the dataset using MLE.

Step 7: Perform inference with the Bayesian network using 'VariableElimination' class.

Step 8: Compute and print the probabilities of heart disease given evidence of 'restecg=1'
and 'cp=2' using the 'query' method of 'VariableElimination' object.

22
PROGRAM/SOURCE CODE:

!pip install pgmpy

import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

print('Sample instances from the dataset are given below')


heartDisease.head()

print('\n Attributes and datatypes')


print(heartDisease.dtypes)

23
model= BayesianModel([('age','heartdisease'),('gender','heartdisease'),('exang','heartdisease'),('
cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])

print('\nLearning CPD using Maximum likelihood estimators')


model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

print('\n Inferencing with Bayesian Network:')


HeartDiseasetest_infer = VariableElimination(model)

print('\n 1. Probability of HeartDisease given evidence= restecg')


q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)

print('\n 2. Probability of HeartDisease given evidence= cp ')


q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)

24
RESULT:
Thus, the program to implement Bayesian network considering medical
data. Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set has been executed successfully.

25
Ex.No: 6 Apply EM algorithm to cluster a set of data
stored in a .CSV file. Use the same data set for
Date : clustering using the k-Means algorithm.
Compare the results of these two algorithms.

AIM:
To Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set
for clustering using the k-Means algorithm. Compare the results of these two algorithms.

DATASET: iris.csv

LINK: https://drive.google.com/file/d/1-lseekjQ6h1xHKETLYlm7a_IfZ-3sAOY/view?usp=share_link

ALGORITHM:

Step 1: Import required libraries/modules/packages.

Step 2: Read the dataset from the given CSV file into a pandas dataframe.

Step 3: Extract the input features from the dataset anDstore it in a new dataframe
X.

Step 4: Create a KMeans model with three clusters and fit it to the input data X.

Step 5: Create a Gaussian Mixture model with three components and fit it to the input data
X.

Step 6: Print the accuracy score and confusion matrix of both models.

26
PROGRAM/SOURCE CODE:

import matplotlib.pyplot as plt

names = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width', 'Class']

dataset = pd.read_csv("iris.csv", names=names)

X = dataset.iloc[:, :-1]

label = {'Iris-setosa': 0,'Iris-versicolor': 1, 'Iris-virginica': 2}

y = [label[c] for c in dataset.iloc[:, -1]]

plt.figure(figsize=(14,7))
colormap=np.array(['red','lime','black'])

# REAL PLOT
plt.subplot(1,3,1)
plt.title('Real')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y])

27
# K-PLOT
model=KMeans(n_clusters=3, random_state=0).fit(X)
plt.subplot(1,3,2)
plt.title('KMeans')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[model.labels_])

print('The accuracy score of K-Mean: ',metrics.accuracy_score(y, model.labels_))


print('The Confusion matrixof K-Mean:\n',metrics.confusion_matrix(y, model.labels_))

28
#GMM PLOT
gmm=GaussianMixture(n_components=3, random_state=0).fit(X)
y_cluster_gmm=gmm.predict(X)
plt.subplot(1,3,3)
plt.title('GMM Classification')
plt.scatter(X.Petal_Length,X.Petal_Width,c=colormap[y_cluster_gmm])

print('The accuracy score of EM: ',metrics.accuracy_score(y, y_cluster_gmm))


print('The Confusion matrix of EM:\n ',metrics.confusion_matrix(y,
y_cluster_gmm))

RESULT:
Thus, the program to implement EM algorithm to cluster a set of data stored in a
.CSV file. Use the same data set for clustering using the k-Means algorithm has been executed
successfully.

29
Ex.No: 7 Write a program to implement k-Nearest Neighbour
algorithm to classify the iris dataset. Print both
Date : correct and wrong predictions

AIM:
To implement k-Nearest Neighbour algorithm to classify the iris data set. Print both
correct and wrong predictions

DATASET: iris.csv

LINK: https://drive.google.com/file/d/1vVwpEuIyb-r3uVtrvbZxBMDsX5wpq-ml/view?usp=share_link

ALGORITHM:

Step 1: Load the Iris dataset from a CSV file into a pandas dataframe.

Step 2: Split the dataset into the input features X and output class y.

Step 3: Split the dataset into training and testing sets.

Step 4: Initialize a KNN classifier with n_neighbors set to 5 and fit it to the training data.
Step 5: Predict the class labels for the testing set using the KNN classifier.

Step 6: Calculate and print the confusion matrix, classification report, and accuracy score
of the KNN classifier.

Step 7: Print the accuracy of the classifier.

30
PROGRAM/SOURCE CODE:

import numpy as np
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import train_test_split
from sklearn import metrics

names = ['sepal-length', 'sepal-width', 'petal-length', 'petal-width', 'Class']

# Read dataset to pandas dataframe


dataset = pd.read_csv("iris.csv", names=names)
X = dataset.iloc[:, :-1]
y = dataset.iloc[:, -1]
print(X.head())

Xtrain, Xtest, ytrain, ytest = train_test_split(X, y, test_size=0.10)

classifier = KNeighborsClassifier(n_neighbors=5).fit(Xtrain, ytrain)

ypred = classifier.predict(Xtest)

i=0
print ("\n-------------------------------------------------------------------------")
print ('%-25s %-25s %-25s' % ('Original Label', 'Predicted Label', 'Correct/Wrong'))
print ("-------------------------------------------------------------------------")

31
for label in ytest:
print ('%-25s %-25s' % (label, ypred[i]), end="")
if (label == ypred[i]):
print (' %-25s' % ('Correct'))
else:
print (' %-25s' % ('Wrong'))
i=i+1

print ("-------------------------------------------------------------------------")
print("\nConfusion Matrix:\n",metrics.confusion_matrix(ytest, ypred))
print ("-------------------------------------------------------------------------")
print("\nClassification Report:\n",metrics.classification_report(ytest, ypred))
print ("-------------------------------------------------------------------------")
print('Accuracy of the classifer is %0.2f' % metrics.accuracy_score(ytest,ypred))
print ("-------------------------------------------------------------------------")

32
RESULT:
Thus, the program to implement k-Nearest Neighbour algorithm to classify the iris
data set using Python has been executed successfully.

33
Ex.No: 8
Write a program to implement Decision Tree
Date : classification model using a .CSV file to measure the
accuracy.

AIM:
To implement Decision Tree classification model using a .CSV file to measure the
accuracy.

DATASET: data_cleaned.csv

LINK: https://drive.google.com/file/d/1VbrVnGcblK7e2PXVQlMkUTtxTkaVVYBW/view?usp=share_link

ALGORITHM:

Step 1: Load cleaned data and split into training and validation sets.

Step 2: Train a decision tree classifier on the training set.

Step 3: Evaluate classifier accuracy on training and validation sets.

Step 4: Use predict() to get predicted outcomes for validation set.

Step 5: Use predict_proba() to get survival probabilities for validation set.

Step 6: Create new predictions based on a probability threshold of 0.7.

Step 7: Iterate through different values of max_depth to generate train and validation
Accuracy scores.

Step 8: Visualize train and validation accuracy scores using a line graph.

Step 9: Create a decision tree classifier with max_depth of 8 and max_leaf_nodes of 25.

Step 10: Fit the classifier to the training set and evaluate accuracy on training and validation
Sets.

34
Step 11: Use graphviz to create a visualization of the decision tree.

Step 12: Display the visualization using matplotlib.pyplot.

PROGRAM/SOURCE CODE:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
data=pd.read_csv('data_cleaned.csv')
print(data.shape)
data.isnull().sum()
y = data['Survived']
X = data.drop(['Survived'], axis=1)
from sklearn.model_selection import train_test_split
X_train, X_valid, y_train, y_valid = train_test_split(X, y, random_state = 101, stratify=y, t
est_size=0.25)
y_train.value_counts (normalize=True)
y_valid.value_counts(normalize=True)
X_train.shape, y_train.shape
X_valid.shape, y_valid.shape
from sklearn.tree import DecisionTreeClassifier
dt_model = DecisionTreeClassifier(random_state=10)
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid, y_valid)
dt_model.predict(X_valid)
dt_model.predict_proba(X_valid)
y_pred = dt_model.predict_proba(X_valid)[:,1]
y_new = []
for i in range(len(y_pred)):
if y_pred[i]<=0.7:
y_new.append(0)
else:
y_new.append(1)
from sklearn.metrics import accuracy_score
accuracy_score(y_valid, y_new)
train_accuracy = []
validation_accuracy = []
for depth in range(1,30):
dt_model = DecisionTreeClassifier(max_depth=depth, random_state=10)
35
dt_model.fit(X_train, y_train)
train_accuracy.append(dt_model.score(X_train, y_train))
validation_accuracy.append(dt_model.score(X_valid, y_valid))
frame = pd.DataFrame({'max_depth':range(1,30), 'train_acc':train_accuracy, 'valid_acc':val
idation_accuracy})
frame.head(15)
plt.figure(figsize=(12,6))
plt.plot(frame['max_depth'], frame['train_acc'], marker='o')
plt.plot(frame['max_depth'], frame['valid_acc'], marker='o')
plt.xlabel('Depth of tree')
plt.ylabel('performance')
plt.legend()
dt_model = DecisionTreeClassifier(max_depth=8, max_leaf_nodes=25, random_state=10)
dt_model
dt_model.fit(X_train, y_train)
dt_model.score(X_train, y_train)
dt_model.score(X_valid, y_valid)
from sklearn import tree
!pip install graphviz
decision_tree = tree.export_graphviz(dt_model,out_file='tree.dot',feature_names=X_train.c
olumns,max_depth=2,filled=True)
!dot -Tpng tree.dot -o tree.jpg
image = plt.imread('tree.jpg')
plt.figure(figsize=(15,15))
plt.imshow(image)

OUTPUT:

36
RESULT:
Thus, the program to implement Decision Tree classification model
using a .CSV file to measure the accuracy using Python has been executed successfully.

37
Ex.No: 9
Implement Logistic regression Algorithm with a dataset
Date : .And measure the accuracy score and confusion matrix

AIM:
To implement Logistic regression Algorithm with a dataset . And measure the accuracy
score and Confusion matrix.

DATASET: summa.csv

LINK:https://drive.google.com/file/d/1w8C2PmuZkDOuVEhIwTdBb3LJMW7HIJ1R/view?usp=share_link

ALGORITHM:

Step 1: Import necessary libraries and load the dataset using pandas.read_csv().
Step 2: Extract the 'temp' and 'label' columns and reshape them.

Step 3: Create a scatter plot with a logistic regression line using seaborn.regplot().

Step 4: Split the data into training and testing sets using train_test_split().

Step 5: Initialize a LogisticRegression model object and fit the training data.

Step 6: Predict the y values for the testing data and calculate the accuracy score using
Accuracy score.

Step 7: Generate a confusion matrix to evaluate the model's performance on the entire
Dataset.

38
PROGRAM/SOURCE CODE:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score,confusion_matrix
df=pd.read_csv('summa.csv')
x=df['temp'].values
y=df['label'].values
x=x.reshape(-1,1)
y=y.reshape(-1,1)
plt.scatter(x,y)
sns.regplot(x=x,
y=y,
data=df,
logistic=True,
line_kws={'color':'black'},
scatter_kws={'color':'green'})

39
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.2)
lr=LogisticRegression()
lr.fit(xtrain,ytrain)

#accuracy on test data


ypred=lr.predict(xtest)
print("accuracy score:",accuracy_score(ytest,ypred))

print("confusion matrix:","\n",confusion_matrix(y,lr.predict(x)))

RESULT:

Thus, the program to implement Logistic regression Algorithm with a dataset . And
measure the accuracy score and Confusion matrix using Python has been executed
successfully.

40
Ex.No: 10
Implement Linear regression Algorithm with a
Date :
dataset And measure the accuracy score.

AIM:
Implement Linear regression Algorithm with a dataset and measure the accuracy score.

DATASET: linear data.csv

LINK: https://drive.google.com/file/d/1zSPRRkSxqLPYWsQTCQHjAqUfNv14zrqU/view?usp=share_link

ALGORITHM:

Step 1: Import necessary libraries and mount Google Drive to access the dataset using
Drive.mount().

Step 2: Load the 'linear_data.csv' dataset using pandas.read_csv() function and store it in a
DataFrame variable called df.

Step 3: Extract the 'x' and 'y' columns and reshape them.

Step 4: Split the data into training and testing sets using train_test_split() with a test size of
0.25.

Step 5: Initialize a LinearRegression model object and fit the training data.

Step 6: Predict the y values for the testing data using lr.predict().

Step 7: Create a scatter plot of the 'x' and 'y' data with a linear regression line using
matplotlib.pyplot.scatter() and matplotlib.pyplot.plot() functions Dataset.

Step 8: Calculate the R-squared score of the model using r2_score() function by comparing
the predicted y values with the actual y values in the testing set.

Step 9: Print the calculated R-squared score.

41
PROGRAM/SOURCE CODE:

import matplotlib.pyplot as plt


import pandas as pd
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score,r2_score
from google.colab import drive
drive.mount('/content/drive')

df=pd.read_csv('/linear_data.csv')
x=df['x'].values
y=df['y'].values
x=x.reshape(-1,1)
y=y.reshape(-1,1)
xtrain,xtest,ytrain,ytest=train_test_split(x,y,test_size=0.25)
lr=LinearRegression()
lr.fit(xtrain,ytrain)
y_pred=lr.predict(xtest)
plt.scatter(x,y,c='blue')
plt.plot(xtest,y_pred)
plt.xlabel("X-axis")
plt.ylabel("Y-axis")

42
accuracy = r2_score(ytest, y_pred)
print("R2_score:", accuracy)

RESULT:
Thus, the program to Implement Linear regression Algorithm with a
dataset and measure the accuracy score using python has been executed successfully.

43
44

You might also like