0% found this document useful (0 votes)

36 views10 pages

ML Lab Experiments (1) - Pages-4

The document describes experiments to cluster medical data using the EM algorithm and k-means clustering. It explains the EM algorithm process and compares the results of EM and k-means when clustering heart disease data. Sample code demonstrates applying k-means clustering to iris data and outputs cluster labels.

Uploaded by

Tarasha Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views10 pages

ML Lab Experiments (1) - Pages-4

Uploaded by

Tarasha Maheshwari

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 7
Objective: Write a program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
You can use Java/Python ML library classes/API.

Description:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
dependency, and each node corresponds to a unique random variable.
Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
 The directed acyclic graph is a set of random variables represented by nodes.
 The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
For illustration, consider the following example. Suppose we attempt to turn on our computer,
but the computer does not start (observation/evidence). We would like to know which of the
possible causes of computer failure is more likely. In this simplified illustration, we assume only
two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.

Fig: Directed acyclic graph representing two independent possible causes of a computer failure.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Data Set:
13
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Title: Heart Disease Databases

The Cleveland database contains 76 attributes, but all published experiments refer to using a
subset of 14 of them. In particular, the Cleveland database is the only one that has been used by
ML researchers to this date. The "Heartdisease" field refers to the presence of heart disease in the
patient. It is integer valued from 0 (no presence) to 4.

Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303

Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
 Value 1: typical angina
 Value 2: atypical angina
 Value 3: non-anginal pain
 Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
 Value 0: normal
 Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or
depression of > 0.05 mV)
 Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
 Value 1: upsloping
 Value 2: flat
 Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect

14
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

13. Heartdisease: It is integer valued from 0 (no presence) to 4.

Some instance from the dataset:
ag se c trestbp cho fb restec thalac exan oldpea slop c tha Heartdiseas
e x p s l s g h g k e a l e
63 1 1 145 233 1 2 150 0 2.3 3 0 6 0
67 1 4 160 286 0 2 108 1 1.5 2 3 3 2
67 1 4 120 229 0 2 129 1 2.6 2 2 7 1
41 0 2 130 204 0 2 172 0 1.4 1 0 3 0
62 0 4 140 268 0 2 160 0 3.6 3 2 3 3
60 1 4 130 206 0 2 132 1 2.4 2 2 7 4

Program:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination

#read Cleveland Heart Disease data

heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)

#display the data

print('Sample instances from the dataset are given below')
print(heartDisease.head())

#display the Attributes names and datatyes

print('\n Attributes and datatypes')
print(heartDisease.dtypes)

#Creat Model- Bayesian Network

model= BayesianModel([('age','heartdisease'),('sex','heartdisease'),('exang','heartdisease'),
('cp','heartdisease'),('heartdisease','restecg'),('heartdisease','chol')])

15
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

#Learning CPDs using Maximum Likelihood Estimators

print('\n Learning CPD using Maximum likelihood estimators')
model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)

# Inferencing with Bayesian Network

print('\n Inferencing with Bayesian Network:')
HeartDiseasetest_infer = VariableElimination(model)

#computing the Probability of HeartDisease given restecg

print('\n 1.Probability of HeartDisease given evidence= restecg :1')
q1=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'restecg':1})
print(q1)

#computing the Probability of HeartDisease given cp

print('\n 2.Probability of HeartDisease given evidence= cp:2 ')
q2=HeartDiseasetest_infer.query(variables=['heartdisease'],evidence={'cp':2})
print(q2)

Output:

16
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

17
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Experiment No: 8
Objective: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
data set for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
Description:
The expectation-maximization algorithm is an approach for performing maximum likelihood
estimation in the presence of latent variables. It does this by first estimating the values for the
latent variables, then optimizing the model, then repeating these two steps until convergence. It
is an effective and general approach and is most commonly used for density estimation with
missing data, such as clustering algorithms like the Gaussian Mixture Model.
Algorithm:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.

The essence of Expectation-Maximization algorithm is to use the available observed data of the
dataset to estimate the missing data and then using that data to update the values of the
parameters.
Initially, a set of initial values of the parameters are considered. A set of incomplete observed
data is given to the system with the assumption that the observed data comes from a specific
model.
The next step is known as “Expectation” – step or E-step. In this step, we use the observed data
in order to estimate or guess the values of the missing or incomplete data. It is basically used to
update the variables.
The next step is known as “Maximization”-step or M-step. In this step, we use the complete data
generated in the preceding “Expectation” – step in order to update the values of the parameters.
It is basically used to update the hypothesis.

18
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

Now, in the fourth step, it is checked whether the values are converging or not, if yes, then stop
otherwise repeat step-2 and step-3 i.e. “Expectation” – step and “Maximization” – step until the
convergence occurs.

Fig: flowchart of EM algorithm

Program:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np

#import matplotlib inline

iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
X.columns = ['Sepal_Length','Sepal_Width','Petal_Length','Petal_Width']
y = pd.DataFrame(iris.target)
y.columns = ['Targets']

#colormap = np.array(['red', 'lime', 'black'])

# K Means Cluster

19
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

model = KMeans(n_clusters=3)
model.fit(X)

# This is what KMeans thought

model.labels_

# View the results

# Set the size of the plot

plt.figure(figsize=(14,7))

# Create a colormap
colormap = np.array(['red', 'lime', 'black'])

# Plot the Original Classifications

plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')

# Plot the Models Classifications

plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[model.labels_], s=40)
plt.title('K Mean Classification')

# View the results

# Set the size of the plot
plt.figure(figsize=(14,7))
# Create a colormap
#print('The accuracy score : ',sm.accuracy_score(y, model.labels_))
#sm.confusion_matrix(y, model.labels_)

predY = np.choose(model.labels_, [0, 1, 2]).astype(np.int64)

print (predY)

20
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

#colormap = np.array(['red', 'lime', 'black'])

# Plot Orginal
plt.subplot(1, 2, 1)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y.Targets], s=40)
plt.title('Real Classification')
# Plot Predicted with corrected values
plt.subplot(1, 2, 2)
plt.scatter(X.Petal_Length,X.Petal_Width, c=colormap[predY], s=40)
plt.title('K Mean Classification')

print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))

print('The Confusion matrixof K-Mean: ',sm.confusion_matrix(y, model.labels_))

from sklearn import preprocessing

scaler = preprocessing.StandardScaler()
scaler.fit(X)
xsa = scaler.transform(X)
xs = pd.DataFrame(xsa, columns = X.columns)
#xs.sample(5)

from sklearn.mixture import GaussianMixture

gmm = GaussianMixture(n_components=3)
gmm.fit(xs)

y_cluster_gmm = gmm.predict(xs)
#y_cluster_gmm

plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')

print('The accuracy score of EM: ',sm.accuracy_score(y, y_cluster_gmm))

print('The Confusion matrix of EM: ',sm.confusion_matrix(y, y_cluster_gmm))
Output:
21
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021

[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]

22
Laboratory File

Xii-Pst Book PDF
0% (1)
Xii-Pst Book PDF
96 pages
Final PPT Heart Disease
67% (3)
Final PPT Heart Disease
23 pages
991.20 Nitrogeno Total en Leche - Kjeldahl
No ratings yet
991.20 Nitrogeno Total en Leche - Kjeldahl
2 pages
Heart Disease Prediction Final
67% (3)
Heart Disease Prediction Final
45 pages
Heart Disease Prediction Using Machine Learning-1
No ratings yet
Heart Disease Prediction Using Machine Learning-1
6 pages
MSS 064 Rev.00 Final
No ratings yet
MSS 064 Rev.00 Final
33 pages
FTRE Brochure
No ratings yet
FTRE Brochure
36 pages
Problem Set 1 - Simple Interest
50% (2)
Problem Set 1 - Simple Interest
2 pages
9TH SSC Trigonometry Paper
100% (2)
9TH SSC Trigonometry Paper
2 pages
Evaluation of Cardiovascular Disease in Diabetic Patients Using Machine Learning Techniques
No ratings yet
Evaluation of Cardiovascular Disease in Diabetic Patients Using Machine Learning Techniques
13 pages
Heart Disease Prediction - Jupyter Notebook
100% (1)
Heart Disease Prediction - Jupyter Notebook
9 pages
Synopsis (Heart Disease Prediction)
No ratings yet
Synopsis (Heart Disease Prediction)
7 pages
Program 7
100% (1)
Program 7
4 pages
Garg 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012046
No ratings yet
Garg 2021 IOP Conf. Ser. Mater. Sci. Eng. 1022 012046
10 pages
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
No ratings yet
Predicting The Presence of Heart Diseases Using Comparative Data Mining and Machine Learning Algorithms
5 pages
4-10 Aiml
No ratings yet
4-10 Aiml
25 pages
Heart Disease Report
No ratings yet
Heart Disease Report
8 pages
Program 6
No ratings yet
Program 6
4 pages
Impute L CMD
No ratings yet
Impute L CMD
30 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
16 pages
COMP5318
No ratings yet
COMP5318
42 pages
Ex 8
No ratings yet
Ex 8
2 pages
ML 7th and 10th Program
No ratings yet
ML 7th and 10th Program
8 pages
AB Report Group 2
No ratings yet
AB Report Group 2
14 pages
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
No ratings yet
Machine Learning Assignment 1 Basic Concepts: Due: 27 March 2015, 15:00pm
3 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
E.X No.4 Implement Bayesian Networks Aim
No ratings yet
E.X No.4 Implement Bayesian Networks Aim
2 pages
HussainBadshah SafwanSheikh
No ratings yet
HussainBadshah SafwanSheikh
12 pages
IR Final LabManual
No ratings yet
IR Final LabManual
18 pages
Heart Attack Risk Assessment Model
No ratings yet
Heart Attack Risk Assessment Model
13 pages
IEEE Conference Team ATOM
No ratings yet
IEEE Conference Team ATOM
5 pages
ML Practicals
No ratings yet
ML Practicals
21 pages
Ide To 6 Classification Algorithms
No ratings yet
Ide To 6 Classification Algorithms
34 pages
C2 W4 Lab 02 Tree Ensemble
No ratings yet
C2 W4 Lab 02 Tree Ensemble
10 pages
Heart Failure CETM24
No ratings yet
Heart Failure CETM24
28 pages
Heart Disease Prediction Model: Dissertation
No ratings yet
Heart Disease Prediction Model: Dissertation
4 pages
Ai in HC - 2
No ratings yet
Ai in HC - 2
9 pages
Lab Program 7
No ratings yet
Lab Program 7
5 pages
Heart Disease Prediction
No ratings yet
Heart Disease Prediction
17 pages
ML Lab Program - VTU
No ratings yet
ML Lab Program - VTU
5 pages
ETE 399 Mini Project
No ratings yet
ETE 399 Mini Project
7 pages
03-Supervised Machine Learning Classification
No ratings yet
03-Supervised Machine Learning Classification
33 pages
MLT Lab 07
No ratings yet
MLT Lab 07
4 pages
Dissertation
No ratings yet
Dissertation
41 pages
Heart Disease Prediction Using Feature Selection and Ensemble Learning Techniques
No ratings yet
Heart Disease Prediction Using Feature Selection and Ensemble Learning Techniques
5 pages
Project Report
No ratings yet
Project Report
18 pages
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
No ratings yet
Diagnosis and Prediction of Heart Disease Using Machine Learning Techniques
11 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
6 pages
Web Application
No ratings yet
Web Application
13 pages
Final Research Paper
No ratings yet
Final Research Paper
3 pages
Article Eda
No ratings yet
Article Eda
7 pages
Aiml Programs
No ratings yet
Aiml Programs
12 pages
Heart Disease
No ratings yet
Heart Disease
6 pages
Prediction of Heart Diseases Using Machine Learning
No ratings yet
Prediction of Heart Diseases Using Machine Learning
49 pages
Heart Disease Prediction With Machine Learning Approaches
No ratings yet
Heart Disease Prediction With Machine Learning Approaches
5 pages
Final Year Project
No ratings yet
Final Year Project
57 pages
Base Paper
No ratings yet
Base Paper
12 pages
FP Report - Group 2
No ratings yet
FP Report - Group 2
4 pages
Machine Learning Laboratory (21AIL66)
No ratings yet
Machine Learning Laboratory (21AIL66)
7 pages
Mini Research
No ratings yet
Mini Research
4 pages
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
No ratings yet
Lab Manual - MachineLearningLaboratory-DR - Vaishnavi
71 pages
Batch-2 (Review 2)
No ratings yet
Batch-2 (Review 2)
19 pages
Cse437 4
No ratings yet
Cse437 4
14 pages
Fall of Dhaka
100% (4)
Fall of Dhaka
4 pages
Residential Plots For Sale in Wadakpally - Bheeramguda
No ratings yet
Residential Plots For Sale in Wadakpally - Bheeramguda
2 pages
Tropical Rainforest: Presented by
No ratings yet
Tropical Rainforest: Presented by
30 pages
Data (Prod & Admin) - July 2023 - August
No ratings yet
Data (Prod & Admin) - July 2023 - August
332 pages
Blood Angels Army List (2000)
No ratings yet
Blood Angels Army List (2000)
1 page
Lift Manuals - Manuale Delle Parti - CHASSIS, MAST, OPTIONS & INTERNAL HOSING - PDF Tav 4 Ver
No ratings yet
Lift Manuals - Manuale Delle Parti - CHASSIS, MAST, OPTIONS & INTERNAL HOSING - PDF Tav 4 Ver
3 pages
Module1 - Magnetism
No ratings yet
Module1 - Magnetism
35 pages
Synthetic
No ratings yet
Synthetic
6 pages
A Guidelines For Interviewing For The High School Newspaper
No ratings yet
A Guidelines For Interviewing For The High School Newspaper
4 pages
FSD Material
No ratings yet
FSD Material
122 pages
Registration For JP Morgan Chase & Co Recruitment Drive For 2025 Graduating Batch
No ratings yet
Registration For JP Morgan Chase & Co Recruitment Drive For 2025 Graduating Batch
2 pages
Maroon Black Minimalist Best Genre Movie List Planner
No ratings yet
Maroon Black Minimalist Best Genre Movie List Planner
5 pages
Neighbours Dec 5
No ratings yet
Neighbours Dec 5
10 pages
Home Sweet Compromise
No ratings yet
Home Sweet Compromise
7 pages
Force & Laws of Motion5
No ratings yet
Force & Laws of Motion5
2 pages
Visual Effects (VFX) Market 2034: Forecast & Analysis
No ratings yet
Visual Effects (VFX) Market 2034: Forecast & Analysis
10 pages
Lecture 1a
No ratings yet
Lecture 1a
22 pages
Lovely Professional University: Academic Task - 2 Mittal School of Business
No ratings yet
Lovely Professional University: Academic Task - 2 Mittal School of Business
2 pages
UMTS Call Flow Scenarios Overview
No ratings yet
UMTS Call Flow Scenarios Overview
161 pages
Tutorial Letter 201/1/2018: Organisational Communication
No ratings yet
Tutorial Letter 201/1/2018: Organisational Communication
37 pages
Teach Anyone: Understanding Personality To
No ratings yet
Teach Anyone: Understanding Personality To
18 pages
Ficha Técnica de Balatas-001 Noviembre 2011
No ratings yet
Ficha Técnica de Balatas-001 Noviembre 2011
4 pages
Project Brief 1
No ratings yet
Project Brief 1
2 pages
2022 CALM Permission Form
No ratings yet
2022 CALM Permission Form
2 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Amazing Java: Learn Java Quickly
From Everand
Amazing Java: Learn Java Quickly
Andrei Besedin
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet

ML Lab Experiments (1) - Pages-4

Uploaded by

ML Lab Experiments (1) - Pages-4

Uploaded by

Machine Learning Lab (IT804) Jan-Jun 2021

Title: Heart Disease Databases

13. Heartdisease: It is integer valued from 0 (no presence) to 4.

#read Cleveland Heart Disease data

#display the data

#display the Attributes names and datatyes

#Creat Model- Bayesian Network

#Learning CPDs using Maximum Likelihood Estimators

# Inferencing with Bayesian Network

#computing the Probability of HeartDisease given restecg

#computing the Probability of HeartDisease given cp

Fig: flowchart of EM algorithm

#import matplotlib inline

#colormap = np.array(['red', 'lime', 'black'])

# This is what KMeans thought

# View the results

# Set the size of the plot

# Plot the Original Classifications

# Plot the Models Classifications

# View the results

predY = np.choose(model.labels_, [0, 1, 2]).astype(np.int64)

#colormap = np.array(['red', 'lime', 'black'])

print('The accuracy score of K-Mean: ',sm.accuracy_score(y, model.labels_))

from sklearn import preprocessing

from sklearn.mixture import GaussianMixture

print('The accuracy score of EM: ',sm.accuracy_score(y, y_cluster_gmm))

You might also like