ML Lab Experiments (1) - Pages-4
ML Lab Experiments (1) - Pages-4
Experiment No: 7
Objective: Write a program to construct a Bayesian network considering medical data. Use
this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set.
You can use Java/Python ML library classes/API.
Description:
A Bayesian network is a directed acyclic graph in which each edge corresponds to a conditional
dependency, and each node corresponds to a unique random variable.
Bayesian network consists of two major parts: a directed acyclic graph and a set of conditional
probability distributions
The directed acyclic graph is a set of random variables represented by nodes.
The conditional probability distribution of a node (random variable) is defined for every
possible outcome of the preceding causal node(s).
For illustration, consider the following example. Suppose we attempt to turn on our computer,
but the computer does not start (observation/evidence). We would like to know which of the
possible causes of computer failure is more likely. In this simplified illustration, we assume only
two possible causes of this misfortune: electricity failure and computer malfunction.
The corresponding directed acyclic graph is depicted in below figure.
Fig: Directed acyclic graph representing two independent possible causes of a computer failure.
The goal is to calculate the posterior conditional probability distribution of each of the possible
unobserved causes given the observed evidence, i.e. P [Cause | Evidence].
Data Set:
13
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Database: 0 1 2 3 4 Total
Cleveland: 164 55 36 35 13 303
Attribute Information:
1. age: age in years
2. sex: sex (1 = male; 0 = female)
3. cp: chest pain type
Value 1: typical angina
Value 2: atypical angina
Value 3: non-anginal pain
Value 4: asymptomatic
4. trestbps: resting blood pressure (in mm Hg on admission to the hospital)
5. chol: serum cholestoral in mg/dl
6. fbs: (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
7. restecg: resting electrocardiographic results
Value 0: normal
Value 1: having ST-T wave abnormality (T wave inversions and/or ST elevation or
depression of > 0.05 mV)
Value 2: showing probable or definite left ventricular hypertrophy by Estes' criteria
8. thalach: maximum heart rate achieved
9. exang: exercise induced angina (1 = yes; 0 = no)
10. oldpeak = ST depression induced by exercise relative to rest
11. slope: the slope of the peak exercise ST segment
Value 1: upsloping
Value 2: flat
Value 3: downsloping
12. thal: 3 = normal; 6 = fixed defect; 7 = reversable defect
14
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Program:
import numpy as np
import pandas as pd
import csv
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.models import BayesianModel
from pgmpy.inference import VariableElimination
15
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Output:
16
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
17
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Experiment No: 8
Objective: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same
data set for clustering using k-Means algorithm. Compare the results of these two algorithms and
comment on the quality of clustering. You can add Java/Python ML library classes/API in the
program.
Description:
The expectation-maximization algorithm is an approach for performing maximum likelihood
estimation in the presence of latent variables. It does this by first estimating the values for the
latent variables, then optimizing the model, then repeating these two steps until convergence. It
is an effective and general approach and is most commonly used for density estimation with
missing data, such as clustering algorithms like the Gaussian Mixture Model.
Algorithm:
1. Given a set of incomplete data, consider a set of starting parameters.
2. Expectation step (E – step): Using the observed available data of the dataset, estimate
(guess) the values of the missing data.
3. Maximization step (M – step): Complete data generated after the expectation (E) step is
used in order to update the parameters.
4. Repeat step 2 and step 3 until convergence.
The essence of Expectation-Maximization algorithm is to use the available observed data of the
dataset to estimate the missing data and then using that data to update the values of the
parameters.
Initially, a set of initial values of the parameters are considered. A set of incomplete observed
data is given to the system with the assumption that the observed data comes from a specific
model.
The next step is known as “Expectation” – step or E-step. In this step, we use the observed data
in order to estimate or guess the values of the missing or incomplete data. It is basically used to
update the variables.
The next step is known as “Maximization”-step or M-step. In this step, we use the complete data
generated in the preceding “Expectation” – step in order to update the values of the parameters.
It is basically used to update the hypothesis.
18
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
Now, in the fourth step, it is checked whether the values are converging or not, if yes, then stop
otherwise repeat step-2 and step-3 i.e. “Expectation” – step and “Maximization” – step until the
convergence occurs.
Program:
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.cluster import KMeans
import sklearn.metrics as sm
import pandas as pd
import numpy as np
# K Means Cluster
19
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
model = KMeans(n_clusters=3)
model.fit(X)
# Create a colormap
colormap = np.array(['red', 'lime', 'black'])
20
Laboratory File
Machine Learning Lab (IT804) Jan-Jun 2021
y_cluster_gmm = gmm.predict(xs)
#y_cluster_gmm
plt.subplot(2, 2, 3)
plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s=40)
plt.title('GMM Classification')
[[1 ,0, 0, 0]
[0 ,0, 1, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]
[1 ,0, 0, 0]]
22
Laboratory File