Module 5-Part 2
Module 5-Part 2
INTRODUCTION TO
MACHINE LEARNING
MODULE-5 (UNSUPERVISED LEARNING) ENSEMBLE METHODS,
VOTING, BAGGING, BOOSTING. UNSUPERVISED LEARNING -
CLUSTERING METHODS -SIMILARITY MEASURES, K-MEANS
CLUSTERING, EXPECTATION-MAXIMIZATION FOR SOFT
CLUSTERING, HIERARCHICAL CLUSTERING METHODS , DENSITY
BASED CLUSTERING
2
MODULE 5—PART II
Expectation-Maximization for soft clustering,
Hierarchical Clustering Methods , Density
based clustering
Dendrograms
Hierarchical clustering can be represented by a rooted binary tree. The
nodes of the trees represent groups or clusters.
The root node represents the entire data set. The terminal nodes each
represent one of the individual observations (singleton clusters). Each
nonterminal node has two daughter nodes.
Linear combination
b
abcde
c
cde
d
de
e
divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0
The new set of clusters C2: {a}, {b}, {d}, {c, e}.
d({c, e}, {a}) = max{d(c, a), d(e, a)} = max{3, 11} = 11.
d({c, e}, {b}) = max{d(c, b), d(e, b)} = max{7, 10} = 10.
The following table gives the distances between the various clusters in C2.
The new set of clusters C3: {a}, {b, d}, {c, e}.
In the above table, the minimum distance is the distance between the
clusters {a} and {b, d}.
We have d({a, b, d}, {c, e}) = max{d(a, c), d(a, e), d(b, c), d(b, e), d(d,
c), d(d, e)}
The single-linkage clustering uses the “minimum formula”, that is, the
following formula to compute the distance between two clusters A and B:
Divisive clustering algorithms begin with the entire data set as a single
cluster, and recursively divide one of the existing clusters into two
daughter clusters at each iteration in a top-down fashion
DIANA (DIvisive ANAlysis)
L | X log pxt |
t
i 1
E - step : Q | l E LC | X, Z | X, l
M - step : l 1 arg max Q | l
L l 1 | X L l | X
where N(xi μk,Σk) is the Gaussian probability density function with mean
μk and covariance Σk, and θ represents all the parameters {πk,μk,Σk}
The EM algorithm will estimate these parameters θ by maximizing the
likelihood of the observed data.
• Here, γ(zik) is the expected membership of the i-th data point in the k-th
Gaussian based on the current estimates of the parameters θ^(t).
3. M-Step (Maximization Step)
In the M-step, we update the parameters πk, μk, and Σk by maximizing the
expected complete-data log-likelihood, which incorporates the
responsibilities γ(zik).
4. Iterate
Repeat the E-step and M-step until the parameters θ={πk,μk,Σk}
converge, i.e., when the change in the parameters between iterations is
below a certain threshold or when the log-likelihood stops increasing
significantly.
THANK YOU