0% found this document useful (0 votes)
24 views55 pages

Module 5-Part 2

Uploaded by

rudrav728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views55 pages

Module 5-Part 2

Uploaded by

rudrav728
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

AMT305 –

INTRODUCTION TO
MACHINE LEARNING
MODULE-5 (UNSUPERVISED LEARNING) ENSEMBLE METHODS,
VOTING, BAGGING, BOOSTING. UNSUPERVISED LEARNING -
CLUSTERING METHODS -SIMILARITY MEASURES, K-MEANS
CLUSTERING, EXPECTATION-MAXIMIZATION FOR SOFT
CLUSTERING, HIERARCHICAL CLUSTERING METHODS , DENSITY
BASED CLUSTERING
2

MODULE 5—PART II
Expectation-Maximization for soft clustering,
Hierarchical Clustering Methods , Density
based clustering

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


3
HIERARCHICAL CLUSTERING
 is a method of cluster analysis which seeks to build a hierarchy of
clusters (or groups) in a given dataset.

 The hierarchical clustering produces clusters in which the clusters at


each level of the hierarchy are created by merging clusters at the next
lower level.

 The decision regarding whether two clusters are to be merged or not


is taken based on the measure of dissimilarity between the clusters.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


4 Hierarchical clustering -Dendrograms

 A dendrogram is a tree diagram used to illustrate the arrangement of the


clusters produced by hierarchical clustering.

 Dendrograms
 Hierarchical clustering can be represented by a rooted binary tree. The
nodes of the trees represent groups or clusters.
 The root node represents the entire data set. The terminal nodes each
represent one of the individual observations (singleton clusters). Each
nonterminal node has two daughter nodes.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


contd,..G
5

Linear combination

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Methods for hierarchical clustering
6
There are two methods for the hierarchical clustering of a
dataset.
These are known as the agglomerative method (or the
bottom-up method) and the divisive method (or, the top-
down method).

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


7 Contd…

Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative


(AGNES)
a
ab

b
abcde

c
cde
d
de

e
divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


8 Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Measures of distance between groups of data points
9

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
10

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


12 Algorithm for agglomerative hierarchical clustering

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Problem-1

The complete-linkage clustering uses the “maximum formula”,


that is, the following formula to compute the distance between two
clusters A and B:
Contd…
14
1. Dataset : {a, b, c, d, e}.
Initial clustering (singleton sets) C1: {a}, {b}, {c}, {d}, {e}.
2. The following table gives the distances between the various clusters in
C1:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
15
 In the above table, the minimum distance is the distance between the
clusters {c} and {e}.

 Also d({c}, {e}) = 2.

 We merge {c} and {e} to form the cluster {c, e}.

 The new set of clusters C2: {a}, {b}, {d}, {c, e}.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
16
 Let us compute the distance of {c, e} from other clusters.

 d({c, e}, {a}) = max{d(c, a), d(e, a)} = max{3, 11} = 11.

 d({c, e}, {b}) = max{d(c, b), d(e, b)} = max{7, 10} = 10.

 d({c, e}, {d}) = max{d(c, d), d(e, d)} = max{9, 8} = 9.

 The following table gives the distances between the various clusters in C2.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
17
 In the above table, the minimum distance is the distance between the
clusters {b} and {d}.

 Also d({b}, {d}) = 5.

 We merge {b} and {d} to form the cluster {b, d}.

 The new set of clusters C3: {a}, {b, d}, {c, e}.

Let us compute the distance of {b, d} from other clusters.


d({b, d}, {a}) = max{d(b, a), d(d, a)} = max{9, 6} = 9.
d({b, d}, {c, e}) = max{d(b, c), d(b, e), d(d, c), d(d, e)} = max{7, 10, 9,
8} = 10.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


18 Contd…

 In the above table, the minimum distance is the distance between the
clusters {a} and {b, d}.

 Also d({a}, {b, d}) = 9.

 We merge {a} and {b, d} to form the cluster {a, b, d}.

 The new set of clusters C4: {a, b, d}, {c, e}


AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara
contd
19
 Only two clusters are left. We merge them form a single cluster
containing all data points.

 We have d({a, b, d}, {c, e}) = max{d(a, c), d(a, e), d(b, c), d(b, e), d(d,
c), d(d, e)}

 = max{3, 11, 7, 10, 9, 8} = 11

Dendrogram for the data given

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


20

The single-linkage clustering uses the “minimum formula”, that is, the
following formula to compute the distance between two clusters A and B:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


21
Solution

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd,,,
22

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


24

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Dendogram for the Hierarchical clustering
25

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


26
Algorithm for divisive hierarchical clustering

Divisive clustering algorithms begin with the entire data set as a single
cluster, and recursively divide one of the existing clusters into two
daughter clusters at each iteration in a top-down fashion
DIANA (DIvisive ANAlysis)

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


27

27 by DEpt. of CSE, CE Kottarakkara


AMT 305 Introduction to Machine Learning,prepared
Contd…
28

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


29

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


30 contd,…

= ¼ (d(a,b)+d(a,c)+d(a,d), d(a,e))= ¼ (9+3+6+11) = 7.25

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
31

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


contd…
32

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


33 Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
34

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


DENSITY BASED CLUSTERING
35
In density-based clustering, clusters are defined as areas of higher density
than the remainder of the data set.
 Objects in these sparse areas - that are required to separate clusters - are
usually considered to be noise and border points
The most popular density based clustering method is DBSCAN (Density-
Based Spatial Clustering of Applications with Noise).
The algorithm grows regions with sufficiently high density into clusters,
and discovers clusters of arbitrary shape in spatial databases with noise.
It defines a cluster as a maximal set of density-connected points

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
36

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
37

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


38
Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


DBSCAN ALGORITHM
39

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


EXPECTATION-MAXIMISATION
40 ALGORITHM (EM ALGORITHM)

The maximum likelihood estimation method (MLE) is a method for


estimating the parameters of a statistical model, given observations
The method attempts to find the parameter values that maximize the
likelihood function, or equivalently the log-likelihood function,
The expectation-maximisation algorithm (sometimes abbreviated as the
EM algorithm) is used to find maximum likelihood estimates of the
parameters of a statistical model

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


41 Contd…

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


42

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
43

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
44

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
45
 Log likelihood with a mixture model

L  | X   log pxt | 
t

 t log  pxt |Gi P Gi 


k

i 1

 Assume hidden variables z, which when known, make optimization


much simpler
 Complete likelihood, Lc(Φ |X,Z), in terms of x and z
 Incomplete likelihood, L(Φ |X), in terms of x

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
46
Iterate the two steps
1. E-step: Estimate z given X and current Φ
2. M-step: Find new Φ’ given z, X, and old Φ.


E - step : Q  | l   E LC  | X, Z | X,  l 
M - step :  l 1  arg max Q  | l 

An increase in Q increases incomplete likelihood

L  l 1 | X   L  l | X 

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


47 EM algorithm for Gaussian Mixtures
The Expectation-Maximization (EM) algorithm is widely used to fit
Gaussian Mixture Models (GMMs).
Gaussian Mixture Models are probabilistic models that assume the data is
generated from a mixture of several Gaussian distributions, each with its
own mean and covariance.
The challenge with GMMs is that we don't know which Gaussian
component generated each data point (this is a latent variable or a hidden
part of the data).

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


48 Gaussian Mixture Model
• Assume we have a dataset X={x1,x2,…,xn}, and we want to fit it using a
mixture of k Gaussians. Each Gaussian has its own mean μk, covariance
Σk, and a mixture weight πk, where:
• μk is the mean of the k-th Gaussian component.
• Σk is the covariance matrix of the k-th Gaussian component.
• πk is the prior probability that a data point comes from the k-th Gaussian
(also called the mixture weight).

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
49
The overall probability density function for the data is given by the
weighted sum of the individual Gaussian components:

where N(xi μk,Σk) is the Gaussian probability density function with mean
μk and covariance Σk, and θ represents all the parameters {πk,μk,Σk}
The EM algorithm will estimate these parameters θ by maximizing the
likelihood of the observed data.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
50 • Steps in the EM Algorithm:
• 1. Initialization
• Start with initial guesses for the parameters
• This can be done randomly or using some clustering method (such as k-
means) to assign initial cluster memberships.
• 2. E-Step (Expectation Step)
• In the E-step, we compute the posterior probability that each data point
xi belongs to each Gaussian component. These probabilities are called
responsibilities, denoted by γ(zik), where:
• zik=1 means data point xi was generated by Gaussian k.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
51 • The responsibility γ(zik) is the probability that the i-th data point belongs
to the k-th Gaussian:

• Here, γ(zik) is the expected membership of the i-th data point in the k-th
Gaussian based on the current estimates of the parameters θ^(t).
3. M-Step (Maximization Step)
In the M-step, we update the parameters πk, μk, and Σk by maximizing the
expected complete-data log-likelihood, which incorporates the
responsibilities γ(zik).

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
52
• The new estimates of the parameters are calculated as follows:
• Update the mixture weights:
• The weight πk^(t+1) is the proportion of data points assigned to the k-th
Gaussian:

• Update the means:


• The mean μk^(t+1) is the weighted average of the data points assigned to the
k-th Gaussian:

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


Contd…
53
Update the covariance matrices:
The covariance matrix Σk^(t+1) is the weighted sum of the squared
differences between the data points and the updated mean μk^(t+1):

4. Iterate
Repeat the E-step and M-step until the parameters θ={πk,μk,Σk}
converge, i.e., when the change in the parameters between iterations is
below a certain threshold or when the log-likelihood stops increasing
significantly.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


54
Applications of the EM Algorithm

1.Clustering: EM is used for clustering in the context of Gaussian


Mixture Models (GMMs), where clusters are modeled as Gaussian
distributions.
2.Missing Data Problems: EM can handle cases where some data
is missing by treating the missing values as latent variables.

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara


55
MODULE 5 – PART II ENDS

“ Wish you all the best dears!!!!”

THANK YOU

AMT 305 Introduction to Machine Learning,prepared by DEpt. of CSE, CE Kottarakkara

You might also like