0% found this document useful (0 votes)

27 views71 pages

ML.5-Clustering Techniques (Week 9)

Uploaded by

Sơn Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views71 pages

ML.5-Clustering Techniques (Week 9)

Uploaded by

Sơn Trịnh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 71

Nhân bản – Phụng sự – Khai phóng

Chapter 5

Clustering Techniques
Machine Learning
CONTENTs

• Clustering Problems

• K-Means

• DBSCAN

• Gaussian Mixtures

Machine Learning 2
CONTENTs

•Clustering Problems
• K-Means

• DBSCAN

• Gaussian Mixtures

Machine Learning 3
Clustering Problem
• Unsupervised learning
• Sometimes the data form clusters, where examples within a cluster are similar to
each other, and examples in different clusters are dissimilar:

• Grouping data points into clusters, with no labels, is called clustering

Machine Learning 4
Clustering Problem

• Assume the data {x(1) , . . . , x(N)} lives in a Euclidean space, x(n) ∈ Rd

• Assume the data belongs to K classes (patterns)
• Assume the data points from same class are similar, i.e. close in Euclidean distance.
How can we identify those classes (data points that belong to each class)?

Machine Learning 5
CONTENTs

• Clustering Problems

•K-Means
• DBSCAN

• Gaussian Mixtures

Machine Learning 6
K-means

• Initialization: randomly initialize cluster centers

• The algorithm iteratively alternates between two steps:
• Assignment step: Assign each data point to the closest cluster
• Refitting step: Move each cluster center to the center of gravity of the data
assigned to it

Machine Learning 7
K-means

• K-means assumes there are K clusters, and each point is close to its cluster center
(the mean of points in the cluster).
• If we knew the cluster assignment we could easily compute means.
• If we knew the means we could easily compute cluster assignment.
Chicken and egg problem!  Can show it is NP hard.
• Very simple (and useful) heuristic - start randomly and alternate between the two!

Machine Learning 8
K-means

Machine Learning 9
K-means

Machine Learning 10
K-means
• Finding the Optimal Number of Clusters

• Bad choices for the number of clusters

Machine Learning 11
K-means
• Finding the Optimal Number of Clusters

• Selecting the number of clusters k using the “elbow rule”

Machine Learning 12
K-means
• Limits of K-Means
• K-Means does not behave very well when the clusters have varying sizes,
• different densities, or non-spherical shapes

K-Means fails to cluster these ellipsoidal blobs properly

Machine Learning 13
CONTENTs

• Clustering Problems

• K-Means

•DBSCAN
• Gaussian Mixtures

Machine Learning 14
DBSCAN
• DBSCAN – Density-Based Spatial Clustering of Applications with Noise
• Core, Border, and Noise points

Machine Learning 15
DBSCAN
• Clusters as continuous regions of high density.
• DBSCAN algorithm:
• For each instance, the algorithm counts how many instances are located
within a small distance ε (epsilon) from it. This region is called the instance’s ε
neighborhood.
• If an instance has at least min_samples instances in its ε-neighborhood
(including itself), then it is considered a core instance.
• All instances in the neighborhood of a core instance belong to the same
cluster.
• Any instance that is not a core instance and does not have one in its
neighborhood is considered an anomaly.

Machine Learning 16
DBSCAN: Algorithm

• Let ClusterCount=0. For every point p:

• 1. If p it is not a core point, assign a null label to it [e.g., zero]
• 2. If p is a core point, a new cluster is formed
[with label ClusterCount:= ClusterCount+1]
• Then find all points density-reachable from p and classify them in the
cluster. [Reassign the zero labels but not the others]
• Repeat this process until all of the points have been visited.
(Since all the zero labels of border points have been reassigned in 2,
the remaining points with zero label are noise).

Machine Learning 17
DBSCAN: Complexity

• Time Complexity: O(n2 )—for each point it has to be determined if it is a

core point, can be reduced to O(n*log(n)) in lower dimensional spaces
by using efficient data structures (n is the number of objects to be
clustered);
• Space Complexity: O(n).

Machine Learning 18
DBSCAN

Machine Learning 19
DBSCAN

• DBSCAN clustering using two different neighborhood radiuses

Machine Learning 20
DBSCAN: Opimal Eps

Machine Learning 21
CONTENTs

• Clustering Problems

• K-Means

• DBSCAN

•Gaussian Mixtures

Machine Learning 22
Gaussian Bayes Classifier Reminder

Machine Learning 23
Predicting wealth from age

Machine Learning 24
Learning modelyear , mpg ---> maker
  21  12   1m 
 
  12  2 2   2m 
Σ=
     
   2 m 
 1m  2 m

Machine Learning 25
  21  12

  1m 

General: O(m2) parameters
  12  2 2   2m 
Σ=
     
   2 m 
 1m  2 m

Machine Learning 26
Aligned: O(m) parameters
  21 0 0  0 0 
 
 0  22 0  0 0 
 
0 0  23  0 0 
Σ=
       
 
 0 0 0   2 m −1 0 
 0   2 m 
 0 0 0

Machine Learning 27
  21 0 0  0 0 
 
 0  22 0  0 0  Aligned: O(m) parameters
 
0 0  23  0 0 
Σ=
       
 
 0 0 0   2 m −1 0 
 0   2 m 
 0 0 0

Machine Learning 28
Spherical: O(1) cov parameters
 2 0 0  0 0 
 
 0 2 0  0 0 
 
0 0 2  0 0 
Σ=
       
 
 0 0 0  2 0 
 0   2 
 0 0 0

Machine Learning 29
 2 0 0  0 0 

 0 2 0  0 0 
 Spherical: O(1) cov parameters
 
0 0 2  0 0 
Σ=
       
 
 0 0 0  2 0 
 0   2 
 0 0 0

Machine Learning 30
Making a Classifier from a Density Estimator

Categorical inputs Real-valued inputs Mixed Real / Cat okay

only only

Joint BC Gauss BC
Predict Dec Tree
Naïve BC
Inputs

Classifier category

Prob- Joint DE Gauss DE

Inputs

Density Naïve DE
ability
Estimator

Predict
Inputs

Regressor real no.

Machine Learning 31
Next… back to Density Estimation

What if we want to do density estimation with multimodal or clumpy data?

Machine Learning 32
The GMM assumption

• There are k components. The

i’th component is called wi
• Component wi has an
associated mean vector mi m2
m1

Machine Learning 33
The GMM assumption

• There are k components. The

i’th component is called wi
• Component wi has an
associated mean vector mi m2
• Each component generates data m1
from a Gaussian with mean mi
and covariance matrix 2I
Assume that each datapoint is m3
generated according to the
following recipe:

Machine Learning 34
The GMM assumption

• There are k components. The

i’th component is called wi
• Component wi has an
associated mean vector mi m2
• Each component generates data
from a Gaussian with mean mi
and covariance matrix 2I
Assume that each datapoint is
generated according to the
following recipe:
1. Pick a component at random.
Choose component i with
probability P(wi).

Machine Learning 35
The GMM assumption

• There are k components. The

i’th component is called wi
• Component wi has an
associated mean vector mi m2
• Each component generates data
x
from a Gaussian with mean mi
and covariance matrix 2I
Assume that each datapoint is
generated according to the
following recipe:
1. Pick a component at random.
Choose component i with
probability P(wi).
2. Datapoint ~ N(mi, 2I )
Machine Learning 36
The General GMM assumption

• There are k components. The

i’th component is called wi
• Component wi has an
associated mean vector mi m2
• Each component generates data m1
from a Gaussian with mean mi
and covariance matrix Si
Assume that each datapoint is m3
generated according to the
following recipe:
1. Pick a component at random.
Choose component i with
probability P(wi).
2. Datapoint ~ N(mi, Si )
Machine Learning 37
Unsupervised Learning: not as hard as it looks

Sometimes easy
IN CASE YOU’RE
WONDERING WHAT THESE
DIAGRAMS ARE, THEY
SHOW 2-d UNLABELED
DATA (X VECTORS)
DISTRIBUTED IN 2-d SPACE.
Sometimes impossible THE TOP ONE HAS THREE
VERY CLEAR GAUSSIAN
CENTERS

and sometimes
in between

Machine Learning 38
Computing likelihoods in unsupervised case
We have x1 , x2 , … xN
We know P(w1) P(w2) .. P(wk)
We know σ

P(x|wi, μi, … μk) = Prob that an observation from class wi would have value x given class
means μ1… μx

Can we write an expression for that?

39
likelihoods in unsupervised case
We have x1 x2 … xn
We have P(w1) .. P(wk). We have σ.
We can define, for any x , P(x|wi , μ1, μ2 .. μk)

Can we define P(x | μ1, μ2 .. μk) ?

Can we define P(x1, x1, .. xn | μ1, μ2 .. μk) ?

[YES, IF WE ASSUME THE X1’S WERE DRAWN INDEPENDENTLY]

40
Unsupervised Learning:Mediumly Good News

We now have a procedure s.t. if you give me a guess at μ1, μ2 .. μk,

I can tell you the prob of the unlabeled data given those μ‘s.

Suppose x‘s are 1-dimensional. (From Duda and Hart)

There are two classes; w1 and w2

P(w1) = 1/3 P(w2) = 2/3 σ=1.
There are 25 unlabeled datapoints
x1 = 0.608
x2 = -1.590
x3 = 0.235
x4 = 3.949
:
x25 = -0.712

Machine Learning 41
Duda & Hart’s Example
Graph of
log P(x1, x2 .. x25 | μ1, μ2 )
against μ1 (→) and μ2 ()

Max likelihood = (μ1 =-2.13, μ2 =1.668)

Local minimum, but very close to global at (μ1 =2.085, μ2 =-1.257)*
* corresponds to switching w1 + w2.
Machine Learning 42
Duda & Hart’s Example
We can graph the
prob. dist. function
of data given our
μ1 and μ2
estimates.

We can also graph the

true function from
which the data was
randomly generated.

• They are close. Good.

• The 2nd solution tries to put the “2/3” hump where the “1/3” hump should
go, and vice versa.
• In this example unsupervised is almost as good as supervised. If the x1 ..
x25 are given the class which was used to learn them, then the results are
(μ1=-2.176, μ2=1.684). Unsupervised got (μ1=-2.13, μ2=1.668).

Machine Learning 43
Finding the max likelihood μ1,μ2..μk
We can compute P( data | μ1,μ2..μk)
How do we find the μi‘s which give max. likelihood?

• The normal max likelihood trick:

Set log Prob (….) = 0
μi
and solve for μi‘s.
# Here you get non-linear non-analytically- solvable equations
• Use gradient descent
Slow but doable
• Use a much faster, cuter, and recently very popular method…

44
Expectation Maximalization

Machine Learning 45
The E.M. Algorithm
• We’ll get back to unsupervised learning soon.
• But now we’ll look at an even simpler case with hidden information.
• The EM algorithm
❑ Can do trivial things, such as the contents of the next few slides.
❑ An excellent way of doing our unsupervised learning problem, as we’ll see.
❑ Many, many other uses, including inference of Hidden Markov Models (future
lecture).

46
Silly Example
Let events be “grades in a class”
w1 = Gets an A P(A) = ½
w2 = Gets a B P(B) = μ
w3 = Gets a C P(C) = 2μ
w4 = Gets a D P(D) = ½-3μ
(Note 0 ≤ μ ≤1/6)
Assume we want to estimate μ from data. In a given class there were
a A’s
b B’s
c C’s
d D’s
What’s the maximum likelihood estimate of μ given a,b,c,d ?

47
Silly Example
Let events be “grades in a class”
w1 = Gets an A P(A) = ½
w2 = Gets a B P(B) = μ
w3 = Gets a C P(C) = 2μ
w4 = Gets a D P(D) = ½-3μ
(Note 0 ≤ μ ≤1/6)
Assume we want to estimate μ from data. In a given class there were
a A’s
b B’s
c C’s
d D’s
What’s the maximum likelihood estimate of μ given a,b,c,d ?

48
Trivial Statistics
P(A) = ½ P(B) = μ P(C) = 2μ P(D) = ½-3μ
P( a,b,c,d | μ) = K(½)a(μ)b(2μ)c(½-3μ)d
log P( a,b,c,d | μ) = log K + alog ½ + blog μ + clog 2μ + dlog (½-3μ)

LogP
FOR MAX LIKE μ, SET =0
μ
LogP b 2c 3d
= + − =0
μ μ 2μ 1 / 2 − 3μ
b+c
Gives max like μ =
6(b + c + d )
So if class got
A B C D
14 6 9 10
1
Max like μ =
10

Machine Learning 49
Same Problem with Hidden Information
REMEMBER
Someone tells us that P(A) = ½
Number of High grades (A’s + B’s) = h P(B) = μ
Number of C’s =c P(C) = 2μ

Number of D’s =d P(D) = ½-3μ

What is the max. like estimate of μ now?

Machine Learning 50
Same Problem with Hidden Information
REMEMBER
Someone tells us that P(A) = ½
Number of High grades (A’s + B’s) = h P(B) = μ
Number of C’s =c P(C) = 2μ

Number of D’s =d P(D) = ½-3μ

What is the max. like estimate of μ now?

We can answer this question circularly:
EXPECTATION If we know the value of μ we could compute the expected value of a and b

1
2 h μ
Since the ratio a:b should be the same as the ratio ½ : m a= b= h
1 +μ 1 +μ
2 2
MAXIMIZATION
If we know the expected values of a and b we could compute the
maximum likelihood value of μ b+c
μ =
6(b + c + d )
Machine Learning 51
E.M. for our Trivial Problem
REMEMBER
We begin with a guess for μ P(A) = ½
We iterate between EXPECTATION and MAXIMALIZATION to improve our estimates ofP(B) = μ a and b.
μ and
P(C) = 2μ

Define μ(t) the estimate of μ on the t’th iteration P(D) = ½-3μ

b(t) the estimate of b on t’th iteration

μ (0) = initial guess

μ(t)h E-step
b(t ) = = b | μ (t )
1 + μ (t )
2
b(t ) + c
μ (t + 1) = M-step
6(b(t ) + c + d )
= max like est of μ given b(t )
Continue iterating until converged.
Good news: Converging to local optimum is assured.
Bad news: I said “local” optimum.
Machine Learning 52
E.M. Convergence
• Convergence proof based on fact that Prob(data | μ) must increase or remain same between each
iteration [NOT OBVIOUS]
• But it can never exceed 1 [OBVIOUS]
So it must therefore converge [OBVIOUS]

In our example, suppose we had t μ(t) b(t)

h = 20
c = 10 0 0 0
d = 10
μ(0) = 0 1 0.0833 2.857
2 0.0937 3.158
3 0.0947 3.185
4 0.0948 3.187
Convergence is generally linear: error decreases by a constant
factor each time step. 5 0.0948 3.187
6 0.0948 3.187
Machine Learning 53
Back to Unsupervised Learning of GMMs
Remember:
We have unlabeled data x1 x2 … xR
We know there are k classes
We know P(w1) P(w2) P(w3) … P(wk)
We don’t know μ1 μ2 .. μk

We can write P( data | μ1…. μk)

= p(x1...xR μ1...μ k )

=  p(xi μ1...μ k )
R

i =1

( )
=  p xi w j , μ1...μ k P(w j )
R k

i =1 j =1

=  K exp − 2 (xi − μ j ) P(w j )

R k
 1 2

i =1 j =1  2σ 

Machine Learning 54
E.M. for GMMs


For Max likelihood we know log Pr ob (data μ1...μ k ) = 0
μ i
Some wild' n' crazy algebra turns this into : " For Max likelihood, for each j,

 P(w xi , μ1...μ k ) xi
R

j
μj = i =1

 P(w xi , μ1...μ k )
R

j
i =1

This is n nonlinear equations in μj’s.”

If, for each xi we knew that for each wj the prob that μj was in class wj is
P(wj|xi,μ1…μk) Then… we would easily compute μj.

If we knew each μj then we could easily compute P(wj|xi,μ1…μk) for each wj and xi.

…I feel an EM experience coming on!!

55
E.M. for GMMs
Iterate. On the t’th iteration let our estimates be
lt = { μ1(t), μ2(t) … μc(t) }

E-step
Compute “expected” classes of all datapoints for each class Just evaluate a
Gaussian at xk

P(wi xk , lt ) =
p(xk wi , lt )P(wi lt )
=
( )
p xk wi , m i (t ), 2I pi (t )
p(xk lt )
( )
c

M-step. j =1
 k j j
p x w , m (t ),  2
I p j (t )

Compute Max. like μ given our data’s class membership distributions

 P(w x , l ) x
i k t k
μ (t + 1) = k

 P(w x , l )
i
i k t
k

Machine Learning 56
E.M. Convergence
• This algorithm is REALLY USED. And in high dimensional state spaces, too. E.G.
Vector Quantization for Speech Data

• Your lecturer will

(unless out of
time) give you a
nice intuitive
explanation of
why this rule
works.
• As with all EM
procedures,
convergence to a
local optimum
guaranteed.
57
E.M.p (t)for General
is shorthand
i for GMMs
estimate of P(wi) on
Iterate. On the t’th iteration let our estimates be
t’th iteration
lt = { μ1(t), μ2(t) … μc(t), S1(t), S2(t) … Sc(t), p1(t), p2(t) … pc(t) }

E-step
Just evaluate a
Compute “expected” classes of all datapoints for each class
Gaussian at xk

p(xk wi , lt )P(wi lt ) p(xk wi , m i (t ), S i (t ) ) pi (t )

P(wi xk , lt ) = =
p(xk lt )
 p(x )
c

k w j , m j (t ), S j (t ) p j (t )
M-step. j =1

Compute Max. like μ given our data’s class membership distributions

 P(w x , l ) x  P(wi xk , lt ) xk − mi (t + 1)xk − mi (t + 1)

S i (t + 1) =
i k t k
μ (t + 1) =
k

 P(w x , l )
k

 P(w x , l )
i
i k t i k t
k k

 P(w i xk , lt )
pi (t + 1) = k
R = #records
R
Machine Learning 58
Gaussian Mixture Example: Start

Advance apologies: in Black and

White this example will be
incomprehensible

Machine Learning 59
After first iteration

Machine Learning 60
After 2nd iteration

Machine Learning 61
After 3rd iteration

Machine Learning 62
After 4th iteration

Machine Learning 63
After 5th iteration

Machine Learning 64
After 6th iteration

Machine Learning 65
After 20th iteration

Machine Learning 66
Some Bio Assay data

Machine Learning 67
GMM clustering of the assay data

Machine Learning 68
Resulting Density Estimator

Machine Learning 69
SUMMARY

• Clustering Problems

• K-Means

• DBSCAN

• Gaussian Mixtures

Machine Learning 70
Nhân bản – Phụng sự – Khai phóng

Enjoy the Course…!

Machine Learning 71

Pattern Recognition - Theodoridis Koutroumbas
No ratings yet
Pattern Recognition - Theodoridis Koutroumbas
641 pages
Classification (NaiveBayes KNN SVM DecisionTrees)
No ratings yet
Classification (NaiveBayes KNN SVM DecisionTrees)
105 pages
Outline: Three Basic Algorithms
No ratings yet
Outline: Three Basic Algorithms
34 pages
Chapter 04
No ratings yet
Chapter 04
42 pages
AI Unit 5
No ratings yet
AI Unit 5
103 pages
Mod09-ppt2-ML in Image Classification
No ratings yet
Mod09-ppt2-ML in Image Classification
30 pages
ML RUSA Module 6 Probablistic EM KNN SVM
No ratings yet
ML RUSA Module 6 Probablistic EM KNN SVM
51 pages
MLLecture 1
No ratings yet
MLLecture 1
56 pages
04 - KMeans Clustering
No ratings yet
04 - KMeans Clustering
56 pages
ML Lecture#04
No ratings yet
ML Lecture#04
40 pages
01 Clustering
No ratings yet
01 Clustering
43 pages
Lec 11
No ratings yet
Lec 11
57 pages
EM and Kmeans Relations
No ratings yet
EM and Kmeans Relations
70 pages
Aiml Prof
No ratings yet
Aiml Prof
8 pages
Clustering
No ratings yet
Clustering
82 pages
Clustering
No ratings yet
Clustering
75 pages
Week 5 v1.1 - Unsupervised Learning
No ratings yet
Week 5 v1.1 - Unsupervised Learning
40 pages
ML Mod 5
No ratings yet
ML Mod 5
47 pages
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
100% (1)
Introduction To Basics of Machine Learning Algorithms: Pankaj Oli
13 pages
Lec09 Clustering
No ratings yet
Lec09 Clustering
27 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
Machine Learning With Matlab
100% (1)
Machine Learning With Matlab
36 pages
CH 7
No ratings yet
CH 7
33 pages
ML Columbia PDF
No ratings yet
ML Columbia PDF
615 pages
Datamining Lect12
No ratings yet
Datamining Lect12
75 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
ML Unit3
No ratings yet
ML Unit3
21 pages
I2ml3e Chap8
No ratings yet
I2ml3e Chap8
28 pages
Datamining Lect7knearst
No ratings yet
Datamining Lect7knearst
62 pages
ML - Unit - 4 - Part Ii
No ratings yet
ML - Unit - 4 - Part Ii
79 pages
Pattern Analysis-Machine Learning
No ratings yet
Pattern Analysis-Machine Learning
74 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
M146 Lec14 Sidenotes S25
No ratings yet
M146 Lec14 Sidenotes S25
33 pages
ML Lecture06 Unsupervised Learning
No ratings yet
ML Lecture06 Unsupervised Learning
87 pages
Fam Question Bank CT
No ratings yet
Fam Question Bank CT
14 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
Session 5
No ratings yet
Session 5
36 pages
Assignment 2
No ratings yet
Assignment 2
8 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
WS - Data Analytics Fundamental-R
No ratings yet
WS - Data Analytics Fundamental-R
51 pages
Pattern Revision
No ratings yet
Pattern Revision
63 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Machine Learning
No ratings yet
Machine Learning
33 pages
Predict Classify Cluster
No ratings yet
Predict Classify Cluster
12 pages
Lecture 12 - Unsupervised Learning - Shoould Be Marged
No ratings yet
Lecture 12 - Unsupervised Learning - Shoould Be Marged
31 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Unit 2 - SVM
No ratings yet
Unit 2 - SVM
137 pages
Topic: Machine Learning
No ratings yet
Topic: Machine Learning
35 pages
Lec 04
No ratings yet
Lec 04
70 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
DAC ML Tutorial Final Deck
No ratings yet
DAC ML Tutorial Final Deck
150 pages
Data Mining Lecture 10B: Classification
No ratings yet
Data Mining Lecture 10B: Classification
62 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Accelerated Data Science Introduction To Machine Learning Algorithms
No ratings yet
Accelerated Data Science Introduction To Machine Learning Algorithms
37 pages
Medical Imabmnge Analysis
No ratings yet
Medical Imabmnge Analysis
41 pages
Saso Iso 17089 2 2020 e
No ratings yet
Saso Iso 17089 2 2020 e
45 pages
Project Report 2
No ratings yet
Project Report 2
11 pages
Ud Module 4
No ratings yet
Ud Module 4
105 pages
Design of Heat Exchangers Using Aspen EDR
No ratings yet
Design of Heat Exchangers Using Aspen EDR
7 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Lectures On Divergent Series (Emile Borel)
No ratings yet
Lectures On Divergent Series (Emile Borel)
129 pages
15-Nguyen Van Thin-Bai Bao28!3!2007
No ratings yet
15-Nguyen Van Thin-Bai Bao28!3!2007
8 pages
(1964) Correlations in The Motions of Atoms in Liquid Argon
100% (1)
(1964) Correlations in The Motions of Atoms in Liquid Argon
7 pages
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
No ratings yet
Department of Electrical Engineering: M.B.M Engineering College, Jodhpur
16 pages
Short Essay On Abraham Lincoln
100% (2)
Short Essay On Abraham Lincoln
3 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
Essay On Greenhouse Effect
100% (2)
Essay On Greenhouse Effect
3 pages
The Numpy Pocketbook: Essentials on the Go
From Everand
The Numpy Pocketbook: Essentials on the Go
Silas Meadowlark
No ratings yet
BASIC CBLM9 Work in A Diverse Environment
100% (2)
BASIC CBLM9 Work in A Diverse Environment
55 pages
ML Labs
No ratings yet
ML Labs
46 pages
1-Python Algebra Maths
No ratings yet
1-Python Algebra Maths
26 pages
Avalanche Formation and Characteristics
No ratings yet
Avalanche Formation and Characteristics
13 pages
Impact of Colonialism On Africa and Its Economic Development
No ratings yet
Impact of Colonialism On Africa and Its Economic Development
8 pages
Chapter 6-Leading
No ratings yet
Chapter 6-Leading
27 pages
CE118 Project Part 1
No ratings yet
CE118 Project Part 1
42 pages
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
100% (3)
Full Download Electromagnetic Waves and Lasers Second Edition Kimura Wayne D PDF
49 pages
Chapter 9&10 Prepraration of Consumer Behavior
No ratings yet
Chapter 9&10 Prepraration of Consumer Behavior
90 pages
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
No ratings yet
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
41 pages
ML.1-Overview of ML (Week 1)
No ratings yet
ML.1-Overview of ML (Week 1)
24 pages
ABYIP 2025.docx (Pungsod)
No ratings yet
ABYIP 2025.docx (Pungsod)
10 pages
Complex Thought FINAL
No ratings yet
Complex Thought FINAL
25 pages
Revision For Gifted Student
No ratings yet
Revision For Gifted Student
6 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
FFBL FML FPCL Answer Key
No ratings yet
FFBL FML FPCL Answer Key
19 pages
Sentusys™ Intelligent Tube System: Michael Andersson & Erika Hedblom - Sandvik Materials Technology
No ratings yet
Sentusys™ Intelligent Tube System: Michael Andersson & Erika Hedblom - Sandvik Materials Technology
19 pages
ML.0-Introduction To ML Course
No ratings yet
ML.0-Introduction To ML Course
7 pages
The Use of Smart Materials in Building Design
No ratings yet
The Use of Smart Materials in Building Design
5 pages
Section 1-Short Cantilever ST
No ratings yet
Section 1-Short Cantilever ST
5 pages
Raghuvaran 2020 IOP Conf. Ser. Mater. Sci. Eng. 995 012040
No ratings yet
Raghuvaran 2020 IOP Conf. Ser. Mater. Sci. Eng. 995 012040
9 pages
Proposition
No ratings yet
Proposition
6 pages
Answer Sheets
No ratings yet
Answer Sheets
4 pages
BNAD 277 Tableau Assignment
No ratings yet
BNAD 277 Tableau Assignment
1 page
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
No ratings yet
Angela Ales Bello The Divine in Husserl and Other Explorations 1st Edition Angela Ales Bello Auth Instant Download
29 pages

ML.5-Clustering Techniques (Week 9)

Uploaded by

ML.5-Clustering Techniques (Week 9)

Uploaded by

Nhân bản – Phụng sự – Khai phóng

• Grouping data points into clusters, with no labels, is called clustering

• Assume the data {x(1) , . . . , x(N)} lives in a Euclidean space, x(n) ∈ Rd

• Initialization: randomly initialize cluster centers

• Bad choices for the number of clusters

• Selecting the number of clusters k using the “elbow rule”

K-Means fails to cluster these ellipsoidal blobs properly

• Let ClusterCount=0. For every point p:

• Time Complexity: O(n2 )—for each point it has to be determined if it is a

• DBSCAN clustering using two different neighborhood radiuses

Categorical inputs Real-valued inputs Mixed Real / Cat okay

Prob- Joint DE Gauss DE

Regressor real no.

What if we want to do density estimation with multimodal or clumpy data?

• There are k components. The

• There are k components. The

• There are k components. The

• There are k components. The

• There are k components. The

Can we write an expression for that?

Can we define P(x | μ1, μ2 .. μk) ?

Can we define P(x1, x1, .. xn | μ1, μ2 .. μk) ?

[YES, IF WE ASSUME THE X1’S WERE DRAWN INDEPENDENTLY]

We now have a procedure s.t. if you give me a guess at μ1, μ2 .. μk,

Suppose x‘s are 1-dimensional. (From Duda and Hart)

There are two classes; w1 and w2

Max likelihood = (μ1 =-2.13, μ2 =1.668)

We can also graph the

• They are close. Good.

• The normal max likelihood trick:

Number of D’s =d P(D) = ½-3μ

What is the max. like estimate of μ now?

Number of D’s =d P(D) = ½-3μ

What is the max. like estimate of μ now?

Define μ(t) the estimate of μ on the t’th iteration P(D) = ½-3μ

b(t) the estimate of b on t’th iteration

μ (0) = initial guess

In our example, suppose we had t μ(t) b(t)

We can write P( data | μ1…. μk)

=  K exp − 2 (xi − μ j ) P(w j )

This is n nonlinear equations in μj’s.”

…I feel an EM experience coming on!!

Compute Max. like μ given our data’s class membership distributions

• Your lecturer will

p(xk wi , lt )P(wi lt ) p(xk wi , m i (t ), S i (t ) ) pi (t )

Compute Max. like μ given our data’s class membership distributions

 P(w x , l ) x  P(wi xk , lt ) xk − mi (t + 1)xk − mi (t + 1)

Advance apologies: in Black and

Enjoy the Course…!

You might also like