AI Chapter 3 Part 5
AI Chapter 3 Part 5
Institute of Technology
University of Gondar
Biomedical Engineering Department
Outlines: -
» Clustering
o K means clustering
o Hierarchical clustering
2
Clustering
» Desirable(Good Clustering):
o Higher inter-cluster separation
» Distance Measure will determine the similarity between two elements and it will
influences the shape of the clusters.
𝑑= (𝑞𝑖 − 𝑝𝑖 )2
𝑖=1
K-Means Clustering-Distance Measure
𝑑 = 𝑞𝑥 − 𝑝𝑥 + 𝑞𝑦 − 𝑞𝑦
𝑖=1
𝑑 = (𝑞𝑖 − 𝑝𝑖 )2
𝑖=1
7
How does K-Means algorithm Works?
8
K-Means Clustering: Steps
4. Next the centroids are calculated again, when we have our new
cluster(Mean of Elements)
Elbow Method:
» The elbow method is to run K-Means clustering on datasets where ‘K’ is referred as number of
clusters.
» The sum of squared error is defined as the sum of the squared distances between each member of
the cluster and its centroid :𝑊𝑆𝑆 = σ𝑚
𝑖=1 𝑥𝑖 − 𝑐𝑖
2
We can see a very slow change in the values of WSS after k=2 ,
so you should take that elbow point value as the final number
of clusters
Example Problem
» Cluster the following eight points (with (x, y) representing locations) into three clusters :
» The distance function between two points Aj=(x1, y1) and Ci=(x2, y2) is defined as:
» Use k-means algorithm to find optimal centroids to group the given data into three clusters.
Iteration 1
First we list all points in the first column of the table below. The initial cluster centers -
centroids, are (2, 10), (8,4) and (1, 2) - chosen randomly.
Next, we will calculate the distance from each points to each of the three centroids,
by using the distance function: dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
» Starting from point A1 calculate the distance to each of the three means, by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |8 – 2| + |4 – 10| = 6 + 6 = 12
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
o Fill these values in the table & decide which cluster should the point (2, 10) be placed in? The one, where
the point has the shortest distance to the mean – i.e. mean 1 (cluster 1), since the distance is 0.
» Analogically, we fill in the rest of the table, and place each point in one of the clusters
Iteration 1
» Next, we need to re-compute the new cluster centers. We do so, by taking the mean of all
points in each cluster.
» For Cluster 1, we have three points and needs to take average of them as new centroid, i.e.
((2+5+4)/3, (10+8+9)/3) = (3.67, 9)
» For Cluster 2, we have three points. The new centroid is:
((8+7+6)/3, (4+5+4)/3 ) = (7, 4.33)
» For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
» Since centroids changes in Iteration1 (epoch1), we go to the next Iteration (epoch2) using the
new means we computed.
o The iteration continues until the centroids do not change anymore..
Second epoch
» Using the new centroid compute cluster members again.
Data Points Cluster 1 with Cluster 2 with Cluster 3 with Cluster
centroid (3.67, 9) centroid centroid (1.5,
(7, 4.33) 3.5)
A1 (2, 10) 2.67 10.67 7 C1
A2 (2, 5) 5.67 5.67 2 C3
A3 (8, 4) C2
A4 (5, 8) C1
A5 (7, 5) C2
A6 (6, 4) C2
A7 (1, 2) C3
A8 (4, 9) C1
16
Pros and Cons: K-Means Clustering
Pros:
o Simple and understandable
Cons:
o Must define number of clusters
o Hard-cluster
17
Fuzzy C-Means Clustering
18
Fuzzy C-Means Clustering
σ𝑛 𝑚
𝑘=1(𝜇𝑖𝑘 ) 𝑥𝑘
2. Cluster centres 𝑐𝑖 = σ𝑛 𝑚
𝑘=1 𝜇𝑖𝑘
1
3. Update memberships 𝜇𝑖𝑘 = 2
𝑑𝑖𝑘 𝑚−1
σ𝑀
𝑗=1 𝑑𝑗𝑘
Pros:
» Allows a data point to be in multiple clusters
Cons:
» Need to define C, the number of clusters
21
Agglomerative: Hierarchical Clustering
» Top down approach: begin with the whole set and processed to divide
it into successively smaller clusters.
o Start with all sample units in a single cluster of size n.
o Then, at each step of the algorithm, clusters are partitioned into a pair of
clusters, selected to maximize the distance between each daughter.
o The algorithm stops when sample units are partitioned into n clusters of
size 1
23
Dendrogram: Shows How the Clusters are Merged/Splitted
24
Pros and Cons: Hierarchical Clustering
Pros:
Cons:
25
Cluster Evaluation
How do we know the clusters are valid? or, at least, good enough?
Indirect Evaluation:
» In some applications, clustering is not the primary task, but used to help perform
another task.
» We can use the performance on the primary task to compare clustering methods.
28
Assignment 2
1. Write python algorithm for SVM, Clustering and value based machine
learning methods.
» Reinforcement learning
» Expert system