0% found this document useful (0 votes)
23 views30 pages

AI Chapter 3 Part 5

The document discusses different types of clustering algorithms. It outlines K-means clustering and fuzzy C-means clustering. For K-means clustering, it describes how the algorithm works, including initializing centroids, assigning points to clusters, calculating new centroids, and iterating until convergence. It also discusses pros and cons of K-means. For fuzzy C-means, it provides an overview and describes how the algorithm calculates and updates membership and cluster centers.

Uploaded by

biruck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views30 pages

AI Chapter 3 Part 5

The document discusses different types of clustering algorithms. It outlines K-means clustering and fuzzy C-means clustering. For K-means clustering, it describes how the algorithm works, including initializing centroids, assigning points to clusters, calculating new centroids, and iterating until convergence. It also discusses pros and cons of K-means. For fuzzy C-means, it provides an overview and describes how the algorithm calculates and updates membership and cluster centers.

Uploaded by

biruck
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Artificial Intelligence

Institute of Technology
University of Gondar
Biomedical Engineering Department

By Ewunate Assaye (MSc.)


Supervised Learning

Outlines: -
» Clustering
o K means clustering

o Fuzzy C means clustering

o Hierarchical clustering

2
Clustering

» Clustering is the process of dividing the datasets into groups,


consisting of similar data-points.

» Grouping of objects based on the information found in the data,


describing the objects or their relationship.

» The goal of clustering is to determine the intrinsic grouping in a set


of unlabeled(i.e. unsupervised) data.

» Desirable(Good Clustering):
o Higher inter-cluster separation

o Higher intra-cluster homogeneity


Type of Clustering
K-Means Clustering-Partition Clustering

» Objects are classified into a predefined number of


groups:
o as much dissimilar as possible from one group to
another group

o as much similar as possible within each group.


K-Means Clustering-Distance Measure

» Distance Measure will determine the similarity between two elements and it will
influences the shape of the clusters.

» Types of distance Measure


o Euclidean distance measure
✓ the ordinary straight line which is the distance between two points in Euclidean spaces.

𝑑= ෍(𝑞𝑖 − 𝑝𝑖 )2
𝑖=1
K-Means Clustering-Distance Measure

» Manhattan Distance measure


✓ It is the simple sum of the horizontal and vertical components or the distances between
two points measured along axes at right angles.

𝑑 = ෍ 𝑞𝑥 − 𝑝𝑥 + 𝑞𝑦 − 𝑞𝑦
𝑖=1

» Squared Euclidean distance measure


✓ Uses the same equation as the Euclidean distances metric, but does not take the square
root.

𝑑 = ෍(𝑞𝑖 − 𝑝𝑖 )2
𝑖=1
7
How does K-Means algorithm Works?

8
K-Means Clustering: Steps

1. Decide the number of clusters to be made.

2. Provide centroids of all the clusters. (Guessing)

3. The algorithm calculates Euclidian distances of the points from


each centroid and assigns the point to the closest cluster.

4. Next the centroids are calculated again, when we have our new
cluster(Mean of Elements)

5. The distances of the points from the center of clusters are


calculated again and assigns the point to the closest cluster.

6. Again the new centroid for the cluster is calculated

7. These steps are repeated until we have a repetition in centroids or


new centroids are very close to the previous ones or similar.
How to decide the number of clusters?

Elbow Method:
» The elbow method is to run K-Means clustering on datasets where ‘K’ is referred as number of
clusters.

» The sum of squared error is defined as the sum of the squared distances between each member of
the cluster and its centroid :𝑊𝑆𝑆 = σ𝑚
𝑖=1 𝑥𝑖 − 𝑐𝑖
2

o Where 𝑥𝑖 = data points and 𝑐𝑖 is closest point to centroid.

We can see a very slow change in the values of WSS after k=2 ,
so you should take that elbow point value as the final number
of clusters
Example Problem

» Cluster the following eight points (with (x, y) representing locations) into three clusters :

A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4)


A7(1, 2) A8(4, 9).

o Assume that initial cluster centers are:

A1(2, 10), A4(8,4) and A7(1, 2).

» The distance function between two points Aj=(x1, y1) and Ci=(x2, y2) is defined as:

dis(Aj, Ci) = |x2 – x1| + |y2 – y1| .

» Use k-means algorithm to find optimal centroids to group the given data into three clusters.
Iteration 1

First we list all points in the first column of the table below. The initial cluster centers -
centroids, are (2, 10), (8,4) and (1, 2) - chosen randomly.

Data Points Cluster 1 with Cluster 2 with Cluster 3 with Cluster


centroid (2,10) centroid (8, 4) centroid (1, 2)
A1 (2, 10) 0 12 9 1
A2 (2, 5) 5 7 4 3
A3 (8, 4) 12 0 9 2
A4 (5, 8) 5 7 10 1
A5 (7, 5) 10 2 9 2
A6 (6, 4) 10 2 7 2
A7 (1, 2) 9 9 0 3
A8 (4, 9) 3 9 10 1

Next, we will calculate the distance from each points to each of the three centroids,
by using the distance function: dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
» Starting from point A1 calculate the distance to each of the three means, by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |8 – 2| + |4 – 10| = 6 + 6 = 12
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
o Fill these values in the table & decide which cluster should the point (2, 10) be placed in? The one, where
the point has the shortest distance to the mean – i.e. mean 1 (cluster 1), since the distance is 0.

» Next go to the second point A2 and calculate the distance:


dis(A2, mean1) = |2 – 2| + |10 – 5| = 0 + 5 = 5
dis(A2, mean2) = |8 – 2| + |4 – 5| = 6 + 1 = 7
dis(A2, mean2) = |1 – 2| + |2 – 5| = 1 + 3 = 4
o So, we fill in these values in the table and assign the point (2, 5) to cluster 3 since mean 3 is the shortest
distance from A2.

» Analogically, we fill in the rest of the table, and place each point in one of the clusters
Iteration 1
» Next, we need to re-compute the new cluster centers. We do so, by taking the mean of all
points in each cluster.
» For Cluster 1, we have three points and needs to take average of them as new centroid, i.e.
((2+5+4)/3, (10+8+9)/3) = (3.67, 9)
» For Cluster 2, we have three points. The new centroid is:
((8+7+6)/3, (4+5+4)/3 ) = (7, 4.33)
» For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)
» Since centroids changes in Iteration1 (epoch1), we go to the next Iteration (epoch2) using the
new means we computed.
o The iteration continues until the centroids do not change anymore..
Second epoch
» Using the new centroid compute cluster members again.
Data Points Cluster 1 with Cluster 2 with Cluster 3 with Cluster
centroid (3.67, 9) centroid centroid (1.5,
(7, 4.33) 3.5)
A1 (2, 10) 2.67 10.67 7 C1
A2 (2, 5) 5.67 5.67 2 C3
A3 (8, 4) C2
A4 (5, 8) C1
A5 (7, 5) C2
A6 (6, 4) C2
A7 (1, 2) C3
A8 (4, 9) C1

» After the 2nd epoch the results would be:


cluster 1: {A1,A4,A8} with new centroid=(3.67,9);
cluster 2: {A3,A5,A6} with new centroid = (7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Final results
» Finally in the 2th epoch there is no change of members of clusters and
centroids. So the algorithm stops.
» The result of clustering is shown in the following figure

16
Pros and Cons: K-Means Clustering

Pros:
o Simple and understandable

o Items automatically assigned to cluster

Cons:
o Must define number of clusters

o Hard-cluster

o Unable to handle noise data and outliers

17
Fuzzy C-Means Clustering

» Fuzzy C-means is an extension of K-Means, the popular simple clustering


techniques.

» Fuzzy clustering(also referred to as soft clustering) is a form of clustering in


which each point can belong to more than one cluster.

18
Fuzzy C-Means Clustering

How FCM works?

1. Initialize (# of cluster M, fuzzifier m, membership function 𝜇)

σ𝑛 𝑚
𝑘=1(𝜇𝑖𝑘 ) 𝑥𝑘
2. Cluster centres 𝑐𝑖 = σ𝑛 𝑚
𝑘=1 𝜇𝑖𝑘

1
3. Update memberships 𝜇𝑖𝑘 = 2
𝑑𝑖𝑘 𝑚−1
σ𝑀
𝑗=1 𝑑𝑗𝑘

4. Stopping criteria ∪𝑐𝑢𝑟𝑟𝑒𝑛𝑡 − ∪𝑏𝑒𝑓𝑜𝑟𝑒

∪ −Fuzzy partition , let M=4

X=[0.1,0.2,0.6,0.1] for FCM x=[0,0,1,0] for K-means


Pros and cons: Fuzzy C-Means Clustering

Pros:
» Allows a data point to be in multiple clusters

» A more natural representation of the behaviors of genes

» Genes usually are involved in multiple functions

Cons:
» Need to define C, the number of clusters

» Need to determine membership cut-off values

» Clusters are sensitive to initial assignment of centroids

» Fuzzy c-means is not a deterministic algorithm


Hierarchical Clustering

» A cluster has tree like structure or a parent child relationship.

» Hierarchical clustering is an alternative approach which builds a hierarchy from the


bottom-up and does not require us to specify the number of clusters beforehand.

21
Agglomerative: Hierarchical Clustering

» Bottom Up approach: begin with each element as a separate cluster


and merge them into successively larger clusters.
o Start with all sample units in n clusters of size 1.
o Then, at each step of the algorithm, the pair of clusters with the shortest
distance are combined into a single cluster.
o The algorithm stops when all sample units are grouped into one cluster
of size n.
Divisive: Hierarchical Clustering

» Top down approach: begin with the whole set and processed to divide
it into successively smaller clusters.
o Start with all sample units in a single cluster of size n.
o Then, at each step of the algorithm, clusters are partitioned into a pair of
clusters, selected to maximize the distance between each daughter.
o The algorithm stops when sample units are partitioned into n clusters of
size 1

23
Dendrogram: Shows How the Clusters are Merged/Splitted

Decompose data objects into a several levels of nested


partitioning (tree of clusters), called a dendrogram.

A clustering of the data objects is obtained by cutting the


dendrogram at the desired level, then each connected
component forms a cluster.

24
Pros and Cons: Hierarchical Clustering

Pros:

» No assumption of a particular number of clusters

» May corresponds to meaningful taxonomies

Cons:

» Once a decision is made to combine two clusters, it can’t be undone

» Too slow for large datasets

25
Cluster Evaluation
How do we know the clusters are valid? or, at least, good enough?

Indirect Evaluation:

» In some applications, clustering is not the primary task, but used to help perform
another task.

» We can use the performance on the primary task to compare clustering methods.

» For instance, in designing a recommender system, if the primary task is to provide


recommendations on book purchasing to online shoppers.
o If we can cluster books according to their features, we might be able to provide better
recommendations.
o We can evaluate different clustering algorithms based on how well they help with the
recommendation task.
o Here, we assume that the recommendation can be reliably evaluated.
Cluster Evaluation

o Sum-of-squares within clusters-SSW » 𝑆𝑆𝐵 = σ𝑀


𝑖=1 𝑛𝑖 𝑥𝑖 − 𝑥ҧ
2

o Sum-of-squares between clusters-SSB o M clusters


2 o 𝑛𝑖 is number of elements in a cluster
» 𝑆𝑆𝑊 = σ𝑁
𝑖=1 𝑥𝑖 − 𝑐𝑝
o 𝑐𝑖 is the current class mean
N is data points
o 𝑥ҧ is mean of means
𝑥𝑖 data instances

𝑐𝑝 is the center of clusters


Cluster Evaluation

» Find all credit applicants who have no credit risks.

» Identify customers with similar buying habits.

28
Assignment 2

1. Write python algorithm for SVM, Clustering and value based machine
learning methods.

Submit via [email protected] before July 12 2022


Reading Assignment

» Reinforcement learning

» Expert system

» Natural language processing

You might also like