0% found this document useful (0 votes)
0 views25 pages

Week 14 and 15 Machine Learning Unsupervised 2

This document outlines the topic of Unsupervised Learning, focusing on the K-means clustering algorithm, its strengths, and weaknesses. It defines unsupervised learning, explains clustering, and provides real-life applications such as anomaly detection and image analysis. The document also details the steps involved in the K-means algorithm and discusses common distance measures used in clustering.

Uploaded by

am16.pro16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views25 pages

Week 14 and 15 Machine Learning Unsupervised 2

This document outlines the topic of Unsupervised Learning, focusing on the K-means clustering algorithm, its strengths, and weaknesses. It defines unsupervised learning, explains clustering, and provides real-life applications such as anomaly detection and image analysis. The document also details the steps involved in the K-means algorithm and discusses common distance measures used in clustering.

Uploaded by

am16.pro16
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Topic 10:

Machine Learning:
Unsupervised Learning
Term 2-ARTI 106
Computer Track
2024-2025
Learning outcomes

The main learning objectives of this topic are:


❑Define Unsupervised Learning
❑ Explain the clustering and give some examples from
the real-life applications.
❑ Explain How the K-Mean Clustering algorithm works
❑ Discuss the strength and weakness of K-means.
Outlines

❑Define the unsupervised learning.


❑ What is Clustering and its goals.
❑ Real-life examples using clustering.
❑ K-Mean Clustering algorithm Steps.
❑ Strengths and weakness of K-means.
Supervised learning vs.
unsupervised learning

❑ Supervised learning: discover patterns in the data that relate data


attributes with a target (class) attribute.
❑ These patterns are then utilized to predict the values of the target
attribute in future data instances.
❑ Unsupervised learning: The data have no target attribute.
❑ There is no supervision (no labels/responses), only inputs 𝒙𝟏 , 𝒙𝟐 , … , 𝒙𝑵 .
❑ The purpose of unsupervised learning is to attempt to find natural
partitions in the training set.
❑ Two general strategies for Unsupervised learning include: Clustering and
Hierarchical Clustering.
What is Unsupervised
Learning Useful for?

❑Collecting and labeling a large set of sample patterns


can be very expensive!!! Training for supervised learning
needs a lot of computation time.
❑Unsupervised learning is good at finding patterns and
relationships in data without being told what to look for
(unlabeled data). This can help you learn new things
about your data.
Clustering

❑ Clustering is often called an unsupervised learning task as no


class values denoting an a priori grouping of the data instances
are given, which is the case in supervised learning.
❑ Clustering is the classification of objects into different groups, or
more precisely, the partitioning of a data set into subsets
(clusters), so that the data in each subset (ideally) share some
common trait - often according to some defined distance
measure.
Goal of clustering
❑The goal of clustering is to
group data points that are close
(or similar) to each other
❑Input: 𝑁 unlabeled examples
𝒙1 , 𝒙2 , … , 𝒙𝑁 ;
❑Output: Group the examples
into 𝐾 “homogeneous”
partitions
What is clustering for?
Let us see some real-life examples:
❑ Anomaly detection: Unsupervised learning can identify
unusual patterns or deviations from normal behavior in data,
enabling the detection of fraud, intrusion, or system failures.
❑ Image analysis: Unsupervised learning can group images
based on their content, facilitating tasks such as image
classification, object detection, and image retrieval.
❑ In marketing, segment customers according to their
similarities to do targeted marketing.
Unsupervised Learning
Methods

There are a lot of Unsupervised Learning Methods:


❑ k-means
❑The EM Algorithm
❑Gaussian mixture model
❑Principal Component Analysis (PCA)
How the K-Mean Clustering
algorithm works?

Watch this tutorial about does K-


means work.
K-means algorithm
❑ K-means is a algorithm is an algorithm to cluster n objects based
on attributes into k partitions, where k < n.
❑ Let the set of data points (or instances) D be:
❑ {x1, x2, …, xn},
❖ where xi = (xi1, xi2, …, xir) is a vector in a real-valued space X 
Rr, and r is the number of attributes (dimensions) in the data.
❑ The k-means algorithm partitions the given data into k clusters.
❖Each cluster has a cluster center, called centroid.
❖k is specified by the user.
K-means algorithm steps:

❑ Given k, the k-means algorithm works as follows:


1) Define K.
2) Randomly choose k data points (seeds) to be the initial
centroids, cluster centers.
3) Assign each data point to the closest centroid.
4) Re-compute the centroids using the current cluster
memberships.
5) Assign data point to new clusters and if a convergence
criterion is not met, go to 2).
How does the K-means work?
How does the K-means work:
step 1
How does the K-means work:
step 2
How does the K-means work:
step 3
Common distance
measures
❑ Distance measure will determine how the similarity of two elements is calculated
and it will influence the shape of the clusters.
❑ They include:
❖ The Euclidean distance (also called 2-norm distance) is given by:

❖ The Manhattan distance (also called taxicab norm or 1-norm) is given by:
How does the K-means work:
step 3
How does the K-means work:
step 4
How does the K-means work:
step 5
How does the K-means work:
Strengths of k-means
❑ Simple: easy to understand and to implement
❑ Efficient: Time complexity: O(tkn),
❖ where n is the number of data points,
❖ k is the number of clusters, and
❖ t is the number of iterations.
❑ Flexibility: K-Means clustering is a flexible algorithm that can easily
adjust to changes. If there are any problems, adjusting the cluster
segment will allow changes to easily occur on the algorithm.
❑ K-means is the most popular clustering algorithm.
Weaknesses of k-means

❑The user needs to specify k.


❑The algorithm is sensitive to outliers
❖Outliers are data points that are very far away
from other data points.
❖Outliers could be errors in the data recording or
some special data points with very different
values.
Weaknesses of k-means:
Problems with outliers
Thank you for you attention

You might also like