0% found this document useful (0 votes)

11 views18 pages

Chapter 3 p4

Unsupervised machine learning involves training models on unlabeled datasets, primarily using clustering and association algorithms. Clustering groups similar data points together, while association rules identify relationships between variables, aiding in tasks like market analysis. K-Means is a popular clustering algorithm that partitions data into predefined clusters based on proximity to centroids, with advantages in simplicity and efficiency but challenges with sensitivity to outliers and scalability.

Uploaded by

sagarmeravi563

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views18 pages

Chapter 3 p4

Uploaded by

sagarmeravi563

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

Unsupervised Machine Learning

Unsupervised learning is a type of machine learning in which models are trained

using unlabeled dataset and are allowed to act on that data without any
supervision.
Types of Unsupervised Learning Algorithm:
•Clustering:
• Clustering is a method of grouping the objects into clusters
such that objects with most similarities remains into a group
and has less or no similarities with the objects of another
group.
• Cluster analysis finds the commonalities between the data
objects and categorizes them as per the presence and absence
of those commonalities.

•Association:
• An association rule is an unsupervised learning method which is
used for finding the relationships between variables in the large
database.
• It determines the set of items that occurs together in the
dataset.
• Association rule makes marketing strategy more effective.
Such as people who buy X item (suppose a bread) are also
tend to purchase Y (Butter/Jam) item. A typical example of
Association rule is Market Basket Analysis.
Advantages of Unsupervised Learning

• Unsupervised learning is used for more complex

tasks as compared to supervised learning because,
in unsupervised learning, we don't have labeled
input data.
• Unsupervised learning is preferable as it is easy to
get unlabeled data in comparison to labeled data.

Disadvantages of Unsupervised Learning

• Unsupervised learning is intrinsically more difficult
than supervised learning as it does not have
corresponding output.
• The result of the unsupervised learning algorithm
might be less accurate as input data is not labeled,
and algorithms do not know the exact output in
advance.
Clustering in Machine Learning

• Clustering or cluster analysis is a machine learning technique,

which groups the unlabeled dataset. It can be defined as "A
way of grouping the data points into different clusters,
consisting of similar data points. The objects with the
possible similarities remain in a group that has less or
no similarities with another group."
• It does it by finding some similar patterns in the unlabeled
dataset such as shape, size, color, behavior, etc., and divides
them as per the presence and absence of those similar
patterns.
• It is an unsupervised learning method, hence no supervision is
provided to the algorithm, and it deals with the unlabeled
dataset.
• The clustering technique is commonly used for statistical
data analysis.
• Example: Let's understand the clustering technique with the
real-world example of Mall: When we visit any shopping mall,
we can observe that the things with similar usage are
grouped together. Such as the t-shirts are grouped in one
section, and trousers are at other sections, similarly, at
vegetable sections, apples, bananas, Mangoes, etc., are
grouped in separate sections, so that we can easily find out
the things. The clustering technique also works in the same
way.

• The clustering technique can be widely used in

various tasks. Some most common uses of this
technique are:
1. Market Segmentation
2. Statistical data analysis
3. Social network analysis
4. Image segmentation
5. Anomaly detection, etc.
The below diagram explains the working of the clustering
algorithm. We can see the different fruits are divided into
several groups with similar properties.
Types of Clustering
Broadly speaking, clustering can be divided into two subgroups :
• Hard Clustering: In hard clustering, each data point either belongs to a cluster
completely or not. For example, in the above example each customer is put into one
group out of the 3 groups.
• Soft Clustering: In soft clustering, instead of putting each data point into a separate
cluster, a probability or likelihood of that data point to be in those clusters is assigned.
For example, from the above scenario each costumer is assigned a probability to be in
either of 3 clusters of the retail store.
Types of clustering algorithms
Connectivity models:
• As the name suggests, these models are based on the notion that the data points closer
in data space exhibit more similarity to each other than the data points lying farther
away.
• These models can follow two approaches. In the first approach, they start with
classifying all data points into separate clusters & then aggregating them as the distance
decreases.
• In the second approach, all data points are classified as a single cluster and then
partitioned as the distance increases. Also, the choice of distance function is subjective.
• These models are very easy to interpret but lacks scalability for handling big datasets.
Examples of these models are hierarchical clustering algorithm and its variants.
Centroid models:
• These are iterative clustering algorithms in which the notion of similarity is derived
by the closeness of a data point to the centroid of the clusters.
• K-Means clustering algorithm is a popular algorithm that falls into this category. In
these models, the no. of clusters required at the end have to be mentioned
beforehand, which makes it important to have prior knowledge of the dataset.
These models run iteratively to find the local optima.
Distribution models:
• These clustering models are based on the notion of how probable is it that all data
points in the cluster belong to the same distribution (For example: Normal,
Gaussian). These models often suffer from overfitting.
• A popular example of these models is Expectation-maximization algorithm which
uses multivariate normal distributions.

Density Models:
• These models search the data space for areas of varied density of data points in
the data space.
• It isolates various different density regions and assign the data points within
these regions in the same cluster.
• Popular examples of density models are DBSCAN and OPTICS.
K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used

to solve the clustering problems in machine learning or data science.
In this topic, we will learn what is K-means clustering algorithm, how
the algorithm works, along with the R implementation of k-means
clustering.
What is K-Means Algorithm?

K-Means Clustering is an Unsupervised Learning algorithm, which

groups the unlabeled dataset into different clusters. Here K defines
the number of pre-defined clusters that need to be created in the
process, as if K=2, there will be two clusters, and for K=3, there will
be three clusters, and so on.
It is an iterative algorithm that divides the unlabeled dataset into k
different clusters in such a way that each dataset belongs only one
group that has similar properties.
It allows us to cluster the data into different groups and a convenient
way to discover the categories of groups in the unlabeled dataset on
its own without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a
centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.

The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find
the best clusters. The value of k should be predetermined in this
algorithm.

The k-means clustering algorithm mainly performs two tasks:

•Determines the best value for K center points or centroids by an
iterative process.
•Assigns each data point to its closest k-center. Those data points which
are near to the particular k-center, create a cluster.
Hence each cluster has datapoints with some commonalities, and it is
away from other clusters.
The below diagram explains the working of the K-means
Clustering Algorithm:
•Step 1: Choose groups in the feature plan randomly
•Step 2: Minimize the distance between the cluster center and the different
observations (centroid). It results in groups with observations
•Step 3: Shift the initial centroid to the mean of the coordinates within a
group.
•Step 4: Minimize the distance according to the new centroids. New
boundaries are created. Thus, observations will move from one group to
another
•Repeat until no observation changes groups
How does the K-Means Algorithm Work?
The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the
predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which means reassign each datapoint to the new
closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready.
# Loading data
data(iris)

# Structure
str(iris)
Performing K-Means Clustering on Dataset

# Installing Packages
install.packages("ClusterR")
install.packages("cluster")
# Loading package
library(ClusterR)
library(cluster)
# Removing initial label of
# Species from original dataset

iris_1 <- iris[, -5]

# Fitting K-Means clustering Model
# to training dataset

set.seed(240) # Setting seed

kmeans.re <- kmeans(iris_1, centers = 3, nstart = 20)
kmeans.re

# Cluster identification for

# each observation
kmeans.re$cluster
# Confusion Matrix

cm <- table(iris$Species, kmeans.re$cluster)

cm
# Model Evaluation and visualization
plot(iris_1[c("Sepal.Length", "Sepal.Width")])
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster)
plot(iris_1[c("Sepal.Length", "Sepal.Width")],
col = kmeans.re$cluster,
main = "K-means with 3 clusters")
## Plotiing cluster centers
kmeans.re$centers
kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")]

# cex is font size, pch is symbol

points(kmeans.re$centers[, c("Sepal.Length", "Sepal.Width")],
col = 1:3, pch = 8, cex = 3)
## Visualizing clusters
y_kmeans <- kmeans.re$cluster
clusplot(iris_1[, c("Sepal.Length", "Sepal.Width")],
y_kmeans,
lines = 0,
shade = TRUE,
color = TRUE,
labels = 2,
plotchar = FALSE,
span = TRUE,
main = paste("Cluster iris"),
xlab = 'Sepal.Length',
ylab = 'Sepal.Width')
Advantages of K means algorithm
1) simple and easy to understand and to implement
2)K means is the most popular clustering algorithm
3)because it provides easily interpretable clustering result
fast and efficient in terms of computational cost excellent
for pre clustering in comparison to other clustering
algorithm
Disadvantages of K means algorithm
1) the algorithm is only applicable if the mean is
defined
2)for categorical data K mode
i) the centroid is represented by most frequent
values
ii) the algorithm is sensitive to outliers( outliers
are data points that are very far away from other
data points)
3) the algorithms are slow and do not scale to a large
number of data point

Machine Learning Simplified
100% (1)
Machine Learning Simplified
109 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
23 pages
DSUP Exp5
No ratings yet
DSUP Exp5
7 pages
Machine Learning - Iv
No ratings yet
Machine Learning - Iv
13 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Week 11
No ratings yet
Week 11
49 pages
Clustering
No ratings yet
Clustering
10 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Week 9. Unsupervised Learning
No ratings yet
Week 9. Unsupervised Learning
32 pages
UNIT-5 Material
No ratings yet
UNIT-5 Material
42 pages
Unit 3 Unsupervised Learning & Neural Network
No ratings yet
Unit 3 Unsupervised Learning & Neural Network
21 pages
Aiml Unit 4
No ratings yet
Aiml Unit 4
20 pages
Unit 4
No ratings yet
Unit 4
29 pages
04-FSSR DS610 2024 2025T1 Kmeans
No ratings yet
04-FSSR DS610 2024 2025T1 Kmeans
57 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
10 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
CLUSTERING
No ratings yet
CLUSTERING
11 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Unit 4
No ratings yet
Unit 4
125 pages
Clustering
No ratings yet
Clustering
9 pages
Lecture - 10 Unsupervised Learning & K-Means Clustering
No ratings yet
Lecture - 10 Unsupervised Learning & K-Means Clustering
31 pages
Unit 4
No ratings yet
Unit 4
40 pages
K Means Clustering
No ratings yet
K Means Clustering
11 pages
What Is Unsupervised Learning
No ratings yet
What Is Unsupervised Learning
9 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
Module 6 - Un-Supervised Learning Algorithms
No ratings yet
Module 6 - Un-Supervised Learning Algorithms
31 pages
Unit - Iv Unsupervisied Learning - Notes
No ratings yet
Unit - Iv Unsupervisied Learning - Notes
32 pages
Unit 4
No ratings yet
Unit 4
74 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
DSA Presentation Group 6
No ratings yet
DSA Presentation Group 6
34 pages
U1 - KMeans - 5th Sem - DS
No ratings yet
U1 - KMeans - 5th Sem - DS
14 pages
Unit IV
No ratings yet
Unit IV
96 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Unit 4
No ratings yet
Unit 4
16 pages
Untitled Document
No ratings yet
Untitled Document
32 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
ML Unit5 Notes
No ratings yet
ML Unit5 Notes
18 pages
Lec 05 Unsupervised-Kmeans
No ratings yet
Lec 05 Unsupervised-Kmeans
50 pages
Unit 4
No ratings yet
Unit 4
96 pages
(KtabPDF Com) xrwA7TEBGp
No ratings yet
(KtabPDF Com) xrwA7TEBGp
32 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
DSV - Unit 3 - Data Analysis in Depth
No ratings yet
DSV - Unit 3 - Data Analysis in Depth
53 pages
ML Unit 4
No ratings yet
ML Unit 4
110 pages
Unit4 ML
No ratings yet
Unit4 ML
20 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
ML Module5 Clustering
No ratings yet
ML Module5 Clustering
71 pages
9.54 Class 13: Unsupervised Learning
No ratings yet
9.54 Class 13: Unsupervised Learning
54 pages
ML L14 Clustering
No ratings yet
ML L14 Clustering
59 pages
2nd Unit NN Final Class Notes
No ratings yet
2nd Unit NN Final Class Notes
50 pages
Machine Learning in Medicine - A Practical Introduction
No ratings yet
Machine Learning in Medicine - A Practical Introduction
18 pages
Autonomous Driving With Deep Learning: A Survey of State-of-Art Technologies
No ratings yet
Autonomous Driving With Deep Learning: A Survey of State-of-Art Technologies
33 pages
Project Cycle Notes Class 10 Final
No ratings yet
Project Cycle Notes Class 10 Final
14 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
16 pages
Artificial Intelligence in Smart Tourism - A Conceptual Framework
No ratings yet
Artificial Intelligence in Smart Tourism - A Conceptual Framework
11 pages
Week 6 - Lecture 11-1
No ratings yet
Week 6 - Lecture 11-1
28 pages
AI - Introduction - Laura Perea
No ratings yet
AI - Introduction - Laura Perea
23 pages
ML - Lab - Programs - J
No ratings yet
ML - Lab - Programs - J
18 pages
NOTES OF Python Ok
No ratings yet
NOTES OF Python Ok
73 pages
Behavioral Detection and Prevention of Cheating During Online Examination Using Deep Learning Approach
No ratings yet
Behavioral Detection and Prevention of Cheating During Online Examination Using Deep Learning Approach
5 pages
Heart Disease Prediction Using Machine Learning
No ratings yet
Heart Disease Prediction Using Machine Learning
36 pages
W1.1 Cbu5201
No ratings yet
W1.1 Cbu5201
36 pages
1 s2.0 S2666764921000485 Main
No ratings yet
1 s2.0 S2666764921000485 Main
11 pages
Module 1
No ratings yet
Module 1
6 pages
2024 Build Llms
No ratings yet
2024 Build Llms
87 pages
Testpdf - Offering You The Best Study Guides and Valid It Exam Dumps
No ratings yet
Testpdf - Offering You The Best Study Guides and Valid It Exam Dumps
5 pages
ML Notes MAKAUT 7th Sem
No ratings yet
ML Notes MAKAUT 7th Sem
31 pages
Generative Ai Market Landscape 2023 PDF
No ratings yet
Generative Ai Market Landscape 2023 PDF
40 pages
Deep Learning Introduction Unit 1
No ratings yet
Deep Learning Introduction Unit 1
21 pages
Machine Learning Techniques
100% (2)
Machine Learning Techniques
45 pages
INFS5700 T2 2025 Week 1 Lecture Slides - Moodle
No ratings yet
INFS5700 T2 2025 Week 1 Lecture Slides - Moodle
49 pages
Machine Learning PYQ 2022 Ans
No ratings yet
Machine Learning PYQ 2022 Ans
17 pages
Exploring Large Language Models. Language Underst
No ratings yet
Exploring Large Language Models. Language Underst
48 pages
Wa0001.
No ratings yet
Wa0001.
82 pages
A Beginners Guide To Machine Learning For HR Practitioners
No ratings yet
A Beginners Guide To Machine Learning For HR Practitioners
6 pages
LR 1
No ratings yet
LR 1
35 pages
Word Sense Disambiguation: by Under The Guidance of
No ratings yet
Word Sense Disambiguation: by Under The Guidance of
99 pages
Learning Mechanisms and Rules-Notes
No ratings yet
Learning Mechanisms and Rules-Notes
7 pages
Data Science Specialization v3
No ratings yet
Data Science Specialization v3
23 pages

Chapter 3 p4

Uploaded by

Chapter 3 p4

Uploaded by

Unsupervised Machine Learning

Unsupervised learning is a type of machine learning in which models are trained

• Unsupervised learning is used for more complex

Disadvantages of Unsupervised Learning

• Clustering or cluster analysis is a machine learning technique,

• The clustering technique can be widely used in

K-Means Clustering is an unsupervised learning algorithm that is used

K-Means Clustering is an Unsupervised Learning algorithm, which

The k-means clustering algorithm mainly performs two tasks:

iris_1 <- iris[, -5]

set.seed(240) # Setting seed

# Cluster identification for

cm <- table(iris$Species, kmeans.re$cluster)

# cex is font size, pch is symbol

You might also like