0% found this document useful (0 votes)

21 views5 pages

Cbsyllabus Bda

Uploaded by

rajdmice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Cbsyllabus Bda

Uploaded by

rajdmice

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 5

DMI COLLEGE OF ENGINEERING

(AN AUTONOMOUS INSTITUTION)

APPROVED BY AICTE, AFFILIATED TO ANNA UNIVERSITY,
ACCREDITED BY NBA, ISO CERTIFIED INSTITUTION
PALANCHUR – NAZARATHPET P.O., CHENNAI – 600 123

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

YEAR /SEMESTER/DEPT: III / V / AIDS ACADEMIC YEAR :: AUGUST 2024 - NOVEMBER 2024

SUBJECT CODE / NAME: CCS 334 / BIG DATA ANALYTICS

CONTENT BEYOND THE SYLLABUS

CLUSTERING IN MACHINE LEARNING

Clustering or cluster analysis is a machine learning technique, which groups the unlabelled dataset. It
can be defined as "A way of grouping the data points into different clusters, consisting of similar data
points. The objects with the possible similarities remain in a group that has less or no similarities with
another group."

It does it by finding some similar patterns in the unlabelled dataset such as shape, size, color, behavior, etc.,
and divides them as per the presence and absence of those similar patterns.

It is an unsupervised learning method, hence no supervision is provided to the algorithm, and it deals with the
unlabeled dataset.

After applying this clustering technique, each cluster or group is provided with a cluster-ID. ML system can
use this id to simplify the processing of large and complex datasets.

The clustering technique can be widely used in various tasks. Some most common uses of this technique are:

o Market Segmentation
o Statistical data analysis
o Social network analysis
o Image segmentation
o Anomaly detection, etc.
Apart from these general usages, it is used by the Amazon in its recommendation system to provide the
recommendations as per the past search of products. Netflix also uses this technique to recommend the
movies and web-series to its users as per the watch history.

The below diagram explains the working of the clustering algorithm. We can see the different fruits are
divided into several groups with similar properties.

Types of Clustering Methods

The clustering methods are broadly divided into Hard clustering (datapoint belongs to only one group)
and Soft Clustering (data points can belong to another group also). But there are also other various
approaches of Clustering exist. Below are the main clustering methods used in Machine learning:

1. Partitioning Clustering
2. Density-Based Clustering
3. Distribution Model-Based Clustering
4. Hierarchical Clustering
5. Fuzzy Clustering

Partitioning Clustering
It is a type of clustering that divides the data into non-hierarchical groups. It is also known as the centroid-
based method. The most common example of partitioning clustering is the K-Means Clustering algorithm.

In this type, the dataset is divided into a set of k groups, where K is used to define the number of pre-defined
groups. The cluster center is created in such a way that the distance between the data points of one cluster is
minimum as compared to another cluster centroid.

Density-Based Clustering
The density-based clustering method connects the highly-dense areas into clusters, and the arbitrarily shaped
distributions are formed as long as the dense region can be connected. This algorithm does it by identifying
different clusters in the dataset and connects the areas of high densities into clusters. The dense areas in data
space are divided from each other by sparser areas.

These algorithms can face difficulty in clustering the data points if the dataset has varying densities and high
dimensions.
Distribution Model-Based Clustering
In the distribution model-based clustering method, the data is divided based on the probability of how a
dataset belongs to a particular distribution. The grouping is done by assuming some distributions
commonly Gaussian Distribution.The example of this type is the Expectation-Maximization Clustering
algorithm that uses Gaussian Mixture Models (GMM).

Hierarchical Clustering
Hierarchical clustering can be used as an alternative for the partitioned clustering as there is no requirement of
pre-specifying the number of clusters to be created. In this technique, the dataset is divided into clusters to
create a tree-like structure, which is also called a dendrogram. The observations or any number of clusters
can be selected by cutting the tree at the correct level. The most common example of this method is
the Agglomerative Hierarchical algorithm.

Fuzzy Clustering
Fuzzy clustering is a type of soft method in which a data object may belong to more than one group or
cluster. Each dataset has a set of membership coefficients, which depend on the degree of membership
to be in a cluster. Fuzzy C-means algorithm is the example of this type of clustering; it is sometimes also
known as the Fuzzy k-means algorithm.

Clustering Algorithms
The Clustering algorithms can be divided based on their models that are explained above. There are different
types of clustering algorithms published, but only a few are commonly used. The clustering algorithm is based
on the kind of data that we are using. Such as, some algorithms need to guess the number of clusters in the
given dataset, whereas some are required to find the minimum distance between the observation of the
dataset.

Here we are discussing mainly popular Clustering algorithms that are widely used in machine learning:

1. K-Means algorithm: The k-means algorithm is one of the most popular clustering algorithms. It
classifies the dataset by dividing the samples into different clusters of equal variances. The number of
clusters must be specified in this algorithm. It is fast with fewer computations required, with the linear
complexity of O(n).
2. Mean-shift algorithm: Mean-shift algorithm tries to find the dense areas in the smooth density of
data points. It is an example of a centroid-based model, that works on updating the candidates for
centroid to be the center of the points within a given region.
3. DBSCAN Algorithm: It stands for Density-Based Spatial Clustering of Applications with Noise.
It is an example of a density-based model similar to the mean-shift, but with some remarkable
advantages. In this algorithm, the areas of high density are separated by the areas of low density.
Because of this, the clusters can be found in any arbitrary shape.
4. Expectation-Maximization Clustering using GMM: This algorithm can be used as an alternative
for the k-means algorithm or for those cases where K-means can be failed. In GMM, it is assumed that
the data points are Gaussian distributed.
5. Agglomerative Hierarchical algorithm: The Agglomerative hierarchical algorithm performs the
bottom-up hierarchical clustering. In this, each data point is treated as a single cluster at the outset and
then successively merged. The cluster hierarchy can be represented as a tree-structure.
6. Affinity Propagation: It is different from other clustering algorithms as it does not require to specify
the number of clusters. In this, each data point sends a message between the pair of data points until
convergence. It has O(N2T) time complexity, which is the main drawback of this algorithm.

Applications of Clustering
Below are some commonly known applications of clustering technique in Machine Learning:

o In Identification of Cancer Cells: The clustering algorithms are widely used for the
identification of cancerous cells. It divides the cancerous and non-cancerous data sets into
different groups.
o In Search Engines: Search engines also work on the clustering technique. The search result
appears based on the closest object to the search query. It does it by grouping similar data objects
in one group that is far from the other dissimilar objects. The accurate result of a query depends
on the quality of the clustering algorithm used.
o Customer Segmentation: It is used in market research to segment the customers based on their
choice and preferences.
o In Biology: It is used in the biology stream to classify different species of plants and animals
using the image recognition technique.
o In Land Use: The clustering technique is used in identifying the area of similar lands use in the
GIS database. This can be very useful to find that for what purpose the particular land should be
used, that means for which purpose it is more suitable.

ML
No ratings yet
ML
28 pages
Classification and Clustering
No ratings yet
Classification and Clustering
8 pages
Detailed Lesson Plan in Cookery
88% (80)
Detailed Lesson Plan in Cookery
3 pages
Grade 3 Term 2 Scheme Cum Plan
100% (1)
Grade 3 Term 2 Scheme Cum Plan
192 pages
Clustering
No ratings yet
Clustering
11 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
Unit 4
No ratings yet
Unit 4
62 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
64 pages
Final Demo Lesson Plan
No ratings yet
Final Demo Lesson Plan
10 pages
Unsupervised Learning-01
No ratings yet
Unsupervised Learning-01
42 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
4 pages
Unsupervised Machine Learning
No ratings yet
Unsupervised Machine Learning
59 pages
Machine Learning Clustering AlgorithmsI
No ratings yet
Machine Learning Clustering AlgorithmsI
129 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
7 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
21 pages
3CP10 MJJ Clustering Intro
No ratings yet
3CP10 MJJ Clustering Intro
18 pages
4.unit 4 ML Q&A
No ratings yet
4.unit 4 ML Q&A
73 pages
E-Note 28966 Content Document 20241211091351PM
No ratings yet
E-Note 28966 Content Document 20241211091351PM
69 pages
Unit III Clustering
No ratings yet
Unit III Clustering
47 pages
Clustering
No ratings yet
Clustering
57 pages
ML Unit 4 (Ab 22)
No ratings yet
ML Unit 4 (Ab 22)
39 pages
Module 5
No ratings yet
Module 5
91 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
ML Unit 5
No ratings yet
ML Unit 5
20 pages
Day 3 - Content
No ratings yet
Day 3 - Content
50 pages
Module 5 - Notes - 13 12 2024
No ratings yet
Module 5 - Notes - 13 12 2024
45 pages
4.unsupervised Learning Model-Clustering
No ratings yet
4.unsupervised Learning Model-Clustering
45 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
Unsupervised Learning Part 1
No ratings yet
Unsupervised Learning Part 1
9 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
28 pages
Lecturer-1 Unit 3
No ratings yet
Lecturer-1 Unit 3
31 pages
Unit Iii - ML
No ratings yet
Unit Iii - ML
13 pages
Unit 4
No ratings yet
Unit 4
16 pages
Unit 4-L2
No ratings yet
Unit 4-L2
19 pages
Artificial Intelligence Lec 5
No ratings yet
Artificial Intelligence Lec 5
20 pages
Clustering
No ratings yet
Clustering
20 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
Lesson Plan
No ratings yet
Lesson Plan
8 pages
ML Unit-3
No ratings yet
ML Unit-3
22 pages
ML Unit-Iii
No ratings yet
ML Unit-Iii
18 pages
A Differentiated Supervision Model 1 A D
No ratings yet
A Differentiated Supervision Model 1 A D
45 pages
Narrative Report
No ratings yet
Narrative Report
66 pages
Module Review Counseling Techniques
No ratings yet
Module Review Counseling Techniques
4 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
Unit 3 Unsupervised Learning Algorith
No ratings yet
Unit 3 Unsupervised Learning Algorith
15 pages
Unit-5 Clustering (March 16, 24)
No ratings yet
Unit-5 Clustering (March 16, 24)
25 pages
Yoga Lesson 1
No ratings yet
Yoga Lesson 1
6 pages
Lesson Plan in Mathematics 9
100% (9)
Lesson Plan in Mathematics 9
2 pages
Unit - 4 (ML)
No ratings yet
Unit - 4 (ML)
13 pages
Unit 2 ML
No ratings yet
Unit 2 ML
11 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
Classify Clustering
No ratings yet
Classify Clustering
31 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Second Language Acquisition
No ratings yet
Second Language Acquisition
2 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
Zone of Proximal Development
No ratings yet
Zone of Proximal Development
5 pages
Clustering
No ratings yet
Clustering
8 pages
CCS334 Bda
No ratings yet
CCS334 Bda
5 pages
Clustering
No ratings yet
Clustering
10 pages
Clustering New
No ratings yet
Clustering New
6 pages
Handouts SLA ENG504
No ratings yet
Handouts SLA ENG504
74 pages
Clustering
No ratings yet
Clustering
6 pages
Unit 5
No ratings yet
Unit 5
5 pages
Lesson Plan On The Continent of Africa
No ratings yet
Lesson Plan On The Continent of Africa
8 pages
Clustering
No ratings yet
Clustering
3 pages
Poetry - Making Inferences
100% (1)
Poetry - Making Inferences
2 pages
1ms Yearly Planning 20212022
No ratings yet
1ms Yearly Planning 20212022
7 pages
Ocs351-Aimlf Univ Exam
No ratings yet
Ocs351-Aimlf Univ Exam
9 pages
Clustering in Machine Learning - Javatpoint
No ratings yet
Clustering in Machine Learning - Javatpoint
10 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Letter L Lesson Reflection 5
No ratings yet
Letter L Lesson Reflection 5
2 pages
Ideal Dataset Splitting Ratios in Machine Learning Algorithms General Concerns For Data Scientists and Data Analysts
No ratings yet
Ideal Dataset Splitting Ratios in Machine Learning Algorithms General Concerns For Data Scientists and Data Analysts
10 pages
Question Bank BDA CCS334
No ratings yet
Question Bank BDA CCS334
12 pages
AIML ISE mpq2
No ratings yet
AIML ISE mpq2
4 pages
(PDF) Matlab As A Teaching and Learning Tool For Mathematics - A Literature Review
No ratings yet
(PDF) Matlab As A Teaching and Learning Tool For Mathematics - A Literature Review
50 pages
Games For Kids
No ratings yet
Games For Kids
3 pages
Co Po Bda
No ratings yet
Co Po Bda
1 page
Copy of BUSM3310 - ASM1 - Slide - G4.T3
No ratings yet
Copy of BUSM3310 - ASM1 - Slide - G4.T3
56 pages
Definition of Terms
No ratings yet
Definition of Terms
3 pages
Testing Practices of English Teachers in Selected Public Secondary Schools
No ratings yet
Testing Practices of English Teachers in Selected Public Secondary Schools
12 pages
Miss Tuangporn Pantanarat: Bjective
No ratings yet
Miss Tuangporn Pantanarat: Bjective
1 page
TWS Task 5
No ratings yet
TWS Task 5
8 pages
DLL 6. Eapp PASSIVIZATION AND NOMINALIZTN
No ratings yet
DLL 6. Eapp PASSIVIZATION AND NOMINALIZTN
1 page
Scratch - Storyline With Exponents
No ratings yet
Scratch - Storyline With Exponents
3 pages
DNP Reflective Journal
No ratings yet
DNP Reflective Journal
5 pages
Grade 7 SA Unit 6
No ratings yet
Grade 7 SA Unit 6
3 pages
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
No ratings yet
Station Rotations Model Planning Template: Step 1: Reimagine The Learning Environment
10 pages
DBMS Ut 1
No ratings yet
DBMS Ut 1
1 page

Cbsyllabus Bda

Uploaded by

Cbsyllabus Bda

Uploaded by

DMI COLLEGE OF ENGINEERING

(AN AUTONOMOUS INSTITUTION)

DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

SUBJECT CODE / NAME: CCS 334 / BIG DATA ANALYTICS

CONTENT BEYOND THE SYLLABUS

CLUSTERING IN MACHINE LEARNING

Types of Clustering Methods

You might also like