Objectives of Clustering

What is Objectives of Clustering?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

Objectives of Clustering

What is Objectives of Clustering?

Uploaded by

Abu Sufian

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Objectives of Clustering

1. Getting Data

Objective: Gather the raw data needed for analysis.

 Data Sources:
o Internal Sources: Databases, transaction logs, customer records.
o External Sources: APIs, web scraping, third-party datasets.
o Generated Data: Surveys, experiments.
 Data Types:
o Structured Data: Tabular data, such as CSV files or databases.
o Unstructured Data: Text, images, audio.
o Semi-structured Data: JSON, XML.

Example: If you're clustering customers, you might gather data on purchase

history, website behavior, and demographic information from your company's
CRM system.

2. Cleaning Data

Objective: Ensure the data is free from errors and inconsistencies.

 Common Issues:
o Missing Data: Handle missing values by filling them with
mean/median values, using algorithms like K-Nearest Neighbors
(KNN), or simply removing the affected rows/columns.
o Duplicate Data: Remove duplicate entries to avoid skewing the
clustering results.
o Outliers: Detect and handle outliers that could distort cluster
formation, either by removing them or treating them separately.
 Data Cleaning Techniques:
o Imputation: Filling missing data with appropriate values.
o Normalization: Adjusting data to ensure consistency, such as
converting all dates to a single format.
o Filtering: Removing unnecessary columns or rows.

Example: You might find missing age data in customer profiles. This could be filled
with the median age of the customers or treated as a separate cluster.
3. Data Preprocessing

Objective: Prepare the data for the clustering algorithm by transforming it into a
suitable format.

 Normalization and Scaling:

o Normalization: Adjust values to a common scale without distorting
differences in the ranges of values.
o Standardization: Scale data to have a mean of 0 and a standard
deviation of 1, useful for algorithms like K-Means.
 Dimensionality Reduction:
o Techniques like Principal Component Analysis (PCA) reduce the
number of variables while retaining the most important information.
 Feature Engineering:
o Creating new features from existing data to enhance the clustering
process.
o Encoding Categorical Variables: Convert categories into numerical
values using methods like one-hot encoding.

Example: If your customer data includes income, it may vary widely. Normalizing
the income data ensures that customers with high incomes don't
disproportionately influence the clustering.

4. Data Visualization

Objective: Visualize the data to understand its structure and the results of the
clustering.

 Exploratory Visualization:
o Histograms and Box Plots: Used to understand the distribution of
individual features.
o Scatter Plots: Visualize relationships between two variables, helping
to identify potential clusters before applying any algorithm.
 Visualizing Clusters:
o 2D/3D Scatter Plots: After clustering, plot the clusters to visualize
how the data points have been grouped.
o Cluster Centroids: In K-Means, visualize the centroids to understand
the center of each cluster.
o Heatmaps: Show the correlation between features, which can help in
understanding the structure of the clusters.

Example: After clustering customers based on their purchasing behavior, you

could use a 2D scatter plot to visualize the clusters, where each point represents a
customer, and colors distinguish different clusters.

5. Clustering Process

Objective: Apply a clustering algorithm to group the data into meaningful

clusters.

 Choosing the Right Algorithm:

o K-Means: Simple and widely used for partitioning data into K
clusters.
o Hierarchical Clustering: Builds a tree of clusters, useful when the
number of clusters is not known beforehand.
o DBSCAN: Useful for finding arbitrarily shaped clusters and handling
noise/outliers.
 Running the Algorithm:
o Initialize the clustering process by choosing the appropriate
parameters (e.g., number of clusters for K-Means).
o Fit the model to the data, allowing it to group similar data points
together.
 Evaluating the Clusters:
o Silhouette Score: Measures how similar a data point is to its own
cluster compared to other clusters.
o Elbow Method: Used in K-Means to determine the optimal number
of clusters by plotting the sum of squared distances and looking for
an "elbow" point.

Example: Using K-Means to group customers into 3 clusters based on their

shopping habits, you could evaluate the clustering quality with a silhouette score.

Alternate Activity Menus For Math Grade 6
No ratings yet
Alternate Activity Menus For Math Grade 6
37 pages
Data Collection Methods
100% (1)
Data Collection Methods
33 pages
ADS Phase4
No ratings yet
ADS Phase4
21 pages
BDA LabReport-9
No ratings yet
BDA LabReport-9
17 pages
Big Data Analytics
No ratings yet
Big Data Analytics
25 pages
Ads Phase 4
No ratings yet
Ads Phase 4
12 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Clustering
No ratings yet
Clustering
6 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
Module 3
No ratings yet
Module 3
6 pages
Clustering: An Overview: Key Concepts Objective
No ratings yet
Clustering: An Overview: Key Concepts Objective
12 pages
DWM PT 2 QB Soln
No ratings yet
DWM PT 2 QB Soln
8 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
48 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
Clustering
No ratings yet
Clustering
21 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
_DWDM_PPT
No ratings yet
_DWDM_PPT
13 pages
ML Assignment 1
No ratings yet
ML Assignment 1
23 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
Ml Assignment 4
No ratings yet
Ml Assignment 4
6 pages
Classification Clustering Overview
No ratings yet
Classification Clustering Overview
7 pages
UNIT - 4 DWDM
No ratings yet
UNIT - 4 DWDM
27 pages
ENERGY_CONSUMPTION_PREDICTION_SYSTEM (1)
No ratings yet
ENERGY_CONSUMPTION_PREDICTION_SYSTEM (1)
21 pages
09 Clustering
No ratings yet
09 Clustering
21 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
ifferent methods of clustering
No ratings yet
ifferent methods of clustering
8 pages
Machine Learning Unit-4
No ratings yet
Machine Learning Unit-4
24 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Final
No ratings yet
Final
48 pages
DWDM Unit 3
No ratings yet
DWDM Unit 3
21 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
Asynchronous Task Cluster Analysis
No ratings yet
Asynchronous Task Cluster Analysis
2 pages
BI UNIT-03 Chap02 Clustering
No ratings yet
BI UNIT-03 Chap02 Clustering
8 pages
Lesson #10 - Cluster Analysis
No ratings yet
Lesson #10 - Cluster Analysis
3 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
K Means Clustering
No ratings yet
K Means Clustering
6 pages
Clustering
No ratings yet
Clustering
11 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Mining_Frequent_Patterns_and_Data_Mining_Topics_Cleaned
No ratings yet
Mining_Frequent_Patterns_and_Data_Mining_Topics_Cleaned
3 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
clustering
No ratings yet
clustering
6 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
Unit 4
No ratings yet
Unit 4
4 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
K-Mean
No ratings yet
K-Mean
9 pages
DM 3rd unit
No ratings yet
DM 3rd unit
5 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
Cluster
No ratings yet
Cluster
7 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
(PML ITS - Week 10) - Clustering
No ratings yet
(PML ITS - Week 10) - Clustering
42 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
UNIT 4 Updated
No ratings yet
UNIT 4 Updated
56 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
use of ICT
No ratings yet
use of ICT
3 pages
Networks
No ratings yet
Networks
4 pages
ICT&IT
No ratings yet
ICT&IT
4 pages
What Is Client Server
No ratings yet
What Is Client Server
5 pages
ICT Future
No ratings yet
ICT Future
4 pages
Introduction To Entrepreneurship
No ratings yet
Introduction To Entrepreneurship
4 pages
Microsoft Access - Use ADO To Execute SQL Statements
No ratings yet
Microsoft Access - Use ADO To Execute SQL Statements
18 pages
22619 Web Based Application Development Using PHP
No ratings yet
22619 Web Based Application Development Using PHP
12 pages
Project Title: Astudy On Training Need Identification System
No ratings yet
Project Title: Astudy On Training Need Identification System
97 pages
Export
No ratings yet
Export
7 pages
Google Career Certificate Programs 23.8
No ratings yet
Google Career Certificate Programs 23.8
31 pages
Pue Big Data
No ratings yet
Pue Big Data
2 pages
01 SQL - V6
No ratings yet
01 SQL - V6
192 pages
Unit-5 MYSQL Connectivity
No ratings yet
Unit-5 MYSQL Connectivity
25 pages
Current Log
No ratings yet
Current Log
62 pages
Overview (OData Version 2.0) OData - The Best Way To REST
No ratings yet
Overview (OData Version 2.0) OData - The Best Way To REST
7 pages
Exchange Server 2010 Introduction To Supporting Administration
No ratings yet
Exchange Server 2010 Introduction To Supporting Administration
101 pages
Oracle DBA Responsibilities
No ratings yet
Oracle DBA Responsibilities
3 pages
WinCC Flexible SQL Datenbank en
No ratings yet
WinCC Flexible SQL Datenbank en
30 pages
DBMS Architecture
100% (1)
DBMS Architecture
35 pages
SAP IDOC Information
No ratings yet
SAP IDOC Information
23 pages
Neha VM Project
No ratings yet
Neha VM Project
107 pages
Python Program (Journal)
No ratings yet
Python Program (Journal)
67 pages
535 PHY-1
No ratings yet
535 PHY-1
5 pages
Part II Thesis
100% (2)
Part II Thesis
8 pages
Answer Key To Lab Exercises
No ratings yet
Answer Key To Lab Exercises
58 pages
NCP 28
No ratings yet
NCP 28
17 pages
User Research
No ratings yet
User Research
2 pages
BA MCQs
No ratings yet
BA MCQs
2 pages
11gR1/2 Installation and Upgrade Steps
No ratings yet
11gR1/2 Installation and Upgrade Steps
24 pages
4.storage Devices PDF
100% (1)
4.storage Devices PDF
50 pages
Oracle/PLSQL: Data Types: Character Datatypes
No ratings yet
Oracle/PLSQL: Data Types: Character Datatypes
3 pages
Selfstudys Com File
No ratings yet
Selfstudys Com File
5 pages
Markets_Data Loss Prevention
No ratings yet
Markets_Data Loss Prevention
11 pages