Data Mining Unit-4

Cluster analysis is the process of partitioning data objects into subsets called clusters, where objects in a cluster are similar to each other but dissimilar to those in other clusters. Effective clustering requires scalability, the ability to handle various data types and shapes, robustness to noise, and interpretability. Applications of clustering include business intelligence, image recognition, web search, biology, and security.

Uploaded by

anitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views38 pages

Data Mining Unit-4

Uploaded by

anitha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

You are on page 1/ 38

UNIT-IV

CLUSTERING AND APPLICATIONS

Cluster Analysis:
 Cluster analysis or simply clustering is the process of partitioning a set of data objects (or observations) into subsets.
 Each subset is a cluster, such that objects in a cluster are similar to one another, yet dissimilar to objects in other
clusters. The set of clusters resulting from a cluster analysis can be referred to as a clustering.
 Clustering is also called data segmentation in some applications because clustering partitions large data sets into
groups according to their similarity.
Cluster Analysis Requirements
Scalability: Many clustering algorithms work well on small data sets containing fewer than several hundred data
objects; however, a large database may contain millions or even billions of objects, particularly in Web search scenarios.
Clustering on only a sample of a given large data set may lead to biased results. Therefore, highly scalable clustering
algorithms are needed.
Ability to deal with different types of attributes: Many algorithms are designed to cluster numeric (interval-based)
data. However, applications may require clustering other data types, such as binary, nominal (categorical), and ordinal
data, or mixtures of these data types. Recently, more and more applications need clustering techniques for complex data
types such as graphs, sequences, images, and documents.
Discovery of clusters with arbitrary shape: Many clustering algorithms determine clusters based on Euclidean or
Manhattan distance measures. Algorithms based on such distance measures tend to find spherical clusters with similar
size and density. However, a cluster could be of any shape. Consider sensors, for example, which are often deployed for
environment surveillance. Cluster analysis on sensor readings can detect interesting phenomena. We may want to use
clustering to find the frontier of a running forest fire, which is often not spherical. It is important to develop algorithms
that can detect clusters of arbitrary shape.
Requirements for domain knowledge to determine input parameters: Many clustering algorithms require users
to provide domain knowledge in the form of input parameters such as the desired number of clusters. Consequently, the
clustering results may be sensitive to such parameters. Parameters are often hard to determine, especially for high-
dimensionality data sets and where users have yet to grasp a deep understanding of their data. Requiring the
specification of domain knowledge not only burdens users, but also makes the quality of clustering difficult to control.
Ability to deal with noisy data: Most real-world data sets contain outliers and/or missing, unknown, or erroneous
data. Sensor readings, for example, are often noisy—some readings may be inaccurate due to the sensing mechanisms,
and some readings may be erroneous due to interferences from surrounding transient objects. Clustering algorithms can
be sensitive to such noise and may produce poor-quality clusters. Therefore, we need clustering methods that are robust
to noise.
Incremental clustering and insensitivity to input order: In many applications, incremental updates (representing
newer data) may arrive at any time. Some clustering algorithms cannot incorporate incremental updates into existing
clustering structures and, instead, have to recompute a new clustering from scratch. Clustering algorithms may also be
sensitive to the input data order. That is, given a set of data objects, clustering algorithms may return dramatically
different clustering depending on the order in which the objects are presented. Incremental clustering algorithms and
algorithms that are insensitive to the input order are needed.
Capability of clustering high-dimensionality data: A data set can contain numerous dimensions or attributes. When
clustering documents, for example, each keyword can be regarded as a dimension, and there are often thousands of
keywords. Most clustering algorithms are good at handling low-dimensional data such as data sets involving only two or
three dimensions. Finding clusters of data objects in a high dimensional space is challenging, especially considering that
such data can be very sparse and highly skewed.
Constraint-based clustering: Real-world applications may need to perform clustering under various kinds of
constraints. Suppose that your job is to choose the locations for a given number of new automatic teller machines (ATMs)
in a city. To decide upon this, you may cluster households while considering constraints such as the city’s rivers and
highway networks and the types and number of customers per cluster. A challenging task is to find data groups with
good clustering behavior that satisfy specified constraints.
Interpretability and usability: Users want clustering results to be interpretable, comprehensible, and usable. That is,
clustering may need to be tied in with specific semantic interpretations and applications. It is important to study how an
application goal may influence the selection of clustering features and clustering methods.
Cluster Analysis Applications
1. Business Intelligence
2. Image Recognition
3. Web Search
4. Biology

5. Security
Types of Data in Cluster Analysis
 Data Structures
 Interval-Valued (Numeric) Variables
 Binary Variables
 Categorical Variables
 Ordinal Variables
 Variables of Mixed Types
Basic Clustering Methods
Types of Data in Cluster
Analysis

R Book PDF
100% (4)
R Book PDF
291 pages
Paper-2 Clustering Algorithms in Data Mining A Review
No ratings yet
Paper-2 Clustering Algorithms in Data Mining A Review
7 pages
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 8: by Tan, Steinbach, Kumar
93 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
DMDW R20 Unit 5
No ratings yet
DMDW R20 Unit 5
21 pages
Unit-3 DWDM 7TH Sem Cse
No ratings yet
Unit-3 DWDM 7TH Sem Cse
54 pages
Chapter-8 (Cluster Analysis Basic Concepts and Algorithms)
No ratings yet
Chapter-8 (Cluster Analysis Basic Concepts and Algorithms)
73 pages
Lec 1,2
No ratings yet
Lec 1,2
69 pages
MCQ A
No ratings yet
MCQ A
11,493 pages
d2k Tutorial
No ratings yet
d2k Tutorial
78 pages
Cluster Analysis
No ratings yet
Cluster Analysis
77 pages
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
No ratings yet
Unit-V Cluster Analysis?: Unsupervised Classification Stand-Alone Tool Preprocessing Step
24 pages
Assignment 4
No ratings yet
Assignment 4
40 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Data Clustering Seminar
No ratings yet
Data Clustering Seminar
34 pages
Camintac Essay - Nubbh Kejriwal
No ratings yet
Camintac Essay - Nubbh Kejriwal
4 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
28 pages
Syllabus - PGD - DS - Batch-7 PDF
No ratings yet
Syllabus - PGD - DS - Batch-7 PDF
12 pages
Unit 5
No ratings yet
Unit 5
27 pages
Mit Data Science Program
No ratings yet
Mit Data Science Program
16 pages
Importance of Clustering in Data Mining
No ratings yet
Importance of Clustering in Data Mining
5 pages
Dmbi Unit-4
No ratings yet
Dmbi Unit-4
18 pages
Clustering
No ratings yet
Clustering
57 pages
A06-A Survey of Clustering Techniques
No ratings yet
A06-A Survey of Clustering Techniques
5 pages
Data Mining - UNIT-IV
No ratings yet
Data Mining - UNIT-IV
24 pages
Unit 5 Clustering-2
No ratings yet
Unit 5 Clustering-2
28 pages
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
No ratings yet
The General Considerations and Implementation In: K-Means Clustering Technique: Mathematica
10 pages
Cluster Analysis
No ratings yet
Cluster Analysis
36 pages
UNIT 4 Clustering and Applications
No ratings yet
UNIT 4 Clustering and Applications
5 pages
STA780 Wk9 Cluster Analysis SPSS
No ratings yet
STA780 Wk9 Cluster Analysis SPSS
15 pages
ML Lab Manual - Ex No. 1 To 9
No ratings yet
ML Lab Manual - Ex No. 1 To 9
26 pages
Entertainment Computing: Harits Ar Rosyid, Matt Palmerlee, Ke Chen
No ratings yet
Entertainment Computing: Harits Ar Rosyid, Matt Palmerlee, Ke Chen
9 pages
DS Course Curriculum
No ratings yet
DS Course Curriculum
19 pages
Clustering
No ratings yet
Clustering
5 pages
Socransky and Haffajee 1988 - Microbial Complexes Again
No ratings yet
Socransky and Haffajee 1988 - Microbial Complexes Again
6 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
A Dynamic Affinity Propagation Clustering Algorithm For Cell Outage Detection in Self-Healing Networks
No ratings yet
A Dynamic Affinity Propagation Clustering Algorithm For Cell Outage Detection in Self-Healing Networks
5 pages
Clustering
No ratings yet
Clustering
29 pages
Intro To Machine Learning Nanodegree Program Syllabus
No ratings yet
Intro To Machine Learning Nanodegree Program Syllabus
14 pages
P11 - Software Fault Prediction A Literature Review and Current Trends
No ratings yet
P11 - Software Fault Prediction A Literature Review and Current Trends
11 pages
Prasanna Hebbar @govt First Grade College Honnavar
No ratings yet
Prasanna Hebbar @govt First Grade College Honnavar
11 pages
Cluster Analysis-Unit 4
No ratings yet
Cluster Analysis-Unit 4
7 pages
Tristan Camilleri - SOR0511 - Questions - 2021.06.20
No ratings yet
Tristan Camilleri - SOR0511 - Questions - 2021.06.20
33 pages
17 GM ASAP Data Mining - Clustering
No ratings yet
17 GM ASAP Data Mining - Clustering
107 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Cluster Analysis
No ratings yet
Cluster Analysis
26 pages
Data Mining 5
No ratings yet
Data Mining 5
39 pages
DWDM Lecture Notes U-5
No ratings yet
DWDM Lecture Notes U-5
26 pages
Module 4 ML
No ratings yet
Module 4 ML
11 pages
E-Note 28966 Content Document 20241211091351PM
No ratings yet
E-Note 28966 Content Document 20241211091351PM
69 pages
AIML Mod 5
No ratings yet
AIML Mod 5
39 pages
DWDM FINAL6
No ratings yet
DWDM FINAL6
28 pages
Unit 6
No ratings yet
Unit 6
30 pages
Unit 4 Clustering
No ratings yet
Unit 4 Clustering
18 pages
DM Unit 5
No ratings yet
DM Unit 5
15 pages
Non Numeric Clustering Seminar
No ratings yet
Non Numeric Clustering Seminar
26 pages
Unit 5
No ratings yet
Unit 5
67 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
DM Unit-5 Notes
No ratings yet
DM Unit-5 Notes
16 pages
29501clustering in Data Mining Process
No ratings yet
29501clustering in Data Mining Process
3 pages
DM UNIT-4 Part2
No ratings yet
DM UNIT-4 Part2
18 pages
Unit 2 - Introduction To Cluster Analysis
No ratings yet
Unit 2 - Introduction To Cluster Analysis
53 pages
DM Unit-4 Part1
No ratings yet
DM Unit-4 Part1
21 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
66 pages
DW & DM Unit 4 Notes
No ratings yet
DW & DM Unit 4 Notes
40 pages
Unit-V (Dmwh6em)
No ratings yet
Unit-V (Dmwh6em)
30 pages
WiFi-based Positioning System With K-Means Clustering and Outlier Removal-Evidence From Multiple Datasets
No ratings yet
WiFi-based Positioning System With K-Means Clustering and Outlier Removal-Evidence From Multiple Datasets
9 pages
Clustering
No ratings yet
Clustering
8 pages
DMDW Unit-5
No ratings yet
DMDW Unit-5
21 pages
Chapter 9
100% (1)
Chapter 9
31 pages
Machine Learning MS
No ratings yet
Machine Learning MS
5 pages
Final Report End
No ratings yet
Final Report End
92 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Multi-Level Observation and Understanding of Program Behaviors
No ratings yet
Multi-Level Observation and Understanding of Program Behaviors
34 pages
What Are The Requirements of Clustering in Data Mining
No ratings yet
What Are The Requirements of Clustering in Data Mining
2 pages
DM Module 4
No ratings yet
DM Module 4
17 pages
Module V
No ratings yet
Module V
16 pages
Page - 1
No ratings yet
Page - 1
5 pages
Unit 4
No ratings yet
Unit 4
106 pages
DA Unit II
No ratings yet
DA Unit II
21 pages
Data Analytics Unit 1
No ratings yet
Data Analytics Unit 1
16 pages
Unit 15
No ratings yet
Unit 15
26 pages
Bcs602 ML Mod-5 Notes @vtunetwork
No ratings yet
Bcs602 ML Mod-5 Notes @vtunetwork
17 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Custom Alerts in SAP IBP
No ratings yet
Custom Alerts in SAP IBP
5 pages
Fundamentals of Data Science-1
No ratings yet
Fundamentals of Data Science-1
9 pages

Data Mining Unit-4

Uploaded by

Data Mining Unit-4

Uploaded by

UNIT-IV

CLUSTERING AND APPLICATIONS

You might also like