Assignment No: 2: Aim: Objective
Assignment No: 2: Aim: Objective
Objective: To study,
1. Concept of Clustering.
2. Single-pass clustering Algorithm.
3. Measure of Association.
Theory:
Clustering:-
Sometimes several classifications may naturally be associated with each other, having
many concepts in common. These classifications form a cluster. The clustering may be based
on physical location.
The goal of clustering is to determine the intrinsic grouping in a set of unlabeled data.
But how to decide what constitutes a good clustering? It can be shown that there is no
absolute “best” criterion which would be independent of the final aim of the clustering.
Consequently, it is the user which must supply this criterion, in such a way that the result of
the clustering will suit their needs. For instance, user could be interested in finding
representatives for homogeneous groups (data reduction), in finding “natural clusters” and
describe their unknown properties (“natural” data types), in finding useful and suitable
groupings (“useful” data classes) or in finding unusual data objects (outlier detection).
Clustering Requirements:-
The main requirements that a clustering algorithm should satisfy are:
• Scalability;
• Dealing with different types of attributes;
• Discovering clusters with arbitrary shape;
• Minimal requirements for domain knowledge to determine input parameters;
• Ability to deal with noise and outliers;
• Insensitivity to order of input records;
• High dimensionality;
1
• Interpretability and usability.
Single-pass methods:-
Single-pass Algorithm:-
Measures of Association:-
Association is the similarity between objects characterized by discrete state
attributes. The measure of similarity or association is designed to quantify likeness between
the objects in such a way that an object in a group is more like the other members of the
group that is like any object outside the group then a cluster method enables such a group
structure to be discovered.
Example:-
2
Objects {1, 2, 3, 4, 5, 6}
Threshold: 0.59
Clusters are:-
Conclusion:- Thus, we have implemented the single pass algorithm for clustering.
FAQs:-
3
1. What is clustering?
Ans:-
Sometimes several classifications may naturally be associated with each other, having
many concepts in common. These classifications form a cluster. The clustering may be based
on physical location.
In functional clustering, classifications are centered around functions. In data
clustering, classifications are centered around data, while in object based clustering,
classifications are centered around objects. The object based paradigm uses the concept of
class hierarchies to naturally express clustering.