Clustering
Clustering
Presented By
Nakib Aman
Roll no : 110131
Computer Science and Engineering
Department ,PUST.
Clustering
Clustering is the process of grouping a
set of data objects into multiple groups
or clusters so that objects within a cluster
have high similarity , but are very
dissimilar to objects in other clusters.
Much of the history of cluster analysis is
concerned with developing algorithms
that were not too computer intensive ,
since early computers were not nearly as
powerful as they are today.
Uncertain Data
The notion of data that contains
specific uncertainty is called
uncertain data. Uncertainty in data
naturally arises from a variety of real
world phenomena , such as implicit
randomness in a process of data
generation / acquisition .
Uncertain Database
A uncertain database DB is
defined by a set of uncertain
objects DB = {O1,, O|
DB|} spanning a (potentially
infinite) set of possible worlds W
and a constructive generation
rule G to draw possible worlds
from W in an unbiased way .
R Programming Language
R is highly extensible through the use
of user-submitted packages for specific
functions or specific areas of study.
Due to its S heritage, R has stronger
object-oriented-programming facilities
than most statistical computing
languages.
* R version 3.1.3 (Smooth Sidewalk) has been
released on 2015-03-09.
R Studio
R Studio is an integrated development
environment (IDE) for R. It includes a
console, syntax-highlighting editor that
supports direct code execution, as well
as tools for plotting, history, debugging
and workspace management.
Version0.98.1102 will be used for
demonstartion.
R Cluster library
The R Cluster library provides a
modern alternative to k-means
clustering , known as PAM , which is
an acronym for Partitioning around
Medoids . Cluster package was
last updated to version 2.0.1 on
February 19,2015 .
General
Characteristics
Partitioning
methods
Hierarchical
methods
-Clustering is a hierarchical
decomposition.
-May incorporate other
techniques like microclustering .
Density based
methods
Grid based
Cluster Dendrogram
Comparison of Different
Hierarchical Techniques
Single
Complete
hclust(dist(europe),meth
od="single")
hclust(dist(europe),meth
od="complete")
Average
Centroid
hclust(dist(europe),meth
od="average")
hclust(dist(europe),meth
od="centroid")