0% found this document useful (0 votes)
135 views14 pages

Clustering

This document summarizes and compares different clustering methods for uncertain data using R. It begins by defining clustering and uncertain data. It then discusses clustering algorithms like partitioning, hierarchical, density-based, and grid-based methods. The document focuses on hierarchical clustering in R, providing code to generate dendrograms and comparing single, complete, average, and centroid linkage methods using sample European country data. It aims to evaluate these techniques for clustering uncertain data.

Uploaded by

Nakib Aman Turzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views14 pages

Clustering

This document summarizes and compares different clustering methods for uncertain data using R. It begins by defining clustering and uncertain data. It then discusses clustering algorithms like partitioning, hierarchical, density-based, and grid-based methods. The document focuses on hierarchical clustering in R, providing code to generate dendrograms and comparing single, complete, average, and centroid linkage methods using sample European country data. It aims to evaluate these techniques for clustering uncertain data.

Uploaded by

Nakib Aman Turzo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Relative Comparison of

Different Clustering Methods


for Uncertain Data Using R

Presented By
Nakib Aman
Roll no : 110131
Computer Science and Engineering
Department ,PUST.

Clustering
Clustering is the process of grouping a
set of data objects into multiple groups
or clusters so that objects within a cluster
have high similarity , but are very
dissimilar to objects in other clusters.
Much of the history of cluster analysis is
concerned with developing algorithms
that were not too computer intensive ,
since early computers were not nearly as
powerful as they are today.

Uncertain Data
The notion of data that contains
specific uncertainty is called
uncertain data. Uncertainty in data
naturally arises from a variety of real
world phenomena , such as implicit
randomness in a process of data
generation / acquisition .

Uncertain Database
A uncertain database DB is
defined by a set of uncertain
objects DB = {O1,, O|
DB|} spanning a (potentially
infinite) set of possible worlds W
and a constructive generation
rule G to draw possible worlds
from W in an unbiased way .

R Programming Language
R is highly extensible through the use
of user-submitted packages for specific
functions or specific areas of study.
Due to its S heritage, R has stronger
object-oriented-programming facilities
than most statistical computing
languages.
* R version 3.1.3 (Smooth Sidewalk) has been
released on 2015-03-09.

R Studio
R Studio is an integrated development
environment (IDE) for R. It includes a
console, syntax-highlighting editor that
supports direct code execution, as well
as tools for plotting, history, debugging
and workspace management.
Version0.98.1102 will be used for
demonstartion.

R Cluster library
The R Cluster library provides a
modern alternative to k-means
clustering , known as PAM , which is
an acronym for Partitioning around
Medoids . Cluster package was
last updated to version 2.0.1 on
February 19,2015 .

Overview of Clustering Methods


Method

General
Characteristics

Partitioning
methods

-Find mutually exclusive clusters


of spherical shape.
-Distance based.
-Effective for small-to-medium
size data sets.

Hierarchical
methods

-Clustering is a hierarchical
decomposition.
-May incorporate other
techniques like microclustering .

Density based
methods

Grid based

-Can find arbitrarily shaped


clusters .
-May filter out outliers.
-Clusters are dense regions of
objects in space that are
separated by low-density regions.
-Use a multiresolution grid data

Hierarchical Methods of Data


Clustering
A hierarchical method creates a
hierarchical decomposition of the
given set of data objects .
Hierarchical methods can be distance
based or density or continuity
based . Various extensions of
hierarchical methods consider
clustering in subspaces as well .

Cluster Dendrogram

R code for generating Cluster


Dendrogram
europe = read.csv("G:/Thesis/R codes/europe.csv")
europe
euroclust<-hclust(dist(europe[-1]))
plot(euroclust, labels=europe$Country)
rect.hclust(euroclust, 5)

Sample Data used for Clustering


Europe.csv
R file used for clustering

The data is taken from the CIA World Factbook and


gives some information about 28 european
countries.

Comparison of Different
Hierarchical Techniques
Single

Complete

hclust(dist(europe),meth
od="single")

hclust(dist(europe),meth
od="complete")

Average

Centroid

hclust(dist(europe),meth
od="average")

hclust(dist(europe),meth
od="centroid")

You might also like