0% found this document useful (0 votes)

14 views16 pages

Unit-4 Notes

Clustering is the process of partitioning data objects into subsets or clusters based on their similarities. Various clustering methods include partitioning, hierarchical, density-based, and grid-based approaches, each with distinct characteristics and applications. Outliers are data points that deviate significantly from others, and algorithms like K-Means and PAM (K-Medoids) are used for clustering, with hierarchical methods providing a structured approach to grouping data.

Uploaded by

enuguprasanna23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views16 pages

Unit-4 Notes

Uploaded by

enuguprasanna23

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Unit-4

1a) Define Clustering?

Ans: clustering is the process of partitioning a set of data objects (or observations) into
subsets. Each subset is a cluster, such that objects in a cluster are similar to one another.
1b) Explain the applications of cluster analysis,

Ans: Applications Of Cluster Analysis:

 It is widely used in image processing, data analysis, and pattern recognition.
 It helps marketers to find the distinct groups in their customer base and they can
characterize their customer groups by using purchasing patterns.
 It can be used in the field of biology, by deriving animal and plant taxonomies
and identifying genes with the same capabilities.
 It also helps in information discovery by classifying documents on the web.

1c) Explain different Clustering Method.

Ans: Clustering Methods :
i) Partitioning methods:
 Given a set of n objects, a partitioning method constructs k partitions of the data, where
each partition represents a cluster and k ≤ n.
 it divides the data into k groups such that each group must contain at least one
object.each object must belong to exactly one group.
 Most partitioning methods are distance-based. Given k, the number of partitions to
construct, a partitioning method creates an initial partitioning.
 It then uses an iterative relocation technique that attempts to improve the partitioning
by moving objects from one group to another.

ii) Hierarchical method

 A hierarchical method creates a hierarchical decomposition of the given set of data
objects.
 A hierarchical method can be classified as being either agglomerative or divisive.
 The agglomerative approach is also called the bottom-up approach, starts with
each object forming a separate group. It successively merges the objects or groups
close to one another, until all the groups are merged into one.
 The divisive approach is also called the top-down approach, starts with all the
objects in the same cluster. In each successive iteration, a cluster is split into smaller
clusters, until eventually each object is in one cluster.

iii) Density-based method

 This method is based on the density based
 The general idea is to continue growing a given cluster as long as the density (number
of objects or data points) in the “neighborhood” exceeds some threshold.
 For example, for each data point within a given cluster, the neighborhood of a given
radius has to contain at least a minimum number of points.

iv) Grid-based method:

 In this Method the objects together form grid. The object space is quantized into a finite
number of cells that form a grid structure.
 The main advantage of this approach is its fast processing time, which is typically
independent of the number of data objects and dependent only on the number of cells
in each dimension in the quantized space.

2a) Define outlier?

Ans: Outlier is a data object that deviates significantly from the rest of the data objects and
behaves in a different manner.
2b) Describe K-Means Additional issues?
Ans: K-Means Additional issues:
 K-means algorithm is sensitive to outliers because an object with an “ extremely large
value” may substantially distort the distribution.
 This effect is particularly exacerbated due to the use of the square-error function.

2c) Illustrate K-mean algorithm with an example.

Ans: K- Mean Algorithm:
Input:
K: The number of clusters in which the dataset has to be divided
D: A dataset containing N number of objects
Output:
A dataset of K clusters
Method:
1. Randomly assign K objects from the dataset(D) as cluster centres(C)
2. (Re) Assign each object to which object is most similar based upon mean values.
3. Update Cluster means, i.e., Recalculate the mean of each cluster with the updated
values.
4. Repeat Step 3 until no change occurs.
Example: Suppose we want to group the visitors to a website using just their age as
follows: 16, 16, 17, 20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66
Initial Cluster:
K=2
Centroid(C1) = 16
Centroid(C2) = 22
Iteration-1:
[16, 16, 17]=mean=16.33
C1 = 16.33
[20, 20, 21, 21, 22, 23, 29, 36, 41, 42, 43, 44, 45, 61, 62, 66]=mean=37.25
C2 = 37.25
Iteration-2:
[16, 16, 17, 20, 20, 21, 21, 22, 23]=mean=19.55
C1 = 19.55
[29, 36, 41, 42, 43, 44, 45, 61, 62, 66]=mean=46.90
C2 = 46.90
Iteration-3:
[16, 16, 17, 20, 20, 21, 21, 22, 23, 29]=mean=20.50
C1 = 20.50
[36, 41, 42, 43, 44, 45, 61, 62, 66]=mean=48.89
C2 = 48.89
Iteration-4:
[16, 16, 17, 20, 20, 21, 21, 22, 23, 29]=mean=20.50
C1 = 20.50
[36, 41, 42, 43, 44, 45, 61, 62, 66]=mean=48.89
C2 = 48.89
● No change Between Iteration 3 and 4, so we stop.
Therefore we get the clusters (16-29) and (36-66) as 2 clusters we get using K Mean
Algorithm

3a) Explain the draw backs of single linkage clustering?

Ans: The draw backs of single linkage clustering:
Single linkage is Often suffers from chaining, that is, we only need a single pair of
points to be close to merge two clusters. Therefore, clusters can be too spread out and
not compact enough.

3b) List out all partitioning methods for clustering data.

Ans: There are many algorithms that come under partitioning method some of the
popular ones are K-Mean, PAM(K-Medoids), CLARA algorithm (Clustering Large
Applications) etc.
3c) Give a brief note on PAM(K-Medoids) Algorithm with example.
Ans: PAM(K-Medoids) Algorithm:

1. Initialize: select k random points out of the n data points as the medoids.
2. Associate each data point to the closest medoid by using any common distance
metric methods.
3. While the cost decreases: For each medoid m, for each data o point which is not
a medoid:
 Swap m and o, associate each data point to the closest medoid, and
recompute the cost.
 If the total cost is more than that in the previous step, undo the swap.

Example: Step 1: Let the randomly selected 2 medoids, so select k = 2,

and let C1 -(4, 5) and C2 -(8, 5) are the two medoids.

Step 2: Calculating cost. The dissimilarity of each non-medoid point

with the medoids is calculated and tabulated

Distance = |X1-X2| + |Y1-Y2|

X Y Dissimilarity from Dissimilarity Assign
C1 from C2

0 8 7 6 2 C2

1 3 7 3 7 C1
2 4 9 4 8 C1

3 9 6 6 2 C2

4 8 5 - - -

5 5 8 4 6 C1

6 7 3 5 3 C2

7 8 4 5 1 C2

8 7 5 3 1 C2

9 4 5 - - -

● Each point is assigned to the cluster of that medoid whose dissimilarity is

less.
● Points 1, 2, and 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
The Cost = (3 + 4 + 4) + (3 + 1 + 1 + 2 + 2) = 20
Step 3:
● Randomly select one non-medoid point and recalculate the cost.
● Let the randomly selected point be (8, 4).
● The dissimilarity of each non-medoid point with the medoids – C1 (4, 5)
and C2 (8, 4) is calculated and tabulated.

●
● points 1, 2, and 5 go to cluster C1 and 0, 3, 6, 7, 8 go to cluster C2.
● The New cost = (3 + 4 + 4) + (2 + 2 + 1 + 3 + 3) = 22
● Swap Cost = New Cost – Previous Cost = 22 – 20 and 2 >0
● As the swap cost is not less than zero, we undo the swap.
● Hence (4, 5) and (8, 5) are the final medoids.

4a) Discuss the two approaches to improve quality of hierarchical clustering

Ans: The two approaches that are used to improve the quality of hierarchical clustering

 Perform careful analysis of object linkages at each hierarchical partitioning.

 Integrate hierarchical agglomeration by first using a hierarchical agglomerative
algorithm to group objects into micro-clusters, and then performing macro-clustering
on the micro-clusters.

4b) Explain Hierarchical clustering.

Ans: Hierarchical clustering:
 A hierarchical method creates a hierarchical decomposition of the given set of data
objects.
 A hierarchical method can be classified as being either agglomerative or divisive.
 The agglomerative approach is also called the bottom-up approach, starts with
each object forming a separate group. It successively merges the objects or groups
close to one another, until all the groups are merged into one.
 The divisive approach is also called the top-down approach, starts with all the
objects in the same cluster. In each successive iteration, a cluster is split into smaller
clusters, until eventually each object is in one cluster.

4c) Discuss hierarchical methods for clustering and contrast agglomerative and divisive
approaches.
Ans: : Hierarchical clustering:
 A hierarchical method creates a hierarchical decomposition of the given set of data
objects.
 A hierarchical method can be classified as being either agglomerative or divisive

i) Agglomerative hierarchical clustering(AGNES):

The agglomerative approach is also called the bottom-up approach, starts with
each object forming a separate group. It successively merges the objects or
groups close to one another, until all the groups are merged into one.

Algorithm for Agglomerative Hierarchical Clustering is:

 Calculate the similarity of one cluster with all the other clusters (calculate
proximity matrix)
 Consider every data point as a individual cluster
 Merge the clusters which are highly similar or close to each other.
 Recalculate the proximity matrix for each cluster
 Repeat Step 3 and 4 until only a single cluster remains.

Let’s say we have five data points a, b, c, d, e.

• A tree structure called a dendrogram is commonly used to represent the process of
hierarchical clustering. It shows how objects are grouped together step by step.

• A dendrogram for the five objects presented in figure, where l = 0 shows the five
objects as singleton clusters at level 0.
• At l = 1, cluster a and b are grouped together to form the new cluster [ab].
• At l = 2, cluster d and e are grouped together to form the new cluster [de].
• At l = 3, cluster de and c are grouped together to form the new cluster [cde].
• At l = 4, cluster ab and cde are grouped together to form the new cluster [abcde].

ii) Agglomerative hierarchical clustering

The divisive approach is also called the top-down approach, starts with all the
objects in the same cluster. In each successive iteration, a cluster is split into
smaller clusters, until eventually each object is in one cluster.

Let’s say we have five data points a, b, c, d, e.

4d) Discuss the merits and demerits of hierarchical approaches for clustering
Ans: Merits of hierarchical approaches for clustering:
 It is simple to implement and gives the best output in some cases.
 It is easy and results in a hierarchy, a structure that contains more information.
 It does not need us to pre-specify the number of clusters.

Demerits of hierarchical approaches for clustering:

 It breaks the large clusters.
 It is Difficult to handle different sized clusters and convex shapes.
 It is sensitive to noise and outliers.
 The algorithm can never be changed or deleted once it was done previously.

5a) Define categorical variable?

Ans: A categorical variable is a generalization of the binary variable in that it can take on more
than two states. For example, map_color is a categorical variable that may have, say, five states:
red, yellow, green, pink, and blue.

5b) Compare agglomerative and divisive methods.

Ans: The agglomerative approach is also called the bottom-up approach, starts with each
object forming a separate group. It successively merges the objects or groups close to one
another, until all the groups are merged into one.
The divisive approach is also called the top-down approach, starts with all the objects in the
same cluster. In each successive iteration, a cluster is split into smaller clusters, until eventually
each object is in one cluster.

5c) Explain about Density Based(DBSCAN) method is used for clustering?

Ans: Density Based(DBSCAN) method:
 DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a density
based clustering algorithm.
 The algorithm grows regions with sufficiently high density into clusters
 The basic ideas of density-based clustering involve a number of new definitions. They
are
 The neighborhood within a radius ε of a given object is called the ε-neighborhood of
the object.
 if the ε-neighborhood of an object contains at least a minimum number of
objects(MinPts) then the object is called a core object.
 Given a set of objects, D, we say that an object p is directly density-reachable from
object q if p is within the ε-neighborhood of q, and q is a core object.

DBSCAN method:
• DBSCAN searches for clusters by checking the ε-neighborhood of each point in the
database.
• If the ε-neighborhood of a point p contains more than MinPts, a new cluster with p as
a core object is created.
• DBSCAN then iteratively collects directly density-reachable objects from these core
objects, which may involve the merge of a few density-reachable clusters.
• The process terminates when no new point can be added to any cluster.

Example:

for a given ε represented by the radius of the circles, and, say, let MinPts = 3. Based on
the above definitions:
• Of the labeled points, m, p, o, are core objects because each is in an ε-neighborhood
containing at least three points.
• q is directly density-reachable from m. m is directly density-reachable from p and vice
versa.
• q is (indirectly) density-reachable from p because q is directly density-reachable from
m and m is directly density-reachable from p. However, p is not density-reachable from
q because q is not a core object. Similarly, r and s are density-reachable from o.
• o, r, and s are all density-connected
• A density-based cluster is a set of density-connected objects that is maximal with
respect to density-reachability. Every object not contained in any cluster is considered
to be noise.

5d) Discuss about the drawbacks of k-means algorithm? How can we modify the
algorithm to diminish that problem?
Ans: The drawbacks of k-means algorithm:

 It is difficult to choose the number of clusters, 𝑘

 It cannot be used with arbitrary distances
 It sensitive to scaling – requires careful preprocessing
 It does not produce the same result every time
 It is sensitive to outliers (squared errors emphasize outliers)
 It cluster sizes can be quite unbalanced (e.g., one-element outlier clusters)

modification of the algorithm to diminish that Initialization

 K-means clustering algorithm can be significantly improved by using a better

initialization technique, and by repeating (re-starting) the algorithm.
 When the data has overlapping clusters, k-means can improve the results of the
initialization technique.

6a) Define interval-scaled variables?

Ans: Interval-Scaled Variables: Interval-scaled variables are continuous measurements of a

roughly linear scale. Typical examples include weight and height, latitude and longitude
coordinates and weather temperature

6b) Differentiate between clustering and classification

Ans:

Classification Clustering

In Classification, where a specific label is In Clustering ,where grouping is done on

provided to the machine to classify new similarities basis.
observations. Here the machine needs proper
testing and training for the label verification.

Supervised learning approach. Unsupervised learning approach.

It uses a training dataset. It does not use a training dataset.

It uses algorithms to categorize the new data as It uses statistical concepts in which the data set is
per the observations of the training set. divided into subsets with the same features.

It is more complex as compared to clustering. It is less complex as compared to classification

6c) Explain about the Grid–based methods.

 Ans: In this Method the objects together form grid. The object space is quantized into
a finite number of cells that form a grid structure.
 The main advantage of this approach is its fast processing time, which is typically
independent of the number of data objects and dependent only on the number of cells
in each dimension in the quantized space.

Grid-Based Clustering algorithm consists of the following five basic steps

 Creating the grid structure, i.e., partitioning the data space into a finite number of
cells.
 Calculating the cell density for each cell.
 Sorting of the cells according to their densities.
 Identifying cluster centers.
 Traversal of neighbor cells.

Grid based Methods:

 STING (a STatistical INformation Grid approach)

 CLIQUE (Clustering In Quest)
 WaveCluster

1) STING (a STatistical INformation Grid approach)

● To cluster spatial databases, can be used to facilitate several kinds of spatial queries.
● The spatial area is divided into rectangle cells, which are represented by a hierarchical
structure.
● Let the root of the hierarchy be at level 1, its children at level 2, etc.
● The number of layers could be obtained by changing the number of cells that form a
higher-level cell.
● A cell in level i corresponds to the union of the areas of its children in level i + 1.
● In the algorithm STING, each cell has 4 children and each child corresponds to one
quadrant of the parent cell.
2) CLIQUE (Clustering In Quest):

● The CLIQUE algorithm first divides the data space into grids.
● It is done by dividing each dimension into equal intervals called units.
● After that, it identifies dense units. A unit is dense if the data points in this are exceeding
the threshold value.
● Once the algorithm finds dense cells along one dimension, the algorithm tries to find
dense cells along two dimensions, and it works until all dense cells along the entire
dimension are found.
● After finding all dense cells in all dimensions, the algorithm proceeds to find the largest
set (“cluster”) of connected dense cells.
3) WaveCluster:
 A wavelet transform is a signal processing approach that decomposes a signal into
multiple frequency subbands.
 The wavelet model can be used to n-dimensional signals by using a one-
dimensional wavelet transform n times.
 In applying a wavelet transform, data are changed to preserve the relative distance
among objects at different levels of resolution.
 This enables the natural clusters in the data to become more recognizable
 Clusters can be recognized by searching for dense areas in the new domain.

Sample of two dimensional feature space

6d) Compare the performance of various outlier detection approaches
Ans: Outlier is a data object that deviates significantly from the rest of the data objects and behaves in
a different manner.

The analysis of outlier data is referred to as outlier analysis or outlier mining

Outliers are of three types
1. Global (or Point) Outliers
2. Collective Outliers
3. Contextual (or Conditional) Outliers

1)Global Outliers:
● A data point is considered a global outlier if its value is far outside the entirety of the
data set in which it is found
● A global outlier is a measured sample point that has a very high or a very low value
relative to all the values in a dataset.
● For example, if 9 out of 10 points have values between 20 and 30, but the 10th point
has a value of 85, the 10th point may be a global outlier.
● Example:

2)Contextual outliers( conditional outlier)

● If individual data point is different in a specific context or condition (but not otherwise),
then it is termed as a contextual outlier. Attributes of data objects should be divided into
two groups:
1. Contextual attributes: defines the context, e.g., time & location
2. Behavioral attributes: characteristics of the object, used in outlier evaluation, e.g.,
temperature
● Contextual outliers are basically hard to spot if there was no background information.
If you had no idea that the values were temperatures in summer, it may be considered
a valid data point.
Example:

3)Collective Outliers:

● If a collection of data points is completely different with respect to the entire data set,
it is termed as a collective outlier.
● A subset of data points in a data set is said to be different if these values as a collection
deviate remarkably from the entire data set,However the values of the each data points
are not different in either a contextual or global sense.
Example:

Unit 4
No ratings yet
Unit 4
29 pages
K Medoids
No ratings yet
K Medoids
101 pages
Clustering
No ratings yet
Clustering
28 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Ai Fundamentals Midterm Quizzes Source
No ratings yet
Ai Fundamentals Midterm Quizzes Source
26 pages
Machine Learning and Its Applications 1st Edition Peter Wlodarczak 2024 Scribd Download
100% (2)
Machine Learning and Its Applications 1st Edition Peter Wlodarczak 2024 Scribd Download
55 pages
MLT Quantum
No ratings yet
MLT Quantum
163 pages
Dmaclat4 Merged
No ratings yet
Dmaclat4 Merged
46 pages
Nhóm 4 KHDL
No ratings yet
Nhóm 4 KHDL
59 pages
ML Passing Package - 1
No ratings yet
ML Passing Package - 1
43 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Unit VII
No ratings yet
Unit VII
30 pages
13-20+JATM Ika+ (Final)
No ratings yet
13-20+JATM Ika+ (Final)
8 pages
Unit 4-1
No ratings yet
Unit 4-1
25 pages
Lecture24 s12
No ratings yet
Lecture24 s12
24 pages
DWMModule 4
No ratings yet
DWMModule 4
31 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Dmbi Iat-2 Imp Ques Soln
No ratings yet
Dmbi Iat-2 Imp Ques Soln
43 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Cluster Analysis: Basic Concepts and Methods: 10.1 Exercises
No ratings yet
Cluster Analysis: Basic Concepts and Methods: 10.1 Exercises
16 pages
Grouping
No ratings yet
Grouping
98 pages
DM Unit Iv
No ratings yet
DM Unit Iv
45 pages
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
No ratings yet
Cluster Analysis: Basic Concepts Partitioning Methods Hierarchical Methods Density-Based Methods Grid-Based Methods Evaluation of Clustering
38 pages
L08 Hierachical Agglomerative Clustering
No ratings yet
L08 Hierachical Agglomerative Clustering
41 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering in Data Mining
No ratings yet
Clustering in Data Mining
14 pages
Unit Iv
No ratings yet
Unit Iv
14 pages
DM Clustering UNIT4
No ratings yet
DM Clustering UNIT4
36 pages
Data Analytics and Model Evaluation
No ratings yet
Data Analytics and Model Evaluation
55 pages
Techniques of Cluster Analysis: A Seminar On
No ratings yet
Techniques of Cluster Analysis: A Seminar On
25 pages
DWM Unit-5 Sem Ans
No ratings yet
DWM Unit-5 Sem Ans
8 pages
Clustering
No ratings yet
Clustering
45 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering
No ratings yet
Clustering
104 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
DMW Unit-V
No ratings yet
DMW Unit-V
47 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Chapter8-Basic Cluster Analysis2016
No ratings yet
Chapter8-Basic Cluster Analysis2016
143 pages
Unit-IV Cluster Outlier Analysis
No ratings yet
Unit-IV Cluster Outlier Analysis
21 pages
Clustering
No ratings yet
Clustering
25 pages
Data Mining Notes UNIT IV
No ratings yet
Data Mining Notes UNIT IV
19 pages
Unit 5 - Cluster Analysis
No ratings yet
Unit 5 - Cluster Analysis
14 pages
Clustering in Python
No ratings yet
Clustering in Python
31 pages
U-5 Iml
No ratings yet
U-5 Iml
20 pages
Data Mining Unit-Iv
No ratings yet
Data Mining Unit-Iv
34 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
Ds Econtent
No ratings yet
Ds Econtent
8 pages
How To Pass Sem 5 - Comps
No ratings yet
How To Pass Sem 5 - Comps
11 pages
DWDM Unit-5
No ratings yet
DWDM Unit-5
52 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Unit-5 DM
No ratings yet
Unit-5 DM
11 pages
Unit 3 Data
No ratings yet
Unit 3 Data
37 pages
Unit 4 Descriptive Modeling
No ratings yet
Unit 4 Descriptive Modeling
18 pages
DOC-20231118-WA0008new Unit 5
No ratings yet
DOC-20231118-WA0008new Unit 5
15 pages
Unit - 4 DM
No ratings yet
Unit - 4 DM
24 pages
Fuzzy Clustering
No ratings yet
Fuzzy Clustering
6 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
ML Unit 4 Notes - NJ
No ratings yet
ML Unit 4 Notes - NJ
15 pages
03 Hierarchical Clustering
100% (1)
03 Hierarchical Clustering
15 pages
CH-6 DM Clustering
No ratings yet
CH-6 DM Clustering
28 pages
Lec 06 Clustering
No ratings yet
Lec 06 Clustering
44 pages
Unit 2 DMW
No ratings yet
Unit 2 DMW
26 pages
AI Fundamentals Midterm Exam - Attempt Review
No ratings yet
AI Fundamentals Midterm Exam - Attempt Review
17 pages
2 - Review Article - Introduction To Multivariate Analysis
No ratings yet
2 - Review Article - Introduction To Multivariate Analysis
8 pages
Chapter 5
No ratings yet
Chapter 5
43 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Fds Unit03
No ratings yet
Fds Unit03
11 pages
Cluto Clusterring Manual
No ratings yet
Cluto Clusterring Manual
71 pages
1 s2.0 S0378779623002766 Main
No ratings yet
1 s2.0 S0378779623002766 Main
10 pages
Unit IV Cluster Analysis
No ratings yet
Unit IV Cluster Analysis
7 pages
Clustering
No ratings yet
Clustering
7 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Iv Unit DM
No ratings yet
Iv Unit DM
26 pages
4 Clustering
No ratings yet
4 Clustering
9 pages
Hierarchical and Partitional Clustering
No ratings yet
Hierarchical and Partitional Clustering
3 pages
Fundamentals of Data Science Unit 3
No ratings yet
Fundamentals of Data Science Unit 3
15 pages
Social Network Analysis Unit-3
No ratings yet
Social Network Analysis Unit-3
28 pages
Clustering L7
No ratings yet
Clustering L7
7 pages
Paper - Hierarchical Cluster
No ratings yet
Paper - Hierarchical Cluster
13 pages
Multidendrograms: Variable-Group Agglomerative Hierarchical Clusterings
No ratings yet
Multidendrograms: Variable-Group Agglomerative Hierarchical Clusterings
22 pages
Freezo
No ratings yet
Freezo
49 pages
Cluster Is A Group of Objects That Belongs To The Same Class
No ratings yet
Cluster Is A Group of Objects That Belongs To The Same Class
12 pages
Forward Hand Spring
No ratings yet
Forward Hand Spring
18 pages
Data Mining-Unit IV
No ratings yet
Data Mining-Unit IV
15 pages
Applied Data Analysis (With SPSS)
No ratings yet
Applied Data Analysis (With SPSS)
19 pages
Unit 4
No ratings yet
Unit 4
4 pages
Cluster Evaluation Techniques: Atds Assignment
No ratings yet
Cluster Evaluation Techniques: Atds Assignment
4 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet

Unit-4 Notes

Uploaded by

Unit-4 Notes

Uploaded by

Unit-4

1a) Define Clustering?

Ans: Applications Of Cluster Analysis:

1c) Explain different Clustering Method.

ii) Hierarchical method

iii) Density-based method

iv) Grid-based method:

2a) Define outlier?

2c) Illustrate K-mean algorithm with an example.

3a) Explain the draw backs of single linkage clustering?

3b) List out all partitioning methods for clustering data.

Example: Step 1: Let the randomly selected 2 medoids, so select k = 2,

Step 2: Calculating cost. The dissimilarity of each non-medoid point

Distance = |X1-X2| + |Y1-Y2|

● Each point is assigned to the cluster of that medoid whose dissimilarity is

4a) Discuss the two approaches to improve quality of hierarchical clustering

 Perform careful analysis of object linkages at each hierarchical partitioning.

4b) Explain Hierarchical clustering.

i) Agglomerative hierarchical clustering(AGNES):

Algorithm for Agglomerative Hierarchical Clustering is:

Let’s say we have five data points a, b, c, d, e.

ii) Agglomerative hierarchical clustering

Let’s say we have five data points a, b, c, d, e.

Demerits of hierarchical approaches for clustering:

5a) Define categorical variable?

5b) Compare agglomerative and divisive methods.

5c) Explain about Density Based(DBSCAN) method is used for clustering?

 It is difficult to choose the number of clusters, 𝑘

modification of the algorithm to diminish that Initialization

 K-means clustering algorithm can be significantly improved by using a better

6a) Define interval-scaled variables?

Ans: Interval-Scaled Variables: Interval-scaled variables are continuous measurements of a

6b) Differentiate between clustering and classification

In Classification, where a specific label is In Clustering ,where grouping is done on

Supervised learning approach. Unsupervised learning approach.

It uses a training dataset. It does not use a training dataset.

It is more complex as compared to clustering. It is less complex as compared to classification

6c) Explain about the Grid–based methods.

Grid-Based Clustering algorithm consists of the following five basic steps

Grid based Methods:

 STING (a STatistical INformation Grid approach)

1) STING (a STatistical INformation Grid approach)

Sample of two dimensional feature space

The analysis of outlier data is referred to as outlier analysis or outlier mining

2)Contextual outliers( conditional outlier)

You might also like