0% found this document useful (0 votes)
32 views67 pages

DM - Topic Four - Part III (Autosaved)

The document discusses different data mining techniques including cluster analysis. Cluster analysis involves grouping a set of data objects into clusters where objects in the same cluster are more similar to each other than objects in other clusters. The document provides examples of applications of cluster analysis and discusses factors that determine the quality of clustering results.

Uploaded by

arse atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views67 pages

DM - Topic Four - Part III (Autosaved)

The document discusses different data mining techniques including cluster analysis. Cluster analysis involves grouping a set of data objects into clusters where objects in the same cluster are more similar to each other than objects in other clusters. The document provides examples of applications of cluster analysis and discusses factors that determine the quality of clustering results.

Uploaded by

arse atl
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 67

Data Mining

Topic Four-part III


Data mining techniques
Eyob N. (PhD)
Topics
 Fundamental concepts and the need for business intelligence,
data mining and its flavors , big data analysis
 BI/DM/BDA applications, DA models and frameworks
 Data and data warehousing
 Data mining techniques ;Association rule mining, Classification
and Cluster analysis
 Including web/text, opinion mining, Big data and BI technologies,
applications, and case studies
 Current trends in (big) data analytics and BI
What is Cluster Analysis?
 Cluster: a collection of data objects
 Similar to one another within the same cluster
 Dissimilar to the objects in other clusters
 Cluster analysis
 Grouping a set of data objects into clusters
 Clustering is unsupervised classification: no predefined classes
 Typical applications
 As a stand-alone tool to get insight into data distribution
 As a preprocessing step for other algorithms
Example: clustering
 The example below demonstrates the clustering of
padlocks of same kind. There are a total of 10 padlocks
which various in color, size, shape, etc.

 How many possible clusters of padlocks can be


identified?
 There are three different kind of padlocks; which can be
grouped into three different clusters.
 The padlocks of same kind are clustered into a group as shown
below:
Clustering

 Given a set of data points, each having a set of attributes,


and a similarity measure among them, find clusters such that
 Data points in one cluster are more similar to one another.
 Data points in separate clusters are less similar to one
another.
 Similarity/distance Measures:
 Euclidean Distance if attributes are continuous.
 Other Problem-specific Measures.

8
Clustering: Application 1
 Market Segmentation:
 Goal: subdivide a market into distinct subsets of customers
where any subset may conceivably be selected as a market
target to be reached with a distinct marketing mix.
 Approach:
 Collect different attributes of customers based on their
geographical and lifestyle related information.
 Find clusters of similar customers.
 Measure the clustering quality by observing buying
patterns of customers in same cluster vs. those from
different clusters.

9
Clustering: Application 2
 Document Clustering:
 Goal: To find groups of documents that are similar to
each other based on the important terms appearing in
them.
 Approach: To identify frequently occurring terms in each
document , form a similarity measure based on the
frequencies of different terms. Use it to cluster.

10
Clustering: Application

 outlier detection
 Clustering can also be used for outlier detection,
where outliers (values that are “far away” from
any cluster) may be more interesting than
common cases.
 Applications of outlier detection include the
detection of credit card fraud
What Is Good Clustering?
Quality
• A good clustering method will produce high quality clusters
with
 high intra-class similarity
 low inter-class similarity
• The quality of a clustering result depends on both the
similarity measure used by the method and its
implementation
• Key requirement of clustering: Need a good measure of
similarity between instances

• Implementation-The quality of a clustering method is


also measured by its ability to discover some or all of
the hidden patterns in the given datasets
Data formats in Cluster Analysis
Types of Data in Cluster Analysis
 Two formats: data and dissimilarity matrix
 Data matrix (or object-by-variable structure)  x11 ... x
1f
... x 
1p 

 Rows and columns are  ... ... ... ... ... 
x ... x ... x 
different objects(two modes)  i1 if ip 
 ... ... ... ... ... 
 Xij shows value of the ith  
object on the jth attribute  xn1 ... x
nf
... x 
np 

 Dissimilarity matrix (or object-by-object structure)  0 


 Rows and columns are  d(2,1) 0 
 
similar objects(one mode)  d(3,1) d ( 3,2) 0 
 
 D(I,j) shows the distance between the  : : : 
ith object and the jth object d ( n,1) d ( n,2) ... ... 0
Type of data in clustering analysis
 Data types of variables are different

 The difference need proper distance computation logic for cluster


analysis

 Some of the common types of data we may have are:

 Interval-scaled variables

 Binary variables

 Nominal, and ordinal

 mixed types:
Interval-valued variables- and distance functions
 These are values of variables of an object which are characterized
by its continuous nature of the measurement such as height,
weight, age

 As the measurement unit affect cluster distance, we need


preprocessing that avoid the effect of unit of measurement

 This is called standardization


 To standardize data you may calculate the standardized measurement
(z-score or min-max normalization)
Similarity and Dissimilarity Between Objects
• Each clustering problem is based on some kind of “distance” or
“nearness measurement” between data points.
 Distances are normally used to measure the similarity or dissimilarity between
two data objects
 Some popular ones include: Minkowski distance:

d (i, j)  q (| x  x |q  | x  x |q ... | x  x |q )
i1 j1 i2 j 2 ip jp

where i = (xi1, xi2, …, xip) and j = (xj1, xj2, …, xjp) are two p-dimensional data
objects, and q is a positive integer
 If q = 1, d is Manhattan distance
d (i, j) | x  x |  | x  x | ... | x  x |
i1 j1 i2 j 2 ip jp
Similarity and Dissimilarity Between Objects (Cont.)

 If q = 2, d is Euclidean distance:

d (i, j)  (| x  x |2  | x  x |2 ... | x  x |2 )
i1 j1 i2 j2 ip jp

 Basic Properties
 d(i,j) 0
 d(i,i) =0
 d(i,j) = d(j,i)
Cosine similarity
 Measures similarity between objects say d1 and q or d2
as
 

n
d j q w w
sim(d j , q)     i 1 i , j i ,q

i1 w i1 i,q


n n
dj q 2
i, j w 2

 The denominator involves the lengths of the vectors


 So the cosine measure is also known as the normalized inner product



n
Length d j  i 1
2
w
i, j
Example : Computing Cosine Similarity
• Let say we have query vector
• Q = (0.4, 0.8); and also document
• D1 = (0.2, 0.7).
• Compute their similarity using cosine?

(0.4 * 0.2)  (0.8 * 0.7)


sim(Q, D1 ) 
[(0.4) 2  (0.8) 2 ] *[(0.2) 2  (0.7) 2 ]
0.64
  0.98
0.42
Binary Variables
 A binary variable is a variable which has only two possible values (1
or 0, yes or no, etc)
 For example smoker, educated, Ethiopian, IsFemale etc

 If all attributes of the objects are binary valued, we can construct


dissimilarity matrix from the given binary data

 If all the binary valued attributes have the same weight, we can
construct a 2-by-2 contingency table for any two objects I and J as
shown bellow
Binary Variables
Object j
 A contingency table for binary data 1 0 sum
Where Object i
1 a b a b
0 c d cd
 a is the number attributes with value 1 in both objects
sum a  c b  d p
 b is the number attributes with value 1 in object I, 0 in object j.

 c is the number attributes with value 0 in object I, 1 in object j.

 d is the number attributes with value 0 in both objects

 a+ b is the number attributes with value 1 in object I

 c+ d is the number attributes with value 0 in object I

 a+ c is the number attributes with value 1 in object J

 b+d is the number attributes with value 0 in object J

 P = a + b + c + d is the total number of variables


Binary Variables – distance functions

 Hence, distance between the two object can be measured as follows


 Simple matching coefficient for binary valued attributes in which the
two values are equally relevant (Symmetric)
 For example sex as Female or male d (i, j)  bc
a bc  d

 Jaccard coefficient: the two values are not equally important for
example smoker no(=1) more relevant than smoker yes (=0)
(asymmetric):

d (i, j)  bc
a bc
Dissimilarity between Binary Variables
 Example
Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack M Y N P N N N
Mary F Y N P N P N
Jim M Y P N N N N

M  Male (coded as 0) F Female (coded as 1)


 Y  Yes (coded as 0) N  No (coded as 1)
 P  Positive (Undesirable) (coded as 0) N  Negative (desirable)
(coded as 1)
 gender is a symmetric attribute
 the remaining attributes are asymmetric binary
Dissimilarity between Binary Variables

 Contingency table between Jack and Mary


Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4
Jack 0 0 1 0 1 1 1
Mary 1 0 1 0 1 0 1

Mary
1 0 sum
1 3 1 4

Jack 0 1 2 3

sum 4 3 7
Dissimilarity between Binary Variables

 Contingency table between Jack and Jim

Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4


Jack 0 0 1 0 1 1 1
Jim 0 0 0 1 1 1 1

Jim

1 0 sum
1 3 1 4
Jack
0 1 2 3

sum 4 3 7
Dissimilarity between Binary Variables

 Contingency table between Jim and Mary

Name Gender Fever Cough Test-1 Test-2 Test-3 Test-4


Mary 1 0 1 0 1 0 1
Jim 0 0 0 1 1 1 1

Jim

1 0 sum
1 2 2 4

0 2 1 3

sum 4 3 7
Dissimilarity between Binary Variables

Jim
 Contingency table between any
1 0 sum
two objects
1 3 1 4
Jack 0 1 2 3

sum 4 3 7

Mary Jim
1 0 sum 1 0 sum
1 3 1 4 1 2 2 4

Jack 0 1 2 3 0 2 1 3

sum 4 3 7 sum 4 3 7
Dissimilarity between Binary Variables

11
d ( jack , mary )   0.4
3 11
11
d ( jack , jim)   0.4
3 11
22
d ( jim, mary )   0.66
222
Nominal Variables
 A generalization of the binary variable in that it can take
more than 2 states, e.g., red, yellow, blue, green
 Method 1: Simple matching
 m: # of matches, p: total # of variables

d (i, j)  p 
p
m

 Method 2: use a large number of binary variables


 creating a new binary variable for each of the M nominal states
Variables of Mixed Types
 A database may contain different types of variables
 symmetric binary, asymmetric binary, nominal,
ordinal, interval.
 One may use
a weighted formula to combine their effects.
 Or preprocess the data so that it fits to the
techniques requirement
Major Clustering approaches and
Algorithms
Major Clustering Approaches
 Partitioning clustering approach:
 Construct various partitions and then evaluate them by some
criterion, e.g., minimizing the sum of square errors
 Typical methods:
 distance-based: K-means clustering
 model-based: expectation maximization (EM) clustering.
 Hierarchical clustering approach:
 Create a hierarchical decomposition of the set of data (or
objects) using some criterion
 Typical methods:
 agglomerative Vs divisive
 single link Vs complete link
Partitioning clustering approach
 Partitioning method: Construct a partition of a database D of n
objects into a set of k clusters; such that, sum of squared
distance is minimum
 Given k, find a partition of k clusters that optimizes the chosen
partitioning criterion
 Heuristic methods: k-means and k-medoids algorithms
 k-means:
 Each cluster is represented by the center of the cluster
 k-medoids or PAM (Partition around medoids):
 Each cluster is represented by one of the objects in the
cluster
K-means Clustering
 Most common clustering methods and can be tailored
 Partitional clustering approach
 Each cluster is associated with a centroid (center point)
 Each point is assigned to the cluster with the closest centroid
 Number of clusters, K, must be specified
 The basic algorithm is simple
The K-Means Clustering Method
• Algorithm:
• Select K cluster points as initial centroids (the initial centroids
are selected randomly)
 Given k, the k-means algorithm is implemented as follows:
• Repeat
 Partition objects into k nonempty subsets
 Recompute the centroids of each K clusters of the
current partition (the centroid is the center, i.e., mean
point, of the cluster)
 Assign each object to the cluster with the nearest seed
point
• Until the centroid don‟t change
Cont…

 Initial centroids are often chosen randomly.


 Clusters produced vary from one run to another.
 The centroid is (typically) the mean of the points in the cluster.
 „Closeness‟ is measured by Euclidean distance, cosine similarity,
etc.
 K-means will converge for common similarity measures
mentioned above.
 Most of the convergence happens in the first few iterations.
 Often the stopping condition is changed to „Until relatively
few points change clusters‟
 Complexity is O( n * K * I * d )
 n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
Variations of the K-Means Method
 There are also a few variants of the k-means which differ in
 Selection of the initial k means
 Dissimilarity calculations
 Selecting initial points
 Strategies to calculate cluster means

 k-modes (Huang‟98)- Handling categorical data:


 Replacing means of clusters with modes
 Using suitable dissimilarity measures to deal with categorical
objects
 Using a frequency-based method to update modes of clusters
Cont…
 K-Medoids - Clustering Method
 Find representative objects, called medoids, in clusters
 PAM (Partitioning Around Medoids, 1987)
 startsfrom an initial set of medoids and iteratively replaces
one of the medoids by one of the non-medoids if it improves
the total distance of the resulting clustering
 PAM works effectively for small data sets, but does not scale
well for large data sets
Example- k-means clustering

 Cluster the following eight points (with (x, y) representing


locations) into three clusters :
 A1(2, 10) A2(2, 5) A3(8, 4) A4(5, 8) A5(7, 5) A6(6, 4) A7(1, 2) A8(4, 9).
 Assume that initial cluster centers are: A1(2, 10), A4(5, 8) and A7(1, 2).
 The distance function between two points a=(x1, y1) and b=(x2,
y2) is defined as:
dis(a, b) = |x2 – x1| + |y2 – y1| .
 Use k-means algorithm to find optimal centroids to group the
given data into three clusters.
Iteration 1
First we list all points in the first column of the table below. The initial cluster
centers – centroids, are (2, 10), (5, 8) and (1, 2) - chosen randomly.

(2,10) (5, 8) (1, 2)


Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 5 9 1
A2 (2, 5) 5 6 4 3
A3 (8, 4) 12 7 9 2
A4 (5, 8) 5 0 10 2
A5 (7, 5) 10 5 9 2
A6 (6, 4) 10 5 7 2
A7 (1, 2) 9 10 0 3
A8 (4, 9) 3 2 10 2
Next, we will calculate the distance from each points to each of
the three centroids, by using the distance function:
dis(point i,mean j)=|x2 – x1| + |y2 – y1|
Iteration 1
• Starting from point A1 calculate the distance to each of the three means,
by using the distance function:
dis (A1, mean1) = |2 – 2| + |10 – 10| = 0 + 0 = 0
dis(A1, mean2) = |5 – 2| + |8 – 10| = 3 + 2 = 5
dis(A1, mean3) = |1 – 2| + |2 – 10| = 1 + 8 = 9
 Fill these values in the table & decide which cluster should the point
(2, 10) be placed in? The one, where the point has the shortest
distance to the mean – i.e. mean 1 (cluster 1), since the distance is 0.
• Next go to the second point A2 and calculate the distance:
dis(A2, mean1) = |2 – 2| + |10 – 5| = 0 + 5 = 5
dis(A2, mean2) = |5 – 2| + |8 – 5| = 3 + 3 = 6
dis(A2, mean3) = |1 – 2| + |2 – 5| = 1 + 3 = 4
 So, we fill in these values in the table and assign the point (2, 5) to
cluster 3 since mean 3 is the shortest distance from A2.
• Analogically, we fill in the rest of the table, and place each point in one
of the clusters
Iteration 1
 Next, we need to re-compute the new cluster centers (means). We do so,
by taking the mean of all points in each cluster.
 For Cluster 1, we only have one point A1(2, 10), which was the old mean,
so the cluster center remains the same.
 For Cluster 2, we have five points and needs to take average of them as
new centroid, i,e.
( (8+5+7+6+4)/5, (4+8+5+4+9)/5 ) = (6, 6)
 For Cluster 3, we have two points. The new centroid is:
( (2+1)/2, (5+2)/2 ) = (1.5, 3.5)

 That was Iteration1 (epoch1).


 Next, we go to Iteration2 (epoch2), Iteration3, and so on until the
centroids do not change anymore.
 In Iteration2, we basically repeat the process from Iteration1 this
time using the new means we computed.
Second epoch

 Using the new centroid we have to compute cluster members.


(2,10) (6, 6) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 0 8 7 1
A2 (2, 5) 5 5 2 3
A3 (8, 4) 12 4 7 2
A4 (5, 8) 5 3 8 2
A5 (7, 5) ... … … 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
 After the 2nd epoch the results would be:
cluster 1: {A1,A8} with new centroid=(3,9.5);
cluster 2: {A3,A4,A5,A6} with new centroid=(6.5,5.25);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Third epoch
 Using the new centroid we have to compute cluster members.
(3,9.5) (6.5, 5.25) (1.5, 3.5)
Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 1.5 9.25 7 1
A2 (2, 5) 5.5 4.75 2 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
 After the 3rd epoch the results would be:
cluster 1: {A1,A4,A8} with new centroid=(3.66,9);
cluster 2: {A3,A5,A6} with new centroid=(7,4.33);
cluster 3: {A2,A7} with new centroid=(1.5,3.5)
Fourth epoch
 Using the new centroid we have to compute cluster members.

(3.66,9) (7, 4.33) (1.5, 3.5)


Point Mean 1 Mean 2 Mean 3 Cluster
A1 (2, 10) 2.66 10.67 7 1
A2 (2, 5) 3
A3 (8, 4) 2
A4 (5, 8) 1
A5 (7, 5) 2
A6 (6, 4) 2
A7 (1, 2) 3
A8 (4, 9) 1
Final results
• Finally in the 4th epoch there is no change of members of clusters and
centroids. So the algorithm stops.
• The result of clustering is shown in the following figure
Comments on the K-Means Method
 Applicable only when mean is defined, then what about categorical
data?
o Use hierarchical clustering or other variations of K-means
 Need to specify k, the number of clusters, in advance
 Unable to handle noisy data and outliers since an object with an
extremely large value may substantially distort the distribution of the
data.
Solutions to Initial Centroids Problem
o Multiple runs
o Helps, but probability may not be on your side
o Sample and use hierarchical clustering to determine initial centroids
o Select more than k initial centroids and then select among these initial
centroids
o Select most widely separated
o Updating centers incrementally
o Preprocessing and Postprocessing
o Bisecting K-means
o Not as susceptible to initialization issues
Updating Centers Incrementally
 In the basic K-means algorithm, centroids are updated after all
points are assigned to a centroid

 An alternative is to update the centroids after each


assignment (incremental approach)
 More expensive
 Introduces an order dependency
 Never get an empty cluster
 Can use “weights” to change the impact
Pre-processing and Post-processing
 Pre-processing
 Normalize the data
 Eliminate outliers
 Post-processing
 Eliminate small clusters that may represent outliers
 Split „loose‟ clusters, i.e., clusters with relatively high SSE
 Merge clusters that are „close‟ and that have relatively low
SSE
Bisecting K-means
 Bisecting (dividing) K-means algorithm
 Variant of K-means that can produce a partitional and/or a
hierarchical clustering
 Bisecting k-Means is like a combination of k-Means and hierarchical
clustering.
Cont…
 Basic Bisecting K-means Algorithm for finding K clusters
– 1. Pick a cluster to split.
– 2. Find 2 sub-clusters using the basic k-Means algorithm (Bisecting step)
– 3. Repeat step 2, the bisecting step, for ITER times and take the split
that produces the clustering with the highest overall similarity.
– 4. Repeat steps 1, 2 and 3 until the desired number of clusters is
reached.
 The critical part is which cluster to choose for splitting. And there are
different ways to proceed, for example, you can choose the biggest cluster
or the cluster with the worst quality or a combination of both.
Hierarchical clustering approach
0.2

 Produces a set of nested clusters organized as 0.15

a hierarchical tree. 0.1

 Can be visualized as a dendrogram; a tree like


diagram that records the sequences of merges
0.05

or splits
0
1 3 2 5 4 6

Step 0 Step 1 Step 2 Step 3 Step 4


agglomerative
a
ab
b
abcde
c
cde
d
de
e
divisive
Step 4 Step 3 Step 2 Step 1 Step 0
Two main types of hierarchical clustering

 Agglomerative and Divisive

 Agglomerative: it is a Bottom Up clustering technique


 Start with all sample units in n clusters of size 1.
 Then, at each step of the algorithm, the pair of clusters with
the shortest distance are combined into a single cluster.
 The algorithm stops when all sample units are combined into
a single cluster of size n.
Agglomerative Clustering Algorithm
• More popular hierarchical clustering technique
• Basic algorithm is straightforward
1. Let each data point be a cluster
2. Compute the proximity matrix
3. Repeat
4. Merge the two closest clusters
5. Update the proximity matrix
6. Until only a single cluster remains
Key operation is the computation of the proximity of two clusters
10 10 10

9 9 9

8 8 8

7 7 7

6 6 6

5 5 5

4 4 4

3 3 3

2 2 2

1 1 1

0 0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
DIANA (Divisive Analysis)
 Divisive: it is a Top Down clustering technique
 Start with all sample units in a single cluster of size n.
 Then, at each step of the algorithm, clusters are partitioned
into a pair of daughter clusters, selected to maximize the
distance between each daughter.
 The algorithm stops when sample units are partitioned into n
clusters of size 1.
 Introduced in Kaufmann and Rousseeuw (1990)
 Thus it is an Inverse order of AGNES
10

10 10 9

9 9 8

8 8 7

7 7 6

6 6 5

5 5 4

4 4 3

3 3 2

2 2 1

1 1 0
0 1 2 3 4 5 6 7 8 9 10
0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10
Strengths of Hierarchical Clustering

 Do not have to assume any particular number of clusters


 Any desired number of clusters can be obtained by
„cutting‟ the dendogram at the proper level

 They may correspond to meaningful taxonomies


 Example in biological sciences (e.g., animal kingdom,
phylogeny reconstruction, …)
Cluster Validity/Evaluation
Cluster Validity
o For supervised classification we have a variety of measures to evaluate
how good our model is
o Accuracy, precision, recall
o For cluster analysis (unsupervised) , the analogous question is how to
evaluate the “goodness” of the resulting clusters?
o But “clusters are in the eye of the beholder”!
o Then why do we want to evaluate them?
o To avoid noise in finding patterns
o To compare clustering algorithms
o To compare two sets of clusters
o To compare two clusters
Measures of Cluster Validity
o Numerical measures that are applied to judge various aspects of cluster
validity, are classified into the following three types.
o External Index: Used to measure the extent to which cluster labels
match externally supplied class labels.
o Entropy

o Internal Index: Used to measure the goodness of a clustering structure


without respect to external information.
o Sum of Squared Error (SSE)
o Relative Index: Used to compare two different clusterings or clusters.
o Often an external or internal index is used for this function, e.g., SSE or entropy

o Sometimes these are referred to as criteria instead of indices


o However, sometimes criterion is the general strategy and index is the numerical
measure that implements the criterion.
Internal Index- SSE
o Most common measure is Sum of Squared Error (SSE)
o For each point, the error is the distance to the representative point with in a cluster
or nearest cluster
o To get SSE, we square these errors and sum them.
K
SSE    dist 2 ( mi , x )
i 1 xCi

o x is a data point in cluster Ci and mi is the representative point for cluster Ci


o can show that mi corresponds to the center (mean) of the cluster

o Given two clusterings, we can choose the one with the smallest error
o One easy way to reduce SSE is to increase K, the number of clusters
o But do not forget that a good clustering with smaller K can have a lower SSE than a poor
clustering with higher K
Internal Measures: SSE
o Internal Index: Used to measure the goodness of a clustering structure
without respect to external information
o SSE
o SSE is good for comparing two clusterings or two clusters (average SSE).
o Can also be used to estimate the number of clusters
10

6
SSE 5

0
2 5 10 15 20 25 30
K
Review questions
o What makes clustering more challenging?
o What Is Good Clustering?
o Explain SSE?
o Describe the basic Agglomerative Clustering Algorithm?
o Explain the key concept in Clustering?
Review questions
o What is the key issue in clustering and what makes it
challenging?
o How do you know that a given clustering activity is good?
o How does SSE works ?
o What does unsupervised learning means?
o Describe the basic Agglomerative Clustering Algorithm?
o Explain data format in Clustering?
Thank you

You might also like