0% found this document useful (0 votes)
236 views3 pages

CLARA CLARANS Example

An presentation on clara and clarans

Uploaded by

tripathbikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views3 pages

CLARA CLARANS Example

An presentation on clara and clarans

Uploaded by

tripathbikram
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 3

CLARA and CLARANS in Data Mining

Problem Setup:
We have a dataset with 10 points in 2D space, and we need to cluster them into 2 clusters.

Here is the dataset:


| Point | X | Y |
|-------|----|----|
| P1 | 2 | 10 |
| P2 | 2 | 5 |
| P3 | 8 | 4 |
| P4 | 5 | 8 |
| P5 | 7 | 5 |
| P6 | 6 | 4 |
| P7 | 1 | 2 |
| P8 | 4 | 9 |
| P9 | 6 | 2 |
| P10 | 3 | 6 |

CLARA (Clustering Large Applications) Example

1. Step 1: Subset Sampling


CLARA works by drawing multiple random samples (subsets) from the dataset, each of
size s, and then applies PAM (Partitioning Around Medoids) to each subset.

For simplicity, we take a subset of 5 points (small dataset):


- P1 (2, 10)
- P4 (5, 8)
- P6 (6, 4)
- P7 (1, 2)
- P9 (6, 2)

2. Step 2: Apply PAM to Subset


We calculate the distance matrix between the points using the Manhattan distance:

| | P1 | P4 | P6 | P7 | P9 |
|--------|-----|-----|-----|-----|-----|
| **P1** | 0 | 5 | 10 | 9 | 12 |
| **P4** | 5 | 0 | 5 | 10 | 7 |
| **P6** | 10 | 5 | 0 | 9 | 4 |
| **P7** | 9 | 10 | 9 | 0 | 5 |
| **P9** | 12 | 7 | 4 | 5 | 0 |

Using PAM, we identify the medoids. Suppose we pick P4 and P9 as initial medoids. Now,
we assign the remaining points to the closest medoid:
- P1 → P4
- P6 → P9
- P7 → P9

The clusters are:


- Cluster 1: P1, P4
- Cluster 2: P6, P7, P9

3. Step 3: Repeat with Multiple Subsets


CLARA repeats the sampling and clustering multiple times. The final clustering is based on
the medoids that result in the lowest overall cost (sum of distances from points to their
medoid).

CLARANS (Clustering Large Applications based on Randomized Search)


Example

1. Step 1: Initial Medoids


CLARANS starts with two randomly chosen medoids. Suppose we choose:
- Medoid 1: P1 (2, 10)
- Medoid 2: P6 (6, 4)

2. Step 2: Assign Points to Clusters


Assign each point to the closest medoid using Manhattan distance:
- P1 → P1 (Medoid 1)
- P2 → P6 (Medoid 2)
- P3 → P6 (Medoid 2)
- P4 → P1 (Medoid 1)
- P5 → P6 (Medoid 2)
- P7 → P6 (Medoid 2)
- P8 → P1 (Medoid 1)
- P9 → P6 (Medoid 2)
- P10 → P1 (Medoid 1)
Clusters are:
- Cluster 1: P1, P4, P8, P10
- Cluster 2: P2, P3, P5, P6, P7, P9

3. Step 3: Random Search for Better Medoids


CLARANS randomly selects a point that is not a medoid and swaps it with one of the
current medoids, then checks if the overall cost (sum of distances) decreases. If the cost
decreases, the new medoid is retained. If not, another random swap is tried.

4. Step 4: Final Clustering


After several iterations, CLARANS finalizes the clustering when no further improvements
are found. The resulting clusters will be based on the medoids that minimize the clustering
cost.

Conclusion:

- CLARA optimizes by sampling and using PAM, but it can miss the global optimum because
it only evaluates a small subset of data.
- CLARANS uses a randomized search approach, allowing it to explore more medoids and
find a better clustering solution.

You might also like