0% found this document useful (0 votes)

62 views

Lab 08 Solutions

This document contains a lab exercise on clustering algorithms including k-means clustering and hierarchical agglomerative clustering. The exercises involve clustering 8 data points using k-means and hierarchical clustering with single link, complete link, average link and centroid linkage. The exercises are meant to demonstrate how to apply the clustering algorithms step-by-step and analyze the results.

Uploaded by

dawit gebreyohans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views

Lab 08 Solutions

Uploaded by

dawit gebreyohans

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Lab 8: 21 May 2012

Exercises on Clustering

1. Use the k-means algorithm and Euclidean distance to cluster the following 8 examples into 3 clusters:
A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2), A8=(4,9). Suppose that the
initial seeds (centers of each cluster) are A1, A4 and A7. Run the k-means algorithm for 1 epoch. At the
end of this epoch show:
a. The new clusters (i.e. the examples belonging to each cluster);
b. The centers of the new clusters;
c. Draw a 10 by 10 space with all the 8 points and show the clusters after the first epoch and the
new centroids.
d. How many more iterations are needed to converge? Draw the result for each epoch.

Solution
The Euclidean distances between the given points are in the following matrix:

a.

2. Use single and complete link agglomerative clustering to group the data described by the following
distance matrix. Show the dendrograms.

A B C D
A 0 1 4 5
B 0 2 6
C 0 3
D 0

Solution
1. Single link: distance between two clusters is the shortest distance between a pair of elements from
the two clusters.

We apply the algorithm presented in lecture 10 (ml_2012_lecture_10.pdf), page 4.

At the beginning, each point A,B,C, and D is a cluster à c1 = {A}, c2={B}, c3={C}, c4={D}

Iteration 1
The shortest distance is d(c1,c2)=1 à c1 and c2 are merged à the clusters are c3={C}, c4={D},
c5={A,B}
The distances from the new cluster to the others are d(c5,c3) = 2, d(c5,c4)=5

Iteration 2
The shortest distance is d(c5,c3)=2 à c5 and c3 are merged à the clusters are c6={A,B,C},
c4={D}
The distances from the new cluster to the others are: d(c6,c4)=3

Iteration 3
c6 and c4 are merged à the final cluster is c7={A,B,C,D}
The dendrogram is

2. Complete link: The distance between two clusters is the distance of two furthest data points in the
two clusters
We apply the algorithm presented in lecture 10 (ml_2012_lecture_10.pdf) page 4.

At the beginning, each point A,B,C, and D is a cluster à c1 = {A}, c2={B}, c3={C}, c4={D}

Iteration 1
The shortest distance is d(c1,c2)=1 à c1 and c2 are merged à the clusters are c3={C}, c4={D},
c5={A,B}
The distances from the new cluster to the others are: d(c5,c3) = 4, d(c5,c4)=6

Iteration 2
The shortest distance is d(c3,c4)=3 à c3 and c4 are merged à the clusters are c6={C,D},
c5={A,B}
The distances from the new cluster to the others are: d(c6,c5)=6

Iteration 3
c6 and c5 are merged à the final cluster is c7={A,B,C,D}

The dendrogram is

3. Use single-link complete-link, average-link, and centroid agglomerative clustering, to cluster the
following 8 examples: A1=(2,10), A2=(2,5), A3=(8,4), A4=(5,8), A5=(7,5), A6=(6,4), A7=(1,2),
A8=(4,9). Show the dendrograms.

Solution
The solutions for single-link and complete-link are analogous to the previous one. The solutions for average-
link and centroid are also similar, what is changing is the calculation of the distances between clusters.
• For average link the distance is the average of all the distances between points belonging to the two
clusters. For instance if c1={A,B} and c2={C,D},
dist(c1, c2) = (dist(A,B) + dist(A,D) + dist(B,C) + dist(B,D)) / 4
• For centroid the distance between two cluster is the distance between their centroids.

4. Consider a data set in two dimensions with five data points at: {(1, 0), (−1, 0), (0, 1), (3, 0), (3, 1)}. Run
two iterations of k-means by hand with initial points at (−1, 0) and (3, 1). What are the assignments at
each iteration and what are the centroids? Has the algorithm converged?

Solution
The solution is analogous to the solution of Exercise 1.

5. How can we make k-means robust to outliers? Explain the two methods we have seen.
Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf), pages 15-16.

6. Explain the main similarities and differences between k-means and hierarchical clustering.
Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf) and lecture 10 (ml_2012_lecture_10.pdf).

7. Give two examples of real-world applications of clustering.

Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf), page 9.

8. Which are the stopping criteria for the k-means algorithm?

Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf), page 12.

9. Is the result of k-means clustering sensitive to the choice of the initial seeds? How? Make an example.
Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf), page 17.

10. Which is a good algorithm for finding clusters of arbitrary shape? Is finding these clusters always a good
idea? When it is not?
Solution
Refer to lecture 9 (ml_2012_lecture_09.pdf), page 21 and to lecture 10 (ml_2012_lecture_10.pdf), page 5.

11. Explain the general algorithm for agglomerative hierarchical clustering.

Solution
Refer to lecture 10 (ml_2012_lecture_10.pdf), pages 3-4.

12. Explain the single-link and the complete-link methods for hierarchical clustering.
Solution
Refer to lecture 10 (ml_2012_lecture_10.pdf), pages 5-6.

13. Make 2 examples of distance functions that can be used for numeric attributes.
Solution
Refer to lecture 10 (ml_2012_lecture_10.pdf), pages 8-9.

Intro To BigQuery Solutions
No ratings yet
Intro To BigQuery Solutions
2 pages
Accenture Data Scientist Interview Questions
No ratings yet
Accenture Data Scientist Interview Questions
13 pages
CSI 2110 Summary PDF
No ratings yet
CSI 2110 Summary PDF
17 pages
Project 3: Jeopardy! - Part A (Server) : Requirements
No ratings yet
Project 3: Jeopardy! - Part A (Server) : Requirements
7 pages
Review of Basic Statistical Concepts Hanke
No ratings yet
Review of Basic Statistical Concepts Hanke
28 pages
Data Mining Classification: Alternative Techniques
No ratings yet
Data Mining Classification: Alternative Techniques
15 pages
Naïve Bayes Classifier (Week 8)
No ratings yet
Naïve Bayes Classifier (Week 8)
18 pages
Exam 2011 Data Mining Questions and Answers
0% (1)
Exam 2011 Data Mining Questions and Answers
16 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
24 pages
Expectation Maximization
No ratings yet
Expectation Maximization
23 pages
ET4248E - Chap9 - K-Means and GMM
No ratings yet
ET4248E - Chap9 - K-Means and GMM
27 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
Chapter
100% (1)
Chapter
101 pages
Non Parametric Methods 8
No ratings yet
Non Parametric Methods 8
23 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Support Vector Machines PDF
100% (1)
Support Vector Machines PDF
37 pages
MIniMax Algorithm
No ratings yet
MIniMax Algorithm
8 pages
Ai PPT Material
No ratings yet
Ai PPT Material
9 pages
Neural
No ratings yet
Neural
35 pages
Data Mining:: Concepts and Techniques
100% (1)
Data Mining:: Concepts and Techniques
63 pages
What Is Supervised Machine Learning
No ratings yet
What Is Supervised Machine Learning
3 pages
Jntuk R20 ML Unit-Ii
No ratings yet
Jntuk R20 ML Unit-Ii
37 pages
Topologie
No ratings yet
Topologie
181 pages
Support Vector Machines
No ratings yet
Support Vector Machines
14 pages
Introduction To Tree Methods
No ratings yet
Introduction To Tree Methods
15 pages
A Algorithm
No ratings yet
A Algorithm
9 pages
Core Java
No ratings yet
Core Java
217 pages
Kmeans Matlab Code Feed Own Data Source - QuestionInBox
No ratings yet
Kmeans Matlab Code Feed Own Data Source - QuestionInBox
5 pages
2D Array
No ratings yet
2D Array
38 pages
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
No ratings yet
Exercise 3: Logistic Regression: Andrew NG (Very Slightly Edited by Luis R. Izquierdo For The University of Burgos)
5 pages
Data Mining Lab Manual
No ratings yet
Data Mining Lab Manual
34 pages
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
No ratings yet
Understanding DBSCAN Algorithm and Implementation From Scratch - by Andrewngai - Towards Data Science
10 pages
2 Machine Learning Overview v3.5
No ratings yet
2 Machine Learning Overview v3.5
95 pages
DAA Unit-2: Fundamental Algorithmic Strategies
No ratings yet
DAA Unit-2: Fundamental Algorithmic Strategies
5 pages
45 Genetic Algorithms
No ratings yet
45 Genetic Algorithms
20 pages
Deep Learning - Wikipedia
No ratings yet
Deep Learning - Wikipedia
36 pages
Artificial Intelligence DITI 1113: Uniformed Search II
No ratings yet
Artificial Intelligence DITI 1113: Uniformed Search II
36 pages
2018 - A Survey On The Combined Use of Optimization Methods and Game Theory
No ratings yet
2018 - A Survey On The Combined Use of Optimization Methods and Game Theory
22 pages
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
No ratings yet
(Fall 2011) CS-402 Data Mining - Final Exam-SUB - v03
6 pages
A Star Algorithm
No ratings yet
A Star Algorithm
24 pages
Chapter One 1.1 Background of Study
No ratings yet
Chapter One 1.1 Background of Study
15 pages
5 2 Multilayer Perceptron
No ratings yet
5 2 Multilayer Perceptron
17 pages
Operations Research - Wikipedia
No ratings yet
Operations Research - Wikipedia
11 pages
CS8091 BDA Unit 2
No ratings yet
CS8091 BDA Unit 2
101 pages
Unit - 1
100% (1)
Unit - 1
20 pages
ML Unit-3.-1
No ratings yet
ML Unit-3.-1
28 pages
Machine Learning and Neural Networks: Riccardo Rizzo
100% (1)
Machine Learning and Neural Networks: Riccardo Rizzo
113 pages
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
No ratings yet
Introduction To Machine Learning: ETH Zurich Janik Schuettler Marcel Graetz FS18
18 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
AI - Expert System
100% (1)
AI - Expert System
24 pages
Full PDF
No ratings yet
Full PDF
157 pages
Mobile Net
No ratings yet
Mobile Net
9 pages
Autoencoders
No ratings yet
Autoencoders
66 pages
Ant Colony Optimization For Finding The Global Minimum: M. Duran Toksari
No ratings yet
Ant Colony Optimization For Finding The Global Minimum: M. Duran Toksari
9 pages
DBSCAN
No ratings yet
DBSCAN
42 pages
Lab3 NguyenQuocKhanh ITITIU18186
No ratings yet
Lab3 NguyenQuocKhanh ITITIU18186
7 pages
Parallel Ant Colonies For Combinatorial Optimization Problems
No ratings yet
Parallel Ant Colonies For Combinatorial Optimization Problems
9 pages
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
No ratings yet
Tutorial Exercises Clustering - K-Means, Nearest Neighbor and Hierarchical
7 pages
Exercises695Clus Solution - Doc Exercises695Clus Solution
No ratings yet
Exercises695Clus Solution - Doc Exercises695Clus Solution
7 pages
Numerical Analysis II Essentials
From Everand
Numerical Analysis II Essentials
The Editors of REA
No ratings yet
Lesson 2
No ratings yet
Lesson 2
30 pages
Labs
No ratings yet
Labs
35 pages
Lesson-2 LAN Design
No ratings yet
Lesson-2 LAN Design
116 pages
Lesson 1
No ratings yet
Lesson 1
27 pages
Data Mining With Weka - Demo
No ratings yet
Data Mining With Weka - Demo
12 pages
Chap8 Advanced Cluster Analysis
No ratings yet
Chap8 Advanced Cluster Analysis
45 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
DM Intro - 1
No ratings yet
DM Intro - 1
31 pages
3 DM Classification
No ratings yet
3 DM Classification
55 pages
5 DM Association
No ratings yet
5 DM Association
27 pages
6 WebMining
No ratings yet
6 WebMining
45 pages
Optimizing Data Warehousing Performance Through Machine Learning
No ratings yet
Optimizing Data Warehousing Performance Through Machine Learning
10 pages
ZEB PPT - PPTX (Autosaved)
No ratings yet
ZEB PPT - PPTX (Autosaved)
11 pages
Module 3 Quiz
No ratings yet
Module 3 Quiz
1 page
Confusion Matrix: Example Table of Confusion References External Links
No ratings yet
Confusion Matrix: Example Table of Confusion References External Links
3 pages
Preceptron
No ratings yet
Preceptron
17 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Precision+and+recall
No ratings yet
Precision+and+recall
5 pages
Logistic Regression With Pyspark
No ratings yet
Logistic Regression With Pyspark
19 pages
Chapter 4 PDF
No ratings yet
Chapter 4 PDF
89 pages
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
No ratings yet
COMP3308/3608 Artificial Intelligence Week 9 Tutorial Exercises Multilayer Neural Networks 2. Deep Learning
2 pages
III B.Tech I Sem MachineLearning (20AD5T04)
No ratings yet
III B.Tech I Sem MachineLearning (20AD5T04)
1 page
AY 2019-20 Sem-1 ECE Registration Details (E2-E4)
100% (1)
AY 2019-20 Sem-1 ECE Registration Details (E2-E4)
47 pages
5 Logistic Regression
No ratings yet
5 Logistic Regression
48 pages
Machine Learning
No ratings yet
Machine Learning
16 pages
Intrusion_Detection_System_A_Comparative_Study_of_
No ratings yet
Intrusion_Detection_System_A_Comparative_Study_of_
31 pages
Ist 407 Presentation
No ratings yet
Ist 407 Presentation
12 pages
ML Important Topic
No ratings yet
ML Important Topic
13 pages
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
No ratings yet
Machine Learning: Lecture # 2 Data Normalization, KNN & Minimum Distance
74 pages
Lecture 18. Backpropagation
No ratings yet
Lecture 18. Backpropagation
55 pages
K-Means Clustering Dan Local Outlier Factor: Clustering Data Remunerasi PNS Menggunakan Metode
No ratings yet
K-Means Clustering Dan Local Outlier Factor: Clustering Data Remunerasi PNS Menggunakan Metode
8 pages
Supervised Learning by Fadhlurrohman Henriwan
No ratings yet
Supervised Learning by Fadhlurrohman Henriwan
31 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
Boosting (Machine Learning)
No ratings yet
Boosting (Machine Learning)
6 pages
Wine Quality Classification Using Weka
No ratings yet
Wine Quality Classification Using Weka
21 pages
CNN Course V1.3
No ratings yet
CNN Course V1.3
19 pages
Probabilistic Forecast
No ratings yet
Probabilistic Forecast
42 pages
An Introduction To ROC Curve (Receiver Operating Characteristics)
No ratings yet
An Introduction To ROC Curve (Receiver Operating Characteristics)
16 pages
Classification and Prediction-Module4
No ratings yet
Classification and Prediction-Module4
26 pages
Spam Identification On Facebook, Twitter and Email Using Machine Learning
No ratings yet
Spam Identification On Facebook, Twitter and Email Using Machine Learning
9 pages
Lecture 2 Classifier Performance Metrics
No ratings yet
Lecture 2 Classifier Performance Metrics
60 pages
Machine Learning Algorithmsfor Predictionofmobilephone Price
No ratings yet
Machine Learning Algorithmsfor Predictionofmobilephone Price
9 pages

Lab 08 Solutions

Uploaded by

Lab 08 Solutions

Uploaded by

Lab 8: 21 May 2012

We apply the algorithm presented in lecture 10 (ml_2012_lecture_10.pdf), page 4.

7. Give two examples of real-world applications of clustering.

8. Which are the stopping criteria for the k-means algorithm?

11. Explain the general algorithm for agglomerative hierarchical clustering.

You might also like