0% found this document useful (0 votes)

10 views5 pages

2020 06-06-02 Hierarchical Clustering - Ipynb Colab

A type of clustering

Uploaded by

rasoul soleimani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views5 pages

2020 06-06-02 Hierarchical Clustering - Ipynb Colab

A type of clustering

Uploaded by

rasoul soleimani

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.

ipynb - Colab

keyboard_arrow_down Hierarchical Clustering

A Summary of lecture "Cluster Analysis in Python", via datacamp

toc: true
badges: true
comments: true
author: Chanseok Kang
categories: [Python, Datacamp, Machine Learning]
image: images/fifa_cluster.png

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

keyboard_arrow_down Basics of hierarchical clustering

Creating a distance matrix using linkage

method : how to calculate the proximity of clusters

metric : distance metric
optimal_ordering : order data points

Type of Methods

single: based on two closest objects

complete: based on two farthest objects
average: based on the arithmetic mean of all objects
centroids: based on the geometric mean of all objects
median: based on the median of all objects
ward: based on the sum of squares

keyboard_arrow_down Hierarchical clustering: ward method

It is time for Comic-Con! Comic-Con is an annual comic-based convention held in major cities in the world. You have the data of last year's
footfall, the number of people at the convention ground at a given time. You would like to decide the location of your stall to maximize sales.
Using the ward method, apply hierarchical clustering to find the two points of attraction in the area.

Preprocess

comic_con = pd.read_csv('./dataset/comic_con.csv', index_col=0)

comic_con.head()

x_coordinate y_coordinate

0 17 4

1 20 6

2 35 0

3 14 0

4 37 4

from scipy.cluster.vq import whiten

comic_con['x_scaled'] = whiten(comic_con['x_coordinate'])
comic_con['y_scaled'] = whiten(comic_con['y_coordinate'])

from scipy.cluster.hierarchy import linkage, fcluster

# Use the linkage()

distance_matrix = linkage(comic_con[['x_scaled', 'y_scaled']], method='ward', metric='euclidean')

# Assign cluster labels

comic_con['cluster_labels'] = fcluster(distance_matrix, 2, criterion='maxclust')

https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 1/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);

keyboard_arrow_down Hierarchical clustering: single method

Let us use the same footfall dataset and check if any changes are seen if we use a different method for clustering.

# Use the linkage()

distance_matrix = linkage(comic_con[['x_scaled', 'y_scaled']], method='single', metric='euclidean')

# Assign cluster labels

comic_con['cluster_labels'] = fcluster(distance_matrix, 2, criterion='maxclust')

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);

keyboard_arrow_down Hierarchical clustering: complete method

For the third and final time, let us use the same footfall dataset and check if any changes are seen if we use a different method for clustering.

# Use the linkage()

distance_matrix = linkage(comic_con[['x_scaled', 'y_scaled']], method='complete', metric='euclidean')

# Assign cluster labels

comic_con['cluster_labels'] = fcluster(distance_matrix, 2, criterion='maxclust')

# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);

keyboard_arrow_down Visualize clusters

Try to make sense of the clusters formed

An additional step in validation of clusters
Spot trends in data

keyboard_arrow_down Visualize clusters with matplotlib

We have discussed that visualizations are necessary to assess the clusters that are formed and spot trends in your data. Let us now focus on
visualizing the footfall dataset from Comic-Con using the matplotlib module.

# Define a colors dictionary for clusters

colors = {1:'red', 2:'blue'}

# Plot the scatter plot

comic_con.plot.scatter(x='x_scaled', y='y_scaled', c=comic_con['cluster_labels'].apply(lambda x: colors[x]));

keyboard_arrow_down Visualize clusters with seaborn

Let us now visualize the footfall dataset from Comic Con using the seaborn module. Visualizing clusters using seaborn is easier with the inbuild
hue function for cluster labels.

# Plot a scatter plot using seaborn

sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con)

<matplotlib.axes._subplots.AxesSubplot at 0x27d3a82d1c8>

keyboard_arrow_down How many clusters?

Introduction to dendrograms

Strategy till now - decide clusters on visual inspection

Dendrograms help in showing progressions as clusters are merged
A dendrogram is a branching diagram that demonstrates how each cluster is composed by branching out into its child nodes

keyboard_arrow_down Create a dendrogram

Dendrograms are branching diagrams that show the merging of clusters as we move through the distance matrix. Let us use the Comic Con
footfall data to create a dendrogram.

https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 3/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
from scipy.cluster.hierarchy import dendrogram

# Create a dendrogram
dn = dendrogram(distance_matrix)

Limitations of hierarchical clustering

Comparison of runtime of linkage method

Increasing runtime with data points

Quadratic increase of runtime
Not feasible for large datasets

keyboard_arrow_down Timing run of hierarchical clustering

In earlier exercises of this chapter, you have used the data of Comic-Con footfall to create clusters. In this exercise you will time how long it
takes to run the algorithm on DataCamp's system.

Remember that you can time the execution of small code snippets with:

%timeit sum([1, 3, 2])

%timeit linkage(comic_con[['x_scaled', 'y_scaled']], method='ward', metric='euclidean')

459 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)

keyboard_arrow_down FIFA 18: exploring defenders

In the FIFA 18 dataset, various attributes of players are present. Two such attributes are:

sliding tackle: a number between 0-99 which signifies how accurate a player is able to perform sliding tackles
aggression: a number between 0-99 which signifies the commitment and will of a player These are typically high in defense-minded
players. In this exercise, you will perform clustering based on these attributes in the data.

This data consists of 5000 rows, and is considerably larger than earlier datasets. Running hierarchical clustering on this data can take up to 10
seconds.

Preprocess

fifa = pd.read_csv('./dataset/fifa_18_dataset.csv')
fifa.head()

sliding_tackle aggression

0 23 63

1 26 48

2 33 56

3 38 78

4 11 29

fifa['scaled_sliding_tackle'] = whiten(fifa['sliding_tackle'])
fifa['scaled_aggression'] = whiten(fifa['aggression'])

https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 4/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
# Fit the data into a hierarchical cluster
distance_matrix = linkage(fifa[['scaled_sliding_tackle', 'scaled_aggression']], method='ward')

# Assign cluster labels to each row of data

fifa['cluster_labels'] = fcluster(distance_matrix, 3, criterion='maxclust')

# Display cluster centers of each cluster

print(fifa[['scaled_sliding_tackle', 'scaled_aggression', 'cluster_labels']].groupby('cluster_labels').mean())

# Create a scatter plot through seaborn

sns.scatterplot(x='scaled_sliding_tackle', y='scaled_aggression', hue='cluster_labels', data=fifa)
plt.savefig('../images/fifa_cluster.png')

scaled_sliding_tackle scaled_aggression
cluster_labels
1 0.987373 1.849142
2 3.013487 4.063492
3 1.934455 3.210802

https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 5/5

Cluster Analysis Thesis Matlab Code PDF
100% (3)
Cluster Analysis Thesis Matlab Code PDF
7 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Machine Learning Notes Anna University
100% (1)
Machine Learning Notes Anna University
14 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
4 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
10 pages
Balake Kannada - Work Book
No ratings yet
Balake Kannada - Work Book
65 pages
Select Sermons of Pope St. Leo The Great
100% (1)
Select Sermons of Pope St. Leo The Great
296 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Hierarchical Clustering Algorithm
No ratings yet
Hierarchical Clustering Algorithm
9 pages
DSBA Master Codebook - Unsupervised Learning
No ratings yet
DSBA Master Codebook - Unsupervised Learning
7 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
23 pages
Ifferent Methods of Clustering
No ratings yet
Ifferent Methods of Clustering
8 pages
Grouping
No ratings yet
Grouping
98 pages
Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan Download
100% (4)
Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan Download
59 pages
Basics of Hierarchical Clustering: Shaumik Daityari
No ratings yet
Basics of Hierarchical Clustering: Shaumik Daityari
30 pages
Hierarchical Clusters
No ratings yet
Hierarchical Clusters
6 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Aiml Unit 3 4
No ratings yet
Aiml Unit 3 4
19 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
41 pages
Clustering
No ratings yet
Clustering
75 pages
Hierarchical Clustering and Data Science Group Project - Assignment 2
No ratings yet
Hierarchical Clustering and Data Science Group Project - Assignment 2
29 pages
The Self From Various Philosophical Perspective - PSYCH 101
No ratings yet
The Self From Various Philosophical Perspective - PSYCH 101
3 pages
Chap15 Cluster Analysis
No ratings yet
Chap15 Cluster Analysis
55 pages
Lec 2
No ratings yet
Lec 2
32 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
34 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
Clustring
No ratings yet
Clustring
20 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
Chapter 4
No ratings yet
Chapter 4
30 pages
Heirarchical Clustering
No ratings yet
Heirarchical Clustering
22 pages
Module 3 - 1
No ratings yet
Module 3 - 1
149 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
DA Seminar
No ratings yet
DA Seminar
29 pages
Creating Heatmaps With Hierarchical Clustering
No ratings yet
Creating Heatmaps With Hierarchical Clustering
14 pages
Presentation 28128 Content Document 20241126014005PM
No ratings yet
Presentation 28128 Content Document 20241126014005PM
80 pages
Zara
No ratings yet
Zara
47 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
21AI71 Module 5 Textbook
No ratings yet
21AI71 Module 5 Textbook
25 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
ML Unit Iii
No ratings yet
ML Unit Iii
12 pages
Spooo
No ratings yet
Spooo
9 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
35 pages
Clustering
No ratings yet
Clustering
38 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
Week 10
No ratings yet
Week 10
84 pages
Agnes
No ratings yet
Agnes
25 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Expt 5
No ratings yet
Expt 5
3 pages
Marketing Analytics Week-10 LAQ
No ratings yet
Marketing Analytics Week-10 LAQ
5 pages
23CC554
No ratings yet
23CC554
10 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
Data Mining Unit 5
No ratings yet
Data Mining Unit 5
30 pages
Ezra Pound Literary Essays PDF
50% (2)
Ezra Pound Literary Essays PDF
2 pages
My Best Friend, My Love
No ratings yet
My Best Friend, My Love
2 pages
Essay Topics Grade Ten
No ratings yet
Essay Topics Grade Ten
8 pages
(En) M Sew - Soft Instruction
No ratings yet
(En) M Sew - Soft Instruction
58 pages
2023 JHS Scheme of Learning
No ratings yet
2023 JHS Scheme of Learning
37 pages
Text:: Comprehension
No ratings yet
Text:: Comprehension
5 pages
I Dedicate My Victory To Palestine' Afaf Raed Sharif, 17, From Palestine
No ratings yet
I Dedicate My Victory To Palestine' Afaf Raed Sharif, 17, From Palestine
2 pages
1st Key 2nd Key 3rd Key: Explorer Keyboard Shortcuts
No ratings yet
1st Key 2nd Key 3rd Key: Explorer Keyboard Shortcuts
24 pages
Advanced Linguistics OK
No ratings yet
Advanced Linguistics OK
5 pages
الخامس الابتدائي عام بنات - We can Mc Graw Hill الابتدائية منتظم
No ratings yet
الخامس الابتدائي عام بنات - We can Mc Graw Hill الابتدائية منتظم
26 pages
Tenses (Notes) : Chart-Active Verb Tenses
No ratings yet
Tenses (Notes) : Chart-Active Verb Tenses
2 pages
M. Noele Green: Instructional Designer
No ratings yet
M. Noele Green: Instructional Designer
3 pages
Session 1 DA Introduction
No ratings yet
Session 1 DA Introduction
69 pages
Keller Protocol
No ratings yet
Keller Protocol
37 pages
G-Stomper 5 - Pattern Sequencer
No ratings yet
G-Stomper 5 - Pattern Sequencer
83 pages
TIB Bwpluginftl 6.7.1 User-Guide
No ratings yet
TIB Bwpluginftl 6.7.1 User-Guide
55 pages
Simla Deputation PPT Edexcel
No ratings yet
Simla Deputation PPT Edexcel
8 pages
Dr. Nawal Al Hulwa - CV - English
No ratings yet
Dr. Nawal Al Hulwa - CV - English
8 pages
Ch1 Introduction To Os
No ratings yet
Ch1 Introduction To Os
16 pages
A History of Political Thought Plato To Marx 2nd Edition 2nd Subrata Mukherjee Instant Download
No ratings yet
A History of Political Thought Plato To Marx 2nd Edition 2nd Subrata Mukherjee Instant Download
84 pages
Tourist Attractions in Roxas
No ratings yet
Tourist Attractions in Roxas
10 pages
Hi! No Doubt You Know Me. Yes, Yes I Am William: Shakespeare!
No ratings yet
Hi! No Doubt You Know Me. Yes, Yes I Am William: Shakespeare!
16 pages
1 Syllabus For IINDUSTRY 4.0 - 20 Use Cases
No ratings yet
1 Syllabus For IINDUSTRY 4.0 - 20 Use Cases
5 pages
MATH 1150 - Assign #5
No ratings yet
MATH 1150 - Assign #5
1 page
How To Modify Curriculum For Students With ASD
No ratings yet
How To Modify Curriculum For Students With ASD
7 pages
Downshifting Essay
No ratings yet
Downshifting Essay
1 page
Conversations with: AI: Developer edition, #1
From Everand
Conversations with: AI: Developer edition, #1
Xinc Cyberwizard
No ratings yet
Advanced JavaScript Design Patterns
From Everand
Advanced JavaScript Design Patterns
Hernando Abella
No ratings yet

2020 06-06-02 Hierarchical Clustering - Ipynb Colab

Uploaded by

2020 06-06-02 Hierarchical Clustering - Ipynb Colab

Uploaded by

5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.

keyboard_arrow_down Hierarchical Clustering

keyboard_arrow_down Basics of hierarchical clustering

method : how to calculate the proximity of clusters

single: based on two closest objects

keyboard_arrow_down Hierarchical clustering: ward method

comic_con = pd.read_csv('./dataset/comic_con.csv', index_col=0)

from scipy.cluster.vq import whiten

from scipy.cluster.hierarchy import linkage, fcluster

# Use the linkage()

# Assign cluster labels

keyboard_arrow_down Hierarchical clustering: single method

# Use the linkage()

# Assign cluster labels

keyboard_arrow_down Hierarchical clustering: complete method

# Use the linkage()

# Assign cluster labels

keyboard_arrow_down Visualize clusters

Try to make sense of the clusters formed

keyboard_arrow_down Visualize clusters with matplotlib

# Define a colors dictionary for clusters

# Plot the scatter plot

keyboard_arrow_down Visualize clusters with seaborn

# Plot a scatter plot using seaborn

keyboard_arrow_down How many clusters?

Strategy till now - decide clusters on visual inspection

keyboard_arrow_down Create a dendrogram

Limitations of hierarchical clustering

Comparison of runtime of linkage method

Increasing runtime with data points

keyboard_arrow_down Timing run of hierarchical clustering

%timeit sum([1, 3, 2])

%timeit linkage(comic_con[['x_scaled', 'y_scaled']], method='ward', metric='euclidean')

keyboard_arrow_down FIFA 18: exploring defenders

# Assign cluster labels to each row of data

# Display cluster centers of each cluster

# Create a scatter plot through seaborn

You might also like