2020 06-06-02 Hierarchical Clustering - Ipynb Colab
2020 06-06-02 Hierarchical Clustering - Ipynb Colab
ipynb - Colab
toc: true
badges: true
comments: true
author: Chanseok Kang
categories: [Python, Datacamp, Machine Learning]
image: images/fifa_cluster.png
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
Type of Methods
Preprocess
x_coordinate y_coordinate
0 17 4
1 20 6
2 35 0
3 14 0
4 37 4
comic_con['x_scaled'] = whiten(comic_con['x_coordinate'])
comic_con['y_scaled'] = whiten(comic_con['y_coordinate'])
https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 1/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);
# Plot clusters
sns.scatterplot(x='x_scaled', y='y_scaled', hue='cluster_labels', data=comic_con);
<matplotlib.axes._subplots.AxesSubplot at 0x27d3a82d1c8>
https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 3/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
from scipy.cluster.hierarchy import dendrogram
# Create a dendrogram
dn = dendrogram(distance_matrix)
Remember that you can time the execution of small code snippets with:
459 µs ± 377 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each)
sliding tackle: a number between 0-99 which signifies how accurate a player is able to perform sliding tackles
aggression: a number between 0-99 which signifies the commitment and will of a player These are typically high in defense-minded
players. In this exercise, you will perform clustering based on these attributes in the data.
This data consists of 5000 rows, and is considerably larger than earlier datasets. Running hierarchical clustering on this data can take up to 10
seconds.
Preprocess
fifa = pd.read_csv('./dataset/fifa_18_dataset.csv')
fifa.head()
sliding_tackle aggression
0 23 63
1 26 48
2 33 56
3 38 78
4 11 29
fifa['scaled_sliding_tackle'] = whiten(fifa['sliding_tackle'])
fifa['scaled_aggression'] = whiten(fifa['aggression'])
https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 4/5
5/6/24, 12:16 AM 2020-06-06-02-Hierarchical-Clustering.ipynb - Colab
# Fit the data into a hierarchical cluster
distance_matrix = linkage(fifa[['scaled_sliding_tackle', 'scaled_aggression']], method='ward')
scaled_sliding_tackle scaled_aggression
cluster_labels
1 0.987373 1.849142
2 3.013487 4.063492
3 1.934455 3.210802
https://colab.research.google.com/github/goodboychan/chans_jupyter/blob/master/_notebooks/2020-06-06-02-Hierarchical-Clustering.ipynb#scro… 5/5