0% found this document useful (0 votes)

26 views7 pages

DSBA Master Codebook - Unsupervised Learning

Uploaded by

VAIBHAV PATIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views7 pages

DSBA Master Codebook - Unsupervised Learning

Uploaded by

VAIBHAV PATIL

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

DSBA

[email protected]
18XHT46RCY
Codebook

This file is meant for personal use by [email protected] only.

Sharing or publishing the contents in part or full is liable for legal action.
Preface

Data Science is the art and science of solving real world problems and making data driven decisions. It involves an
amalgamation of three aspects and a good data scientist has expertise in all three of them. These are:

1) Mathematical/ Statistical understanding

2) Coding/ Technology understanding
3) Domain knowledge

Your lack of expertise should not become an impediment in your journey in Data Science. With consistent effort, you
can become fairly proficient in coding skills over a period of time. This Codebook is intended to help you become
comfortable with the finer nuances of Python and can be used as a handy reference for anything related to data science
codes throughout the program journey and beyond that.

In this document we have followed the following syntax:

- Brief description of the topic

- Followed with a code example.

Please keep in mind there is no one right way to write a code to achieve an intended outcome. There can be multiple
ways of doing things in Python. The examples presented in this document use just one of the approaches to perform
the analysis. Please explore by yourself different ways to perform the same thing.

[email protected]
18XHT46RCY

1
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Contents

PREFACE ......................................................................................................................................................... 1

TABLE OF FIGURES ..................................................................................................................................... 2

TABLE OF EQUATIONS ............................................................ ERROR! BOOKMARK NOT DEFINED.

UNSUPERVISED LEARNING ...................................................................................................................... 3

Clustering ......................................................................................................................................................................... 3
Partition Clustering: K-Means ....................................................................................................................................... 3
Hierarchical Clustering: Agglomerative ......................................................................................................................... 3

DIMENSIONALITY REDUCTION TECHNIQUES .................................................................................. 5

Principal Component Analysis ...................................................................................................................................... 5

Dimensionality reduction using Linear Discriminant Analysis .................................................................................. 5

[email protected]
18XHT46RCY

Table of Figures
Figure 15: A Dendrogram ................................................................................................................................................................... 4

2
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Unsupervised Learning

Clustering

Grouping similar data

Partition Clustering: K-Means

The KMeans algorithm clusters data by trying to separate samples in n groups of equal variance, minimizing a criterion known as
the inertia or within-cluster sum-of-squares (see below). This algorithm requires the number of clusters to be specified. It scales
well to a large number of samples and has been used across a large range of application areas in many different fields.

from sklearn.cluster import KMeans

import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],

[10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

kmeans.labels_ #cluster numbers assigned to data points

[email protected]
18XHT46RCY
array([1, 1, 1, 0, 0, 0], dtype=int32) #output

kmeans.predict([[0, 0], [12, 3]])

array([1, 0], dtype=int32) #output

kmeans.cluster_centers_ #cluster centroids

array([[10., 2.], #output

[ 1., 2.]])

Source: scikit-learn

Hierarchical Clustering: Agglomerative

Hierarchical clustering is a general family of clustering algorithms that build nested clusters by merging or splitting them
successively. This hierarchy of clusters is represented as a tree (or dendrogram). The root of the tree is the unique cluster that gathers
all the samples, the leaves being the clusters with only one sample.

The AgglomerativeClustering object performs a hierarchical clustering using a bottom up approach: each observation starts in its
own cluster, and clusters are successively merged together. The linkage criteria determine the metric used for the merge strategy.

from sklearn.cluster import AgglomerativeClustering

import numpy as np

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

3
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
clustering = AgglomerativeClustering().fit(X)

clustering.labels_

array([1, 1, 1, 0, 0, 0])

Source: scikit-learn

Dendrogram

Plotting the hierarchical clustering as a dendrogram. The dendrogram illustrates how each cluster is composed by drawing a U-
shaped link between a non-singleton cluster and its children. The top of the U-link indicates a cluster merge. The two legs of the U-
link indicate which clusters were merged. The length of the two legs of the U-link represents the distance between the child clusters.
It is also the cophenetic distance between original observations in the two children clusters.

from scipy.cluster import hierarchy

import matplotlib.pyplot as plt

ytdist = np.array([662., 877., 255., 412., 996., 295., 468., 268.,

400., 754., 564., 138., 219., 869., 669.])

Z = hierarchy.linkage(ytdist, 'single')

plt.figure()

dn = hierarchy.dendrogram(Z)
[email protected]
18XHT46RCY

Figure 1: A Dendrogram

Source: scipy

4
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
Dimensionality Reduction Techniques

Principal Component Analysis

PCA is used to decompose a multivariate dataset in a set of successive orthogonal components that explain a maximum amount of
the variance. In scikit-learn, PCA is implemented as a transformer object that learns n components in its fit method, and can be used
on new data to project it on these components.

Linear dimensionality reduction using Singular Value Decomposition of the data to project it to a lower dimensional space. The
input data is centered but not scaled for each feature before applying the SVD. source: scikit-learn

import numpy as np

from sklearn.decomposition import PCA

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])

pca = PCA(n_components=2)

pca.fit(X)

PCA(n_components=2)

print(pca.explained_variance_ratio_)

[0.99244289 0.00755711]

print(pca.singular_values_)
[email protected]
[6.30061232 0.54980396]
18XHT46RCY
Method 2: (using the statsmodels library)

from statsmodels.multivariate.pca import PCA

pc = PCA(‘Data Frame on which to apply PCA’, ncomp=’Number of principal components required’)
pc.factors #to get the reduced dimension data

#Code to draw the scree plot to decide the number of factors:

from statsmodels.multivariate.factor import Factor
model=Factor(‘Dataset on which you want to apply PCA’).fit()
model.plot_scree()
plt.show()

Dimensionality reduction using Linear Discriminant Analysis

LDA can be used to perform supervised dimensionality reduction, by projecting the input data to a linear subspace consisting of the
directions which maximize the separation between classes. The dimension of the output is necessarily less than the number of
classes, so this is, in general, a rather strong dimensionality reduction, and only makes sense in a multiclass setting.

This is implemented in discriminant_analysis.LinearDiscriminantAnalysis.transform. The desired dimensionality can be set

using the n_components constructor parameter.

import numpy as np

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

X = np.array([[-1, -1,2], [-2, -1,-1], [-3, -2,-3], [1, 1,-2], [2, 1,-3], [3, 2,-2]])

y = np.array([1, 1, 1, 2, 2, 2])

5
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.
lda = LinearDiscriminantAnalysis(n_components=1)

reduced=lda.fit_transform(X,y)

reduced

array([[-3.98646358],

[-2.84747399],

[-3.70171618],

[ 2.27797919],

[ 3.41696878],

[ 4.84070578]])

Source: scikit-learn

[email protected]
18XHT46RCY

6
This file is meant for personal use by [email protected] only.
Sharing or publishing the contents in part or full is liable for legal action.

12th Poem 1&2 Erc
100% (1)
12th Poem 1&2 Erc
4 pages
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
No ratings yet
Dat Science: CLASS 11: Clustering and Dimensionality Reduction
30 pages
Forty Questions
100% (1)
Forty Questions
73 pages
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
No ratings yet
Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition
5 pages
Hierarchical Clustering Unit 4 ML
No ratings yet
Hierarchical Clustering Unit 4 ML
14 pages
Treatment of Police Towards Poor
No ratings yet
Treatment of Police Towards Poor
8 pages
Lecture - 11 Hierarchical Clustering
No ratings yet
Lecture - 11 Hierarchical Clustering
28 pages
Cheat Sheet-Building Unsupervised Learning Models
No ratings yet
Cheat Sheet-Building Unsupervised Learning Models
3 pages
God You're So Good - G - Passion
No ratings yet
God You're So Good - G - Passion
1 page
NNML ml3
No ratings yet
NNML ml3
84 pages
Clustering
No ratings yet
Clustering
55 pages
Partition
No ratings yet
Partition
52 pages
DATA - Dist
No ratings yet
DATA - Dist
90 pages
Chapter 2
No ratings yet
Chapter 2
30 pages
Module 4
No ratings yet
Module 4
63 pages
Clustering
No ratings yet
Clustering
20 pages
Q2 Grade 10 Arts DLL
No ratings yet
Q2 Grade 10 Arts DLL
18 pages
Unit Progress Test 2 - Version A
No ratings yet
Unit Progress Test 2 - Version A
12 pages
Week 8 DS Practical
No ratings yet
Week 8 DS Practical
13 pages
Business Consulting Presentation
No ratings yet
Business Consulting Presentation
18 pages
Comparative and Superlative Adjectives
No ratings yet
Comparative and Superlative Adjectives
2 pages
Data Mining
No ratings yet
Data Mining
27 pages
Data Digest - June 2022 Edition
No ratings yet
Data Digest - June 2022 Edition
15 pages
Remedial Bahasa Inggris Kelas 9 CH 2
No ratings yet
Remedial Bahasa Inggris Kelas 9 CH 2
1 page
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
10 pages
Article 3 - Boss Vegetable Chopper
No ratings yet
Article 3 - Boss Vegetable Chopper
3 pages
Orientation Presentation - PGPDSBA.O.Oct.B
No ratings yet
Orientation Presentation - PGPDSBA.O.Oct.B
31 pages
IOGCA 2020 Conference Brochure, 24-25 Sept, Ahmedabad
No ratings yet
IOGCA 2020 Conference Brochure, 24-25 Sept, Ahmedabad
6 pages
K-Means Algorithm
No ratings yet
K-Means Algorithm
29 pages
Cluster Analysis Thesis Matlab Code PDF
100% (3)
Cluster Analysis Thesis Matlab Code PDF
7 pages
General Information
No ratings yet
General Information
6 pages
Introduction To Data Science and Probability
No ratings yet
Introduction To Data Science and Probability
5 pages
Full Product - Breather
No ratings yet
Full Product - Breather
5 pages
Clustering Algorithms CheatSheet 1710438661
No ratings yet
Clustering Algorithms CheatSheet 1710438661
6 pages
Know Your Faculty (DSBA)
No ratings yet
Know Your Faculty (DSBA)
4 pages
2000 Mutiple Choice Sentences
No ratings yet
2000 Mutiple Choice Sentences
82 pages
2020 06-06-02 Hierarchical Clustering - Ipynb Colab
No ratings yet
2020 06-06-02 Hierarchical Clustering - Ipynb Colab
5 pages
51 DA5400 - FML51 - 20250501 ProblemSet06
No ratings yet
51 DA5400 - FML51 - 20250501 ProblemSet06
4 pages
PeerEval Unsupervised
No ratings yet
PeerEval Unsupervised
6 pages
Python
No ratings yet
Python
5 pages
Chemistry Project: Adam & Eve'S Convent School, Moradabad
No ratings yet
Chemistry Project: Adam & Eve'S Convent School, Moradabad
49 pages
Chapter One 1.0 Introdution
No ratings yet
Chapter One 1.0 Introdution
52 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Hooded Dino Blanket
No ratings yet
Hooded Dino Blanket
2 pages
Introduction To Data Mining Clustering Analysis
No ratings yet
Introduction To Data Mining Clustering Analysis
84 pages
Grouping
No ratings yet
Grouping
98 pages
Technology Requirement (3)
No ratings yet
Technology Requirement (3)
2 pages
Creating A Website
No ratings yet
Creating A Website
2 pages
Memorandum of Understanding (MOU) : Date
No ratings yet
Memorandum of Understanding (MOU) : Date
2 pages
Inner Oil Cap of Breather
No ratings yet
Inner Oil Cap of Breather
2 pages
Aluminum Container of Breather
No ratings yet
Aluminum Container of Breather
2 pages
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
No ratings yet
Machine Learning For Humans, Part 3 - Unsupervised Learning - by Vishal Maini - Machine Learning For Humans - Medium
23 pages
Linear Discriminant Analysis Reference
No ratings yet
Linear Discriminant Analysis Reference
6 pages
Feature Encoding
No ratings yet
Feature Encoding
5 pages
How To Create A Plan
No ratings yet
How To Create A Plan
5 pages
UnsupervisedLearning FoundationalMathofAI S24
No ratings yet
UnsupervisedLearning FoundationalMathofAI S24
6 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustering in Python-Dr. Afsaneh Javadi
No ratings yet
Clustering in Python-Dr. Afsaneh Javadi
8 pages
Top Cap of Breather
No ratings yet
Top Cap of Breather
1 page
Oil Cup of Breather
No ratings yet
Oil Cup of Breather
1 page
Inner Oil Cap of Breather
No ratings yet
Inner Oil Cap of Breather
1 page
Top Cap of Breather
No ratings yet
Top Cap of Breather
1 page
Customer Feedback
No ratings yet
Customer Feedback
1 page
Inner Oil Cap of Breather4
No ratings yet
Inner Oil Cap of Breather4
1 page
Oil Cup of Breather
No ratings yet
Oil Cup of Breather
1 page
Cantainer of Breather
No ratings yet
Cantainer of Breather
1 page
Breather
No ratings yet
Breather
1 page
Data Mining Project - Clustering - State Wise Health Income
No ratings yet
Data Mining Project - Clustering - State Wise Health Income
9 pages
Data Mining
No ratings yet
Data Mining
18 pages
Article 6 - Smart Grid
No ratings yet
Article 6 - Smart Grid
4 pages
Clustering
No ratings yet
Clustering
45 pages
ML Unit 5
No ratings yet
ML Unit 5
50 pages
First Step To Start Up
No ratings yet
First Step To Start Up
2 pages
Tutorial 8
No ratings yet
Tutorial 8
12 pages
Clustering
No ratings yet
Clustering
65 pages
Introduction To Cluster Analysis.
No ratings yet
Introduction To Cluster Analysis.
53 pages
Clustering Slides
No ratings yet
Clustering Slides
22 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Learner Handbook
No ratings yet
Learner Handbook
11 pages
Pca&kmean
No ratings yet
Pca&kmean
6 pages
Benefits of Ziyarat Ashura (Urdu)
100% (1)
Benefits of Ziyarat Ashura (Urdu)
0 pages
Brown Modern Venn Diagram Graph
No ratings yet
Brown Modern Venn Diagram Graph
1 page
Department of Civil and Environmental Engineering - MIT
No ratings yet
Department of Civil and Environmental Engineering - MIT
72 pages
Practical Data Analysis Cookbook - Sample Chapter
100% (1)
Practical Data Analysis Cookbook - Sample Chapter
31 pages
Assignment
No ratings yet
Assignment
24 pages
KERALA
No ratings yet
KERALA
9 pages
MODELS (AutoRecovered)
No ratings yet
MODELS (AutoRecovered)
9 pages
PGPDSBA+Oct.B.22+Program+Schedule Updated
No ratings yet
PGPDSBA+Oct.B.22+Program+Schedule Updated
2 pages
Complex Numbers
No ratings yet
Complex Numbers
3 pages
Hierarchical Clustering in Machine Learning
No ratings yet
Hierarchical Clustering in Machine Learning
11 pages
Article 2 - Underground & Overhead Cabling
No ratings yet
Article 2 - Underground & Overhead Cabling
8 pages
QFT1 19HS Problems
No ratings yet
QFT1 19HS Problems
32 pages
Plants of Central Australia
No ratings yet
Plants of Central Australia
9 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
Nephrology Test
100% (1)
Nephrology Test
22 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
11 pages
History of CNG
No ratings yet
History of CNG
5 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Article 5 - Acquaguard Water Filter
No ratings yet
Article 5 - Acquaguard Water Filter
4 pages
Lesson 6 - Unsupervised Learning
No ratings yet
Lesson 6 - Unsupervised Learning
63 pages
Disposal of Investment Property
No ratings yet
Disposal of Investment Property
4 pages
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
No ratings yet
Introduction To Data Analytics MCA-3282 Open Elective - 6 Sem B.Tech Topic - Grouping
44 pages
Cluster Analysis in Python Chapter2 PDF
No ratings yet
Cluster Analysis in Python Chapter2 PDF
30 pages
Hierarchical Clustering: Required Data
No ratings yet
Hierarchical Clustering: Required Data
6 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Incotax Quiz 2
No ratings yet
Incotax Quiz 2
6 pages
Ancient Chinese Civilization: Dynasties of China: Xia Dynasty - Qing Dynasty
No ratings yet
Ancient Chinese Civilization: Dynasties of China: Xia Dynasty - Qing Dynasty
29 pages
Agnes
No ratings yet
Agnes
25 pages
FullMarks - Clustering StudentSolution 2
No ratings yet
FullMarks - Clustering StudentSolution 2
13 pages
Maxbox Starter60 Machine Learning
No ratings yet
Maxbox Starter60 Machine Learning
8 pages
Clusteringi 4
No ratings yet
Clusteringi 4
6 pages
Cluster
100% (1)
Cluster
72 pages
Virgenia Henderson Final
No ratings yet
Virgenia Henderson Final
21 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Winogradsky Column
No ratings yet
Winogradsky Column
6 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
SWOT
No ratings yet
SWOT
3 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

DSBA Master Codebook - Unsupervised Learning

Uploaded by

DSBA Master Codebook - Unsupervised Learning

Uploaded by

DSBA

This file is meant for personal use by [email protected] only.

1) Mathematical/ Statistical understanding

In this document we have followed the following syntax:

- Brief description of the topic

TABLE OF FIGURES ..................................................................................................................................... 2

TABLE OF EQUATIONS ............................................................ ERROR! BOOKMARK NOT DEFINED.

UNSUPERVISED LEARNING ...................................................................................................................... 3

DIMENSIONALITY REDUCTION TECHNIQUES .................................................................................. 5

Principal Component Analysis ...................................................................................................................................... 5

Dimensionality reduction using Linear Discriminant Analysis .................................................................................. 5

Grouping similar data

Partition Clustering: K-Means

from sklearn.cluster import KMeans

X = np.array([[1, 2], [1, 4], [1, 0],

[10, 2], [10, 4], [10, 0]])

kmeans = KMeans(n_clusters=2, random_state=0).fit(X)

kmeans.labels_ #cluster numbers assigned to data points

kmeans.predict([[0, 0], [12, 3]])

array([1, 0], dtype=int32) #output

kmeans.cluster_centers_ #cluster centroids

array([[10., 2.], #output

Hierarchical Clustering: Agglomerative

from sklearn.cluster import AgglomerativeClustering

X = np.array([[1, 2], [1, 4], [1, 0],

[4, 2], [4, 4], [4, 0]])

from scipy.cluster import hierarchy

import matplotlib.pyplot as plt

ytdist = np.array([662., 877., 255., 412., 996., 295., 468., 268.,

400., 754., 564., 138., 219., 869., 669.])

Principal Component Analysis

from sklearn.decomposition import PCA

from statsmodels.multivariate.pca import PCA

#Code to draw the scree plot to decide the number of factors:

Dimensionality reduction using Linear Discriminant Analysis

This is implemented in discriminant_analysis.LinearDiscriminantAnalysis.transform. The desired dimensionality can be set

from sklearn.discriminant_analysis import LinearDiscriminantAnalysis

You might also like