0% found this document useful (0 votes)

7 views82 pages

ML Chapter 4 Part3

Uploaded by

growhigh007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views82 pages

ML Chapter 4 Part3

Uploaded by

growhigh007

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 82

Machine Learning

Samatrix Consulting Pvt Ltd

Unsupervised Learning
Unsupervised Learning
•
Unsupervised Learning
•
Unsupervised Learning - Challenges
•
Unsupervised Learning – Use Cases
• Unsupervised learning techniques have been gaining importance in a
number of fields.
• Online shopping sites have been using recommender system that identify
groups of customers with similar browsing and purchase histories.
• It also identifies the items that are of particular interest to the shoppers
within each group.
• Based on the purchase history the customers in a particular group, the
recommender system can show the items to the individual customer.
• The search engine can show different search results to a particular person
based on the click histories of other people who have similar search
patterns.
Principal Component Analysis
• We have already studied the principal component analysis in the
context of principal component regression.
• When our data set contains a large number of correlated variables,
we use principal components to summarize the data set by using a
smaller number of representative variables that can explain the most
variability in the original set collectively.
• The principal components are the directions in the feature space
along with the variability in the data is high.
Principal Component Analysis
•
Principal Components
•
Principal Components
•
Principal Components
•
Principal Components
•
Principal Components
•
Principal Components
•
Steps for PCA
Step 1 - Standardization
•
Step 2 – Covariance Matrix Computation
•
Step 2 – Covariance Matrix Computation
•

Student Math English Art

1 90 60 90
2 90 90 30
3 60 60 60
4 60 60 90
5 30 30 30
Step 2 – Covariance Matrix Computation
In [1]: import numpy as np

In [2]: marks = np.array([[90,90,60,60,30],[60,90,60,60,30], [90,30,60,90,30]])

• The mean Matrix would be

In [3]: mean_marks=np.mean(marks, axis= 1)

In [4]: mean_marks
Out[4]: array([66., 60., 60.])
Step 2 – Covariance Matrix Computation
• The covariance Matrix would be
In [5]: CovMat=np.cov(marks, bias=True)

In [6]: CovMat
Out[6]:
array([[504., 360., 180.],
[360., 360., 0.],
[180., 0., 720.]])
Step 2 – Covariance Matrix Computation
• The variance score for each test is shown along the diagonal.
• The Art test has highest variance (720) whereas English has smallest
(360).
• Hence Art score has more variability than the English test.
• The covariance between Math and English is positive (360) whereas
the covariance between Math and Art is also positive (180).
• The Covariance between English and Art is zero that shows no
relationship between English and Art
Step 3 – Compute Eigenvalue and
Eigenvector
• In order to determine the principal components of the data, we need to compute
the eigenvalues and eigenvectors from covariance matrix
In [7]: eig_val, eig_vec = np.linalg.eig(CovMat)

In [8]: eig_val
Out[8]: array([ 44.81966028, 910.06995304, 629.11038668])

In [9]: eig_vec
Out[9]:
array([[ 0.6487899 , -0.65580225, -0.3859988 ],
[-0.74104991, -0.4291978 , -0.51636642],
[-0.17296443, -0.62105769, 0.7644414 ]])
Step 4 – Sort Eigenvalue Choose k
Eigenvector
• The eigenvectors will form the basis of new feature space but they
only define the directions and all of them have unit length.
• To decide which eigenvector(s), we need to drop to get lower
dimensional subspace, we should review the corresponding
eigenvalues of the eigenvectors.
• The eigenvector corresponding to the lowest eigenvalue bears the
lowest information about the distribution of the data and we can
drop them.
Step 4 – Sort Eigenvalue Choose k
Eigenvector
• We need to rank the eigenvalues from the highest to the lowest and choose the top k
eigenvalues and eigenvectors.

In [10]: eig_pairs = [(np.abs(eig_val[i]), eig_vec[:,i]) for i in range(len(eig_val))]

In [11]: eig_pairs.sort(key=lambda x: x[0], reverse=True)

In [12]: for i in eig_pairs:

...: print(i[0])
910.0699530410367
629.1103866763253
44.81966028263878
Step 4 – Sort Eigenvalue Choose k
Eigenvector
• The corresponding eigenvectors are defined as weights

In [13]: matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1), eig_pairs[1][1].reshape(3,1)))

In [14]: print('Matrix W:\n', matrix_w)

Matrix W:
[[-0.65580225 -0.3859988 ]
[-0.4291978 -0.51636642]
[-0.62105769 0.7644414 ]]
Step 5 – Transform the value in new subspace
•
Calculate using sklearn
Alternatively, we can directly use sklearn to calculate the values

In [17]: from sklearn.decomposition import PCA as sklearnPCA

In [18]: sklearn_pca = sklearnPCA(n_components=2)

In [19]: sklearn_transf = sklearn_pca.fit_transform(marks.T)

Calculate using sklearn
In [20]: sklearn_transf
Out[20]:
array([[-34.37098481, -13.66927088],
[ -9.98345733, 47.68820559],
[ 3.93481353, -2.31599277],
[-14.69691716, -25.24923474],
[ 55.11654576, -6.45370719]])
Uniqueness of Principal Components
•
Uniqueness of Principal Components
• If we compare the eigenvectors following two different methods.
• Even though the first eigenvector is same in value as well as sign but
the second eigenvector is same in value but sign flips.
• In both the cases the value of principal components is same.
Proportion of Variance Explained
• In the previous section, we performed PCA on a three-dimensional data set
and projected the data onto the first two principal component vectors to
obtain a two-dimensional view of the data.
• This two-dimensional representation of the three-dimensional data
successfully captures the major patterns in the data.
• The question arises how much information in a given data is lost by
projecting the observations onto the first few principal components?
• How much information is not included in the first few principal
components?
• We are interested in proportion of variance explained (PVE) by each
principal component.
Proportion of Variance Explained
•
Proportion of Variance Explained
We can find out the variance explained using eigenvalue (from step 8)

In [22]: eig_val[::-1].sort()

In [23]: eig_val
Out[23]: array([910.06995304, 629.11038668, 44.81966028])

In [24]: eig_val/eig_val.sum()
Out[24]: array([0.57453911, 0.39716565, 0.02829524])

We can also find the value explained using sklearn for the top 2 principal components

In [25]: sklearn_pca.explained_variance_ratio_
Out[25]: array([0.57453911, 0.39716565])
Proportion of Variance Explained
Cumulative variance explained is

In [26]: sklearn_pca.explained_variance_ratio_.cumsum()
Out[26]: array([0.57453911, 0.97170476])

We can see that in this case, the first 2 components are able to explain
97.17% variance.
Deciding the number of components
•
Deciding the number of components
In [27]: marks1 = np.array([[90,90,60,60,30],[60,90,60,60,30],[90,30,60,90,30],[
...: 90,60,30,60,90]])

In [28]: sklearn_pca = sklearnPCA(n_components=4)

In [29]: sklearn_transf = sklearn_pca.fit_transform(marks1.T)

Deciding the number of components
In [30]: from matplotlib import pyplot as plt

In [32]: plt.figure(figsize=(7,5))
...: plt.plot([1,2,3,4], sklearn_pca.explained_variance_ratio_, '-o', label=
...: 'Individual component')
...: plt.plot([1,2,3,4], np.cumsum(sklearn_pca.explained_variance_ratio_), '
...: -s', label='Cumulative')
...: plt.ylabel('Proportion of Variance Explained')
...: plt.xlabel('Principal Component')
...: plt.xlim(0.75,4.25)
...: plt.ylim(0,1.05)
...: plt.xticks([1,2,3,4])
...: plt.legend(loc=2)
...: plt.show()
Deciding the number of components
• We generally decide on the number of principal components required by
examining a screen plot such as illustrated in Figure – 4.
• We need the smallest number of principal components that can explain a
sizable amount of variation in the data.
• We can do so by eyeballing the screen plot and looking for a point at which
the proportion of variance explained by each subsequent principal
component drops off.
• This is often referred to as an elbow in the screen plot.
• By inspection of Figure – 4, one might conclude a fair amount of variance
has been explained by the first three principal components and there is an
elbow after the third component.
• The fourth principal component explains a very small amount of variance.
Hence, it is worthless.
Independent Component
Analysis
Independent Component Analysis
• The Independent Component Analysis is based on information theory. It is
also a dimensionality reduction technique.
• The difference between Principal Component Analysis and Independent
Component Analysis is that PCA looks for uncorrelated factors whereas ICA
looks for independent factors.
• Two variables uncorrelated if there is no linear relationship between them.
• On the other hand, the two variables are independent, if they do not
depend on other variables.
• For example, the age of an individual is independent of his food
preferences.
Independent Component Analysis
• On various occasions, it is useful to process the data in order to
extract uncorrelated and independent components.
• For example, let’s suppose that we record two people while they sing
different songs. The result is very noisy.
• Our goal is to separate one source from another.
• This problem cannot be solved using PCA because in PCA there is no
constraint on the independence of the components.
• We can solve this using ICA.
• In layman terms, PCA helps to compress the data and ICA helps to
separate the data.
Clustering Methods
Clustering
• The techniques used for finding subgroups, or clusters, in a data set
are known as clustering.
• By using the clustering techniques, we can group the observation
together so that the observations within each group are quite similar
to each other, whereas the observations in different groups are
different from each other.
• However, we need to define how to define whether two or more
observations are similar or different.
• This requires domain-specific knowledge and knowledge about the
data that is being studied.
Clustering
• The objective of both clustering and PCA is to simplify the data via a
small number of summaries even though their mechanisms are
different.
• The objective of PCA is to find a low-dimensional representation of
the observations that can explain a good fraction of the variance.
• The objective of clustering is to find a homogeneous subgroup among
the observations
Clustering
•
K- Means Clustering
•
K- Means Clustering
K- Means Clustering
•
K- Means Clustering
•
K- Means Clustering Python Example
from sklearn.cluster import KMeans
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(2)
X = np.random.standard_normal((100,2))
X[:50,0] = X[:50,0]+3
X[:50,1] = X[:50,1]-4
km1 = KMeans(n_clusters=2, n_init=20)
km1.fit(X)

np.random.seed(4)
km2 = KMeans(n_clusters=3, n_init=20)
km2.fit(X)
K- Means Clustering Python Example
np.random.seed(6)
km3 = KMeans(n_clusters=4, n_init=20)
km3.fit(X)

fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(18,5))

ax1.scatter(X[:,0], X[:,1], s=40, c=km1.labels_)

ax1.set_title('K-Means Clustering Results K=2')
ax1.scatter(km1.cluster_centers_[:,0], km1.cluster_centers_[:,1], marker='+', s=100, c='k', linewidth=2)
ax2.scatter(X[:,0], X[:,1], s=40, c=km2.labels_)
ax2.set_title('K-Means Clustering Results K=3')
ax2.scatter(km2.cluster_centers_[:,0], km2.cluster_centers_[:,1], marker='+', s=100, c='k', linewidth=2);
ax3.scatter(X[:,0], X[:,1], s=40, c=km3.labels_)
ax3.set_title('K-Means Clustering Results K=4')
ax3.scatter(km3.cluster_centers_[:,0], km3.cluster_centers_[:,1], marker='+', s=100, c='k', linewidth=2);
K- Means Clustering Algorithm
•
K- Means Clustering Algorithm
Hierarchical Clustering
Hierarchical Clustering
•
Interpreting a Dendrogram
• We have plotted the hierarchical clustering using 45 observations in a
two-dimensional space.
• The results have been shown in Figure – 6. In the left-hand panel of Figure –
6, each leaf of the dendrogram represents one of the 46 observations.
• Following a bottom-up approach, as we move up the tree, some leaves that
are similar to each other start to fuse into branches.
• As we go higher up the tree, branches fuse with either other branches or
leaves.
• During the fusion process, a similar group of observations fuses with each
other at the early stage (lower in the tree).
Interpreting a Dendrogram
Interpreting a Dendrogram
• The observations that fuse at a later stage (near the top of the tree),
can be different from each other.
• Hence, we need to look for the point in the tree where the branches
containing those two observations are fused first.
• We can measure the height of the fusion on the vertical axis.
• The observations that fuse with each other at the bottom of the tree
are very similar to each other.
• On the other hand, the observations that fuse towards the top of the
tree will be quite different from each other.
Interpreting a Dendrogram
•
Linkage
• One of the important concepts is the dissimilarity between pairs of
observation and pairs of groups of observations. The term linkage
defines the dissimilarity between two groups of observations. The
three most common types of linkage are as follows
• Complete – Maximal inter-cluster dissimilarity
• Single – Minimal inter-cluster dissimilarity
• Average – Mean inter-cluster dissimilarity
• The python code of plotting dendrogram based on the three linkages
is given below
Linkage
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster import hierarchy

np.random.seed(2)
X = np.random.standard_normal((100,2))

fig, (ax1,ax2,ax3) = plt.subplots(3,1, figsize=(15,18))

for linkage, cluster, ax in zip([hierarchy.complete(X), hierarchy.average(X), hierarchy.single(X)], ['c1','c2','c3'],

[ax1,ax2,ax3]):
cluster = hierarchy.dendrogram(linkage, ax=ax, color_threshold=0)

ax1.set_title('Complete Linkage')
ax2.set_title('Average Linkage')
ax3.set_title('Single Linkage');
Linkage
Latent Semantic Indexing
Latent Semantic Indexing
• Latent Semantic Analysis (LSA), also known as Latent Semantic
Indexing (LSI), is an application of unsupervised dimensionality
reduction techniques to textual data.
• The problems that LSA tries to solve are the problems of:
• Synonymy: This means multiple words having the same meaning
• Polysemy: This means one-word having multiple meanings
• For example, consider the following two sentences:
• I liked his last novel quite a lot.
• We would like to go for a novel marketing campaign.
• In the first sentence, the word ‘novel’ refers to a book, and in the
second sentence it means new or fresh.
Latent Semantic Indexing
• We can easily distinguish between these words because we are able to
understand the context behind these words.
• However, a machine would not be able to capture this concept as it cannot
understand the context in which the words have been used.
• This is where Latent Semantic Analysis (LSA) comes into play as it attempts
to leverage the context around the words to capture the hidden concepts,
also known as topics.
• So, simply mapping words to documents won’t really help.
• What we really need is to figure out the hidden concepts or topics behind
the words.
• LSA is one such technique that can find these hidden topics.
Thanks
Samatrix Consulting Pvt Ltd

Sources of Data
100% (3)
Sources of Data
18 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
No ratings yet
AS Chemistry - Revision Notes Unit 3 - Introduction To Organic Chemistry
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Unit 3
No ratings yet
Unit 3
28 pages
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
No ratings yet
A COMPLETE GUIDE TO PRINCIPAL COMPONENT ANALYSIS in ML 1598272724
16 pages
ML Unit - 3 DimensionalitY Reduction
No ratings yet
ML Unit - 3 DimensionalitY Reduction
39 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
PCA - Ensemble Classifiers
No ratings yet
PCA - Ensemble Classifiers
9 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
L 10 Principal Component Analysis 09052024 072206pm
No ratings yet
L 10 Principal Component Analysis 09052024 072206pm
37 pages
Feature Extraction Techniques
No ratings yet
Feature Extraction Techniques
32 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
DR Pca
No ratings yet
DR Pca
22 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Principal Component Analysis: by Eesha Tur Razia Babar
No ratings yet
Principal Component Analysis: by Eesha Tur Razia Babar
38 pages
09 Pca
No ratings yet
09 Pca
19 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
Pca
No ratings yet
Pca
18 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
ML - Unit 3
No ratings yet
ML - Unit 3
4 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Pac
No ratings yet
Pac
70 pages
Aim: Theory: Experiment 3
No ratings yet
Aim: Theory: Experiment 3
3 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
Feature Extraction: - Saheni Patra
No ratings yet
Feature Extraction: - Saheni Patra
17 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Unit 4 (PCA)
No ratings yet
Unit 4 (PCA)
12 pages
How Do You Do A Principal Component Analysis?
No ratings yet
How Do You Do A Principal Component Analysis?
13 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Pca 1
No ratings yet
Pca 1
3 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Unit Iii Dimentionality Reduction
No ratings yet
Unit Iii Dimentionality Reduction
12 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
Lecture6 PCA
No ratings yet
Lecture6 PCA
30 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Eigenfaces With Pca
No ratings yet
Eigenfaces With Pca
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
Lesson 7-Feature Selection and Principal Component Analysis
No ratings yet
Lesson 7-Feature Selection and Principal Component Analysis
24 pages
PCA
100% (1)
PCA
33 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
14 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
ACPusing R
No ratings yet
ACPusing R
25 pages
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
Homework Riddles
100% (1)
Homework Riddles
5 pages
Progress Test 2A (Units 4-6)
No ratings yet
Progress Test 2A (Units 4-6)
7 pages
FT 1000 - FP 1000 - TG L111e
No ratings yet
FT 1000 - FP 1000 - TG L111e
12 pages
Exemplos Betas
No ratings yet
Exemplos Betas
12 pages
Daily Time Record Daily Time Record: A.M. P.M. A.M. P.M
No ratings yet
Daily Time Record Daily Time Record: A.M. P.M. A.M. P.M
1 page
The Next Big Thing Quantum Computings Potential On Chemicals
No ratings yet
The Next Big Thing Quantum Computings Potential On Chemicals
7 pages
Ecological Concepts in Buildings-A Case Study in Bangalore
No ratings yet
Ecological Concepts in Buildings-A Case Study in Bangalore
6 pages
ASHRAE Weather Data
No ratings yet
ASHRAE Weather Data
1 page
Investigational Device Exemption (IDE) - FDA
No ratings yet
Investigational Device Exemption (IDE) - FDA
2 pages
137-E Blank Form
No ratings yet
137-E Blank Form
3 pages
Signal Integrity Measurements and Network Analysis
No ratings yet
Signal Integrity Measurements and Network Analysis
55 pages
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
No ratings yet
Utilization of Low-Density Polyethylene (LDPE) Plastic in Production of Cement Brick
41 pages
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
No ratings yet
Top 10 Solar O&M KPIs To Track - Arbox Renewable Energy
4 pages
Civil Engineering Important Questions
No ratings yet
Civil Engineering Important Questions
8 pages
Resources and Development Practise Sheet 1
100% (1)
Resources and Development Practise Sheet 1
3 pages
Personal Values - Mark Manson
No ratings yet
Personal Values - Mark Manson
52 pages
WC4331
No ratings yet
WC4331
4 pages
Business Etiquette in South Korea - 20230908 - 122053 - 0000
No ratings yet
Business Etiquette in South Korea - 20230908 - 122053 - 0000
8 pages
Module 4.1 - Minimum Design Lateral Force
No ratings yet
Module 4.1 - Minimum Design Lateral Force
6 pages
Sports Mania
No ratings yet
Sports Mania
33 pages
Eoa Peg-4000 (En) Msds
No ratings yet
Eoa Peg-4000 (En) Msds
7 pages
1750 Blood Angels (Warhammer 40,000 9th Edition) (92 PL, 11CP, 1,749pts)
No ratings yet
1750 Blood Angels (Warhammer 40,000 9th Edition) (92 PL, 11CP, 1,749pts)
15 pages
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
No ratings yet
INAC 2011 Phnatom Alderson RANDO - Boia Et Al
10 pages
Mazda Engineering Standard: Teruhisa Morishige
No ratings yet
Mazda Engineering Standard: Teruhisa Morishige
10 pages
Quarter 1 Least Learned Competencies in Science
No ratings yet
Quarter 1 Least Learned Competencies in Science
3 pages
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
No ratings yet
Stal S700 - Porownanie - 10-Hillong-Milan-Veljkovic
30 pages
15114L23 Popa-Mirela 2023 29-2 - 133-137
No ratings yet
15114L23 Popa-Mirela 2023 29-2 - 133-137
5 pages
COHESIVE DEVICES-Advanced
100% (2)
COHESIVE DEVICES-Advanced
2 pages

ML Chapter 4 Part3

Uploaded by

ML Chapter 4 Part3

Uploaded by

Machine Learning

Samatrix Consulting Pvt Ltd

Student Math English Art

In [2]: marks = np.array([[90,90,60,60,30],[60,90,60,60,30], [90,30,60,90,30]])

• The mean Matrix would be

In [3]: mean_marks=np.mean(marks, axis= 1)

In [10]: eig_pairs = [(np.abs(eig_val[i]), eig_vec[:,i]) for i in range(len(eig_val))]

In [11]: eig_pairs.sort(key=lambda x: x[0], reverse=True)

In [12]: for i in eig_pairs:

In [13]: matrix_w = np.hstack((eig_pairs[0][1].reshape(3,1), eig_pairs[1][1].reshape(3,1)))

In [14]: print('Matrix W:\n', matrix_w)

In [17]: from sklearn.decomposition import PCA as sklearnPCA

In [18]: sklearn_pca = sklearnPCA(n_components=2)

In [19]: sklearn_transf = sklearn_pca.fit_transform(marks.T)

In [28]: sklearn_pca = sklearnPCA(n_components=4)

In [29]: sklearn_transf = sklearn_pca.fit_transform(marks1.T)

fig, (ax1, ax2, ax3) = plt.subplots(1,3, figsize=(18,5))

ax1.scatter(X[:,0], X[:,1], s=40, c=km1.labels_)

fig, (ax1,ax2,ax3) = plt.subplots(3,1, figsize=(15,18))

for linkage, cluster, ax in zip([hierarchy.complete(X), hierarchy.average(X), hierarchy.single(X)], ['c1','c2','c3'],

You might also like