0% found this document useful (0 votes)

17 views52 pages

Module12 - Unsupervised Learning

This document discusses unsupervised learning methods, specifically principal component analysis (PCA) and clustering. It provides details on how PCA works, including finding linear combinations of variables with maximal variance and using singular value decomposition to solve the optimization problem. An example using US crime data is shown to illustrate PCA.

Uploaded by

riya pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views52 pages

Module12 - Unsupervised Learning

Uploaded by

riya pandey

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 52

Unsupervised

Learning
Reference Books

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An

introduction to statistical learning (Vol. 112, p. 18). New York:
springer.

Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H.

(2009). The elements of statistical learning: data mining,
inference, and prediction (Vol. 2, pp. 1-758). New York:
springer.

Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate

statistical analysis.
Unsupervised Learning

• Unsupervised learning is a type of algorithm that learns patterns

from untagged data.
• Unsupervised learning is more subjective than supervised
learning, as there is no simple goal for the analysis, such as
prediction of a response.
• We will discuss two unsupervised learning methods:
1. Principal components analysis
2. Clustering
Principal Components Analysis

• PCA produces a low-dimensional representation of a dataset.

• It finds a sequence of linear combinations of the variables
that have maximal variance, and are mutually uncorrelated.
• Apart from producing derived variables for use in supervised
learning problems, PCA also serves as a tool for data
visualization.
Principal Components Analysis: details

• The first principal component of a set of features X1 ,X2 , . . . ,X p

is the normalized linear combination of the features
Z1 = ∅11 X1 +∅21 X2 + ..… +∅p1 X p

that has the largest variance. B y normalized, we mean that

p
෍ ∅2j1 = 1
j=1

• The elements ∅11 , … , ∅p1 as the loadings of the first principal

component; together, the loadings make up the principal
component loading vector
∅𝟏 = (∅𝟏𝟏 ∅𝟐𝟏 … ∅𝐩𝟏 )𝐓
P C A : example

35
30
25
Ad Spending
20
15
10
5
0

10 20 30 40 50 60 70

Population

The population size (pop) and ad spending (ad) for 100 different cities are shown as
purple circles. The green solid line indicates the first principal component direction,
and the blue dashed line indicates the second principal component direction.
Computation of Principal Components

• Suppose we have a n × p data set X.

• Assume variables in X has been centered to have mean zero.
• Get the linear combination of the sample feature values
of the form
zi1 = ∅11 xi1 +∅21 xi2 + ..… +∅p1 x ip
for i = 1, . . . , n that has largest sample variance, (1)

subject to the constraint that

p
෍ ∅2j1 = 1
j=1

• Since each of the xij has mean zero, then so does zi1 .
Hence the sample variance of the zi1 can be written as
1 n
2
෍ zi1
n i=1
• Plugging in (1) the first principal component loading vector
solves the optimization problem
2
maximize 1 n
σ
∅11 …∅p1 n i=1
σpj=1 ∅j1 xij subject to σpj=1 ∅2j1 = 1

• This problem can be solved via a singular-value decomposition

of the matrix X.
• We refer to Z 1 as the first principal component, with
realized values z 11 , . . . , z n 1.
• The second principal component is the linear combination of
X1 ,X 2 , . . . , X p that has maximal variance among all linear
combinations that are uncorrelated with Z 1 .
• The second principal component scores z 12 , z22 , . . . , z n2
take the form

zi2 = ∅12 xi1 + ∅22 xi2 + ..… + ∅p2 xip

where ∅2 is the second principal component loading vector,

with elements ∅11 , ∅21 , … , ∅p1.
Illustration

• USAarrests data: For each of the fifty states in the United

States, the data set contains the number of arrests per
100,000 residents for each of three crimes: Assault, Murder,
and Other. We also record UrbanPop (the percent of the
population in each state living in urban areas).
• The principal component score vectors have length n = 50,
and the principal component loading vectors have length
p = 4.
• P C A was performed after standardizing each variable to
have mean zero and standard deviation one.
USAarrests data: P C A plot
−0.5 0.0 0.5

UrbanPop

3
2

0.5
Hawaii California
RhodM
e aIslU
saatnacdhuseNttesw Jersey

Connecticut
Second Principal Component

Washington Colorado
1

Ohio New York Nevada

n sininnesota Pennsylvania
WiscoM IllinoisArizona
Oregon
Texas
Other
Dm
s klaho
KansaO elaaware Missouri
Nebraska Indiana Michigan
New HaImowpashire

0.0
0

New Mexico Florida

Idaho Virginia
Wyoming
Maine Maryland
rth Dakota Montana
Assault
South Dakota TennesseLeouisiana
Kentucky
−1

Alaska
Arkansas Alabama
Georgia
VermontWest Virginia Murder

−0.5
South Carolina
−2

North Carolina
Mississippi
−3

−3 −2 −1 0 1 2 3

First Principal Component

Figure details

The first two principal components for the USArrests data.

• The blue state names represent the scores for the first
two principal components.
• The orange arrows indicate the first two principal
component loading vectors (with axes on the top and
right). For example, the loading for Other on the first
component is 0.54, and its loading on the second
principal component 0.17 [the word Rape is centered at
the point (0.54, 0.17)].
• This figure is known as a biplot, because it displays
both the principal component scores and the principal
component loadings.
Figure details

• First loading vector places approximately equal weights on

Assult, Murder and Other
• It indicates that the first PC represents crime in the city
• The second loading vector places most of the weight on
urban pop.
• The second PC represents the urban population
• Crime related variables are correlated (high murder rate is
associated with high assault)
• Urbanpop variable is less correlated with the other three.
How to Determine Principal Components

Let 𝚺 be the covariance matrix of the random variable

𝑿𝑻 = {𝐗 𝟏 ,𝐗 𝟐, . . . , 𝐗 𝐩 }

Let 𝚺 has eigenvalue-eigenvector pairs 𝜆1 , 𝒆𝟏 , 𝜆2 , 𝒆𝟐 … , (𝜆𝑝 , 𝒆𝒑 )

Where 𝜆1 ≥ 𝜆2 … ≥ 𝜆𝑝 ≥ 0

The 𝑖 𝑡ℎ PC is given by

𝒁𝒊 = 𝒆𝒊𝟏 X1 +𝒆𝒊𝟐 X2 +… .. +𝒆𝒊𝒑 Xp where 𝑖 = 1,2, … 𝑝

With following properties

𝑉𝑎𝑟 𝑍𝑖 = 𝒆𝑻𝒊 𝚺𝒆𝒊 = 𝜆𝑖 where 𝑖 = 1,2, … 𝑝
𝐶𝑜𝑣 𝑍𝑖 , 𝑍𝑘 = 0 for 𝑖 ≠ 𝑘
Another Interpretation of Principal Components

• The first principal component loading vector has a very

special property: it defines the hyperplane in p-dimensional
space that is closest to the n observations (using average
squared Euclidean distance as a measure of closeness).
• The notion of principal components as the dimensions that
are closest to the n observations extends beyond just the
first principal component.
Scaling of the variables
• If the variables are in different units, scaling each to have
standard deviation equal to one is recommended.
• Variance of Murder, Other, Assault and UrbanPop are:
18.97, 87.73, 6945.16 and 209.5
• If they are in the same units, scaling is not mandatory
Scaled Unscaled
−0.5 0.0 0.5 −0.5 0.0 0.5 1.0

1.0
UrbanPop UrbanPop
3

150
2

0.5

100
** ** *
Second Principal Component

Second Principal Component

0.5
* **
1

*
* *

50
* * * * * Other
* Other
* * * ** * *
* * ** *
0.0

* * *
0

*
* * * * *** ** * * ** ** * *
* * * *

0.0
* *
* * A*ssault 0 * * ** * *M*urd*er *
* * * Assa
* ** * *
* * * * * *
−1

* * *
* *
M*urder
−50

* *
−0.5

−0.5
−2

−100

**
−3

−3 −1 0 1 2 3 −100 −50 0 50 100

−2 150
First Principal Component First Principal Component
Proportion of Variance Explained

• To understand the strength of each component, measure the

proportion of variance explained ( P V E ) by each one.
• The total variance present in a data set (assuming that the
variables have been centered to have mean zero) is defined as
n p n
1
෍ Var X j = ෍ ෍ xij2
n
i=1 j=1 i=1
and the variance explained by the mth principal component is
2
n n p
1 2
1
෍ zim = ෍ ෍ ∅jm xij
n n
i=1 i=1 j=1
• Therefore, the P V E of the mth principal component is given
by the positive quantity between 0 and 1
2
σni=1 σpj=1 ∅jm xij
σpj=1 σni=1 xij2
Scree Plots

Left: Proportion of variance explained by each of the four

principal components in the USArrests data.
Right: The cumulative proportion of variance explained by
the four principal components in the USArrests data.
Example

Suppose Random Variable 𝑋1 , 𝑋2 , 𝑋3 have the covariance matrix

1 −2 0
−2 5 0
Σ=
0 0 2

Determine the principal components

PCA for Missing Values and Matrix Completion
• Often datasets have missing values, which can be a nuisance.

• Removing the rows that contain missing observations and perform

data analysis on the complete rows is wasteful, and depending on
the fraction missing, could be unrealistic.

• Alternatively, if xij is missing, can be replaced by the mean of the

jth column (using the non-missing entries to compute the mean).

• Although this is a common and convenient strategy, but

correlation between the variables is not exploited

• Principal components can be used to impute through a process

known as matrix completion.

• Sometimes data is missing by necessity (matrix of movie reviews)

PCA for Missing Values and Matrix Completion

Netflix movie rating data excerpt

Another Interpretation of Principal Components

• •
•

1.0
• • ••
•
• • • •• • •• •••
• •
•
• •

0.5
Secondprincipal component
•• •
• •
• • •
• • •• • •

0.0
••• •
• •
• •• • •• •• • •
•
• • •
• •
• • • •
•• •

−0.5
• •
• •••
• • ••
• • • •
• •
−1.0

−1.0 −0.5 0.0 0.5 1.0

First principal component
PCA for Missing Values and Matrix Completion
• Principal components provide low-dimensional linear surfaces that
are closest to the observations.

• The first two principal components of a data set span the plane
that is closest to the n observations

• The first three PCs of a data set span the three-dimensional

hyperplane that is closest to the n observations, and so forth.

• Using this interpretation, together the first M principal component

score vectors and the first M PC loading vectors provide the
best M-dimensional approximation (in terms of Euclidean
distance) to the ith observation 𝑥𝑖𝑗 .
• This is represented by 𝑥𝑖𝑗 ≈ σM m=1 zim ∅jm
PCA for Missing Values and Matrix Completion
• More formally, this can be represented by an optimization
problem
• Suppose data matrix 𝑋 is column centered
• Out of all approximations of the form 𝑥𝑖𝑗 ≈ σM
m=1 a im bjm , the
one with smallest RSS is given by

𝑝 2
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑨∈ℛ𝑛×𝑀 ,𝑩∈ℛ 𝑝×𝑀 σ𝑗=1 σ𝑛𝑖=1 𝑥𝑖𝑗 − σM
m=1 a im bjm

• t can be shown that for any value of M, the columns of the

matrices 𝑨 and 𝑩 that solve the above problem are the first M
principal components score and loading vectors.

• The smallest possible value of the above objective

2
σ𝑝𝑗=1 σ𝑛𝑖=1 𝑥𝑖𝑗 − σMm=1 zim ∅jm
• This property can be used for missing value imputations.
PCA for Missing Values and Matrix Completion
• A modified optimization problem
2
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑨∈ℛ𝑛×𝑀 ,𝑩∈ℛ 𝑝×𝑀 σ 𝑖,𝑗 ∈𝓞 𝑥𝑖𝑗 − σM
m=1 aim bjm

• 𝓞 is the set of all observed pairs 𝑖, 𝑗

• A missing observation can be estimated by

𝑥ො𝑖𝑗 = σM ො im b෠ jm
m=1 a

• An iterative approach can be used to solve the above optimization

problem.
Iterative Algorithm Matrix Completion

1. Initialize by creating data matrix 𝑋෨ by imputing missing values by

column mean

2. Repeat steps a to c until the objective no longer decreases

a. Solve:
2
𝑚𝑖𝑛𝑖𝑚𝑖𝑧𝑒𝑨∈ℛ𝑛×𝑀 ,𝑩∈ℛ 𝑝×𝑀 σ 𝑖,𝑗 ∈𝓞 𝑥෤𝑖𝑗 − σM m=1 a im bjm
by computing PCs of 𝑋෨
b. For each i, j ∉ 𝓞 , set 𝑥෤𝑖𝑗 ← σM m=1 aො im b෠ jm
c. Compute objective:
2
σ 𝑖,𝑗 ∈𝓞 𝑥𝑖𝑗 − σM ො im b෠ jm
m=1 a

• Return estimated missing entries 𝑥෤𝑖𝑗 , for i, j ∉ 𝓞

PCA for Missing Values and Matrix Completion

𝑖 𝑡ℎ customer rating for movie 𝑗 can be approximated by

𝑥ො𝑖𝑗 ← ෍ aො im b෠ jm
m=1
• aො im represents the strength with which the ith user belongs to the
𝑚𝑡ℎ clique, a group of customers that enjoys movies of the 𝑚𝑡ℎ genre;
• b෠ jm represents the strength with which the jth movie belongs to the
𝑚𝑡ℎ genre.
Clustering
Clustering

• Clustering refers to a very broad set of techniques for

finding subgroups, or clusters, in a data set.
• We seek a partition of the data into distinct groups so that
the observations within each group are quite similar to
each other.
P C A vs Clustering

• P C A looks for a low-dimensional representation of the

observations that explains a good fraction of the variance.
• Clustering looks for homogeneous subgroups among the
observations.
Two clustering methods

• In K-means clustering, observations are partitioned into a

pre-specified number of clusters.
• In hierarchical clustering, number of clusters are not known
beforehand
• A tree-like visual representation of the observations, called
dendrogram, is created to view at once the clusterings
obtained for each possible number of clusters, from 1 to n.
K-means clustering
K=2 K=3 K=4

A simulated data set with 150 observations in 2-dimensional space. Panels show the
results of applying K-means clustering with different values of K , the number of
clusters. The color of each observation indicates the cluster to which it was assigned
using the K-means clustering algorithm. Note that there is no ordering of the
clusters, so the cluster coloring is arbitrary.
These cluster labels were not used in clustering; instead, they are the outputs of the
clustering procedure.
Details of K-means clustering

Let C1 , . . . , CK denote sets containing the indices of the

observations in each cluster. These sets satisfy two
properties:
1. C1 ∪ C2 ∪ ⋯ ∪ CK = {1, … , n}. In other words, each
observation belongs to at least one of the K clusters.
2. Ck ∩ Ck′ = ∅ for all k ≠ k ′ .In other words, the clusters
a r e non-overlapping: no observation belongs to more than
one cluster.
For instance, if the ith observation is in the kth cluster, then
i ∈ CK .
• The within-cluster variation for cluster C k is a measure
W C V ( C k ) of the amount by which the observations within
a cluster differ from each other.
• Hence it is an optimization problem
K
minimize
෍ W(Ck ) (2)
C1 , … , CK
k=1
• In words, this formula says partition the observations into K
clusters such that the total within-cluster variation, summed
over all K clusters, is as small as possible.
How to define within-cluster variation?

• Typically Euclidean distance is used

p
1
W Ck = ෍ ෍(xij − xi′j )2 (3)
|Ck | ′
i,i ∈ Ck
j=1
where |C k| denotes the number of observations in the kth
cluster.

• Combining (2) and (3) gives the optimization problem that

defines K-means clustering,
K p
minimize 1 (4)
෍ ෍ ෍(xij − xi′j )2
C1 , … , CK |Ck | i,i′ ∈ Ck
k=1 j=1
K-Means Clustering Algorithm

1. Randomly assign a number, from 1 to K , to each of the

observations. These serve as initial cluster assignments for
the observations.
2. Iterate until the cluster assignments stop changing:
1. For each of the K clusters, compute the cluster centroid. The
kth cluster centroid is the vector of the p feature means for the
observations in the kth cluster.
2. Assign each observation to the cluster whose centroid is
closest (where closest is defined using Euclidean distance).
Properties of the Algorithm

• This algorithm is guaranteed to decrease the value of the

objective (4) at each step. Why? Note that
p p
1
෍ ෍(xij − xi′j )2 = 2 ෍ ෍(xij − xത kj )2
|Ck | i,i′ ∈ Ck i ∈ Ck
j=1 j=1
1
where xത kj = σi ∈ Ck xij is the mean for feature j in cluster
|Ck |
Ck .
• However it is not guaranteed to give the global minimum.
• This is why clustering should be tried with a number of initial
solutions
Hierarchical Clustering

• K-means clustering requires pre-specification of the number

of clusters K .
• Hierarchical clustering is an alternative approach which
does not require that we commit to a particular choice of
K.
• HC also provides a tree-like visualization
Hierarchical Clustering: the idea
Builds a hierarchy in a “bottom-up” fashion...

A B

C
D

A B

C
D

A B

C
D

A B

C
D

A B

C
Hierarchical Clustering Algorithm
The approach in words:
• Start with each point in its own cluster.
• Identify the closest two clusters and merge them.
• Repeat.
• Ends when all points are in a single cluster.

Dendrogram

4
3
D
E
A B 2
C
1
0

E
D

C
B
A
Types of Linkage
Linkage Description
Maximal inter-cluster dissimilarity. Compute all pairwise
Complete dissimilarities between the observations in cluster A and
the observations in cluster B, and record the largest of
these dissimilarities.
Minimal inter-cluster dissimilarity. Compute all pairwise
Single dissimilarities between the observations in cluster A and
the observations in cluster B, and record the smallest of
these dissimilarities.
Mean inter-cluster dissimilarity. Compute all pairwise
Average dissimilarities between the observations in cluster A and
the observations in cluster B, and record the average of
these dissimilarities.
Dissimilarity between the centroid for cluster A (a mean
Centroid vector of length p) and the centroid for cluster B. Cen-
troid linkage can result in undesirable inversions.
A n Example

4
X2
2
0
−2

−6 −4 −2 0 2

45 observations generated in 2-dimensional space. In reality there are three

distinct classes, shown in separate colors.
However, we will treat these class labels as unknown and will seek to cluster the
observations in order to discover the classes from the data.
10
Application of hierarchical clustering

10
8

8
6

6
4

4
2

2
0

0
All in 1 cluster Cut at a height of Cut at a height of
9, with 2 clusters 5, with 3 clusters
Choice of Dissimilarity Measure
• So far used Euclidean distance.
• An alternative is correlation-based distance which considers
two observations to be similar if their features are highly
correlated.
• Here correlation is computed between the observation
profiles for each pair of observations.
• Correlation care more about the shape, than the levels
20

Observation 1
Observation 2
Observation 3
15
10

2
5

1
0

5 10 15 20

Variable Index
Practical Issues for Clustering

1. Scaling is necessary
2. In some cases, standardization may be useful
3. What dissimilarity measure and linkage should be used (for HC)?
4. Choice of 𝐾 for K-means clustering
5. Which features should be used to drive the clustering?
Example

Gene expression measurement for 8000 genes, sample collected

from 88 women with breast cancer

Average linkage, correlation metric

Subset of 500 intrinsic genes were studied, before and after

chemotherapy (which genes were varying by how much, within
women and between women)
Heatmap
Based on the
gene
expression,
Samples were
clustered

Survival curves for

different groups

Quiz (FSC200 FSG L2) - Attempt Review2
100% (1)
Quiz (FSC200 FSG L2) - Attempt Review2
11 pages
Computer Architecture and Organisation Notes
100% (1)
Computer Architecture and Organisation Notes
18 pages
RTN 900 V100R019C00 Configuration Guide 01 PDF
No ratings yet
RTN 900 V100R019C00 Configuration Guide 01 PDF
1,883 pages
Service Manual: Active Subwoofer
No ratings yet
Service Manual: Active Subwoofer
30 pages
Complaint Type:Cyber Crime / Report & Track: Complainant Details
No ratings yet
Complaint Type:Cyber Crime / Report & Track: Complainant Details
2 pages
Quantum Technology Monitor
No ratings yet
Quantum Technology Monitor
53 pages
Lesson Plan G8
100% (1)
Lesson Plan G8
8 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
Inject-Concerning Transmitters and Receivers by Peter Neuthinger
No ratings yet
Inject-Concerning Transmitters and Receivers by Peter Neuthinger
5 pages
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
No ratings yet
A Step-By-Step Explanation of Principal Component Analysis (PCA) - Built in
8 pages
Practical Guide To Principal Component Analysis (PCA) in R & Python
No ratings yet
Practical Guide To Principal Component Analysis (PCA) in R & Python
33 pages
FTDI Driver Uninstall With 2-12-28 Install
No ratings yet
FTDI Driver Uninstall With 2-12-28 Install
7 pages
DimensionalitY Reduction
No ratings yet
DimensionalitY Reduction
29 pages
Remote Sensing Assignment
No ratings yet
Remote Sensing Assignment
10 pages
Synapse Link For Tables
No ratings yet
Synapse Link For Tables
5 pages
Sinamics: 1FK7 Synchronous Motors Sinamics S120
No ratings yet
Sinamics: 1FK7 Synchronous Motors Sinamics S120
252 pages
ADV-6 K
No ratings yet
ADV-6 K
303 pages
Pca Tutorial
No ratings yet
Pca Tutorial
11 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
53 pages
NEWBrigada Eskwela Template For Volunteers
No ratings yet
NEWBrigada Eskwela Template For Volunteers
7 pages
MV - Principal Components Using SAS
No ratings yet
MV - Principal Components Using SAS
69 pages
Lecture 6 - PCA - Lecturefin
No ratings yet
Lecture 6 - PCA - Lecturefin
71 pages
Safuu X Calculator
No ratings yet
Safuu X Calculator
97 pages
Practical Guide To Principal Component N R
No ratings yet
Practical Guide To Principal Component N R
43 pages
Module08 PolynomialRegressionSplineGAMs
No ratings yet
Module08 PolynomialRegressionSplineGAMs
56 pages
Ltintegratedreport 2023
No ratings yet
Ltintegratedreport 2023
100 pages
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
No ratings yet
Field / Campo Profibus Pa Junction Box / Caja de Conexiones Profibus Pa
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Module10 - Support Vector Machine
No ratings yet
Module10 - Support Vector Machine
23 pages
Ch12 Unsupervised Learning
No ratings yet
Ch12 Unsupervised Learning
58 pages
Chapter2 PCA
No ratings yet
Chapter2 PCA
65 pages
Unsupervised Handout
No ratings yet
Unsupervised Handout
50 pages
Unit 3
No ratings yet
Unit 3
28 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
PC Regression
No ratings yet
PC Regression
25 pages
Principal Components Analysis: Hal Whitehead BIOL4062/5062
No ratings yet
Principal Components Analysis: Hal Whitehead BIOL4062/5062
29 pages
ASSEMBLY Chapter 10
No ratings yet
ASSEMBLY Chapter 10
45 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
09 Pca
No ratings yet
09 Pca
22 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Minor Project Report
No ratings yet
Minor Project Report
42 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
L08 PrincipalComponentAnalysis
No ratings yet
L08 PrincipalComponentAnalysis
36 pages
Lecture FPCA
No ratings yet
Lecture FPCA
67 pages
Radwin Training Catalog
No ratings yet
Radwin Training Catalog
19 pages
Devoir PCA
No ratings yet
Devoir PCA
13 pages
DR Pca
No ratings yet
DR Pca
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
12 pages
AA11 - Unsupervised Learning - 2024
No ratings yet
AA11 - Unsupervised Learning - 2024
39 pages
Module12.01 UnsupervisedLearning
No ratings yet
Module12.01 UnsupervisedLearning
21 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Module 4-2 Principal Components Analysis
No ratings yet
Module 4-2 Principal Components Analysis
18 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Lecture 12 - Unsupervised - PCA
No ratings yet
Lecture 12 - Unsupervised - PCA
17 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
LED Blinking Using PIC Microcontroller - MPLAB XC8 and MikroC Codes
No ratings yet
LED Blinking Using PIC Microcontroller - MPLAB XC8 and MikroC Codes
35 pages
Principal Component Analysis (PCA) Explained - Built in
No ratings yet
Principal Component Analysis (PCA) Explained - Built in
11 pages
MKS Motherboard Raspberry Pi System and Klipper Firmware Upgrade Guide
No ratings yet
MKS Motherboard Raspberry Pi System and Klipper Firmware Upgrade Guide
16 pages
Practical Daa Soham
No ratings yet
Practical Daa Soham
33 pages
DR PPT
No ratings yet
DR PPT
9 pages
Ch. 10 Principal Components Analysis (PCA)
No ratings yet
Ch. 10 Principal Components Analysis (PCA)
17 pages
Blue Tide Brochure
No ratings yet
Blue Tide Brochure
12 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
12 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
39 pages
The Mathematics Behind Principal Component Analysis
No ratings yet
The Mathematics Behind Principal Component Analysis
9 pages
MTM 2024 Mod 1 Lec 2
No ratings yet
MTM 2024 Mod 1 Lec 2
12 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Kinya Sharon - Ass2 - Machine Learning
No ratings yet
Kinya Sharon - Ass2 - Machine Learning
12 pages
ML15 Pca
No ratings yet
ML15 Pca
12 pages
Doc-20240330-Wa0002 240330 194818
No ratings yet
Doc-20240330-Wa0002 240330 194818
10 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
20 pages
Steps For PCA
No ratings yet
Steps For PCA
5 pages
02 Principal Components
No ratings yet
02 Principal Components
9 pages
4 1 Pca
No ratings yet
4 1 Pca
21 pages
6 Principal Component Analysis
No ratings yet
6 Principal Component Analysis
7 pages
Use of Tools in PC Hardware Servicing: Instructional Plan (Iplan)
No ratings yet
Use of Tools in PC Hardware Servicing: Instructional Plan (Iplan)
2 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
PCA Explained Stepbystep
No ratings yet
PCA Explained Stepbystep
4 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
WBDV111 Finals CS2
No ratings yet
WBDV111 Finals CS2
4 pages
GeM Bidding 7789897
No ratings yet
GeM Bidding 7789897
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
116 Principal Components Analysis
No ratings yet
116 Principal Components Analysis
6 pages
A Step by Step Explanation of Principal Component Analysis
No ratings yet
A Step by Step Explanation of Principal Component Analysis
7 pages
PCA Notes
No ratings yet
PCA Notes
3 pages
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
No ratings yet
Yashwanth Kumar G N: Mob No:-9980703082 Email ID
2 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
15 pages
Class8-IIT Screening Test QP Sample Paper
No ratings yet
Class8-IIT Screening Test QP Sample Paper
2 pages
Sonica Eswar Resume
No ratings yet
Sonica Eswar Resume
1 page
HV 48V 80AH LiFeP04
No ratings yet
HV 48V 80AH LiFeP04
1 page

Module12 - Unsupervised Learning

Uploaded by

Module12 - Unsupervised Learning

Uploaded by

Unsupervised

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An

Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H.

Johnson, R. A., & Wichern, D. W. (2002). Applied multivariate

• Unsupervised learning is a type of algorithm that learns patterns

• PCA produces a low-dimensional representation of a dataset.

• The first principal component of a set of features X1 ,X2 , . . . ,X p

that has the largest variance. B y normalized, we mean that

• The elements ∅11 , … , ∅p1 as the loadings of the first principal

• Suppose we have a n × p data set X.

subject to the constraint that

• This problem can be solved via a singular-value decomposition

zi2 = ∅12 xi1 + ∅22 xi2 + ..… + ∅p2 xip

where ∅2 is the second principal component loading vector,

• USAarrests data: For each of the fifty states in the United

Ohio New York Nevada

New Mexico Florida

First Principal Component

The first two principal components for the USArrests data.

• First loading vector places approximately equal weights on

Let 𝚺 be the covariance matrix of the random variable

Let 𝚺 has eigenvalue-eigenvector pairs 𝜆1 , 𝒆𝟏 , 𝜆2 , 𝒆𝟐 … , (𝜆𝑝 , 𝒆𝒑 )

𝒁𝒊 = 𝒆𝒊𝟏 X1 +𝒆𝒊𝟐 X2 +… .. +𝒆𝒊𝒑 Xp where 𝑖 = 1,2, … 𝑝

With following properties

• The first principal component loading vector has a very

Second Principal Component

−3 −1 0 1 2 3 −100 −50 0 50 100

• To understand the strength of each component, measure the

Left: Proportion of variance explained by each of the four

Suppose Random Variable 𝑋1 , 𝑋2 , 𝑋3 have the covariance matrix

Determine the principal components

• Removing the rows that contain missing observations and perform

• Alternatively, if xij is missing, can be replaced by the mean of the

• Although this is a common and convenient strategy, but

• Principal components can be used to impute through a process

• Sometimes data is missing by necessity (matrix of movie reviews)

Netflix movie rating data excerpt

−1.0 −0.5 0.0 0.5 1.0

• The first three PCs of a data set span the three-dimensional

• Using this interpretation, together the first M principal component

• t can be shown that for any value of M, the columns of the

• The smallest possible value of the above objective

• 𝓞 is the set of all observed pairs 𝑖, 𝑗

• A missing observation can be estimated by

• An iterative approach can be used to solve the above optimization

1. Initialize by creating data matrix 𝑋෨ by imputing missing values by

2. Repeat steps a to c until the objective no longer decreases

• Return estimated missing entries 𝑥෤𝑖𝑗 , for i, j ∉ 𝓞

𝑖 𝑡ℎ customer rating for movie 𝑗 can be approximated by

• Clustering refers to a very broad set of techniques for

• P C A looks for a low-dimensional representation of the

• In K-means clustering, observations are partitioned into a

Let C1 , . . . , CK denote sets containing the indices of the

• Typically Euclidean distance is used

• Combining (2) and (3) gives the optimization problem that

1. Randomly assign a number, from 1 to K , to each of the

• This algorithm is guaranteed to decrease the value of the

• K-means clustering requires pre-specification of the number

45 observations generated in 2-dimensional space. In reality there are three

Gene expression measurement for 8000 genes, sample collected

Average linkage, correlation metric

Subset of 500 intrinsic genes were studied, before and after

Survival curves for

You might also like