Principal Components Analysis (Part I) : Data Science
Principal Components Analysis (Part I) : Data Science
Data Science
UTB
CC BY-SA 4.0
Introduction
2 / 77
NBA Team Stats
3 / 77
# variables
dat <- read.csv('data/nba-teams-2017.csv')
dim(dat)
[1] 30 27
names(dat)
5 / 77
Exploratory Data Analysis
I losses
I points
I field goals
I assists
I turnovers
I steals
I blocks
6 / 77
EDA: Objects and Variables Perspectives
1 j p
i xij
Rows Variables
(objects) n (columns)
X
data matrix (centered)
7 / 77
EDA: Objects and Variables Perspectives
Data Perspectives
We are interested in analyzing a data set from both
perspectives: objects and variables
At its simplest we are interested in 2 fundamental purposes:
I Study resemblance among individuals
(resemblance among NBA teams)
I Study relationship among variables
(relationship among team statistics)
8 / 77
EDA
Exploration
Likewise, we can explore variables at different stages:
I Univariate: one variable at a time
I Bivariate: two variables simultaneously
I Multivariate: multiple variables
Let’s see a shiny-app demo (see apps/ folder in github repo)
9 / 77
points
field_goals losses
assists wins
turnovers blocks
steals
10 / 77
Correlation heatmap
blocks
steals Pearson
Correlation
turnovers 1.0
assists 0.5
Var1
field_goals 0.0
points −0.5
losses −1.0
wins
s
es
_g s
as ls
rn ts
st s
bl ls
ks
in
ld int
er
oa
ea
tu sis
oc
ss
w
ov
fie po
lo
Var2 11 / 77
20 40 60 36 38 40 42 12 14 16 4.0 5.0 6.0
60
wins
40
20
60
40
losses
20
110
points
100
42
field_goals
39
36
26
assists
20
12 14 16
turnovers
9.5
steals
8.0
6.5
5.5
blocks
4.0
12 / 77
What if we could get a better
low-dimensional summary of the data?
13 / 77
4
Nets ● 76ers ●
●
Suns
Lakers ●
2
Dim 2 (20.22%)
Timberwolves ● Hawks
● Nuggets ● Warriors
Kings ● Knicks
●
●
Bulls ● ●
● Blazers
Heat
Grizzlies Clippers Cavaliers
Mavericks
● ●
●
● ● ●● ●
Pistons
Hornets Raptors Celtics Spurs ●
−2
●
Jazz
−4
−4 −2 0 2 4 6 8
Dim 1 (46.01%)
14 / 77
1.0
turnovers
losses
0.5
steals
field_goals points
Dim 2 (20.22%)
blocks assists
0.0
●
−0.5
wins
−1.0
Dim 1 (46.01%)
15 / 77
About PCA
16 / 77
Data Structure
17 / 77
Landmarks
18 / 77
Core Idea
19 / 77
PCA: Overall Goals
20 / 77
Applications
21 / 77
About PCA
Approaches:
PCA can be presented using various—different but
equivalent—approaches. Each approach corresponds to a
unique perspective and a way of thinking about data.
I Data dispersion from the individuals standpoint
I Data variability from the variables standpoint
I Data that follows a decomposition model
I will present PCA by mixing and connecting all of these
approaches.
22 / 77
Geometric Approach
23 / 77
Geometric mindset
24 / 77
Imagine a data set in a ”high-dimensional space”
25 / 77
We are looking for Candidate Subspaces
HC
HA
HB
26 / 77
with the best low-dimensional representation
HC
HA
HB
27 / 77
Best low-dimensional projection
HC
HA
HB
28 / 77
Geometric Idea
29 / 77
Objects in a high-dimensional space
Rp
30 / 77
We look for a subspace such that
Rp
31 / 77
the projection of points on it
Rp
32 / 77
is the best low-dimensional representation
34 / 77
Focus on distances between objects
d2(i, h) h object
object i
g
centroid
H
subspace
35 / 77
We want projected dists to preserve original dists
d2(i, h) h object
object i
projection h
g 2(i,
dH h)
centroid projection i
H
subspace
36 / 77
Focus on projected distances
37 / 77
Distances and Dispersion
Dispersion of Data
Focusing on distances among all pairs of objects implicitly
entails taking into account the dispersion or spread (i.e.
variation) of the data.
Data Configuration
The reason to pay attention to distances and dispersion is to
summarize in a quantitative way the original configuration of
the data points.
38 / 77
How to measure dispersion?
The concept of Inertia
39 / 77
Sum of Square Distances
40 / 77
Imagine 3 points and its centroid
Xp
GSW
UTA
LAL
X2
X1
41 / 77
Dispersion: Sum of all squared dists
Xp
GSW
UTA
LAL
X2
X1
SSD = 2d2 (LAL, GSW) + 2d2 (LAL, UTA) + 2d2 (GSW, UTA)
42 / 77
2n × (sum of squared dists w.r.t. centroid)
Xp
GSW
UTA
LAL
X2
X1
43 / 77
Inertia
One way to take into account the dispersion of the data is
with the concept of Inertia.
I Inertia is a term borrowed from the moment of inertia in
mechanics (physics).
I This involves thinking about data as a rigid body (i.e.
particles).
I We use the term Inertia to convey the idea of dispersion
in the data.
I In multivariate methods, the term Inertia generalizes
the notion of variance.
I Think of Inertia as a “multidimensional variance”
44 / 77
Cloud of teams in p-dimensional space
Xp
GSW
LAL
X2
X1
45 / 77
Centroid (i.e. the average team)
Xp
GSW
LAL
X2
X1
46 / 77
Formula of Total Inertia
47 / 77
Overall variation/spread (around centroid)
Xp
GSW
LAL
X2
X1
48 / 77
Formula of Total Inertia
49 / 77
Centered data: centroid is the origin
Xp
GSW
LAL
X2
X1
50 / 77
Computing Inertia
n
X
Inertia = mi d2 (xi , g)
i=1
n
X 1
= (xi − g)T (xi − g)
i=1
n
1
= tr(XT X)
n
1
= tr(XXT )
n
where mi is the mass (i.e. weight) of individual i, usually 1/n
51 / 77
Finding Principal Components
52 / 77
Inertia Concept
Criterion
The criterion used for dimensionality reduction implies that the
inertia of a cloud of points in the optimal subspace is
maximum (but less than the inertia in the original space).
53 / 77
Criterion
Axis of Inertia
To find the subspace H we can look for each of its axes
∆1 , ∆2 , . . . , ∆k and its corresponding vectors v1 , v2 , . . . , vk
(k < p).
54 / 77
Looking for an axis 1
Xp
GSW
LAL
X2
X1
55 / 77
1st axis
Xp
axis 1
X2
X1
56 / 77
First Axis and Principal Component
Xv1 = z1
57 / 77
First Axis and Principal Component
58 / 77
2nd axis
Xp
axis 2
axis 1
X2
X1
59 / 77
Second Axis and Principal Component
Xv2 = z2
60 / 77
Second Axis and Principal Component
61 / 77
Computational note
62 / 77
Looking at Variables
63 / 77
Looking at the cloud of standardized variables
i
Rn
Xj
Xl
θjl
2
64 / 77
Looking at the cloud of standardized variables
I Notice that:
xTj xl
cos(θjl ) = = cor(Xj , Xl )
kxj k kxl k
65 / 77
Projecting the cloud of standardized variables
66 / 77
Projection of best subspace
A
HA
D
HA HD HB
HB
HD
HC HC
B
C
67 / 77
Projecting the cloud of standardized variables
68 / 77
Finding subspace for variables
XT u1 = q1
69 / 77
Finding subspace for variables
Q = XT U
70 / 77
Finding subspace for variables
where:
I Λ is the diagonal matrix of eigenvectors of 1 XXT
n
I U is the matrix of eigenvectors of 1 XXT
n
But keep in mind that PCs can be rescaled.
71 / 77
Relationship between the
representations
of Individuals and Variables
72 / 77
Link between representations
1
SVD of: √ X = UDVT
n−1
1 1
Z = XV = √ XQD−1 ⇒ V=√ QD−1
n−1 n−1
1 1
Q = XT U = √ XT ZD−1 ⇒ U= √ ZD−1
n−1 n−1
73 / 77
Link between representations
74 / 77
Principal Components?
Meaning of Principal
The term Principal, as used in PCA, has to do with the
notion of principal axis from geometry and linear algebra
Principal Axis
A principal axis is a certain line in a Euclidean space
associated to an ellipsoid or hyperboloid, generalizing the
major and minor axes of an ellipse
75 / 77
References
76 / 77
References (French Literature)
77 / 77