Chapter 5 Dimensional Reduction Methods
Chapter 5 Dimensional Reduction Methods
CHAPTER 5
At the end of this chapter
students should be able to
understand:
Dimensionality
Reduction Methods
1
OVERVIEW
2
Unsupervised Learning – Dimension Reduction
Unsupervised Learning
Datasets in the form of matrices
We are given n objects and p features describing the
objects.
Dataset
An n-by-p matrix A
n rows representing n objects
Each object has p numeric values describing it.
Goal
1. Understand the structure of the data, e.g., the
underlying process generating the data.
2. Reduce the number of features representing the data
Unsupervised Learning
Example - Market basket matrices
---------------------------------------
-------------------------------------------- Aij= quantity of j-th product
n customers purchased by the i-th
A customer
Aim: find a subset of the products that characterize customer behavior
5 Unsupervised Learning
Dimensionality reduction method:
• Singular Value Decomposition (SVD)
• Principal Components Analysis (PCA)
• Canonical Correlation Analysis (CCA)
• Multi-dimensional scaling (MDS)
• Independent component analysis (ICA)
Overview
7
SVD – general overview
The singular value decomposition (SVD) provides another way to factorize a
matrix, into singular vectors and singular values. The SVD is used widely both
in the calculation of other matrix operations, such as matrix inverse, but also as a
data reduction method in machine learning. Data matrices have n rows (one for
each object) and p columns (one for each feature).
Data
Matrix
n x n matrix
m x n matrix *Rows of VT = Right Singular Value
Data matrix *Column of V = Orthonormal
eigenvector ATA
n
m x m matrix m x n diagonal matrix
*Column of U = Left singular *Nonzero value in diagonal = Singular
n
value Value
*Orthonormal eigenvector *Diagonal matrix = square roots of
n
m AAT eigenvalue of U and V in descending
order VT
m
m
S
U
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
All rights reserved. 9 9
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright
Internalowner.
SVD – general overview
1
SVD - Singular What are singular values in a matrix?
Note:
5 – most preferred
© 2021 UNIVERSITI TEKNOLOGI PETRONAS 0 – not preferred
All rights reserved. 15
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
SVD – Example
Represent 53.44%
User 1 of the dataset
User 2 Represent 40.95%
of the dataset
User 3
User 4
User 5
Represent 5.61% of
User 6 the dataset
User 7
Represent 90% of
User 1 the dataset
User 2
User 3
User 4
User 5
User 6
User 7
Conclusion: It is observed that the amount percentage variance explained (explained variance ratio) by each principal component
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
18
All rights reserved.
has different value: 53.44% of the variance is explained by the 1st component, 40.95% by the 2nd component, 15.46%
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
• SVD – Example (Users to Movies)
27
Principal Component Analysis (PCA)
Principal Component Analysis, PCA is a “dimensionality reduction” method. It
reduces the number of variables that are correlated to each other into fewer
independent variables without losing the essence of these variables. It
provides an overview of linear relationships between inputs and variables.
Now you
may
compare the
scores.
Any idea?
SVD vs PCA
Covariance” indicates
the direction of the
linear relationship
between variables.
This data set contains arrests per 100,000 residents for assault, murder,
and rape in each of the 50 US states in 1973. Also given is the percent
of the population living in urban areas.
A data frame with 50 observations on 4 variables.
•Murder numeric Murder arrests (per 100,000)
•Assault numeric Assault arrests (per 100,000)
•UrbanPop numeric Percent urban population
•Rape numeric Rape arrests (per 100,000)
The size of the matrix is 50 states x 4 variables – which is very
large to be processed if it is not reduced.
Link: https://rstudio-pubs-static.s3.amazonaws.com/377338_75ed92a8463d482a80045abcae0e395d.html
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
All rights reserved. 39
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
PCA - Example
Data values are not in the same scales and the means,
Data Structure medians and variances are not in the same range
Data Summary
Link: https://rstudio-pubs-static.s3.amazonaws.com/377338_75ed92a8463d482a80045abcae0e395d.html
We create the principal components for the four variables to explain the variance vectors
in the dataset without including the correlation between variables (PC1,PC2,PC 3, and
PC 4).The rotation matrix provides the principal component loadings vector.
Elbow point
•The first loading vector places approximately equal weight on Assault, Murder, and
Rape, with much less weight on UrbanPop. Hence this component roughly
corresponds to a measure of overall rates of serious crimes.
•The second loading vector places most of its weight on UrbanPop and much less
weight on the other three features. Hence, this component roughly corresponds to
the level of urbanization of the state.
© 2021 UNIVERSITI TEKNOLOGI PETRONAS
All rights reserved. 44
No part of this document may be reproduced, stored in a retrieval system or transmitted in any form or by any means (electronic, mechanical, photocopying, recording or otherwise)
without the permission of the copyright owner.
PCA - Example
The biplot shows that 50 states mapped to the 2
principal components. The vectors of the PCA for 4
variables are also plotted.
The distance to the origin also conveys information. The further away from the plot origin a variable
lies, the stronger the impact that variable has on the model. This means, for instance, that the
variables crisp bread (Crisp_br), frozen fish (Fro_Fish), frozen vegetables (Fro_Veg) and garlic
(Garlic) separate the four Nordic countries from the others. The four Nordic countries are
characterized as having high values (high consumption) of the former three provisions, and low
consumption of garlic. Moreover, the model interpretation suggests that countries like Italy, Portugal,
Spain and to some extent, Austria have high consumption of garlic, and low consumption of
sweetener, tinned soup (Ti_soup) and tinned fruit (Ti_Fruit).
Example PCA – simple demonstration
Example PCA
PCA - Applications
1. Image compression. Image can be resized as per the requirement and patterns
can be determined.
2. Customer profiling based on demographics as well as their intellect in the
purchase.
3. Widely used by researchers in the food science field.
4. Banking field in many areas like applicants applied for loans, credit cards, etc.
5. Customer Perception towards brands.
6. Finance field to analyze stocks quantitatively, forecasting portfolio returns, also
in the interest rate implantation.
7.Healthcare industries in multiple areas like patient insurance data where there
are multiple sources of data and with a huge number of variables