0% found this document useful (0 votes)

50 views36 pages

Lecture 9 - Data Reduction

Data

Uploaded by

raoseshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views36 pages

Lecture 9 - Data Reduction

Data

Uploaded by

raoseshu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 36

Transfer Functions

Data Preprocessing
- Data Reduction
Data Preprocessing

 Data Preprocessing: An Overview

 Data Quality

 Major Tasks in Data Preprocessing

 Data Cleaning

 Data Integration

 Data Reduction

 Data Transformation and Data Discretization

2
2
Data Reduction Strategies

 Data reduction: Obtain a reduced representation of the data set that is

much smaller in volume but yet produces the same (or almost the same)
analytical results

 Why data reduction? — A database/data warehouse may store terabytes of

data. Complex data analysis may take a very long time to run on the
complete data set.

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes
 Numerosity reduction (some simply call it: Data Reduction)
 Data compression

3
Data Reduction Strategies

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes

 Wavelet transforms

 Principal Components Analysis (PCA)

 Feature subset selection, feature creation

 Numerosity reduction (some simply call it: Data Reduction)

 Regression and Log-Linear Models

 Histograms, clustering, sampling

 Data cube aggregation

 Data compression

4
Data Reduction: Dimensionality Reduction

 Curse of dimensionality
 When dimensionality increases, data becomes increasingly sparse
 Density and distance between points, which is critical to clustering,
outlier analysis, becomes less meaningful
 Dimensionality reduction
 Avoid the curse of dimensionality
 Help eliminate irrelevant features and reduce noise
 Reduce time and space required in data mining
 Allow easier visualization
 Dimensionality reduction techniques
 Wavelet transforms
 Principal Component Analysis
 Supervised and nonlinear techniques (e.g., feature selection)

5
Visualization Problem
 Not easy to visualize multivariate data
 - 1D: dot

 - 2D: Bivariate plot (i.e. X-Y plane)

 - 3D: X-Y-Z plot

 - 4D: ternary plot with a color code /Tetrahedron- 5D, 6D,

etc. : ???
Motivation

• Given data points in d dimensions

• Convert them to data points in r<d dimensions
• With minimal loss of information
Basics of PCA
 PCA is useful when we need to extract useful information
from multivariate data sets.

 This technique is based on the reduced dimensionality.

What is Principal Component

 A principal component can be defined as a linear

combination of optimally-weighted observed variables.
What are the new axes?

Original Variable B PC 2
PC 1

Original Variable A

• Orthogonal directions of greatest variance in data

• Projections along PC1 discriminate the data most along any one axis
Principle Component Analysis

PCA:
Orthogonal projection of data onto lower-dimension
linear space that...
• maximizes variance of projected data (purple line)

• minimizes mean squared distance between

• data point and
• projections (sum of blue lines) 14
The Principal Components
• Vectors originating from the center of mass

• Principal component #1 points

in the direction of the largest variance.

• Each subsequent principal component…

• is orthogonal to the previous ones, and
• points in the directions of the largest
variance of the residual subspace

15
2D Gaussian dataset

16
1st PCA axis

17
2nd PCA axis

18
Principal component analysis
• Principal component analysis (PCA) is a procedure which
uses the correlations between the variables to identify
which combinations of variables capture most information
about the dataset

• Mathematically, it determines the eigenvectors of the

covariance matrix and sorts them in importance according
to their corresponding eigenvalues
Basics for Principal Component Analysis

• Orthogonal/Orthonormal

• Standard deviation, Variance, Covariance

• The Covariance matrix

• Eigenvalues and Eigenvectors

Covariance

• Standard Deviation and Variance are 1-dimensional

• How much do the dimensions vary from the mean with respect to each other ?

• Covariance measures between 2 dimensions

We easily see, if X=Y we end up with variance

Covariance Matrix

• Let X be a random vector.

• Then the covariance matrix of X, denoted by Cov(X), is

• The diagonals of Cov(X) are .

• In matrix notation,

The covariance matrix is symmetric

Orthogonality/Orthonormality

1.5 <v1,v2> = <(1 0),(0 1)>

= 0
1
0.5

0.5 1.0 1.5

• Two vectors v1 and v2 for which <v1,v2>=0 holds are said to be orthogonal

• Unit vectors which are orthogonal are said to be orthonormal.

Eigenvalues/Eigenvectors

• Let A be an nxn square matrix and x an nx1 column vector. Then a (right)
eigenvector of A is a nonzero vector x such that:

Eigenvalue Eigenvector

Procedure:
Finding the eigenvalues

=0 Finding lambdas

Find corresponding eigenvectors

Transformation

• Looking for a transformation of the data matrix X (pxn) such that

Y= T X=1 X1+ 2 X2+..+ p Xp

Transformation

What is a reasonable choice for the  ?

Remember: We wanted a transformation that maximizes information

That means: captures Variance in the data

Maximize the variance of the projection of the observations on the Y

variables !
Find  such that

Var(T X) is maximal

The matrix C=Var(X) is the covariance matrix of the Xi variables

Transformation
Can we intuitively see that in a picture?

Good Better
 v( x1 ) c(x1,x2 ) ........c(x1,x p ) 
 
 c(x1,x2 ) v( x2 ) ........c(x2 ,x p ) 
Cov(X)=  
 
 c(x ,x ) c(x ,x )..........v( x ) 
 1 p 2 p p 
PCA algorithm
(based on sample covariance matrix)
• Given data {x1, …, xm}, compute covariance matrix 

1 m 1 m
   (x i  x)( x  x) T where x   xi
m i 1 m i 1

• PCA basis vectors = Compute the eigenvectors of 

• Larger eigenvalue  more important eigenvectors

29
PCA – zero mean
• Suppose we are given x1, x2, ..., xM (N x 1) vectors
N: # of features
Step 1: compute sample mean M: # data
M
1
x
M
x
i 1
i

Step 2: subtract sample mean (i.e., center data at zero)

Φi  xi  x
Step 3: compute the sample covariance matrix Σx
1 M
1 M
1 where A=[Φ1 Φ2 ... ΦΜ]
x 
M

i 1
( x i  x )( x i  x )T

M

i 1
 T
i
i  
M
AAT
i.e., the columns of A are the Φi
(N x M matrix)

30
PCA - Steps
Step 4: compute the eigenvalues/eigenvectors of Σx
 xui  iui
where we assume 1  2  ...  N
Note : most software packages return the eigenvalues (and corresponding eigenvectors)
is decreasing order – if not, you can explicitly put them in this order)

Since Σx is symmetric, <u1,u2,…,uN> form an orthogonal basis

in RN and we can represent any x∈RN as: x 
x 
y 
y 
1 1

 2  2
N  .   . 

x  x   yi ui  y1u1  y2u2  ...  y N u N

   
 .  .
xx:  
 .   . 
i 1    
i.e., this is  .   . 
just a “change”  .   . 
(x  x)T ui    
yi  T
 ( x  x )T
ui if || ui || 1 of basis!  xN   y N 
ui ui
Note : most software packages normalize ui to unit length to simplify calculations; if
not, you can explicitly normalize them) 31
PCA - Steps
Step 5: dimensionality reduction step – approximate x using
only the first K eigenvectors (K<<N) (i.e., corresponding to
the K largest eigenvalues where K is a parameter)

32
Example
• Compute the PCA of the following dataset:

(1,2),(3,3),(3,5),(5,4),(5,6),(6,5),(8,7),(9,8)

• Compute the sample covariance matrix is:

• The eigenvalues can be computed by finding the roots of the

characteristic polynomial:

33
Example (cont’d)
• The eigenvectors are the solutions of the systems:
xui  iui

Note: if ui is a solution, then cui is also a solution where c≠0.

Eigenvectors can be normalized to unit-length using:

vi
vˆi 
|| vi ||
34
Choosing the projection dimension K ?

• K is typically chosen based on how much information

(variance) we want to preserve:
K

Choose the smallest  i

K that satisfies
i 1
N
T where T is a threshold (e.g., 0.9)
the following
inequality: 
i 1
i

• If T=0.9, for example, we “preserve” 90% of the information

(variance) in the data.

• If K=N, then we “preserve” 100% of the information in the

data (i.e., just a “change” of basis and xˆ  x )

35
Data Normalization

• The principal components are dependent on the units used

to measure the original variables as well as on the range of
values they assume.

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

to have zero mean and unit standard deviation:

xi   where μ and σ are the mean and standard

deviation of the i-th feature xi

36

Principal Component Analysis
No ratings yet
Principal Component Analysis
13 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
8 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
No ratings yet
Program: Course Code: Course Name:: M.C.A. MCAS9220 Data Science Fundamentals
28 pages
RES805-RM-Module 2
No ratings yet
RES805-RM-Module 2
26 pages
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
No ratings yet
Singular Value Decomposition (SVD) / Principal Components Analysis (Pca)
31 pages
Basic Theory
No ratings yet
Basic Theory
4 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
9 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Principal Component Analysis Concepts
No ratings yet
Principal Component Analysis Concepts
16 pages
PPT1
No ratings yet
PPT1
93 pages
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
No ratings yet
Pattern Recognition PCA: Subrata Datta Dept. of AIML Nsec
19 pages
Multivariate Statistical Analysis
No ratings yet
Multivariate Statistical Analysis
12 pages
Pattern Recognition (CSE4213) : Principal Components Analysis (PCA)
No ratings yet
Pattern Recognition (CSE4213) : Principal Components Analysis (PCA)
38 pages
Principal Components Analysis (PCA)
No ratings yet
Principal Components Analysis (PCA)
27 pages
UploadFile 9116
No ratings yet
UploadFile 9116
21 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
20 Pca
No ratings yet
20 Pca
50 pages
MLPDF 2
No ratings yet
MLPDF 2
9 pages
Data Analysis: Dr. C Santhosh Kumar
No ratings yet
Data Analysis: Dr. C Santhosh Kumar
22 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Module 5 - BECE309L - AIML - Part2
No ratings yet
Module 5 - BECE309L - AIML - Part2
34 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
10 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Principal Component Analysis Concepts: T56Gzsrvah
No ratings yet
Principal Component Analysis Concepts: T56Gzsrvah
16 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
Presentation A I STD 2
No ratings yet
Presentation A I STD 2
63 pages
7.3 Pca
No ratings yet
7.3 Pca
17 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Principal Component Analysis (PCA)
No ratings yet
Principal Component Analysis (PCA)
18 pages
Multivariate Statistics Principal Component Analysis (PCA)
No ratings yet
Multivariate Statistics Principal Component Analysis (PCA)
41 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Dimensionality Reduction Using PCA (Principal Component Analysis)
No ratings yet
Dimensionality Reduction Using PCA (Principal Component Analysis)
13 pages
Principal Component Analysis and Cluster Analysis
No ratings yet
Principal Component Analysis and Cluster Analysis
14 pages
M2 PPT
No ratings yet
M2 PPT
60 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
27 pages
Projecting Data To A Lower Dimension With PCA
No ratings yet
Projecting Data To A Lower Dimension With PCA
6 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
3.2 Pca
No ratings yet
3.2 Pca
27 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
Dimensionality Reduction by Pca: Non - Feasible
No ratings yet
Dimensionality Reduction by Pca: Non - Feasible
26 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
34 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
U4 - PCA - 5th Sem - DS
No ratings yet
U4 - PCA - 5th Sem - DS
14 pages
PCA
100% (1)
PCA
33 pages
Data Pre-Processing-IV (Feature Extraction-PCA)
No ratings yet
Data Pre-Processing-IV (Feature Extraction-PCA)
23 pages
Need of Principal Component Analysis
No ratings yet
Need of Principal Component Analysis
8 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
10 pages
Pac
No ratings yet
Pac
70 pages
03 Preprocessing
No ratings yet
03 Preprocessing
18 pages
1501589578da Mod15 Q1 e Text
No ratings yet
1501589578da Mod15 Q1 e Text
9 pages
PCA Finds Representation Through Linear Transformation
No ratings yet
PCA Finds Representation Through Linear Transformation
28 pages
Unit 3
No ratings yet
Unit 3
28 pages
Principal Computer Analysis (PCA)
No ratings yet
Principal Computer Analysis (PCA)
25 pages
Normalization
No ratings yet
Normalization
35 pages
Presentation
No ratings yet
Presentation
31 pages
Principal Components Analysis (PCA) Final
No ratings yet
Principal Components Analysis (PCA) Final
23 pages
PCA Complete
No ratings yet
PCA Complete
8 pages
S1-21 - DSECLZC415 Data Pre-Processing: BITS Pilani
No ratings yet
S1-21 - DSECLZC415 Data Pre-Processing: BITS Pilani
54 pages
Modified Module 2-DM
No ratings yet
Modified Module 2-DM
107 pages
Data Mining Methods: Data Pre-Processing: Prof. Dr. Christina Andersson
No ratings yet
Data Mining Methods: Data Pre-Processing: Prof. Dr. Christina Andersson
33 pages
10-601 Machine Learning (Fall 2010) Principal Component Analysis
No ratings yet
10-601 Machine Learning (Fall 2010) Principal Component Analysis
8 pages
Lec 1 Data Acquisition and Preprocessing
No ratings yet
Lec 1 Data Acquisition and Preprocessing
8 pages
Data Mining Basic Techniques
No ratings yet
Data Mining Basic Techniques
14 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
25 pages
Digital Signal Processing (DSP) with Python Programming
From Everand
Digital Signal Processing (DSP) with Python Programming
Maurice Charbit
No ratings yet

Lecture 9 - Data Reduction

Uploaded by

Lecture 9 - Data Reduction

Uploaded by

Transfer Functions

 Data Preprocessing: An Overview

 Major Tasks in Data Preprocessing

 Data Transformation and Data Discretization

 Data reduction: Obtain a reduced representation of the data set that is

 Why data reduction? — A database/data warehouse may store terabytes of

 Data reduction strategies

 Data reduction strategies

 Dimensionality reduction, e.g., remove unimportant attributes

 Principal Components Analysis (PCA)

 Feature subset selection, feature creation

 Numerosity reduction (some simply call it: Data Reduction)

 Histograms, clustering, sampling

 Data cube aggregation

 - 2D: Bivariate plot (i.e. X-Y plane)

 - 3D: X-Y-Z plot

 - 4D: ternary plot with a color code /Tetrahedron- 5D, 6D,

• Given data points in d dimensions

 This technique is based on the reduced dimensionality.

 A principal component can be defined as a linear

• Orthogonal directions of greatest variance in data

• minimizes mean squared distance between

• Principal component #1 points

• Each subsequent principal component…

• Mathematically, it determines the eigenvectors of the

• Standard deviation, Variance, Covariance

• The Covariance matrix

• Eigenvalues and Eigenvectors

• Standard Deviation and Variance are 1-dimensional

• Covariance measures between 2 dimensions

We easily see, if X=Y we end up with variance

• Let X be a random vector.

• Then the covariance matrix of X, denoted by Cov(X), is

• The diagonals of Cov(X) are .

The covariance matrix is symmetric

1.5 <v1,v2> = <(1 0),(0 1)>

0.5 1.0 1.5

• Unit vectors which are orthogonal are said to be orthonormal.

Find corresponding eigenvectors

• Looking for a transformation of the data matrix X (pxn) such that

Y= T X=1 X1+ 2 X2+..+ p Xp

What is a reasonable choice for the  ?

Remember: We wanted a transformation that maximizes information

That means: captures Variance in the data

Maximize the variance of the projection of the observations on the Y

The matrix C=Var(X) is the covariance matrix of the Xi variables

• PCA basis vectors = Compute the eigenvectors of 

• Larger eigenvalue  more important eigenvectors

Step 2: subtract sample mean (i.e., center data at zero)

Since Σx is symmetric, <u1,u2,…,uN> form an orthogonal basis

x  x   yi ui  y1u1  y2u2  ...  y N u N

• Compute the sample covariance matrix is:

• The eigenvalues can be computed by finding the roots of the

Note: if ui is a solution, then cui is also a solution where c≠0.

Eigenvectors can be normalized to unit-length using:

• K is typically chosen based on how much information

Choose the smallest  i

• If T=0.9, for example, we “preserve” 90% of the information

• If K=N, then we “preserve” 100% of the information in the

• The principal components are dependent on the units used

• Data should always be normalized prior to using PCA.

• A common normalization method is to transform all the data

xi   where μ and σ are the mean and standard

You might also like