0% found this document useful (0 votes)

31 views16 pages

What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium

The article discusses dimension reduction in data science, highlighting the challenges posed by large feature sets, such as overfitting and increased storage requirements. It explains key techniques like Linear Discriminant Analysis (LDA) for supervised data and Principal Component Analysis (PCA) for unsupervised data, emphasizing their roles in compressing data while retaining essential information. The benefits of dimension reduction include improved model training efficiency and reduced noise from correlated features.

Uploaded by

Anand Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

31 views16 pages

What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium

Uploaded by

Anand Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

5/4/2021 What Is Dimension Reduction In Data Science?

| by Farhad Malik | FinTechExplained | Medium

What Is Dimension Reduction In

Data Science?
Farhad Malik Follow
Dec 21, 2018 · 7 min read

We have access to a large amounts of data now. The large amount of data
can lead us to situations where by we take every possible data that is
available to us and feed it into a forecasting model to predict our target
variable. This article aims to explain the common issues associated with
introduction of large set of features and provides solutions which we can
utilise to resolve those problems.

It is crucial for every data scientist and machine learning expert to understand
what dimension reduction techniques are and when to use them.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 1/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Photo by Sergi Kabrera on Unsplash

Let’s Understand The Issues Better

Occasionally we gather data for our data science project and end up
gathering a large set of features. Some of these features (known as
variables) are not as important as others. Sometimes the features
themselves are correlated with each other. And occasionally we end up
https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 2/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

over-fitting the problem by introducing too many features. The large

number of features make the data set sparse.

Furthermore, it takes a much larger space to store a data set with a large
number of features. Moreover, it can get very difficult to analyse and
visualize a data set with a large number of dimensions.

Dimension reduction can reduce the time that is

required to train our machine learning model and it
can also benefit in eliminating over-fitting.

This article outlines the techniques which we can follow to compress our
data set onto a new feature subspace of lower dimensionality. I will also be
providing details of important dimension reduction techniques.

Please read FinTechExplained d isclaimer .

How Do I Define Dimension Reduction?

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 3/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Imagine you want to e-mail a large set of files to your friend. Uploading and
sending the files might take a longer time. You can speed up the process of
uploading of the files by zipping the files and e-mail the zipped file instead.
Zipping the file compresses large quantity of data into smaller equivalent
sets.

Dimension reduction is the same principal as

zipping the data.

Dimension reduction compresses large set of

features onto a new feature subspace of lower
dimensional without losing the important
information.

Although the slight difference is that dimension reduction techniques will

lose some of the information when the dimensions are reduced.

It is harder to visualise a large set of dimensions. Dimension reduction

techniques can be employed to make a 20+ dimension feature space into 2
https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 4/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

or 3 dimension subspace.

What Are Different Dimension Reduction

Techniques?
Before we take a deep dive into the key techniques, let’s quickly understand
the two main areas of machine learning:

1. Supervised — when the results of the training set are known

2. Unsupervised — when the final outcome is not known

If you want to get a better understanding of machine learning then have a

look at my article:

Machine Learning In 8 Minutes

Machine learning is the present and the future. All technologists,
data scientists and financial experts can benefit…
medium.com

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 5/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

There are a large number of techniques to reduce the dimensions such as

forward/backward feature selection or combining the dimensions together
by calculating weighted average of the correlated features. However in this
article I will explore two of the main techniques of dimension reduction:

Linear Discriminant Analysis (LDA):

LDA is used for compressing supervised data

When we have a large set of features (classes), and our data is normally
distributed and the features are not correlated with each other then we can
use LDA to reduce the number of dimensions. LDA is a generalised version
of Fisher’s linear discriminant.

Calculate z-score to normalise the features that are highly skewed.

If you want to understand how to enrich features and calculate z-score then
have a look at this article:

Processing Data To Improve Machine Learning Models

Accuracy
Occasionally we build a machine learning model, train it with our
training data, and when we get it to predict future…
medium com
https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 6/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium
medium.com

Sci-kit learn offers easy to use LDA tools:

from sklearn.lda import LDA

my_lda = LDA(n_components=3)
lda_components = my_lda.fit_transform(X_train, Y_train)

This code will result in producing three LDA components for the entire data
set.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 7/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Photo by NASA on Unsplash

Principal component analysis (PCA):

They are mainly used for compressing unsupervised data.

PCA is a very useful technique that can help de-noise and detect patterns in
data. PCA is used in reducing dimensions in images, textual contents and in
speech recognition systems.

Sci-kit learn library offers a powerful PCA component classifier. This code
snippet illustrates how to create PCA components:

from sklearn.decomposition import PCA

pca_classifier = PCA(n_components=3)
my_pca_components = pca_classifier.fit_transform(X_train)

It is wise to understand how PCA works.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 8/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Understanding PCA
This section of the article provides an overview of the process:

PCA technique analyses the entire data set and then finds the points
with maximum variance.

It creates new variables such that there is a linear relationship between

the new and original variables such that the variance is maximised.

Covariance matrix is then created for the features to understand their

multi-collinearity.

Once the variance-covariance matrix is computed, PCA then uses the

gathered information to reduce the dimensions. It computes orthogonal
axes from the original feature axes. These are the axes of directions with
maximum variance.

Firstly the eigenvectors of the variance-covariance matrix are calculated.

The vector represents the directions of maximum variance which are known
as the principal components. The eigenvalues are then created that define
magnitude of the principal components.

The eigenvalues are the PCA components.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 9/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Therefore, for N dimensions, there will be a NxN variance-covariance

matrix and as a result, we will have a eigen vector of N values and N eigen
values matrix.

We can use following python modules to create the components:

Use linalg.eig to create eigen vectors

Use numpy.cov to compute variance-covariance matrix

We need to take the eigen vectors that represent the our data set best. These
are the vectors which we have highest eigenvalues.

Take eigen vectors that capture about 70% of the

variance.

Remember eigenvectors with largest eigenvalues are the ones with highest
variance and they are closest to the original data set. Also larger the number of
eigenvectors, slower the computation performance.

I normally take 2–3 top eigen vectors to represent the data set.
https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 10/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

If we want to keep sci-kit learn to give us all of the PCA components so that
we can assess the variance then initialise PCA with None components:

Photo by Chang Qing on Unsplash

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 11/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

It is important to normalise/standardise the data

before performing PCA because PCA is sensitive to
the scale of the data in the features.

Kernel principal component analysis (KDA):

They are used for Nonlinear dimensionality reduction

When we have non-linear features then we can project them onto a larger
feature set to remove their correlations and to make them linear.

Essentially, non-linear data is mapped and transformed onto a higher-

dimensional space. Then PCA is used to reduce the dimensions. However,
one downside of this approach is that it is computationally very expensive.

Just like in PCA, we first compute variance-covariance matrix and then

eigen vectors and eigen values are prepared with the highest variance to
compute principal components.

We then compute kernel matrix. This requires us to construct a similarity

matrix. The matrix is then decomposed via creating eigen values and eigen

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 12/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

vectors.

Sci-Kit learn offers Kernal PCA modules. To use Kernal PCA, we can use
following snippet of code:

from sklearn.decomposition import KernelPCA

kpca = KernelPCA(n_components=2,kernel='rbf', gamma=45)
kpca_components = kpca.fit_transform(X)

Gamma is a tuning parameter of the RBF kernel.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 13/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Photo by Billy Huynh on Unsplash

Benefits Of Dimension Reduction

This section briefly outlines the core benefits of reducing dimensions.

We have access to a large set of data now. When we are building forecasting
models that are trained on images, sound and/or textual contents then the
input feature sets can end up having a large set of features. It increases
space, further adds over-fitting and slows down the time to train the
models. Occasionally features are introduced that end up adding more
noise than expected.

One of the key methodologies to improve efficiency in computational

intensive tasks is to reduce the dimensions after ensuring most of the key
information is maintained. It also eliminates features with strong
correlation between them and reduces over-fitting.

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 14/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

Summary
This article provided an overview of the techniques which we can follow to
compress our data set onto a new feature subspace of lower dimensionality.
It also provided details of important dimension reduction techniques.

Lastly the benefits of dimension reductions were summarised.

Please let me know if there are any questions.

Data Science Machine Learning Fintech Analysis Artificial Intelligence

Learn more. Make Medium yours. Share your thinking.

Medium is an open platform where 170 million Follow the writers, publications, and topics that If you have a story to tell, knowledge to share, or
readers come to find insightful and dynamic matter to you, and you’ll see them on your a perspective to offer — welcome home. It’s
thinking. Here, expert and undiscovered voices homepage and in your inbox. Explore easy and free to post your thinking on any topic.
alike dive into the heart of any topic and bring Write on Medium
new ideas to the surface. Learn more

About Help Legal

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 15/16
5/4/2021 What Is Dimension Reduction In Data Science? | by Farhad Malik | FinTechExplained | Medium

https://radiant-brushlands-42789.herokuapp.com/medium.com/fintechexplained/what-is-dimension-reduction-in-data-science-2aa5547f4d29 16/16

Summer Internship Project Report
100% (1)
Summer Internship Project Report
66 pages
Java MCQ
No ratings yet
Java MCQ
55 pages
Dimension Reduction
No ratings yet
Dimension Reduction
38 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Unit-3 Data Reduction
No ratings yet
Unit-3 Data Reduction
5 pages
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
No ratings yet
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
29 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
PCA
No ratings yet
PCA
21 pages
ISOMAP in ML
No ratings yet
ISOMAP in ML
12 pages
DR
No ratings yet
DR
20 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Ann Unit V
No ratings yet
Ann Unit V
30 pages
Data Reduction
No ratings yet
Data Reduction
23 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Research Citation Notes
No ratings yet
Research Citation Notes
35 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
4 Data Reduction Techniques For Efficient Data Analysis
No ratings yet
4 Data Reduction Techniques For Efficient Data Analysis
10 pages
Acosta Vs Ochoa
0% (1)
Acosta Vs Ochoa
3 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
Dimentiality
No ratings yet
Dimentiality
4 pages
Inbound 3415279694782152083
No ratings yet
Inbound 3415279694782152083
6 pages
Dimension Reduction - Dimensionality Reduction Techniques
No ratings yet
Dimension Reduction - Dimensionality Reduction Techniques
5 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Module 3 ML
No ratings yet
Module 3 ML
19 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
Introduction To Data Science 8-2-2025
No ratings yet
Introduction To Data Science 8-2-2025
6 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
Data Reduction
No ratings yet
Data Reduction
9 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
Deep Learning For Data Analytics 2023 Answer
No ratings yet
Deep Learning For Data Analytics 2023 Answer
6 pages
DMDW 5
No ratings yet
DMDW 5
25 pages
ML Chapter 4
No ratings yet
ML Chapter 4
38 pages
dmdw2 2
No ratings yet
dmdw2 2
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
ICACCI 2015 7275954-Pca
No ratings yet
ICACCI 2015 7275954-Pca
4 pages
What Is Dimensionality Reduction - Definition From TechTarget
No ratings yet
What Is Dimensionality Reduction - Definition From TechTarget
5 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
3 pages
DWH Unit-3
No ratings yet
DWH Unit-3
12 pages
Dimensionality Reduction in Machine Learning-1
No ratings yet
Dimensionality Reduction in Machine Learning-1
16 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Project (Time) Control For An EPC Project
No ratings yet
Project (Time) Control For An EPC Project
12 pages
Dimensionality Reduction Report-Yomna Eid Rizk
No ratings yet
Dimensionality Reduction Report-Yomna Eid Rizk
6 pages
Witherby Seamanship2014 PDF
67% (3)
Witherby Seamanship2014 PDF
92 pages
A Comparative Study Between Traditional and Modern Recruitment Techniques
No ratings yet
A Comparative Study Between Traditional and Modern Recruitment Techniques
98 pages
Practise Questions For Test 2
No ratings yet
Practise Questions For Test 2
10 pages
Xilinx System Generator For DSP PDF
No ratings yet
Xilinx System Generator For DSP PDF
376 pages
Winback - en Brochure Rshock Version J3 Mars 2021 A
100% (1)
Winback - en Brochure Rshock Version J3 Mars 2021 A
12 pages
MAP050-King in Yellow in Carcosa - Compressed
No ratings yet
MAP050-King in Yellow in Carcosa - Compressed
11 pages
4 交易之王语录
No ratings yet
4 交易之王语录
98 pages
Deathworld Harry Harrison
No ratings yet
Deathworld Harry Harrison
153 pages
10 Vallarta v. CA
No ratings yet
10 Vallarta v. CA
2 pages
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
No ratings yet
CS4670: Computer Vision: Lecture 5: Feature Detection and Matching
46 pages
WORKBOOK - Product Design Workshop-2
No ratings yet
WORKBOOK - Product Design Workshop-2
34 pages
ChallengeJP Simple CashFlow Template
No ratings yet
ChallengeJP Simple CashFlow Template
28 pages
Ddos Detection Approach Based On Continual Learning in The SDN Environment
No ratings yet
Ddos Detection Approach Based On Continual Learning in The SDN Environment
10 pages
Black Box and White Box Testing
No ratings yet
Black Box and White Box Testing
5 pages
BMW PDF
No ratings yet
BMW PDF
38 pages
Team 7's Homebrew Handbook of Emerging Shinobi Talent (Season 3) - The Homebrewery
No ratings yet
Team 7's Homebrew Handbook of Emerging Shinobi Talent (Season 3) - The Homebrewery
2 pages
Administration: Order of Completion
No ratings yet
Administration: Order of Completion
24 pages
Case Study - Yangpu - Riverfront
No ratings yet
Case Study - Yangpu - Riverfront
2 pages
Business Communication Report
No ratings yet
Business Communication Report
15 pages
March Pump SP-TE-7K-MD
No ratings yet
March Pump SP-TE-7K-MD
2 pages
Backface Removal
No ratings yet
Backface Removal
4 pages
Lion Air Eticket (IQVQBS) - Diyarn Putra Maulana
No ratings yet
Lion Air Eticket (IQVQBS) - Diyarn Putra Maulana
4 pages
Application For Probation 175 Basmayor
No ratings yet
Application For Probation 175 Basmayor
3 pages
Choosing Between "Component" and "Cartridge" Mechanical Seals
No ratings yet
Choosing Between "Component" and "Cartridge" Mechanical Seals
3 pages
Habeas
No ratings yet
Habeas
5 pages
From: Sent: To: Subject
No ratings yet
From: Sent: To: Subject
2 pages
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Mastering Big Data in Finance: Analytics and Risk Assessment: Digital Life, #1
From Everand
Mastering Big Data in Finance: Analytics and Risk Assessment: Digital Life, #1
Tony Sale
No ratings yet
Modin for Scalable Data Science: The Complete Guide for Developers and Engineers
From Everand
Modin for Scalable Data Science: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
Data Science Unveiled: A Practical Guide to Key Techniques
From Everand
Data Science Unveiled: A Practical Guide to Key Techniques
Ed A Norex
No ratings yet
Big Data for Enterprise Architects
From Everand
Big Data for Enterprise Architects
Dr Mehmet Yildiz
4.5/5 (3)
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
From Everand
Architecting Big Data & Analytics Solutions - Integrated with IoT & Cloud
Dr Mehmet Yildiz
4.5/5 (2)
Detectron2 in Practice: Definitive Reference for Developers and Engineers
From Everand
Detectron2 in Practice: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
From Everand
Advanced Resilient Distributed Datasets in Distributed Computing: Definitive Reference for Developers and Engineers
Richard Johnson
No ratings yet
The Power of Big Data: Transforming Industries and Shaping the Future
From Everand
The Power of Big Data: Transforming Industries and Shaping the Future
Tom Henricksen
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Few-Shot Machine Learning: Doing More with Less Data
From Everand
Few-Shot Machine Learning: Doing More with Less Data
Robert Johnson
No ratings yet
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet
Dimensionality Reduction: Advancements in data processing for intelligent systems
From Everand
Dimensionality Reduction: Advancements in data processing for intelligent systems
Fouad Sabry
No ratings yet

What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium

Uploaded by

What Is Dimension Reduction in Data Science - by Farhad Malik - FinTechExplained - Medium

Uploaded by

5/4/2021 What Is Dimension Reduction In Data Science?

| by Farhad Malik | FinTechExplained | Medium

What Is Dimension Reduction In

Photo by Sergi Kabrera on Unsplash

Let’s Understand The Issues Better

over-fitting the problem by introducing too many features. The large

Dimension reduction can reduce the time that is

Please read FinTechExplained d isclaimer .

How Do I Define Dimension Reduction?

Dimension reduction is the same principal as

Dimension reduction compresses large set of

Although the slight difference is that dimension reduction techniques will

It is harder to visualise a large set of dimensions. Dimension reduction

What Are Different Dimension Reduction

1. Supervised — when the results of the training set are known

2. Unsupervised — when the final outcome is not known

If you want to get a better understanding of machine learning then have a

Machine Learning In 8 Minutes

There are a large number of techniques to reduce the dimensions such as

Linear Discriminant Analysis (LDA):

Calculate z-score to normalise the features that are highly skewed.

Processing Data To Improve Machine Learning Models

Sci-kit learn offers easy to use LDA tools:

from sklearn.lda import LDA

Photo by NASA on Unsplash

Principal component analysis (PCA):

from sklearn.decomposition import PCA

It is wise to understand how PCA works.

It creates new variables such that there is a linear relationship between

Covariance matrix is then created for the features to understand their

Once the variance-covariance matrix is computed, PCA then uses the

Firstly the eigenvectors of the variance-covariance matrix are calculated.

The eigenvalues are the PCA components.

Therefore, for N dimensions, there will be a NxN variance-covariance

We can use following python modules to create the components:

Use linalg.eig to create eigen vectors

Take eigen vectors that capture about 70% of the

Photo by Chang Qing on Unsplash

It is important to normalise/standardise the data

Kernel principal component analysis (KDA):

Essentially, non-linear data is mapped and transformed onto a higher-

Just like in PCA, we first compute variance-covariance matrix and then

We then compute kernel matrix. This requires us to construct a similarity

from sklearn.decomposition import KernelPCA

Gamma is a tuning parameter of the RBF kernel.

Photo by Billy Huynh on Unsplash

Benefits Of Dimension Reduction

One of the key methodologies to improve efficiency in computational

Lastly the benefits of dimension reductions were summarised.

Please let me know if there are any questions.

Data Science Machine Learning Fintech Analysis Artificial Intelligence

Learn more. Make Medium yours. Share your thinking.

About Help Legal

You might also like