0% found this document useful (0 votes)

35 views29 pages

HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022

The document discusses dimensionality reduction techniques. It begins by defining dimensionality reduction as reducing the number of random variables under consideration to obtain a set of principal variables while maintaining similar information. Some benefits mentioned are compressing data size, speeding up computations, removing redundant features, and allowing visualization of higher-dimensional data. Several techniques are then outlined, including removing features with many missing values, low variance filters, decision trees/random forests, and removing highly correlated features.

Uploaded by

abhishek mohite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

35 views29 pages

HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022

Uploaded by

abhishek mohite

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

The Curse of Dimensions...

Of all the facts which were presented to us

we had to pick just those which we deemed CSDC7013
to be essential, and then piece them
Mathematics for
together in their order, so as to reconstruct
this very remarkable chain of events. AIML
Dimensionality
Reduction

Amroz K. Siddiqui

Fr. C. Rodrigues Institute of

Technology, Vashi.

– Sherlock Holmes
– The Sign of Four

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 1 / 29

Outline

1 Motivation
What is Dimensionality Reduction?
What are the benefits of Dimension Reduction?

2 Dimensionality Reduction Techniques

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 2 / 29

Motivation

Data explosion
New ways of gathering data
Noise, unnecessary data
More is not always good.
Large amounts of data might sometimes produce worse performances
in data analytics applications.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 3 / 29

Motivation

Examples: Ways of collecting data

Political parties are capturing data by expanding their reach on field
Sports. Records explosion.
Organizations are evaluating their brand value by social media
engagements (comments, likes), followers, positive and negative
sentiments
With more variables, comes more trouble! And to avoid this trouble,
dimension reduction techniques comes to the rescue.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 4 / 29

Motivation

In machine learning classification problems, there are often too many

factors on the basis of which the final classification is done.
These factors are basically variables called features.
The higher the number of features, the harder it gets to visualize the
training set and then work on it.
Sometimes, most of these features are correlated, and hence
redundant.
This is where dimensionality reduction algorithms come into play.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 5 / 29

Motivation

What is Dimensionality Reduction?

Dimension Reduction refers to the process of converting a set of data
having vast dimensions into data with lesser dimensions ensuring that
it conveys similar information concisely.
Dimensionality Reduction is the process of reducing the number of
random variables under consideration, by obtaining a set of principal
variables.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 6 / 29

Example

Email Classification
Candidates for recruitment
Environmental variables
Sports variables

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 7 / 29

Example

A 3-D classification problem can be hard to visualize.

Whereas a 2-D one can be mapped to a simple 2 dimensional space.
And a 1-D problem to a simple line.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 8 / 29

Example

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 9 / 29

Example

Figure: 2D data converted to 1D data, if both convey the same meaning.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 10 / 29

Dimensionality Reduction

In similar ways, we can reduce n dimensions of data set to k

dimensions (k < n) .
These k dimensions can be directly identified (filtered).
Or can be a combination of dimensions (weighted averages of
dimensions)
Or new dimension(s) that represent existing multiple dimensions well.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 11 / 29

What are the benefits of Dimension Reduction?

It helps in data compressing and reducing the storage space required

It speeds up the time required for performing same computations.
Less dimensions leads to less computing, also less dimensions can
allow usage of algorithms unfit for a large number of dimensions
It removes redundant features. For example: there is no point in
storing a value in two different units (meters and inches).
Reducing the dimensions of data to 2D or 3D may allow us to plot
and visualize it precisely. You can then observe patterns more clearly.
It is helpful in noise removal also and as result of that we can improve
the performance of models.

Amroz K. Siddiqui (Fr. CRIT) Dimensionality Reduction September 27, 2022 12 / 29

Components of Dimensionality Reduction

There are two components of dimensionality reduction:

Feature selection: In this, we try to find a subset of the original set
of variables, or features, to get a smaller subset which can be used to
model the problem. It usually involves three ways:
1 Filter
2 Wrapper
3 Embedded
Feature extraction: This reduces the data in a high dimensional
space to a lower dimension space, i.e. a space with lesser no. of
dimensions.