Dimension Reduction

Dimensionality reduction techniques transform high-dimensional data into a lower-dimensional representation while retaining important information. This is done to reduce overfitting, improve model performance, enable faster computation and require less storage. Common linear dimensionality reduction methods include principal component analysis (PCA) which projects data along axes of maximum variance to extract principal components that capture most information with fewer dimensions.

Uploaded by

apurva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views38 pages

Dimension Reduction

Uploaded by

apurva

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 38

DIMENSION REDUCTION

DEFINITION
• Dimensionality reduction, or dimension
reduction, is the transformation of data from
a high-dimensional space into a low-
dimensional space so that the low-
dimensional representation retains some
meaningful properties of the original data,
ideally close to its intrinsic dimension.
DIMENSIONS
• The number of input variables or features for a dataset is referred
to as its dimensionality.
• Dimensionality reduction refers to techniques that reduce the
number of input variables in a dataset.
• More input features often make a predictive modeling task more
challenging to model, more generally referred to as the curse of
dimensionality.
• High-dimensionality statistics and dimensionality reduction
techniques are often used for data visualization. Nevertheless
these techniques can be used in applied machine learning to
simplify a classification or regression dataset in order to better fit
a predictive model.
Problem With Many Input Variables

• The performance of machine learning algorithms can degrade with too

many input variables.
• If your data is represented using rows and columns, such as in a
spreadsheet, then the input variables are the columns that are fed as
input to a model to predict the target variable.
• Input variables are also called features.
• We can consider the columns of data representing dimensions on an n-
dimensional feature space and the rows of data as points in that space.
• This is a useful geometric interpretation of a dataset.
• Having a large number of dimensions in the feature space can mean that
the volume of that space is very large, and in turn, the points that we
have in that space (rows of data) often represent a small and non-
representative sample.
As the number of features increase, the
number of samples also increases
proportionally. The more features we have, the
more number of samples we will need to have
all combinations of feature values well
represented in our sample.

As the number of features increases, the

model becomes more complex.
The more the number of features, the more
the chances of overfitting.
A machine learning model that is trained on a
large number of features, gets increasingly
dependent on the data it was trained on and in
turn overfitted, resulting in poor performance
on real data, beating the purpose.
WHY ????
• This can dramatically impact the performance
of machine learning algorithms fit on data with
many input features, generally referred to as
the “curse of dimensionality.”
• Therefore, it is often desirable to reduce the
number of input features.
• This reduces the number of dimensions of the
feature space, hence the name “dimensionality
reduction.”
Advantages of Dimension reduction

• Less misleading data means model accuracy

improves.
• Less dimensions mean less computing. Less
data means that algorithms train faster.
• Less data means less storage space required.
• Less dimensions allow usage of algorithms
unfit for a large number of dimensions
• Removes redundant features and noise.
Dimensionality Reduction

• High-dimensionality might mean hundreds, thousands, or

even millions of input variables.
• Fewer input dimensions often mean correspondingly fewer
parameters or a simpler structure in the machine learning
model, referred to as degrees of freedom. A model with too
many degrees of freedom is likely to overfit the training
dataset and therefore may not perform well on new data.
• It is desirable to have simple models that generalize well, and
in turn, input data with few input variables. This is particularly
true for linear models where the number of inputs and the
degrees of freedom of the model are often closely related.
CURSE OF DIMENTIONALITY
The fundamental reason for the curse of
dimensionality is that high-dimensional
functions have the potential to be much more
complicated than low-dimensional ones, and
that those complications are harder to discern.
The only way to beat the curse is to
incorporate knowledge about the data that is
correct.
When ???
• Dimensionality reduction is a data preparation
technique performed on data prior to modeling.
• It might be performed after data cleaning and data
scaling and before training a predictive model.
• … dimensionality reduction yields a more compact,
more easily interpretable representation of the
target concept, focusing the user’s attention on the
most relevant variables.
Which data to be considered???
any dimensionality reduction performed on
training data must also be performed on new
data, such as a test dataset, validation dataset,
and data when making a prediction with the
final model.
Techniques for Dimensionality Reduction

• Feature Selection Methods

– use scoring or statistical methods to select which features to keep and
which features to delete.
– … perform feature selection, to remove “irrelevant” features that do not
help much with the classification problem.
• Matrix Factorization
– matrix factorization methods can be used to reduce a dataset matrix into
its constituent parts.
– The parts can then be ranked and a subset of those parts can be
selected that best captures the salient structure of the matrix that can
be used to represent the dataset.
– The most common approach to dimensionality reduction is called
principal components analysis or PCA.
Techniques for Dimensionality Reduction
• Manifold Learning
– Techniques from high-dimensionality statistics can also be used for
dimensionality reduction.
– In mathematics, a projection is a kind of function or mapping that transforms
data in some way.
– Kohonen Self-Organizing Map (SOM).
• Autoencoder Methods
– Deep learning neural networks can be constructed to perform dimensionality
reduction.
– A popular approach is called autoencoders. This involves framing a self-
supervised learning problem where a model must reproduce the input
correctly.
– An auto-encoder is a kind of unsupervised neural network that is used for
dimensionality reduction and feature discovery. More precisely, an auto-
encoder is a feedforward neural network that is trained to predict the input
itself.
Linear Dimensionality Reduction Methods
• The most common and well known dimensionality reduction methods are the ones
that apply linear transformations, like
• PCA (Principal Component Analysis) : Popularly used for dimensionality reduction
in continuous data,
• PCA rotates and projects data along the direction of increasing variance.
• The features with the maximum variance are the principal components.
PCA
• variables are transformed into a new set of variables, which are linear
combination of original variables.
• These new set of variables are known as principle components.
• They are obtained in such a way that first principle component accounts
for most of the possible variation of original data after which each
succeeding component has the highest possible variance.
• The second principal component must be orthogonal to the first
principal component.
• In other words, it does its best to capture the variance in the data that
is not captured by the first principal component.
• For two-dimensional dataset, there can be only two principal
components. Below is a snapshot of the data and its first and second
principal components.
• You can notice that second principle component is orthogonal to first
principle component.
• STEP 1: STANDARDIZATION
– The aim of this step is to standardize the range of the continuous
initial variables so that each one of them contributes equally to the
analysis.
– if there are large differences between the ranges of initial variables,
those variables with larger ranges will dominate over those with small
ranges
– For example, a variable that ranges between 0 and 100 will dominate
over a variable that ranges between 0 and 1 , which will lead to
biased results.
– So, transforming the data to comparable scales can prevent this
problem.
– all the variables will be transformed to the same scale.
STEP 2: COVARIANCE MATRIX COMPUTATION
• The aim of this step is to understand how the variables of the
input data set are varying from the mean with respect to each
other, or in other words, to see if there is any relationship
between them.
• variables are highly correlated in such a way that they contain
redundant information. So, in order to identify these
correlations, we compute the covariance matrix.
• The covariance matrix is a p × p symmetric matrix (where p is
the number of dimensions) that has as entries the covariances
associated with all possible pairs of the initial variables.
• For example, for a 3-dimensional data set with 3 variables x, y,
and z, the covariance matrix is a 3×3 matrix of this from:
• (Cov(a,a)=Var(a)
• the covariance is commutative
(Cov(a,b)=Cov(b,a)),(a)),
• entries of the covariance matrix are symmetric
with respect to the main diagonal, which
means that the upper and the lower triangular
portions are equal.
• What do the covariances that we have as entries
of the matrix tell us about the correlations
between the variables?

• It’s actually the sign of the covariance that matters :

• if positive then : the two variables increase or
decrease together (correlated)
• if negative then : One increases when the other
decreases (Inversely correlated)
• STEP 3: COMPUTE THE EIGENVECTORS AND EIGENVALUES OF THE
COVARIANCE MATRIX TO IDENTIFY THE PRINCIPAL COMPONENTS
• principal components of the data.
– Principal components are new variables that are constructed as linear
combinations or mixtures of the initial variables.
– These combinations are done in such a way that the new variables (i.e.,
principal components) are uncorrelated and most of the information
within the initial variables is squeezed or compressed into the first
components.
– So, the idea is 10-dimensional data gives you 10 principal components,
but PCA tries to put maximum possible information in the first
component, then maximum remaining information in the second and so
on
• will allow you to reduce dimensionality without losing much
information, and this by discarding the components with low
information and considering the remaining components as your new
variables.
• An important thing to realize here is that, the principal components
are less interpretable and don’t have any real meaning since they are
constructed as linear combinations of the initial variables.
• Geometrically speaking, principal components represent the
directions of the data that explain a maximal amount of variance, that
is to say, the lines that capture most information of the data.
• The relationship between variance and information here, is that, the
larger the variance carried by a line, the larger the dispersion of the
data points along it, and the larger the dispersion along a line, the
more the information it has.
• To put all this simply, just think of principal components as new axes
that provide the best angle to see and evaluate the data, so that the
differences between the observations are better visible.
HOW PCA CONSTRUCTS THE PRINCIPAL COMPONENTS?
• there are as many principal components as
there are variables in the data,
• principal components are constructed in such
a manner that the first principal component
accounts for the largest possible variance in
the data set.
• For example, let’s assume that the scatter plot
of our data set is as shown below, can we
guess the first principal component ?
• Yes, it’s approximately the line that matches
the purple marks because it goes through the
origin and it’s the line in which the projection
of the points (red dots) is the most spread out.
• Or mathematically speaking, it’s the line that
maximizes the variance (the average of the
squared distances from the projected points
(red dots) to the origin).
• The second principal component is calculated
in the same way, with the condition that it is
uncorrelated with (i.e., perpendicular to) the
first principal component and that it accounts
for the next highest variance.
• This continues until a total of p principal
components have been calculated, equal to
the original number of variables.
eigenvectors and eigenvalues
• they always come in pairs, so that every
eigenvector has an eigenvalue.
• And their number is equal to the number of
dimensions of the data.
• For example, for a 3-dimensional data set,
there are 3 variables, therefore there are 3
eigenvectors with 3 corresponding
eigenvalues.
Eigenvector and Eigenvalue

• For a square matrix A, an Eigenvector and

Eigenvalue make this equation true (if we can
find them):
• We start by finding the eigenvalue: we know this equation must
be true:
Av = λv
• Now let us put in an identity matrix so we are dealing with
matrix-vs-matrix:
Av = λ I v
• Bring all to left hand side:
Av − λIv = 0
A * Eigenvector — Eigenvalue * EigenVector = 0

• If v is non-zero then we can solve for λ using just the

determinant:
| A − λI | = 0
Example
Eigenvectors & Covariance matrix
Relationship unleashed
• the eigenvectors of the Covariance matrix are
actually the directions of the axes where there is the
most variance(most information) and that we call
Principal Components.
• And eigenvalues are simply the coefficients attached
to eigenvectors, which give the amount of variance
carried in each Principal Component.
• By ranking your eigenvectors in order of their
eigenvalues, highest to lowest, you get the principal
components in order of significance.
• let’s suppose that our data set is 2-dimensional with 2
variables x,y and that the eigenvectors and eigenvalues of the
covariance matrix are as follows:
• If we rank the eigenvalues in descending order, we get λ1>λ2, which
means that the eigenvector that corresponds to the first principal
component (PC1) is v1 and the one that corresponds to the second
component (PC2) isv2.
• After having the principal components, to compute the percentage of
variance (information) accounted for by each component, we divide
the eigenvalue of each component by the sum of eigenvalues.
• If we apply this on the example above, we find that PC1 and PC2
carry respectively 96% and 4% of the variance of the data.
STEP 4: FEATURE VECTOR
• As we saw in the previous step, computing the eigenvectors and
ordering them by their eigenvalues in descending order, allow
us to find the principal components in order of significance.
• In this step, what we do is, to choose whether to keep all these
components or discard those of lesser significance (of low
eigenvalues), and form with the remaining ones a matrix of
vectors that we call Feature vector.
• So, the feature vector is simply a matrix that has as columns the
eigenvectors of the components that we decide to keep.
• This makes it the first step towards dimensionality reduction,
because if we choose to keep only p eigenvectors (components)
out of n, the final data set will have only p dimensions.
• Or discard the eigenvector v2, which is the one of lesser significance, and
form a feature vector with v1 only:
• Discarding the eigenvector v2 will reduce dimensionality by 1, and will
consequently cause a loss of information in the final data set.
• But given that v2 was carrying only 4% of the information, the loss will be
therefore not important and we will still have 96% of the information
that is carried by v1.
• Continuing with the example from the previous step, we can either form
a feature vector with both of the eigenvectors v1 and v2:
• Or discard the eigenvector v2, which is the one of lesser significance, and
form a feature vector with v1 only:
• Discarding the eigenvector v2 will reduce dimensionality by 1, and will
consequently cause a loss of information in the final data set. But given
that v2 was carrying only 4% of the information, the loss will be therefore
not important and we will still have 96% of the information that is carried
by v1.
LAST STEP : RECAST THE DATA ALONG THE PRINCIPAL COMPONENTS AXES

• the aim is to use the feature vector formed using

the eigenvectors of the covariance matrix, to
reorient the data from the original axes to the
ones represented by the principal components
(hence the name Principal Components Analysis).
• This can be done by multiplying the transpose of
the original data set by the transpose of the
feature vector.
Example
Linear Dimensionality Reduction Methods

• Factor Analysis :
• A technique that is used to reduce a large number of variables into
fewer numbers of factors.
• The values of observed data are expressed as functions mof a
number of possible causes in order to find which are the most
important.
• The observations are assumed to be caused by a linear
transformation of lower dimensional latent factors and added
Gaussian noise.
• LDA (Linear Discriminant Analysis):
• projects data in a way that the class separability is maximised.
• Examples from same class are put closely together by the projection.
• Examples from different classes are placed far apart by the projection
• https://builtin.com/data-science/step-step-ex
planation-principal-component-analysis
Tips for Dimensionality Reduction
• There is no best technique for dimensionality reduction and no
mapping of techniques to problems.
• Instead, the best approach is to use systematic controlled
experiments to discover what dimensionality reduction
techniques, when paired with your model of choice, result in
the best performance on your dataset.
• Typically, linear algebra and manifold learning methods assume
that all input features have the same scale or distribution.
• This suggests that it is good practice to either normalize or
standardize data prior to using these methods if the input
variables have differing scales or units.

Dimensionality Reduction
No ratings yet
Dimensionality Reduction
66 pages
Livre John J. A. Johnson D.G Whitaker D Statistical Thinking in Business Second Edition CRC Press 2005 2
100% (1)
Livre John J. A. Johnson D.G Whitaker D Statistical Thinking in Business Second Edition CRC Press 2005 2
400 pages
Feature Dimensionality Reduction: A Review: Survey and State of The Art
No ratings yet
Feature Dimensionality Reduction: A Review: Survey and State of The Art
31 pages
Unit-13 Feature Selection and Extraction
No ratings yet
Unit-13 Feature Selection and Extraction
24 pages
Dimensionality Reduction in Machine Learning-1
No ratings yet
Dimensionality Reduction in Machine Learning-1
16 pages
ML Unit 4 (R22)
No ratings yet
ML Unit 4 (R22)
34 pages
Unit 3
No ratings yet
Unit 3
102 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Smartphoneusagetowardslearningbehaviorandacademicperformanceofaccountancybusinessandmanagementstudentsoftacurongnationalhighschool PDF
No ratings yet
Smartphoneusagetowardslearningbehaviorandacademicperformanceofaccountancybusinessandmanagementstudentsoftacurongnationalhighschool PDF
56 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
Lecture 7 Data Transformation and Dimensionality Reduction
No ratings yet
Lecture 7 Data Transformation and Dimensionality Reduction
22 pages
Chapter 12 - Dimension Reduction
No ratings yet
Chapter 12 - Dimension Reduction
14 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
47 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
ML Unit 4 at VS
No ratings yet
ML Unit 4 at VS
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
82 pages
9 ML
No ratings yet
9 ML
39 pages
ML (Unit 5)
No ratings yet
ML (Unit 5)
34 pages
Curse of Dimensionality
No ratings yet
Curse of Dimensionality
33 pages
ICACCI 2015 7275954-Pca
No ratings yet
ICACCI 2015 7275954-Pca
4 pages
Machine Learning Unit-5
No ratings yet
Machine Learning Unit-5
49 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
eISSN: 2948-3549 Doi: (INCEIF) - Creative Commons Attribution (CC BY 4.0)
No ratings yet
eISSN: 2948-3549 Doi: (INCEIF) - Creative Commons Attribution (CC BY 4.0)
36 pages
Feature Selection & Feature Extraction
No ratings yet
Feature Selection & Feature Extraction
19 pages
Chapter6 - Unit IV2024
No ratings yet
Chapter6 - Unit IV2024
84 pages
Confusion Matrix
No ratings yet
Confusion Matrix
26 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
Dimensonality Reduction
No ratings yet
Dimensonality Reduction
25 pages
Big Data and Predictive Maintenance in Manufacturing
No ratings yet
Big Data and Predictive Maintenance in Manufacturing
16 pages
Bernardes Et Al 2024 - Snaplage Desempenho
No ratings yet
Bernardes Et Al 2024 - Snaplage Desempenho
9 pages
ML Unit 4
No ratings yet
ML Unit 4
20 pages
Finalized Research 100.1
No ratings yet
Finalized Research 100.1
26 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
Ai & ML Week-9
No ratings yet
Ai & ML Week-9
30 pages
Data Analysis Hypothesis Testing Printable
No ratings yet
Data Analysis Hypothesis Testing Printable
23 pages
Unit 3
No ratings yet
Unit 3
23 pages
Deep Learning For Data Analytics 2023 Answer
No ratings yet
Deep Learning For Data Analytics 2023 Answer
6 pages
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
No ratings yet
HAIMLC501 MathematicsForAIML Lecture 16 Dimensionality Reduction SH2022
29 pages
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
ML Unit 4
No ratings yet
ML Unit 4
34 pages
L-10 - Presentation1-09052024-072206pm
No ratings yet
L-10 - Presentation1-09052024-072206pm
27 pages
Chapter 1.2. Overview of ML
No ratings yet
Chapter 1.2. Overview of ML
17 pages
University Institute of Engineering Department of Computer Science & Engineering
No ratings yet
University Institute of Engineering Department of Computer Science & Engineering
23 pages
AML Unit 5
No ratings yet
AML Unit 5
13 pages
Dimentiality
No ratings yet
Dimentiality
4 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
30 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
ML Unit Iv Part I
No ratings yet
ML Unit Iv Part I
11 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
ML Unit 4
No ratings yet
ML Unit 4
34 pages
1 s2.0 S0044848616306913 Main
No ratings yet
1 s2.0 S0044848616306913 Main
9 pages
Mksk-2021-2022-Gasal-Course Plan-2021 Aug-V1
No ratings yet
Mksk-2021-2022-Gasal-Course Plan-2021 Aug-V1
15 pages
Deeplearning - Ai Deeplearning - Ai
No ratings yet
Deeplearning - Ai Deeplearning - Ai
36 pages
Partial Least Squares (PLS) Structural Equation Modeling (SEM) For Building and Testing Behavioral Causal Theory: When To Choose It and How To Use It
No ratings yet
Partial Least Squares (PLS) Structural Equation Modeling (SEM) For Building and Testing Behavioral Causal Theory: When To Choose It and How To Use It
24 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
9 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
5 pages
Scope of Statistics in Fashion Industry
75% (4)
Scope of Statistics in Fashion Industry
13 pages
Week 2 DF Feedback Fall 2012
100% (1)
Week 2 DF Feedback Fall 2012
6 pages
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
Perceived Social Acceptance and Peer Intimacy Among Children With Disabilities in Regular Schools in Norway
No ratings yet
Perceived Social Acceptance and Peer Intimacy Among Children With Disabilities in Regular Schools in Norway
12 pages
Purchase Intention and Buying Behavior Towards Laptops: A Study of Students in
No ratings yet
Purchase Intention and Buying Behavior Towards Laptops: A Study of Students in
9 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Statistical Inferences Assignment-2
No ratings yet
Statistical Inferences Assignment-2
3 pages
Dimensionality Reduction Techniques You Should Know in 2021
No ratings yet
Dimensionality Reduction Techniques You Should Know in 2021
12 pages
Finish Start: Chapter 02: Project Management Solution: Practice Problems
No ratings yet
Finish Start: Chapter 02: Project Management Solution: Practice Problems
5 pages
Dimensionality
No ratings yet
Dimensionality
9 pages
On Line Tour Page Under CEDAS Revised
No ratings yet
On Line Tour Page Under CEDAS Revised
68 pages
Unit No.02 - Feature Extraction and Selection
No ratings yet
Unit No.02 - Feature Extraction and Selection
17 pages
Neyman Pearson Lemma
No ratings yet
Neyman Pearson Lemma
3 pages
Unit 5 Notes New
No ratings yet
Unit 5 Notes New
6 pages
Bangayan, Melody D. Discussion 4
No ratings yet
Bangayan, Melody D. Discussion 4
1 page
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
3 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
What Is Measurement and Evaluation.03.18 PDF
No ratings yet
What Is Measurement and Evaluation.03.18 PDF
12 pages
Introduction To Dimensionality Reduction
No ratings yet
Introduction To Dimensionality Reduction
5 pages
Research Methodology Chapter 1 - 240118 - 130959
No ratings yet
Research Methodology Chapter 1 - 240118 - 130959
9 pages
Processing and Interpretation of Data
No ratings yet
Processing and Interpretation of Data
12 pages
Dimensionality Reduction Report-Yomna Eid Rizk
No ratings yet
Dimensionality Reduction Report-Yomna Eid Rizk
6 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Proposal Evaluation For Senior Essay MM
No ratings yet
Proposal Evaluation For Senior Essay MM
5 pages
Course Contents: INSTITUTE of MANAGEMENT STUDIES, Devi Ahilya University, INDORE
No ratings yet
Course Contents: INSTITUTE of MANAGEMENT STUDIES, Devi Ahilya University, INDORE
16 pages
S.C.S. (A) College, Puri: B.A. (Hons.) Geography Syllabus
No ratings yet
S.C.S. (A) College, Puri: B.A. (Hons.) Geography Syllabus
15 pages
Time-Series Analysis and Forecasting: Least Squares Method
No ratings yet
Time-Series Analysis and Forecasting: Least Squares Method
17 pages
Perception of Teaching As A Profession in Nigeria by Sandwich Students of Ekiti State University, Ado-Ekiti
No ratings yet
Perception of Teaching As A Profession in Nigeria by Sandwich Students of Ekiti State University, Ado-Ekiti
13 pages
PHD Course Work
No ratings yet
PHD Course Work
15 pages
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
Template To Calculations of Analytical Method Validation PR
No ratings yet
Template To Calculations of Analytical Method Validation PR
1 page

Dimension Reduction

Uploaded by

Dimension Reduction

Uploaded by

DIMENSION REDUCTION

• The performance of machine learning algorithms can degrade with too

As the number of features increases, the

• Less misleading data means model accuracy

• High-dimensionality might mean hundreds, thousands, or

• Feature Selection Methods

• It’s actually the sign of the covariance that matters :

• For a square matrix A, an Eigenvector and

• If v is non-zero then we can solve for λ using just the

• the aim is to use the feature vector formed using

You might also like