Dimension Reduction
Dimension Reduction
DEFINITION
• Dimensionality reduction, or dimension
reduction, is the transformation of data from
a high-dimensional space into a low-
dimensional space so that the low-
dimensional representation retains some
meaningful properties of the original data,
ideally close to its intrinsic dimension.
DIMENSIONS
• The number of input variables or features for a dataset is referred
to as its dimensionality.
• Dimensionality reduction refers to techniques that reduce the
number of input variables in a dataset.
• More input features often make a predictive modeling task more
challenging to model, more generally referred to as the curse of
dimensionality.
• High-dimensionality statistics and dimensionality reduction
techniques are often used for data visualization. Nevertheless
these techniques can be used in applied machine learning to
simplify a classification or regression dataset in order to better fit
a predictive model.
Problem With Many Input Variables
• Factor Analysis :
• A technique that is used to reduce a large number of variables into
fewer numbers of factors.
• The values of observed data are expressed as functions mof a
number of possible causes in order to find which are the most
important.
• The observations are assumed to be caused by a linear
transformation of lower dimensional latent factors and added
Gaussian noise.
• LDA (Linear Discriminant Analysis):
• projects data in a way that the class separability is maximised.
• Examples from same class are put closely together by the projection.
• Examples from different classes are placed far apart by the projection
• https://builtin.com/data-science/step-step-ex
planation-principal-component-analysis
Tips for Dimensionality Reduction
• There is no best technique for dimensionality reduction and no
mapping of techniques to problems.
• Instead, the best approach is to use systematic controlled
experiments to discover what dimensionality reduction
techniques, when paired with your model of choice, result in
the best performance on your dataset.
• Typically, linear algebra and manifold learning methods assume
that all input features have the same scale or distribution.
• This suggests that it is good practice to either normalize or
standardize data prior to using these methods if the input
variables have differing scales or units.