Unit-4 Dimensionality Reduction
Unit-4 Dimensionality Reduction
While working with machine learning models, we often encounter datasets with a large
number of features. These datasets can lead to problems such as increased computation
time and overfitting. To address these issues, we use dimensionality reduction techniques.
Feature Extraction
Feature extraction involves creating new features by combining or transforming the
original features.
As we know that while dealing with a high dimensional dataset then we must apply some
dimensionality reduction techniques to the data at hand so, that we can explore the data
and utilize it for modeling in an efficient manner.
For example, we have two classes and we need to separate them efficiently. Classes can
have multiple features. Using only a single feature to classify them may result in some
overlapping as shown in the below figure. So, we will keep on increasing the number of
features for proper classification.
Assumptions of LDA
LDA assumes that the data has a Gaussian distribution and that the covariance matrices of
the different classes are equal. It also assumes that the data is linearly separable, meaning
that a linear decision boundary can accurately classify the different classes.
Suppose we have two sets of data points belonging to two different classes that we want
to classify. As shown in the given 2D graph, when the data points are plotted on the 2D
plane, there’s no straight line that can separate the two classes of data points completely.
Hence, in this case, LDA (Linear Discriminant Analysis) is used which reduces the 2D graph
into a 1D graph in order to maximize the separability between the two classes.
Here, Linear Discriminant Analysis uses both axes (X and Y) to create a new axis and
projects data onto a new axis in a way to maximize the separation of the two categories
and hence, reduces the 2D graph into a 1D graph.
But Linear Discriminant Analysis fails when the mean of the distributions are shared, as it
becomes impossible for LDA to find a new axis that makes both classes linearly separable.
In such cases, we use non-linear discriminant analysis.
Steps:
1. Calculating mean vectors for each class.
2. Computing within-class and between-class scatter matrices to understand the
distribution and separation of classes.
3. Solving for the eigenvalues and eigenvectors that maximize the between-class variance
relative to the within-class variance. This defines the optimal projection space to
distinguish the classes.
dataset.
Factor Analysis
Factor analysis is a statistical method used to analyze the relationships among a set of
observed variables by explaining the correlations or covariances between them in terms
of a smaller number of unobserved variables called factors.
Factor analysis, a method within the realm of statistics and part of the general linear
model (GLM), serves to condense numerous variables into a smaller set of factors. By
doing so, it captures the maximum shared variance among the variables and condenses
them into a unified score, which can subsequently be utilized for further analysis.Factor
analysis operates under several assumptions:
Applications of PCA
Dimensionality Reduction – Reduces the number of features
while retaining essential information.
Identifying Latent Constructs – Helps uncover hidden patterns
and relationships within the data.
Data Summarization – Condenses large datasets into a smaller
set of meaningful components.
Hypothesis Testing – Assists in validating assumptions by
analyzing underlying data structures.
Variable Selection – Identifies the most significant features,
improving model efficiency.
Enhancing Predictive Models – Reduces noise and
multicollinearity, leading to better model performance.
information.
Assumptions in ICA
1. The first assumption asserts that the source signals (original
signals) are statistically independent of each other.
2. The second assumption is that each source signal exhibits non-
Gaussian distributions.
Advantages of LLE
The dimensionality reduction method known as locally linear
embedding (LLE) has many benefits for data processing and
visualization. The following are LLE's main benefits:
Preservation of Local Structures: LLE is excellent at
maintaining the in-data local relationships or structures.
It successfully captures the inherent geometry of
nonlinear manifolds by maintaining pairwise distances
between nearby data points.
Handling Non-Linearity: LLE has the ability to capture
nonlinear patterns and structures in the data, in contrast
to linear techniques like Principal Component
Analysis (PCA). When working with complicated, curved,
or twisted datasets, it is especially helpful.
Dimensionality Reduction: LLE lowers the
dimensionality of the data while preserving its
fundamental properties. Particularly when working with
high-dimensional datasets, this reduction makes data
presentation, exploration, and analysis simpler.
Disavantages of LLE
Curse of Dimensionality: LLE can experience the
"curse of dimensionality " when used with extremely high-
dimensional data, just like many other dimensionality
reduction approaches. The number of neighbors required
to capture local interactions rises as dimensionality does,
potentially increasing the computational cost of the
approach.
Memory and computational Requirements: For big
datasets, creating a weighted adjacency matrix as part of
LLE might be memory-intensive. The eigenvalue
decomposition stage can also be computationally taxing
for big datasets.
Outliers and Noisy data: LLE is susceptible to
anomalies and jittery data points. The quality of the
embedding may be affected and the local linear
relationships may be distorted by outliers.
Isomap:
A nonlinear dimensionality reduction method used in data
analysis and machine learning is called isomap, short for
isometric mapping. Isomap was developed to maintain the
inherent geometry of high-dimensional data as a substitute for
conventional techniques like Principal Component Analysis (PCA).
Isomap creates a low-dimensional representation, usually a two-
or three-dimensional map, by focusing on the preservation of
pairwise distances between data points.
This technique works especially well for extracting the
underlying structure from large, complex datasets, like those
from speech recognition, image analysis, and biological systems.
Finding patterns and insights in a variety of scientific and
engineering domains is made possible by Isomap's capacity to
highlight the fundamental relationships found in data.
Working of ISOMAP
Calculate the pairwise distances: The algorithm
starts by calculating the Euclidean distances between the
data points.
Find nearest neighbors according to these
distances: For each data point, its k nearest neighbor is
determined by that distance.
Create a neighborhood plot: the edges of each point
are aligned with their closest neighbors, which creates a
diagram that represents the data's regional structure.
Calculate geodesic distances: The Floyd algorithm
sorts through all the pairs of data points in a
neighborhood graph and finds the most distant paths.
geodesic distances are represented by these shortest
paths.
Perform dimensional reduction: Classical Multi
Scaling MDS is used for geodesic distance matrices that
result in low dimensional embedding of data.
Disadvanatges:
Computational cost: for large datasets, computation of
geodesic distance using Floyd's algorithm can be
computationally expensive and lead to a longer run time.
Sensitive to parameter settings: incorrect selection
of the parameters may lead to a distortion or misleading
insert.
May be difficult for manifolds with holes or topological
complexity, which may lead to inaccurate
representations: Isomap is not capable of performing well
in a manifold that contains holes or other topological
complexity.
Applications of Isomap
Visualization: High-dimensional data like face images
can be visualized in a lower-dimensional space, enabling
easier exploration and understanding.
Data exploration: Isomap can help identify clusters and
patterns within the data that are not readily apparent in
the original high-dimensional space.
Anomaly detection: Outliers that deviate significantly
from the underlying manifold can be identified using
Isomap.
Machine learning tasks: Isomap can be used as a pre-
processing step for other machine learning tasks, such as
classification and clustering, by improving the
performance and interpretability of the models.
Least Squares Optimization:
Least squares optimization is a mathematical technique
that minimizes the sum of squared residuals to find the
best-fitting curve for a set of data points.
It's a type of regression analysis that's often used by
statisticians and traders to identify trends and trading
opportunities
Steps:
1. Determine the equation of the line you believe best fits the
data.
Denote the independent variable values as xi and the
dependent ones as yi.
Calculate the average values of xi and yi as X and Y.
Presume the equation of the line of best fit as y = mx + c, where
m is the slope of the line and c represents the intercept of the
line on the Y-axis.
Determine the equation of the line you believe best
fits the data.
The slope m and intercept c can be calculated from
the following formulas:
Thus, we obtain the line of best fit as y = mx + c.
2. Calculate the residuals (differences) between the observed
values and the values
predicted by your model.
3. Square each of these residuals and sum them up.
4. Adjust the model to minimize this sum.