0% found this document useful (0 votes)
24 views14 pages

ML 4

Dimensionality reduction techniques are used to reduce the number of input features in a dataset while preserving relevant information, improving model performance, enabling visualization, and mitigating the curse of dimensionality. Key methods include feature selection and feature extraction, with algorithms like PCA, LDA, t-SNE, and UMAP being commonly used. Applications span various fields such as genomics, image compression, and noise reduction, although care must be taken to avoid information loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views14 pages

ML 4

Dimensionality reduction techniques are used to reduce the number of input features in a dataset while preserving relevant information, improving model performance, enabling visualization, and mitigating the curse of dimensionality. Key methods include feature selection and feature extraction, with algorithms like PCA, LDA, t-SNE, and UMAP being commonly used. Applications span various fields such as genomics, image compression, and noise reduction, although care must be taken to avoid information loss.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

3-2 YR-SEM, R22 Regulation, CSM

UNIT 4
Dimensionality Reduction
Dimensionality reduction refers to techniques used to reduce the number of input features or
dimensions in a dataset while preserving as much relevant information as possible. It is essential for
handling high-dimensional data, improving computational efficiency, reducing storage requirements,
and mitigating issues like the curse of dimensionality.

Why Dimensionality Reduction?


1. Improves Model Performance:
o Reduces overfitting by removing irrelevant or redundant features.
o Speeds up training and inference by simplifying the dataset.
2. Visualization:
o Enables visualization of high-dimensional data in 2D or 3D spaces.
3. Mitigates Curse of Dimensionality:
o High-dimensional spaces can lead to sparse data, making distances and patterns less
meaningful.
4. Removes Noise:
o Reduces the impact of irrelevant or noisy features.

Types of Dimensionality Reduction


1. Feature Selection
Selects a subset of relevant features from the original dataset.
 Methods:
o Filter Methods: Use statistical measures like correlation, chi-square, or mutual
information.
o Wrapper Methods: Evaluate subsets of features based on model performance (e.g.,
recursive feature elimination).
o Embedded Methods: Select features as part of the model training process (e.g.,
LASSO, decision trees).

2. Feature Extraction
Transforms the data into a lower-dimensional space by creating new features.
 Linear Methods:
o Principal Component Analysis (PCA):

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

 Projects data onto a new set of orthogonal axes (principal components) that
maximize variance.
 Works by eigenvalue decomposition of the covariance matrix.
 Suitable for linearly separable data.
o Linear Discriminant Analysis (LDA):
 Focuses on maximizing class separability in supervised learning.
 Finds linear combinations of features that best separate different classes.
 Non-Linear Methods:
o t-SNE (t-Distributed Stochastic Neighbor Embedding):
 Preserves local structure in data while embedding it into lower dimensions.
 Often used for visualization but not suitable for downstream tasks.
o UMAP (Uniform Manifold Approximation and Projection):
 Similar to t-SNE but faster and preserves both global and local structures.
o Kernel PCA:
 Extends PCA to capture non-linear structures using kernel functions.
o Isomap:
 Uses geodesic distances instead of Euclidean distances to capture manifold
structures.

Key Algorithms for Dimensionality Reduction

Algorithm Type Description

Identifies principal components that capture maximum


PCA Linear
variance.

LDA Linear Maximizes separability among classes (supervised).

Non- Visualizes high-dimensional data by preserving pairwise


t-SNE
Linear similarities in a lower-dimensional space.

Non-
UMAP Similar to t-SNE but faster and scalable.
Linear

Non- Neural networks designed to learn compressed


Autoencoders
Linear representations of data.

Factor Analysis Linear Models data variability as a function of latent variables.

Independent Component
Linear Separates data into statistically independent components.
Analysis (ICA)

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

Algorithm Type Description

Non- Preserves the geodesic distances in a lower-dimensional


Isomap
Linear manifold.

LLE (Locally Linear Non- Captures the local neighborhood relationships for
Embedding) Linear dimensionality reduction.

Steps in Dimensionality Reduction


1. Preprocessing:
o Standardize or normalize features to ensure consistent scaling.
o Handle missing data.
2. Choose Technique:
o Decide based on data characteristics (linear or non-linear relationships, need for
interpretability, etc.).
3. Apply Algorithm:
o Reduce dimensions using the chosen technique.
4. Validate Results:
o Check for information loss using metrics like explained variance, reconstruction error,
or downstream model performance.

Applications
1. Visualization:
o Explore and understand high-dimensional data in 2D or 3D.
2. Preprocessing:
o Simplify datasets before training machine learning models.
3. Noise Reduction:
o Remove redundant or irrelevant features to improve model performance.
4. Genomics:
o Identify patterns in genetic data with thousands of features.
5. Image Compression:
o Reduce pixel data while preserving the image's key characteristics.

Advantages
 Simplifies data, making it easier to analyze and visualize.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

 Reduces computational cost and storage requirements.


 Improves model interpretability.
 Mitigates the risk of overfitting.

Disadvantages
 Risk of information loss if not applied carefully.
 Some techniques (e.g., t-SNE) are computationally intensive for large datasets.
 Feature extraction methods create new features, which might lose interpretability.
 Non-linear techniques like t-SNE and UMAP are mainly for visualization and not suitable for all
downstream tasks.

Linear Discriminant Analysis


Linear Discriminant Analysis (LDA) is a supervised machine learning technique used for
dimensionality reduction and classification. It is commonly used when the data are labeled
and the goal is to separate different classes in the feature space. Here’s a breakdown of the
key components of LDA:
Key Concepts:
1. Purpose:
o LDA is primarily used for classification, but it can also be used for dimensionality
reduction when there are more features than necessary.
o The goal is to find a linear combination of features that best separates two or more
classes.
2. Dimensionality Reduction:
o LDA reduces the dimensionality of data while maintaining as much information as
possible to distinguish between the classes.
o Unlike Principal Component Analysis (PCA), which focuses on maximizing variance,
LDA aims to maximize the separation between multiple classes.
3. How it Works:
o LDA works by computing a projection of the data points onto a lower-dimensional
space. This projection maximizes the distance between the means of the classes
while minimizing the variance within each class.
o The main idea is to maximize the ratio of between-class variance to within-class
variance, which is the criterion for the best separation.
4. Steps Involved:
o Compute the mean vectors for each class.
o Compute the scatter matrices:
 Within-class scatter matrix (measures the spread of the points within each
class).

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

 Between-class scatter matrix (measures the separation between the class


means).
o Calculate the eigenvalues and eigenvectors of the matrix formed by the inverse of the
within-class scatter matrix multiplied by the between-class scatter matrix.
o Select the top eigenvectors (based on the largest eigenvalues) to form the
transformation matrix that will be used for the projection.
5. Assumptions:
o The data for each class should be normally distributed.
o The classes should have the same covariance (homoscedasticity).
o The classes are assumed to be linearly separable.
6. Application:
o Classification: Once the projection matrix is obtained, it can be used to classify new
data points by projecting them onto the lower-dimensional space and determining
which class they are closest to.
o Dimensionality Reduction: In problems where there are many features, LDA can
reduce the number of dimensions while maintaining the class-separability.

Principal Component Analysis


Principal Component Analysis (PCA) is an unsupervised machine learning technique primarily
used for dimensionality reduction and feature extraction. It transforms a dataset with
potentially correlated variables into a set of linearly uncorrelated variables called principal
components.
Key Concepts of PCA
1. Purpose:
 Reduce the dimensionality of a dataset while retaining as much of the variability in the data
as possible.
 Identify patterns and highlight strong relationships among features.
 Remove redundant or less informative features.
2. How PCA Works:
 PCA identifies the directions (principal components) along which the variance in the data is
maximized.
 The first principal component (PC1) captures the largest variance in the data. The second
principal component (PC2) captures the next largest variance, and so on, under the constraint
that each subsequent component is orthogonal to the previous ones.
3. Steps of PCA:
1. Standardize the Data:
o Center the data by subtracting the mean from each feature.
o Scale the data to have unit variance if features have different scales.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

2. Compute the Covariance Matrix:


o Calculate the covariance matrix to understand how the variables in the dataset are
related.
3. Perform Eigen Decomposition:
o Compute the eigenvalues and eigenvectors of the covariance matrix.
o Eigenvalues represent the amount of variance captured by each principal component.
o Eigenvectors represent the directions (axes) of the principal components.
4. Select Principal Components:
o Rank the eigenvalues in descending order and select the top kkk components that
capture the majority of the variance (based on a cumulative variance threshold, e.g.,
95%).
5. Project the Data:
o Transform the original dataset by projecting it onto the selected eigenvectors to form
the new dataset in the reduced-dimensional space.
4. Mathematical Formulation:
 The eigenvectors corresponding to the largest eigenvalues form the principal components.
5. Visualization:
 In 2D or 3D data, the principal components can be visualized as axes along which the data's
spread (variance) is maximized.

Applications of PCA:
1. Data Preprocessing:
o Reduce dimensionality before applying machine learning algorithms to prevent
overfitting and reduce computation time.
2. Noise Reduction:
o Eliminate features that contribute minimally to the overall variance, which often
represent noise.
3. Visualization:
o Reduce high-dimensional data to 2D or 3D for easier visualization and interpretation.
4. Genomics:
o Analyze gene expression data or genetic variation data with thousands of features.
5. Image Compression:
o Reduce image dimensions while retaining important visual information.

PCA vs. LDA:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

Aspect PCA LDA

Type Unsupervised Supervised

Goal Maximize variance Maximize class separability

Class Label Use Does not use labels Requires labelled data

Output Principal components Linear discriminants

Factor Analysis (FA) & Independent Component Analysis (ICA)


Factor Analysis (FA) and Independent Component Analysis (ICA) are two statistical
techniques used for analysing data by reducing dimensions and identifying latent structures.
However, they have distinct goals, methods, and assumptions.

1. Factor Analysis (FA)


Purpose:
 FA is a statistical method used to identify underlying latent variables (factors) that explain the
observed correlations among measured variables.
 It assumes that the observed variables are linear combinations of the latent factors plus some
noise.
Key Concepts:
 Latent Variables: Unobservable variables that are inferred from observed data.
 Explained Variance: FA seeks to explain as much variance as possible in the observed variables
using fewer factors.
 Noise: Each observed variable includes some error or random noise that FA accounts for.
Steps in FA:
1. Model the Data:
2. Estimate the Factor Loadings:
3. Extract Factors:
4. Rotate Factors (optional):
Applications:
 Psychology: To identify constructs like intelligence or personality traits.
 Market Research: To group consumer behaviors or preferences.
 Social Sciences: To find common patterns in survey responses.

2. Independent Component Analysis (ICA)

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

Purpose:
 ICA is a computational method used to separate a multivariate signal into additive
independent components.
 Unlike FA, which focuses on covariance, ICA focuses on statistical independence.
Key Concepts:
 Independence: The goal is to find components that are statistically independent (minimize
mutual information).
 Blind Source Separation: ICA is often used for separating mixed signals, such as in the "cocktail
party problem" (isolating individual voices from overlapping audio signals).
Steps in ICA:
1. Model the Data:
o Assume that the observed data (XXX) are linear mixtures of independent components
(SSS): X=ASX = ASX=AS where AAA is the mixing matrix, and SSS are the independent
sources.
2. Center and Whiten the Data:
o Preprocess the data by centering (mean subtraction) and whitening (decorrelation).
3. Estimate the Mixing Matrix:
o Use algorithms (e.g., FastICA, InfoMax) to estimate AAA and recover SSS such that the
statistical independence of SSS is maximized.
4. Recover Independent Components:
o Solve for SSS by inverting the mixing process: S=A−1XS = A^{-1}XS=A−1X
Applications:
 Signal Processing: Separate audio, EEG, or fMRI signals.
 Image Processing: Decompose image data into independent features.
 Finance: Identify independent drivers of financial market movements.

Comparison of FA and ICA

Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)

Identify latent factors Find independent


Goal
explaining correlations components in data

Does not assume Assumes components are


Independence
independence of factors statistically independent

Model Includes noise/error term Assumes no explicit noise

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)

Focus Correlation/Variance Statistical independence

Psychology, social sciences, Signal processing,


Applications
market research neuroscience, finance

Computationally more
Computation Often simpler than ICA
demanding

Locally Linear Embedding – Isomap – Least Squares Optimization


Locally Linear Embedding (LLE), Isomap, and Least Squares Optimization are techniques
used in machine learning and data analysis, particularly for dimensionality reduction and
optimization tasks. Here’s an overview of each:
1. Locally Linear Embedding (LLE)
Purpose:
 LLE is a nonlinear dimensionality reduction technique that preserves the local structure of
the data in a lower-dimensional space.
Key Concepts:
 Local Neighbourhoods: For each data point, LLE assumes the local neighbourhood is
approximately linear.
 Linear Reconstruction: It tries to represent each data point as a weighted sum of its
neighbours.
 Global Mapping: The algorithm then finds a low-dimensional embedding that preserves
these local relationships.
Steps of LLE:
1. Find Neighbors:
2. Compute Reconstruction Weights:
3. Optimize Low-Dimensional Embedding:
Applications:
 High-dimensional data visualization.
 Manifold learning for complex, nonlinear datasets.
 Applications in image processing and bioinformatics.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

2. Isomap
Purpose:
 Isomap extends classical multidimensional scaling (MDS) to nonlinear manifolds by
preserving geodesic distances instead of Euclidean distances.
Key Concepts:
 Geodesic Distance: The shortest path between two points along the manifold.
 Manifold Assumption: Assumes data lie on a low-dimensional manifold embedded in a
higher-dimensional space.
Steps of Isomap:
1. Construct Neighborhood Graph:
o Use kkk-nearest neighbors or a distance threshold to connect nearby points.
2. Compute Geodesic Distances:
o Approximate the geodesic distances between all pairs of points using shortest paths
in the graph (e.g., Dijkstra's or Floyd-Warshall algorithm).
3. Apply Classical MDS:
o Use multidimensional scaling to find a low-dimensional embedding that preserves
the geodesic distances.
Applications:
 Nonlinear dimensionality reduction for manifold learning.
 Image recognition, motion analysis, and gene expression studies.
Comparison with LLE:
 Isomap preserves global geometric relationships, while LLE focuses on preserving local
structures.
 Isomap uses geodesic distances, while LLE relies on linear reconstructions within
neighborhoods.

3. Least Squares Optimization


Purpose:
 Least squares optimization is a mathematical approach used to minimize the sum of squared
differences between observed data and a model's predictions.
Key Concepts:
 Residuals: Differences between observed values and predicted values.
 Objective Function: Minimize the sum of squared residuals:
Applications:
 Regression (linear and nonlinear).

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

 Curve fitting.
 Optimization problems in machine learning, such as finding reconstruction weights in LLE.
Variants:
 Linear Least Squares:
o Solves problems where f(x,θ)f(x, \theta)f(x,θ) is linear in θ\thetaθ.
o Solution involves solving a system of linear equations: θ=(XTX)−1XTy\theta = (X^T
X)^{-1} X^T yθ=(XTX)−1XTy
 Nonlinear Least Squares:
o Used when f(x,θ)f(x, \theta)f(x,θ) is nonlinear in θ\thetaθ.
o Requires iterative optimization algorithms like gradient descent or Levenberg-
Marquardt.

Summary of Techniques:

Technique Purpose Key Approach Focus

Nonlinear
Preserve local Local linear
LLE dimensionality
structure relationships
reduction

Nonlinear Preserve global


Global manifold
Isomap dimensionality geodesic
structure
reduction distances

General Minimize sum Model fitting and


Least
optimization and of squared parameter
Squares
regression errors estimation

Genetic algorithms
Genetic Algorithms are a class of optimization algorithms inspired by the process of natural
selection and genetics. They are widely used for solving complex optimization and search
problems where traditional methods may struggle. The core idea is to evolve a population of
candidate solutions over successive generations to find the best solution.

Key Concepts of Genetic Algorithms


1. Core Components:
 Population: A group of candidate solutions (individuals) to the problem.
 Chromosomes: Representation of a solution, often as a binary string or other encoding.
 Fitness Function: Evaluates how good a candidate solution is at solving the problem.
2. Evolution Process:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

1. Initialization:
o Generate an initial population randomly or based on heuristics.
2. Selection:
o Select the fittest individuals from the population for reproduction. Common
methods include:
 Roulette Wheel Selection: Probabilistic selection based on fitness values.
 Tournament Selection: Randomly pick a subset and select the best.
 Rank Selection: Rank individuals by fitness and select based on rank.
3. Reproduction (Crossover):
o Combine the genetic material of parent solutions to produce offspring.
o Mimics biological reproduction to explore the search space.
4. Mutation:
o Introduce random changes to offspring to maintain diversity and explore new areas.
5. Replacement:
o Replace the least fit individuals with offspring to form the next generation.
6. Termination:
o Repeat the process until a stopping condition is met (e.g., maximum generations or
acceptable fitness).

Genetic Operators
1. Crossover (Recombination):
 Combines genetic material from two parents to create offspring.
 Types of crossover:
o Single-Point Crossover: Swap sections of chromosomes after a random split point.
o Multi-Point Crossover: Swap multiple sections using multiple split points.
o Uniform Crossover: Each gene is independently chosen from one of the parents.
2. Mutation:
 Introduces random changes to the offspring’s genes.
 Ensures diversity in the population and prevents premature convergence.
 Types of mutation:
o Bit Flip: Change a binary gene (0 to 1 or vice versa).
o Swap Mutation: Swap the positions of two genes.
o Gaussian Mutation: Add a random Gaussian noise to numerical genes.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

3. Selection:
 Decides which individuals reproduce.
 Strategies:
o Elitism: Always retain a fraction of the best solutions.
o Diversity Preservation: Encourage diversity to avoid local optima.
Using Genetic Algorithms
1. Problem Encoding:
 Represent the problem as a chromosome. For example:
o Binary Encoding: Use binary strings for combinatorial problems.
o Real-Value Encoding: Use real numbers for continuous problems.
o Permutation Encoding: Use permutations for ordering problems like the Traveling
Salesman Problem (TSP).
2. Fitness Function Design:
 Define a function that quantifies the quality of a solution.
 Example for TSP: Fitness(X)=1Total Distance(X)\text{Fitness}(X) = \frac{1}{\text{Total
Distance}(X)}Fitness(X)=Total Distance(X)1
3. Algorithm Implementation:
1. Initialize the population.
2. Evaluate fitness of individuals.
3. Select parents based on fitness.
4. Apply Crossover and Mutation to generate offspring.
5. Replace old population with new generation.
6. Repeat until stopping criteria are met.

Applications of Genetic Algorithms


1. Optimization:
o Solve nonlinear, high-dimensional, or discrete optimization problems.
o Examples: Scheduling, resource allocation, design optimization.
2. Machine Learning:
o Feature selection, hyperparameter tuning, neural network design.
3. Engineering:
o Structural design, control systems, and robotics.
4. Biology and Medicine:

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD


3-2 YR-SEM, R22 Regulation, CSM

o Protein structure prediction, gene sequence alignment.


5. Games and Art:
o Strategy generation, procedural content generation.

Advantages of Genetic Algorithms


 Works well with complex and poorly understood search spaces.
 Does not require gradient information or continuity.
 Can escape local optima due to stochastic operations.
Disadvantages of Genetic Algorithms
 Computationally expensive, especially for large populations.
 Performance depends on parameter tuning (e.g., mutation rate, crossover rate).
 May converge slowly or prematurely.
Example:
To minimize the function f(x)=x2f(x) = x^2f(x)=x2, using a binary representation:
1. Encode xxx as a binary chromosome.
2. Define fitness as −f(x)-f(x)−f(x) (since lower values are better).
3. Apply selection, crossover, and mutation iteratively until x≈0x \approx 0x≈0.

SAKSHI SHUKLA, ASSISTANT PROFESSOR, SIET, HYD

You might also like