ML 4
ML 4
UNIT 4
Dimensionality Reduction
Dimensionality reduction refers to techniques used to reduce the number of input features or
dimensions in a dataset while preserving as much relevant information as possible. It is essential for
handling high-dimensional data, improving computational efficiency, reducing storage requirements,
and mitigating issues like the curse of dimensionality.
2. Feature Extraction
Transforms the data into a lower-dimensional space by creating new features.
Linear Methods:
o Principal Component Analysis (PCA):
Projects data onto a new set of orthogonal axes (principal components) that
maximize variance.
Works by eigenvalue decomposition of the covariance matrix.
Suitable for linearly separable data.
o Linear Discriminant Analysis (LDA):
Focuses on maximizing class separability in supervised learning.
Finds linear combinations of features that best separate different classes.
Non-Linear Methods:
o t-SNE (t-Distributed Stochastic Neighbor Embedding):
Preserves local structure in data while embedding it into lower dimensions.
Often used for visualization but not suitable for downstream tasks.
o UMAP (Uniform Manifold Approximation and Projection):
Similar to t-SNE but faster and preserves both global and local structures.
o Kernel PCA:
Extends PCA to capture non-linear structures using kernel functions.
o Isomap:
Uses geodesic distances instead of Euclidean distances to capture manifold
structures.
Non-
UMAP Similar to t-SNE but faster and scalable.
Linear
Independent Component
Linear Separates data into statistically independent components.
Analysis (ICA)
LLE (Locally Linear Non- Captures the local neighborhood relationships for
Embedding) Linear dimensionality reduction.
Applications
1. Visualization:
o Explore and understand high-dimensional data in 2D or 3D.
2. Preprocessing:
o Simplify datasets before training machine learning models.
3. Noise Reduction:
o Remove redundant or irrelevant features to improve model performance.
4. Genomics:
o Identify patterns in genetic data with thousands of features.
5. Image Compression:
o Reduce pixel data while preserving the image's key characteristics.
Advantages
Simplifies data, making it easier to analyze and visualize.
Disadvantages
Risk of information loss if not applied carefully.
Some techniques (e.g., t-SNE) are computationally intensive for large datasets.
Feature extraction methods create new features, which might lose interpretability.
Non-linear techniques like t-SNE and UMAP are mainly for visualization and not suitable for all
downstream tasks.
Applications of PCA:
1. Data Preprocessing:
o Reduce dimensionality before applying machine learning algorithms to prevent
overfitting and reduce computation time.
2. Noise Reduction:
o Eliminate features that contribute minimally to the overall variance, which often
represent noise.
3. Visualization:
o Reduce high-dimensional data to 2D or 3D for easier visualization and interpretation.
4. Genomics:
o Analyze gene expression data or genetic variation data with thousands of features.
5. Image Compression:
o Reduce image dimensions while retaining important visual information.
Class Label Use Does not use labels Requires labelled data
Purpose:
ICA is a computational method used to separate a multivariate signal into additive
independent components.
Unlike FA, which focuses on covariance, ICA focuses on statistical independence.
Key Concepts:
Independence: The goal is to find components that are statistically independent (minimize
mutual information).
Blind Source Separation: ICA is often used for separating mixed signals, such as in the "cocktail
party problem" (isolating individual voices from overlapping audio signals).
Steps in ICA:
1. Model the Data:
o Assume that the observed data (XXX) are linear mixtures of independent components
(SSS): X=ASX = ASX=AS where AAA is the mixing matrix, and SSS are the independent
sources.
2. Center and Whiten the Data:
o Preprocess the data by centering (mean subtraction) and whitening (decorrelation).
3. Estimate the Mixing Matrix:
o Use algorithms (e.g., FastICA, InfoMax) to estimate AAA and recover SSS such that the
statistical independence of SSS is maximized.
4. Recover Independent Components:
o Solve for SSS by inverting the mixing process: S=A−1XS = A^{-1}XS=A−1X
Applications:
Signal Processing: Separate audio, EEG, or fMRI signals.
Image Processing: Decompose image data into independent features.
Finance: Identify independent drivers of financial market movements.
Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)
Independent Component
Aspect Factor Analysis (FA)
Analysis (ICA)
Computationally more
Computation Often simpler than ICA
demanding
2. Isomap
Purpose:
Isomap extends classical multidimensional scaling (MDS) to nonlinear manifolds by
preserving geodesic distances instead of Euclidean distances.
Key Concepts:
Geodesic Distance: The shortest path between two points along the manifold.
Manifold Assumption: Assumes data lie on a low-dimensional manifold embedded in a
higher-dimensional space.
Steps of Isomap:
1. Construct Neighborhood Graph:
o Use kkk-nearest neighbors or a distance threshold to connect nearby points.
2. Compute Geodesic Distances:
o Approximate the geodesic distances between all pairs of points using shortest paths
in the graph (e.g., Dijkstra's or Floyd-Warshall algorithm).
3. Apply Classical MDS:
o Use multidimensional scaling to find a low-dimensional embedding that preserves
the geodesic distances.
Applications:
Nonlinear dimensionality reduction for manifold learning.
Image recognition, motion analysis, and gene expression studies.
Comparison with LLE:
Isomap preserves global geometric relationships, while LLE focuses on preserving local
structures.
Isomap uses geodesic distances, while LLE relies on linear reconstructions within
neighborhoods.
Curve fitting.
Optimization problems in machine learning, such as finding reconstruction weights in LLE.
Variants:
Linear Least Squares:
o Solves problems where f(x,θ)f(x, \theta)f(x,θ) is linear in θ\thetaθ.
o Solution involves solving a system of linear equations: θ=(XTX)−1XTy\theta = (X^T
X)^{-1} X^T yθ=(XTX)−1XTy
Nonlinear Least Squares:
o Used when f(x,θ)f(x, \theta)f(x,θ) is nonlinear in θ\thetaθ.
o Requires iterative optimization algorithms like gradient descent or Levenberg-
Marquardt.
Summary of Techniques:
Nonlinear
Preserve local Local linear
LLE dimensionality
structure relationships
reduction
Genetic algorithms
Genetic Algorithms are a class of optimization algorithms inspired by the process of natural
selection and genetics. They are widely used for solving complex optimization and search
problems where traditional methods may struggle. The core idea is to evolve a population of
candidate solutions over successive generations to find the best solution.
1. Initialization:
o Generate an initial population randomly or based on heuristics.
2. Selection:
o Select the fittest individuals from the population for reproduction. Common
methods include:
Roulette Wheel Selection: Probabilistic selection based on fitness values.
Tournament Selection: Randomly pick a subset and select the best.
Rank Selection: Rank individuals by fitness and select based on rank.
3. Reproduction (Crossover):
o Combine the genetic material of parent solutions to produce offspring.
o Mimics biological reproduction to explore the search space.
4. Mutation:
o Introduce random changes to offspring to maintain diversity and explore new areas.
5. Replacement:
o Replace the least fit individuals with offspring to form the next generation.
6. Termination:
o Repeat the process until a stopping condition is met (e.g., maximum generations or
acceptable fitness).
Genetic Operators
1. Crossover (Recombination):
Combines genetic material from two parents to create offspring.
Types of crossover:
o Single-Point Crossover: Swap sections of chromosomes after a random split point.
o Multi-Point Crossover: Swap multiple sections using multiple split points.
o Uniform Crossover: Each gene is independently chosen from one of the parents.
2. Mutation:
Introduces random changes to the offspring’s genes.
Ensures diversity in the population and prevents premature convergence.
Types of mutation:
o Bit Flip: Change a binary gene (0 to 1 or vice versa).
o Swap Mutation: Swap the positions of two genes.
o Gaussian Mutation: Add a random Gaussian noise to numerical genes.
3. Selection:
Decides which individuals reproduce.
Strategies:
o Elitism: Always retain a fraction of the best solutions.
o Diversity Preservation: Encourage diversity to avoid local optima.
Using Genetic Algorithms
1. Problem Encoding:
Represent the problem as a chromosome. For example:
o Binary Encoding: Use binary strings for combinatorial problems.
o Real-Value Encoding: Use real numbers for continuous problems.
o Permutation Encoding: Use permutations for ordering problems like the Traveling
Salesman Problem (TSP).
2. Fitness Function Design:
Define a function that quantifies the quality of a solution.
Example for TSP: Fitness(X)=1Total Distance(X)\text{Fitness}(X) = \frac{1}{\text{Total
Distance}(X)}Fitness(X)=Total Distance(X)1
3. Algorithm Implementation:
1. Initialize the population.
2. Evaluate fitness of individuals.
3. Select parents based on fitness.
4. Apply Crossover and Mutation to generate offspring.
5. Replace old population with new generation.
6. Repeat until stopping criteria are met.