0% found this document useful (0 votes)
2 views

Module 2 Lab 3

This document provides a comprehensive guide to Manifold Learning, specifically focusing on ISOMAP, a non-linear dimensionality reduction technique. It explains the steps involved in ISOMAP, compares it with PCA, and discusses practical implementation tips and limitations. The guide is structured for beginners, with clear examples and explanations to facilitate understanding of complex data structures.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Module 2 Lab 3

This document provides a comprehensive guide to Manifold Learning, specifically focusing on ISOMAP, a non-linear dimensionality reduction technique. It explains the steps involved in ISOMAP, compares it with PCA, and discusses practical implementation tips and limitations. The guide is structured for beginners, with clear examples and explanations to facilitate understanding of complex data structures.

Uploaded by

katrao39798
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Detailed Explanation of Module 2 Lab 3: Manifold Learning Methods (Updated and

Structured for Beginners)


This guide explains every concept and step in the Manifold Learning lab, integrating the content
provided and all your follow-up queries. It is organized for clarity, with practical examples and
beginner-friendly language.

Section 1: What is Manifold Learning?


Manifold learning is a set of techniques for reducing the dimensionality of data by finding a
lower-dimensional “surface” (manifold) within a higher-dimensional space.
Many real-world datasets, though high-dimensional, actually lie on or near a much lower-
dimensional curved surface (manifold).
The goal: Find a new, low-dimensional representation of the data that preserves its
essential structure—especially for visualization or further analysis.

Section 2: Why Not Just Use PCA?


PCA (Principal Component Analysis) is a linear method: it works well if the data lies on a flat
(linear) subspace.
Drawbacks of PCA on curved manifolds:
PCA may need many more dimensions than the true manifold to capture the data
structure.
PCA can project faraway points along the manifold to nearby locations, losing the true
relationships [1] [2] .
PCA cannot capture curved or non-linear relationships [1] .

Section 3: What is ISOMAP?


ISOMAP stands for Isometric Mapping.
It is a non-linear dimensionality reduction technique based on spectral theory.
Key idea: Instead of preserving straight-line (Euclidean) distances, ISOMAP preserves
geodesic distances—the shortest path along the manifold, not through space [1] [2] [3] .
Result: ISOMAP can "unfold" curved data (like an S-curve) into a flat, low-dimensional
space while preserving meaningful relationships.
Section 4: How ISOMAP Works (Step-by-Step with Example)

Step 1: Construct the Neighborhood Graph


Goal: Capture local relationships by connecting each data point to its nearest neighbors [1]
[2] [3] [4] .

How:
For each data point, find its k nearest neighbors (using Euclidean distance).
Build a graph where each point is a node connected to its neighbors by edges weighted
by their distances.
You can use either the k-nearest neighbors method or an ε-ball (all points within a
certain radius) [5] .
Example:
Imagine 1000 points in 3D forming an S-curve. For each point, connect it to its 10 closest
points. The graph now represents local relationships.

Step 2: Compute Geodesic (Shortest Path) Distances


Goal: Estimate the true "manifold" distance between all pairs of points—not just the straight
line, but the shortest path along the graph [1] [2] [3] .
How:
Use Dijkstra’s algorithm (or similar) to find the shortest path between every pair of
points in the graph [1] [5] .
The sum of the edge weights along the shortest path gives the geodesic distance.
Example:
If points A and D aren’t directly connected, but A is connected to B, B to C, and C to D, the
geodesic distance from A to D is the sum of distances A–B, B–C, and C–D [5] .

Step 3: Find the Low-Dimensional Embedding (Using MDS)


Goal: Map the data into a lower-dimensional space (like 2D or 3D) while preserving
geodesic distances as much as possible [1] [2] .
How:
Square the geodesic distance matrix and double-center it.
Perform eigenvalue decomposition (like in PCA) to find the top eigenvectors (directions
with the most variance).
The top k eigenvectors (with the largest eigenvalues) become the axes of your new,
reduced space.
Project the data onto these axes to get the low-dimensional embedding [2] [3] .
Example:
For the S-curve, ISOMAP "unfolds" the curve into a flat 2D space, revealing the underlying
2D structure.
Section 5: ISOMAP in Practice

Python Implementation (with scikit-learn)

from sklearn.manifold import Isomap

# X is your high-dimensional data


embedding = Isomap(n_neighbors=10, n_components=2)
X_transformed = embedding.fit_transform(X)

n_neighbors: Number of neighbors for the neighborhood graph.


n_components: Number of dimensions for the output.

Manual Steps (as in the lab notebook)


1. Compute pairwise Euclidean distances for all points.
2. Keep only the k nearest neighbors for each point to build the graph.
3. Use Dijkstra’s algorithm to compute shortest (geodesic) paths.
4. Center the squared geodesic distance matrix.
5. Perform eigenvalue decomposition and select top components.
6. Project data onto these components for the final embedding.

Section 6: ISOMAP vs. PCA


Aspect PCA (Linear) ISOMAP (Non-linear)

Preserves Euclidean distances Geodesic (manifold) distances

Handles curves? No Yes

Good for Flat, linear data Curved, non-linear manifolds

Example Flat plane S-curve, Swiss roll

Section 7: Parameters and Practical Tips


Number of Neighbors (k):
Too low: The graph may break into disconnected pieces.
Too high: The graph may connect points that are not true neighbors, distorting the
manifold [2] [3] .
Tip: Try different values and visualize the results.
Connected Graph:
Ensure the neighborhood graph is a single connected component, or results may be
incoherent [2] [3] .

Section 8: Limitations and Drawbacks


ISOMAP struggles if the manifold is not well-sampled or has holes [2] [3] .
Careful parameter tuning (especially k) is required.
Computationally intensive for very large datasets (Landmark ISOMAP is an efficient variant)
[2] .

Section 9: Other Manifold Learning Methods


LLE (Locally Linear Embedding): Preserves local relationships.
t-SNE: Focuses on preserving local structure for visualization.
UMAP: Similar to t-SNE, often faster and better at maintaining global structure.
Key difference: Each method preserves different aspects of the data’s structure.

Section 10: Example – ISOMAP on an S-Curve


Dataset: 1000 points forming an S-shaped curve in 3D.
Process:
1. Build the neighborhood graph (each point connects to 10 nearest neighbors).
2. Compute geodesic distances using shortest paths.
3. Use MDS/eigenvalue decomposition to embed data in 2D.
Result: The S-curve is “unfolded” into a flat 2D shape, revealing the true underlying
structure.

Section 11: Key Takeaways


Manifold learning (like ISOMAP) helps uncover the true, lower-dimensional structure of
complex data.
ISOMAP is powerful for non-linear dimensionality reduction, especially when data lies on a
curved manifold.
Choosing parameters (like k) and method is crucial for good results.
Visualization after reduction helps interpret and understand high-dimensional data.

If you want a deeper explanation of any step, or want to see code for a particular part, just
ask!

1. https://www.sjsu.edu/faculty/guangliang.chen/Math253S20/lec10ISOmap.pdf
2. https://www.centron.de/en/tutorial/dimension-reduction-isomap/
3. https://www.mililink.com/upload/article/1159096330aams_vol_215_march_2022_a6_p2371-2382_s._gna
na_sophia,_k._k._thanammal_and_s._s._sujatha.pdf
4. https://labex.io/tutorials/ml-manifold-learning-with-scikit-learn-71115
5. https://www.youtube.com/watch?v=Xu_3NnkAI9s

You might also like