Mapper DR
Mapper DR
Abstract—High-dimensional data often presents challenges in interpret the underlying structure. In dimensionality reduction,
visualization and interpretation due to its complex structures it is often desirable to preserve the topological structure of
and intricate relationships. Traditional dimensionality reduction high-dimensional data, maintaining features like connectivity
techniques, such as PCA and t-SNE, often struggle to preserve
the topological features of such data, leading to the loss of and the presence of loops. However, traditional methods like
critical structural information. To address this, we propose a PCA and t-SNE lack mechanisms to explicitly preserve these
novel dimensionality reduction technique rooted in topological topological properties, often leading to information loss during
data analysis, which aims to maintain the intrinsic topological the reduction process.
structure while mapping the data into a lower-dimensional space.
To address this, Topological Data Analysis (TDA) [6]–
Our approach extracts persistent homology groups and critical
points as topological features, ensuring their invariance in the [8]has emerged as a powerful framework for understanding
reduced representation, and optimizes the bottleneck distance the shape and structure of data beyond mere distances. TDA
using a mapper-based skeleton. We demonstrate the effectiveness focuses on identifying properties that remain unchanged under
of our method on complex real-world datasets, showcasing its continuous transformations, making it well-suited for captur-
ability to uncover meaningful structures that are often overlooked
ing the intrinsic structure of complex datasets. One of the key
by traditional methods.
Index Terms—Topological Data Analysis, Dimensionality Re- tools in TDA is persistent homology [9], which analyzes how
duction, Persistent Homology, Mapper Algorithm topological features such as connected components, loops, and
voids persist across different scales. By computing persistent
I. I NTRODUCTION Betti numbers, it provides a robust summary of the topological
features present in data, allowing us to understand the structure
Dimensionality reduction is crucial for analyzing high- of high-dimensional datasets.
dimensional data, as such data often presents challenges like Another important tool in TDA is the Mapper algorithm
sparsity and the curse of dimensionality. In high-dimensional [10], which is used to visualize the shape of high-dimensional
Euclidean spaces, data points tend to be sparsely distributed, data [11]. Mapper constructs a graph-like skeleton by clus-
making it difficult to accurately capture their structure with- tering data using a filter function, revealing structures like
out a massive number of samples. This sparsity leads to branches and loops that represent significant patterns. It effec-
issues such as the crowding problem, where mapping high- tively captures topological information that may be overlooked
dimensional data into lower dimensions can distort dis- by traditional methods, making it a versatile tool for exploring
tances and relationships. Additionally, most points in high- complex data landscapes.
dimensional spaces concentrate near the edges of a unit In this work, we propose a novel dimensionality reduction
hypercube, making Euclidean distance metrics less effective, method that leverages TDA to address the limitations of
as differences between large and small distances become neg- traditional approaches. Our method focuses on preserving key
ligible. These challenges limit the performance of traditional topological features by maintaining consistent persistent Betti
dimensionality reduction methods like PCA and t-SNE [1], numbers between the original and reduced data. Additionally,
which often struggle to preserve the underlying structure of we integrate the Mapper skeleton into our approach to better
the data during reduction. As a result, there is a growing need retain the global structure of high-dimensional data, ensur-
for techniques that can retain essential structural information ing that important patterns and relationships are preserved
while mapping data to lower dimensions. throughout the reduction process.
Given these challenges, dimensionality reduction techniques
[1]–[5] have become a focal point of research, offering several II. R ELATED W ORK
key benefits. First, they increase the density of samples, allow-
ing distance-based algorithms to function effectively. Second, Numerous dimensionality reduction techniques have been
they remove redundancy in high-dimensional data, reducing developed to represent high-dimensional datasets [1]–[5].
storage and computational costs—especially important for However, a common limitation of these methods is the dis-
large datasets. Third, by mapping data into two or three di- tortions they introduce [3], [12], [13], which can vary sig-
mensions, they enable direct visualization, making it easier to nificantly with changes in hyperparameters. These distortions
1/5
often disrupt the global structure of the data, affecting rela- b) Mapper: Mapper is a TDA tool that visualizes the
tionships between clusters and pairwise distances. In fields shape of high-dimensional data by creating a simplified graph
like physics and biology, where accurate data interpretation is representation. Given a point cloud X ⊂ Rn and a filter
essential, such inconsistencies can lead to misleading conclu- function f : X → R, Mapper projects the data using f and
sions. Moreover, methods like UMAP and t-SNE frequently covers the range of f with overlapping intervals {Ii }m i=1 . For
produce non-canonical, inconsistent representations, with re- each interval Ii , the subset Xi = f −1 (Ii ) is clustered. The
sults highly sensitive to initialization and hyperparameters [3], resulting Mapper graph M has nodes representing clusters,
[14]. This challenge has led to increasing interest in TDA, with edges between nodes if the clusters share data points.
which offers tools to better preserve the intrinsic structure of Formally, let {Cij } be the clusters for interval Ii :
data across scales. m
Recent work in topological data analysis (TDA) has enabled M=
[
{Cij | Cij ̸= ∅},
the integration of persistent homology with optimization [15]– i=1
[18], making it a powerful tool for assessing topological
similarity. [19] developed methods for differentiating functions with an edge between Cij and Ckl if Cij ∩ Ckl ̸= ∅. The
based on persistence diagrams, and [20] categorized tech- choice of f , interval overlaps, and clustering method shapes
niques for regularizing these functions, providing a foundation the Mapper graph, making it effective for detecting features
for comparing datasets’ topological structures using metrics like loops, branches, and connected components in complex
like the bottleneck and Wasserstein distances. Building on data.
this, Rieck and Leitte [21], [22] introduced the idea of using
Wasserstein distance to compare persistence diagrams of high- B. Proposed Algorithm
dimensional data with their lower-dimensional embeddings. In this work, we develop a dimensionality reduction algo-
This approach has inspired methods that iteratively adjust low- rithm that combines persistent homology with Mapper-based
dimensional embeddings to minimize topological differences, initialization to preserve topological features. Our goal is to
effectively preserving structural features during dimensional- find a low-dimensional embedding Y ⊂ Rd (with d ≪ n)
ity reduction. Extensions such as topology-preserving graph that maintains the topological structure of the original high-
autoencoders [23] and differentiable topological layers [24] dimensional data X ⊂ Rn . The process involves the following
further broaden TDA’s applications in deep learning. four steps:
III. M ETHODS a) Step 1: Constructing the Mapper Graph.: We use
the Mapper method to encode the structural information of
A. Preliminaries X. Using a suitable filter function f : X → R, we create
a) Persistent Homology: Persistent Homology is a fun- overlapping intervals and cluster data subsets within each
damental concept in TDA that identifies and quantifies topo- interval, resulting in a simplified Mapper graph M . The
logical features such as connected components, loops, and nodes of M , which we refer to as critical points, represent
voids in data. Given a point cloud X ⊂ Rn , we construct clusters of similar data points, capturing essential structures
a simplicial complex K to approximate its shape. Homology and transitions in X. These critical points play a crucial role
groups Hp (K) capture p-dimensional features (e.g., connected in guiding the subsequent dimensionality reduction process.
components for p = 0, loops for p = 1). The p-th Betti b) Step 2: Initializing the Low-Dimensional Embedding
number βp represents the number of such features, provid- Ymapper .: We initialize the low-dimensional representation
ing a summary of the data’s structure. Persistent homology Ymapper , where the number of points matches the number
extends this idea by analyzing how these features persist of nodes (critical points) in the Mapper graph M . This
across different scales in a filtration of simplicial complexes initialization is done by positioning Ymapper within the same
{K i }, where ∅ = K0 ⊆ K1 ⊆ · · · ⊆ Km = K. As the structure as M , aligning it with the topology of X. To refine
scale increases, features may appear and disappear, which this alignment, we minimize the following loss function:
is captured in a persistence diagram—a collection of point
pairs (i, j) indicating the birth and death of each feature. L(Ymapper , Xmapper ) = W2,2 (D(Ymapper ), D(Xmapper )), (1)
The persistence diagram provides a concise summary of the
topological changes in the data across scales. To quantify where W2,2 is the Wasserstein distance with p = q = 2, com-
the difference between two diagrams, we use the Wasserstein paring the persistence diagrams D(Xmapper ) and D(Ymapper ).
distance: c) Step 3: Transition from Ymapper to Y .: After optimizing
1/p Ymapper , which aligns with the critical points of X, we extend
this representation to a full low-dimensional embedding Y that
(1) (2) X
Wp,q Dk , Dk = inf ∥x − π(x)∥pq , matches the original number of data points in X. This involves
(1) (2)
π:Dk →Dk (1)
x∈Dk
assigning each point in X to a corresponding position in Y ,
guided by the proximity to the nodes in the Mapper graph. This
where ∥ · ∥q denotes the q-norm, and π ranges over all transition allows the topological structure captured in Ymapper
(1) (2)
bijections between diagrams Dk and Dk . to guide the layout of the full embedding Y .
2/5
d) Step 4: Optimizing the Topological Loss between X an interval of 16.7 milliseconds, resulting in a dense and high-
and Y .: To ensure that the final low-dimensional embedding dimensional time series representation.
Y maintains the topological features of the original data X, Each activity comprises approximately 6000 sensor records
we define a combined loss function: per body part, resulting in a dataset of over 200,000 samples
across all activities. This dataset is inherently more complex
L(X, Y ) = W2,2 (D(X), D(Y )) + λLcritical (X, Y ), (2) than MNIST due to several factors:
where W2,2 measures the Wasserstein distance between the • High Dimensionality and Temporal Nature: Unlike
persistence diagrams D(X) and D(Y ), with p = q = 2. the static image data in MNIST, the Multiple-Activity
The term Lcritical (X, Y ) penalizes the relative Euclidean Dataset consists of time-series data where each sample
Distance in critical points. We iteratively update Y to minimize encapsulates a sequence of sensor readings. This requires
L(X, Y ), refining the low-dimensional embedding until it the dimensionality reduction method to preserve both
closely matches the topological structure of X. spatial and temporal dependencies within the data.
Our algorithm thus integrates Mapper-based initialization • Sensor Correlation: The data from different body parts
with a topology-preserving optimization framework, providing are interrelated, as movements are often synchronized or
a robust solution for dimensionality reduction that maintains coordinated. Capturing these relationships is crucial for
both local and global data structures. understanding the underlying dynamics of each activity.
• Activity Variability: The nine activities span a range of
IV. DATASET motion intensities and patterns, from simple activities like
In this study, we employ two datasets to evaluate the sitting to more dynamic ones like jogging or jumping.
effectiveness of our proposed dimensionality reduction algo- A successful dimensionality reduction technique must
rithm. The MNIST dataset serves as a familiar baseline for distinguish these subtle variations while maintaining the
comparison, while the Multiple-Activity Dataset represents a overall structure of the dataset.
more complex and domain-specific challenge that forms the The complexity and richness of this dataset make it a
core of our analysis. valuable test case for our dimensionality reduction algorithm,
especially in scenarios where preserving the underlying struc-
A. MNIST Dataset
ture and temporal dynamics is critical. By applying our method
The MNIST dataset is a well-known benchmark in the field to this dataset, we aim to demonstrate not only the algorithm’s
of machine learning, widely used for evaluating image pro- capacity to simplify high-dimensional data but also its ability
cessing and dimensionality reduction techniques. It contains a to uncover meaningful patterns in human movement data that
total of 70,000 grayscale images of handwritten digits (0-9), would be challenging for traditional techniques.
each represented as a 28x28 pixel image, resulting in a 784-
dimensional vector for each sample. In our study, we select a V. E XPERIMENT
subset of 2,500 samples from the training set for the purpose Our experiments evaluate the effectiveness of our proposed
of visual assessment, as our primary goal is to evaluate the dimensionality reduction method in preserving topological fea-
visualization capabilities of different dimensionality reduction tures and structure across different types of high-dimensional
methods rather than performing classification tasks. This sub- data, with a primary focus on its application to the Multiple-
set allows us to examine how well our algorithm preserves the Activity Dataset and using the MNIST dataset as a baseline
inherent structure of the high-dimensional data when mapping for comparison.
it into a lower-dimensional space. The MNIST dataset provides
a straightforward benchmark to verify the generalizability of A. Multiple-Activity Dataset Experiment
our algorithm to typical image data. The primary set of experiments focuses on the Multiple-
Activity Dataset. We begin by preprocessing the data, re-
B. Multiple-Activity Dataset moving outliers to reduce noise, and normalizing the data
While MNIST serves as a benchmark, the Multiple-Activity to ensure a zero-mean distribution. To capture the tempo-
Dataset is the central focus of our study, presenting a more ral dynamics of human movements, we organize the sensor
challenging and nuanced problem space. This dataset was readings into sliding windows of varying sizes. Each window
collected using Xsens Dot sensors, which are advanced wear- contains a continuous segment of readings, and the window
able inertial measurement units (IMUs) capable of high- size determines the temporal resolution of each sample. For
precision tracking of human movement. The data captures example, a window size of 50 represents a time sequence of 50
nine different indoor activities: Ascending Stairs, Descending consecutive readings per sensor, resulting in a 450-dimensional
Stairs, Jogging, Jumping, Lying, Pushing an Object, Pushups, vector (50 readings × 9 features).
Sitting, and Walking. Sensors were strategically placed on four Our method projects these high-dimensional windows into a
body parts (left hand, right hand, left arm, and right arm), and two-dimensional space, enabling visualization of the activities.
each sensor recorded nine features: Eulerx , Eulery , Eulerz We compare the results with classical dimensionality reduc-
(Euler angles), Accx , Accy , Accz (acceleration), and Gyrx , tion techniques like PCA, Manifold Intrinsic Dimensionality
Gyry , Gyrz (gyroscope readings). The data was sampled at Search (MIDS), and ISOMAP. Additionally, we test a hybrid
3/5
approach, where PCA is used to reduce the dimensionality B. MNIST Dataset Experiment
to 10 before applying our method to further reduce it to two To evaluate the generalizability of our method, we also
dimensions (see Fig. 1). apply it to the MNIST dataset, a common benchmark in
dimensionality reduction and visualization. Using a subset of
2,500 samples, we initially reduce the dimensionality from 784
to 10 using t-SNE, retaining the primary structural features.
Our method is then applied to further reduce the data to two
dimensions, allowing direct comparison with other methods.
We assess two initialization strategies: PCA-based initializa-
tion (see Fig. 3), which provides a structured starting point, and
random initialization (see Fig. 4), which tests the robustness
of our approach. The results show that both strategies produce
similar visual patterns, suggesting that the method can adapt
to different initial conditions, though PCA initialization tends
to offer slightly more refined results.
4/5
To assess whether topological dimensionality reduction pre- R EFERENCES
serves the distributional differences among high-dimensional [1] L. Van der Maaten and G. Hinton, “Visualizing data using t-SNE,”
data from different activities, we employed a random forest Journal of machine learning research, vol. 9, no. 11, 2008.
classifier to evaluate the separability of data after reduction. [2] L. McInnes, J. Healy, and J. Melville, “UMAP: Uniform manifold
approximation and projection for dimension reduction,” arXiv preprint
The classification accuracy provides insight into how well the arXiv:1802.03426, 2018.
reduced representations distinguish between various activities. [3] Y. Wang, H. Huang, C. Rudin, and Y. Shaposhnik, “Understanding how
To further explore the reasons behind these results, we dimension reduction tools work: an empirical approach to deciphering
t-SNE, UMAP, TriMAP, and PaCMAP for data visualization,” J Mach.
introduced a feature extractor before applying traditional di- Learn. Res, vol. 22, pp. 1–73, 2021.
mensionality reduction methods. This extractor generates 36 [4] M. Belkin and P. Niyogi, “Laplacian eigenmaps and spectral techniques
features (see Table I), including 27 time-domain features and for embedding and clustering,” Advances in neural information process-
ing systems, vol. 14, 2001.
9 frequency-domain features. As shown in Table II, these [5] J. B. Tenenbaum, V. De Silva, and J. C. Langford, “A global geometric
commonly used signal processing features significantly im- framework for nonlinear dimensionality reduction,” Science, vol. 290,
prove the classification performance on raw data. However, the no. 5500, pp. 2319–2323, 2000.
[6] S. Barannikov, “The framed Morse complex and its invariants,” Ad-
performance often deteriorates after dimensionality reduction, vances in Soviet Mathematics, vol. 21, pp. 93–115, 1994.
suggesting that the feature extraction process may not align [7] A. J. Zomorodian, Computing and comprehending topology: Persistence
well with the reduced representation. and hierarchical Morse complexes (Ph.D.Thesis). University of Illinois
at Urbana-Champaign, 2001.
TABLE I: Details of the Feature Extractor [8] F. Chazal and B. Michel, “An introduction to topological data analysis:
fundamental and practical aspects for data scientists,” arXiv preprint
Feature ID Feature Feature Detail arXiv:1710.04019, 2017.
µ= K 1 PK [9] A. Zomorodian and G. Carlsson, “Computing persistent homology,”
1 Mean i=0 Si
q P in Proceedings of the twentieth annual symposium on Computational
1 K 2
2 STD σ= K i=1 (Si − µ) geometry, 2004, pp. 347–356.
3 Min Min(S) [10] R. Kraft, “Illustrations of data analysis using the mapper algorithm and
4 Max Max(S) persistent homology,” 2016.
[11] G. Singh, F. Mémoli, G. E. Carlsson et al., “Topological methods for
5 Mode The most frequent value in S
the analysis of high dimensional data sets and 3d object recognition.”
6 Value Range Max(S) - Min(S) PBG@ Eurographics, vol. 2, 2007.
7 Mean Crossing Counts of signal cross the mean value [12] T. Chari, J. Banerjee, and L. Pachter, “The specious art of single-cell
8-17 Freq. Peak Values Top 10 freq. spectrum peak values genomics,” bioRxiv, 2021.
18-27 Top Frequencies Freq. of top 10 freq. spectrum peaks [13] J. Batson, C. G. Haaf, Y. Kahn, and D. A. Roberts, “Topological
PK 2 obstructions to autoencoding,” Journal of High Energy Physics, vol.
28 Signal Energy i=1 (Si ) P
2021, no. 4, pp. 1–43, 2021.
29 Shape Mean µshape = e N
1
i=1 iPi
q P [14] D. Kobak and G. C. Linderman, “Initialization is critical for preserving
1 N 2
30 Shape STD σshape = e i=1 (i − µshape ) Pi global data structure in both t-SNE and UMAP,” Nature biotechnology,
i−µ 3 vol. 39, no. 2, pp. 156–157, 2021.
γshape = 1e N shape
P
31 Shape Skewness i=1 σ
Pi [15] O. Kachan, “Persistent homology-based projection pursuit,” in Proceed-
i−µshape 4
32 Shape Kurtosis ζshape = 1e i=1
P N shape
Pi − 3 ings of the IEEE/CVF Conference on Computer Vision and Pattern
σshape
1 PN
Recognition Workshops, 2020, pp. 856–857.
33 Amplitude Mean µamp = N i=1 Pi [16] A. Wagner, E. Solomon, and P. Bendich, “Improving metric di-
q P
1 N 2 mensionality reduction with distributed topology,” arXiv preprint
34 Amplitude STD σamp = N i=1 (Pi − µamp )
N
P −µ
3 arXiv:2106.07613, 2021.
1 P i amp
35 Amplitude Skewness γamp = N i=1 σamp [17] M. Moor, M. Horn, B. Rieck, and K. Borgwardt, “Topological autoen-
Pi −µamp 4 coders,” in International conference on machine learning. PMLR, 2020,
1 PN
36 Amplitude Kurtosis ζamp = N i=1 σ
−3
amp pp. 7045–7054.
[18] B. J. Nelson and Y. Luo, “Topology-preserving dimensionality reduction
via interleaving optimization,” arXiv preprint arXiv:2201.13012, 2022.
TABLE II: Results of Random Forest Classifier with and [19] M. Carriére, F. Chazal, M. Glisse, Y. Ike, H. Kannan, and Y. Umeda,
without Feature Extraction “Optimizing persistent homology based functions,” in International
Conference on Machine Learning. PMLR, 2021, pp. 1294–1303.
Method PH PCA+PH PCA MDS ISOMAP [20] J. Leygonie, S. Oudot, and U. Tillmann, “A framework for differential
Acc (without Feature Extraction) 0.377 0.294 0.433 0.392 0.197 calculus on persistence barcodes,” Foundations of Computational Math-
Acc (with Feature Extraction) 0.252 0.252 0.294 0.141 0.281 ematics, pp. 1–63, 2021.
[21] B. Rieck and H. Leitte, “Persistent homology for the evaluation of di-
mensionality reduction schemes,” in Computer Graphics Forum, vol. 34,
VI. C ONCLUSION no. 3. Wiley Online Library, 2015, pp. 431–440.
[22] ——, “Agreement analysis of quality measures for dimensionality re-
We proposed a novel dimensionality reduction method duction,” in Topological Methods in Data Analysis and Visualization.
that leverages topological data analysis, combining persistent Springer, 2015, pp. 103–117.
homology and Mapper-based initialization to preserve topo- [23] Z. Luo, C. Xu, Z. Zhang, and W. Jin, “A topology-preserving di-
mensionality reduction method for single-cell rna-seq data using graph
logical features. Experiments on the MNIST and Multiple- autoencoder,” Scientific reports, vol. 11, no. 1, pp. 1–8, 2021.
Activity datasets demonstrated that our method effectively [24] K. Kim, J. Kim, M. Zaheer, J. Kim, F. Chazal, and L. Wasserman, “Pllay:
retains critical structural information, providing improved efficient topological layer based on persistent landscapes,” Advances
in Neural Information Processing Systems, vol. 33, pp. 15 965–15 977,
visualization and interpretability over traditional techniques. 2020.
These results highlight the potential of integrating topological
insights into dimensionality reduction for analyzing complex
high-dimensional data.
5/5