Manifold Learning For LLM Compression
Manifold Learning For LLM Compression
2025
1 Questions
1. What parts of an LLM can be interpreted as points on a manifold? More precisely, where do we apply the
manifold hypothesis in the context of LLMs?
2. What is the most suitable mathematical frame for modeling the (parts of an) LLM?
3. How do we fix experimental settings for checking the theoretical concepts that we formulate?
4. Knowing that the tokens’ space is a stratified manifold ([TokSpStr]), is there something further to investigate
regarding it?
2 Literature review
2.1 Ashkboos et al 2024 - SliceGPT: Compress Large Language Models by Deleting
Rows and Columns
• Studies LLM compression by reducing the embedding dimension
• LLM compression using smaller weight matrices
• Mathematical tools: orthogonal projections onto principal components for weight matrices
• Introduces a new sparsity method for LLM compression
• Uses PCA
• Theoretical study + empirical validation
• LLMs: OPT-1.3B, OPT-2.7B, OPT-6.7B, OPT-13B, OPT-30B, OPT-66B, Llama2-13B, Llama2-70B, Phi-2
• Datasets: WikiText-2
• Comparison: [SparseGPT] 2:4
• Code: https://github.com/microsoft/TransformerCompression
1
2.3 Datta et al 2025 - Topology of Out-of-Distribution Examples in Deep Neural Net-
works
• Studies the characterization of out-of-distribution examples using latent layer embeddings from DNNs
• Mathematical tools: persistent homology from [T-DNN]
• Analyzes the ”trivialization” of data in the case of classification problems with multiple classes
• LLM compression via removing non-critical coupled structures based on gradient information
• Mathematical tools: gradients, structure dependency
• Introduces pruning of nodes
• Code: https://github.com/horseee/LLM-Pruner
2.5 Men et al 2024 - ShortGPT: Layers in Large Language Models are More Redundant
Than You Expect
• Studies LLM compression via pruning of least important layers
• LLM compression via pruning of layers with smallest ”importance”
• Finds that the last layer is important and that reducing the number of layers is better than reducing the dimension
• Mathematical tools: Block Influence for measuring layer importance using (the Euclidean angle between) layer
input and output
• Introduces BI metric (simple idea)
• Datasets: Reasoning: CMNLI, HellaSwag, PIQA; Language: CHID, WSC; Knowledge: CommonSenseQA,
BoolQ; Examination: MMLU, CMMLU; Understanding: Race-High/Middle, XSum, C3, PG19.
• Comparisons: [LLMPru], [SliceGPT], [LaCo]
2
2.6 Trash et al 2025 - MCNC: Manifold Constrained Network Compression
• LLM compression via reparametrization
• Mathematical tools: change of parameters from the original parameters of a model to a (hyper)sphere
• Comparisons: NOLA
2.7 Yang et al 2024 - LaCo: Large Language Model Pruning via Layer Collapse
• LLM compression via collapsing layers
References
[SliceGPT] Ashkboos, S., Croci, M.L., do Nascimento, M.G., Hoeffler, T., Hensman, J., SliceGPT: Compress Large
Language Models by Deleting Rows and Columns, 2024. https://arxiv.org/pdf/2401.15024
[PS-LLM] Persistent Topological Features in Large Language Models, 2024. https://arxiv.org/abs/2410.11042
[TOOD-DNN] Datta, E., Hennig, J., Domschot, E., Mattes, C., Smith, M.R., Topology of Out-of-Distribution Examples
in Deep Neural Networks, 2025. https://arxiv.org/abs/2501.12522
[SparseGPT] Frantar, E., Alistarh, D., SparseGPT: Massive Language Models Can be Accurately Pruned in One-
Shot, Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10323-10337, 2023.
https://proceedings.mlr.press/v202/frantar23a/frantar23a.pdf
[PrDL] Gromov, A., Tirumala, K., Shapourian, H., Glorioso, P., Roberts, D.A., The Unreasonable Ineffectiveness of
the Deeper Layers, (poster) ICLR 2025. https://arxiv.org/abs/2403.17887
[ML-LLM] Liu, D., Z. Qin, H. Wang, Z. Yang, Z. Wang, F. Rong, Q. Liu, Y. Hao, X. Chen, C. Fan, Z. Lv, Z. Tu, D.
Chu, B. Li, and D. Sui. Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging,
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17817–17829,
2024. https://aclanthology.org/2024.emnlp-main.987.pdf
[LLMPru] Ma, X., Fang, G., Wang, X., LLM-Pruner: On the Structural Pruning of Large Lan-
guage Models, Proceedings of the 37th Int. Conf. on Neural Information Processing Systems,
950, pages 21702-21720, 2023. https://proceedings.neurips.cc/paper_files/paper/2023/file/
44956951349095f74492a5471128a7e0-Paper-Conference.pdf
[ShortGPT] Men, X., Xu, M., Zhang, Q., Wang, B., Lin, H., Lu, Y., Han, X., Chen, W., ShortGPT: Layers in Large
Language Models are More Redundant Than You Expect, 2024. https://arxiv.org/pdf/2403.03853
3
[T-DNN] Naitzat, G., Zhitnikov, A., Lim, L.-K., Topology of Deep Neural Networs, Journal of Machine Learning
Research 21, 1-40, 2020. https://jmlr.csail.mit.edu/papers/volume21/20-345/20-345.pdf
[TokSpStr] Robinson, M., Dey, S., Sweet, S., The Structure of the Token Space for Large Language Models, 2024.
https://arxiv.org/abs/2410.08993
[TokManHyp] Robinson, M., Dey, S., Chiang, T., Token Embeddings Violate the Manifold Hypothesis, 2025. https:
//arxiv.org/abs/2504.01002
[MCNC] Trash, C., Abbasi, A., Nooralinejad, P., Koohpayegani, S.A., Andreas, R., Pirsiavash, H., Kolouri, S., MCNC:
Manifold Constrained Network Compression, (poster) ICLR 2025. https://arxiv.org/abs/2406.19301
[LaCo] Yang, Y., Cao, Z., Zhao, H., LaCo: Large Language Model Pruning via Layer Collapse, Findings of the
Association for Computational Linguistics: EMNLP 2024, pages 6401–6417, Miami, Florida, USA, 2024. https:
//aclanthology.org/2024.findings-emnlp.372/