0% found this document useful (0 votes)
8 views4 pages

Manifold Learning For LLM Compression

The document discusses manifold learning techniques for compressing large language models (LLMs) and poses several research questions regarding the interpretation of LLM components as points on a manifold. It reviews multiple studies that explore various mathematical frameworks and methods for LLM compression, including pruning, reparametrization, and layer collapsing. The document also references several key papers and their contributions to the field, highlighting empirical validations and comparisons among different approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views4 pages

Manifold Learning For LLM Compression

The document discusses manifold learning techniques for compressing large language models (LLMs) and poses several research questions regarding the interpretation of LLM components as points on a manifold. It reviews multiple studies that explore various mathematical frameworks and methods for LLM compression, including pruning, reparametrization, and layer collapsing. The document also references several key papers and their contributions to the field, highlighting empirical validations and comparisons among different approaches.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Manifold Learning for LLM Compression

2025

1 Questions
1. What parts of an LLM can be interpreted as points on a manifold? More precisely, where do we apply the
manifold hypothesis in the context of LLMs?
2. What is the most suitable mathematical frame for modeling the (parts of an) LLM?
3. How do we fix experimental settings for checking the theoretical concepts that we formulate?
4. Knowing that the tokens’ space is a stratified manifold ([TokSpStr]), is there something further to investigate
regarding it?

2 Literature review
2.1 Ashkboos et al 2024 - SliceGPT: Compress Large Language Models by Deleting
Rows and Columns
• Studies LLM compression by reducing the embedding dimension
• LLM compression using smaller weight matrices
• Mathematical tools: orthogonal projections onto principal components for weight matrices
• Introduces a new sparsity method for LLM compression
• Uses PCA
• Theoretical study + empirical validation
• LLMs: OPT-1.3B, OPT-2.7B, OPT-6.7B, OPT-13B, OPT-30B, OPT-66B, Llama2-13B, Llama2-70B, Phi-2
• Datasets: WikiText-2
• Comparison: [SparseGPT] 2:4
• Code: https://github.com/microsoft/TransformerCompression

2.2 Gardinazzi et al 2025 - Persistent Topological Features in Large Language Models


• Studies the transformation of data passing through the layers of an LLM
• LLM compression via pruning of layers with smallest persistence similarity
• Mathematical tools: TDA - zigzag persistence
• Introduces persistent similarity for analyzing p-cycles
• Uses kN N graph + Vietoris-Rips complexes
• Theoretical study + empirical validation
• LLMs: Llama2-7B, Llama2-13B, Llama3-8B, Llama3-70B, Mistral-7B, Pythia-6.9B
• Datasets: MMLU, HellaSwag, WinoGrande; for analyzing similarity-based pruning: SST, Math-12K, Code-10K
• Comparisons: [PrDL], [ShortGPT]
• Code: https://anonymous.4open.science/r/conferenceProject-019A/

1
2.3 Datta et al 2025 - Topology of Out-of-Distribution Examples in Deep Neural Net-
works
• Studies the characterization of out-of-distribution examples using latent layer embeddings from DNNs
• Mathematical tools: persistent homology from [T-DNN]
• Analyzes the ”trivialization” of data in the case of classification problems with multiple classes

• Uses Vietoris-Rips complexes


• Empirical study
• DNNs: ResNet18 architecture for image classification

• Datasets: CIFAR-10, CIFAR-100, MNIST, EMNIST

2.4 Ma et al 2023 - LLM-Pruner: On the Structural Pruning of Large Language Models


• Studies task-agnostic LLM compression by structured pruning

• LLM compression via removing non-critical coupled structures based on gradient information
• Mathematical tools: gradients, structure dependency
• Introduces pruning of nodes

• Uses importance estimation for structures


• Theoretical study + empirical validation; many experiments
• LLMs: Llama2-7B, Vicuna-7B, ChatGLM-6B
• Datasets: BoolQ, PIQA, HellaSwag, WinoGrande, ARC-easy, ARC-challenge, OpenbookQA

• Code: https://github.com/horseee/LLM-Pruner

2.5 Men et al 2024 - ShortGPT: Layers in Large Language Models are More Redundant
Than You Expect
• Studies LLM compression via pruning of least important layers
• LLM compression via pruning of layers with smallest ”importance”

• Finds that the last layer is important and that reducing the number of layers is better than reducing the dimension
• Mathematical tools: Block Influence for measuring layer importance using (the Euclidean angle between) layer
input and output
• Introduces BI metric (simple idea)

• Uses cosine similarity measure between input and output of a layer


• Theoretical study + empirical validation
• LLMs: Llama2-7B, Llama2-13B, Baichuan2-7B, Baichuan2-13B

• Datasets: Reasoning: CMNLI, HellaSwag, PIQA; Language: CHID, WSC; Knowledge: CommonSenseQA,
BoolQ; Examination: MMLU, CMMLU; Understanding: Race-High/Middle, XSum, C3, PG19.
• Comparisons: [LLMPru], [SliceGPT], [LaCo]

2
2.6 Trash et al 2025 - MCNC: Manifold Constrained Network Compression
• LLM compression via reparametrization
• Mathematical tools: change of parameters from the original parameters of a model to a (hyper)sphere

• Uses stochastic gradient descent on a low-dimensional manifold obtained by reparametrization


• Theoretical study + empirical validation (good GPU compatibility)
• LLMs: Llama2-7B, Llama2-13B

• Comparisons: NOLA

2.7 Yang et al 2024 - LaCo: Large Language Model Pruning via Layer Collapse
• LLM compression via collapsing layers

• Mathematical tools: differences between parameters in consecutive layers, cosine similarity


• Introduces LLM pruning by collapsing layers starting from a specific one
• Uses cosine similarity for evaluation of merged layers

• Theoretical study + empirical validation


• LLMs: Llama2-7B, Llama2-13B, Baichuan2-7B, Baichuan2-13B
• Datasets: Reasoning: CMNLI, HellaSwag, PIQA; Language: CHID, WSC; Knowledge: CommonSenseQA,
BoolQ; Examination: MMLU, CMMLU; Understanding: Race-High/Middle, XSum, C3

• Comparisons: [LLMPru], [SliceGPT]


• Code: https://github.com/yangyifei729/LaCo

References
[SliceGPT] Ashkboos, S., Croci, M.L., do Nascimento, M.G., Hoeffler, T., Hensman, J., SliceGPT: Compress Large
Language Models by Deleting Rows and Columns, 2024. https://arxiv.org/pdf/2401.15024
[PS-LLM] Persistent Topological Features in Large Language Models, 2024. https://arxiv.org/abs/2410.11042
[TOOD-DNN] Datta, E., Hennig, J., Domschot, E., Mattes, C., Smith, M.R., Topology of Out-of-Distribution Examples
in Deep Neural Networks, 2025. https://arxiv.org/abs/2501.12522
[SparseGPT] Frantar, E., Alistarh, D., SparseGPT: Massive Language Models Can be Accurately Pruned in One-
Shot, Proceedings of the 40th International Conference on Machine Learning, PMLR 202:10323-10337, 2023.
https://proceedings.mlr.press/v202/frantar23a/frantar23a.pdf
[PrDL] Gromov, A., Tirumala, K., Shapourian, H., Glorioso, P., Roberts, D.A., The Unreasonable Ineffectiveness of
the Deeper Layers, (poster) ICLR 2025. https://arxiv.org/abs/2403.17887
[ML-LLM] Liu, D., Z. Qin, H. Wang, Z. Yang, Z. Wang, F. Rong, Q. Liu, Y. Hao, X. Chen, C. Fan, Z. Lv, Z. Tu, D.
Chu, B. Li, and D. Sui. Pruning via Merging: Compressing LLMs via Manifold Alignment Based Layer Merging,
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 17817–17829,
2024. https://aclanthology.org/2024.emnlp-main.987.pdf

[LLMPru] Ma, X., Fang, G., Wang, X., LLM-Pruner: On the Structural Pruning of Large Lan-
guage Models, Proceedings of the 37th Int. Conf. on Neural Information Processing Systems,
950, pages 21702-21720, 2023. https://proceedings.neurips.cc/paper_files/paper/2023/file/
44956951349095f74492a5471128a7e0-Paper-Conference.pdf
[ShortGPT] Men, X., Xu, M., Zhang, Q., Wang, B., Lin, H., Lu, Y., Han, X., Chen, W., ShortGPT: Layers in Large
Language Models are More Redundant Than You Expect, 2024. https://arxiv.org/pdf/2403.03853

3
[T-DNN] Naitzat, G., Zhitnikov, A., Lim, L.-K., Topology of Deep Neural Networs, Journal of Machine Learning
Research 21, 1-40, 2020. https://jmlr.csail.mit.edu/papers/volume21/20-345/20-345.pdf
[TokSpStr] Robinson, M., Dey, S., Sweet, S., The Structure of the Token Space for Large Language Models, 2024.
https://arxiv.org/abs/2410.08993

[TokManHyp] Robinson, M., Dey, S., Chiang, T., Token Embeddings Violate the Manifold Hypothesis, 2025. https:
//arxiv.org/abs/2504.01002
[MCNC] Trash, C., Abbasi, A., Nooralinejad, P., Koohpayegani, S.A., Andreas, R., Pirsiavash, H., Kolouri, S., MCNC:
Manifold Constrained Network Compression, (poster) ICLR 2025. https://arxiv.org/abs/2406.19301

[LaCo] Yang, Y., Cao, Z., Zhao, H., LaCo: Large Language Model Pruning via Layer Collapse, Findings of the
Association for Computational Linguistics: EMNLP 2024, pages 6401–6417, Miami, Florida, USA, 2024. https:
//aclanthology.org/2024.findings-emnlp.372/

You might also like