0% found this document useful (0 votes)
42 views7 pages

Finezip:: Pushing The Limits of Large Language Models For Practical Lossless Text Compression

Uploaded by

satyamkr.verma27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views7 pages

Finezip:: Pushing The Limits of Large Language Models For Practical Lossless Text Compression

Uploaded by

satyamkr.verma27
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

FineZip : Pushing the Limits of Large Language Models for

Practical Lossless Text Compression

Fazal Mittu1 , Yihuan Bu1 , Akshat Gupta1 , Ashok Devireddy1 , Alp Eren Ozdarendeli1 ,
Anant Singh2 , Gopala Anumanchipalli1
1
UC Berkeley, 2 NYU
[email protected]

Abstract large language models (LLMs) can be used to com-


press data from various modalities. Huang et al.
While the language modeling objective has (2024) followed up this work by showing that in-
been shown to be deeply connected with com-
arXiv:2409.17141v1 [cs.CL] 25 Sep 2024

creasing compression abilities of LLMs is linearly


pression, it is surprising that modern LLMs are
not employed in practical text compression sys- correlated to downstream task performance.
tems. In this paper, we provide an in-depth anal- Previous works have exploited this connection
ysis of neural network and transformer-based for lossless text compression. Neural network
compression techniques to answer this question. based models have been implemented for text com-
We compare traditional text compression sys- pression (Schmidhuber and Heil, 1996; Mahoney,
tems with neural network and LLM-based text 2000; Goyal et al., 2018) and have reached better
compression methods. Although LLM-based compression performance than traditional algorith-
systems significantly outperform conventional
compression methods, they are highly imprac-
mic compressors such as gzip. More recent meth-
tical. Specifically, LLMZip, a recent text com- ods have explored using LSTM and transformer
pression system using Llama3-8B requires 9.5 models (Bellard, 2019, 2021). These methods fall
days to compress just 10 MB of text, although under the "online" compressors category, where
with huge improvements in compression ra- a randomly initialized model is directly trained
tios. To overcome this, we present FineZip - a on the data being compressed. In this case, the
novel LLM-based text compression system that model parameters also become part of the compres-
combines ideas of online memorization and dy-
sion. A recent effort, LLMZip (Valmeekam et al.,
namic context to reduce the compression time
immensely. FineZip can compress the above 2023), tested the use of LLMs for lossless compres-
corpus in approximately 4 hours compared to sion. Given an LLM’s ability to predict the next
9.5 days, a 54 times improvement over LLMZip token provided a fixed-length context window, a
and comparable performance. FineZip out- tokenized text can be stored as probabilistic ranks
performs traditional algorithmic compression produced by an LLM predicting the next token.
methods with a large margin, improving com- This is a type of "offline" compression, with a fixed
pression ratios by approximately 50%. With system used for both compression and decompres-
this work, we take the first step towards making
lossless text compression with LLMs a reality.
sion of all incoming text.
While FineZip presents a significant step in In this paper, we build on prior work and intro-
that direction, LLMs are still not a viable solu- duce FineZip, which uses LLMs for lossless text
tion for large-scale text compression. We hope compression with both online and offline compo-
our work paves the way for future research and nents. FineZip combines an "online" component,
innovation to solve this problem. which memorizes the data being compressed, with
an "offline" component in the form of pre-trained
LLMs for compression. The "online" memoriza-
1 Introduction
tion is done by fine-tuning the model on the data
While the relationship between language modeling being compressed in a parameter-efficient way (Hu
and compression has long been known (Schmidhu- et al., 2021; Dettmers et al., 2023) with an addi-
ber and Heil, 1996; Mahoney, 2000; Goyal et al., tional constant overhead of the learned embeddings
2018; Bellard, 2019), recent works (Delétang et al., during fine-tuning. The "offline" component of the
2024; Huang et al., 2024) have reinforced this con- system is the pre-trained LLM which remains fixed
nection. Delétang et al. (2024) recently showed across different corpora. Figure 1 depicts the sys-
Figure 1: System diagram for FineZip.

tem diagram for FineZip. With this approach, we acter in a word occupies 8 bits (1 byte in UTF-8
can leverage the benefits of online compression for encoding), representing the word as a token, es-
improved performance without the drawback of sentially converting it into a number, will almost
requiring additional storage for model parameters. always reduce the number of bytes needed to repre-
Additionally, with FineZip we allow for a dy- sent it. This connection was also observed in Delé-
namic context where each token being compressed tang et al. (2024). As a next step, we can use the
has a context size of equal to its position in a sen- predictive capabilities of LLMs for compression.
tence. This allows us to batch compression and This idea is used in LLMZip (Valmeekam et al.,
decompression steps using LLMs, allowing for sig- 2023) where they use a pre-trained LLM for text
nificant speed-up. "Online memorization" using compression. The connection between language
PEFT methods also allows the model to compen- modeling and compression becomes intuitive when
sate for loss of performance due to a dynamic con- we take a deeper look at the language modeling
text, while a dynamic context allows for batching objective, implemented using a cross-entropy loss.
which allows compression and decompression of It aims to make each token in the training data the
many batches of text in parallel within a fixed com- most probable token given the context preceding
pute budget. With FineZip, we can achieve 54 it, thus minimizing the number of bits required to
times faster compression times with minor loss of represent the rank of the token in the vocabulary
performance when compared to LLMZip, still out- list, when ranked in descending order according to
performing traditional text compression methods their probability. Following this line of thought, we
by a huge margin. Our work also shows that com- propose an intuitive yet effective way of enhanc-
pression rates of LLM-based methods are still not ing this - fine-tuning the model on the data being
low enough for practical use cases, and although compressed.
FineZip pushes the limits of using LLMs loss- A challenge towards fine-tuning modern LLMs
less text compression in practice, much work still is that they are memory-intensive. Additionally, if
needs to be done. The code for our work can be we fine-tune the entire model on the text being com-
found here - https://github.com/fazalmittu/ pressed, then the entire LLM becomes part of the
FineZip. compression, requiring an additional space equal
to the space required to store the model for decom-
2 Introducing FineZip pression. Thus, we propose FineZip, a compres-
sion framework that involves parameter-efficient
The most basic form of compression using LLMs fine-tuning (PEFT) (Mangrulkar et al., 2022) on the
would be to tokenize the input text. Since each char- input text as an "online" step prior to compression.
We call this the "online memorization" step which Method Compression Ratio Time (min)
zlib 0.3251 0.0083
makes the data being compressed more probable gzip 0.3238 0.0141
for the LLM. This fine-tuning is implemented using bzip2 0.2374 0.0437
LoRA (Hu et al., 2021) and is much faster than full NNCP 0.15021 251
LLMZip (AC) 0.0795 13571
fine-tuning, requires much less GPU memory, and LLMZip 0.1163 13651
requires a very small amount of additional storage Finezip (AC) 0.0797 13118
for the trained embeddings. The additional embed- Finezip 0.12799 250
Finezip-4bit 0.1445 67
ding storage does not scale with the dataset being
compressed and becomes negligible at large sizes Table 1: Comparison of Compression Methods on 10mb
of corpora.
Another key difference between LLMZip and
FineZip is that FineZip adopts a dynamic context Modifications to LLMZip: LLMZip originally
size approach rather than maintaining a fixed slid- used Llama-1-7B (Touvron et al., 2023a) while
ing window. LLMZip uses a permanent sliding we leverage Llama-3-8B for both LLMZip and
window approach, where the rank of each token FineZip for uniform comparison. Additionally,
produced has a fixed context window of a preset LLMZip used two methods for compression - one
context size (512 as chosen by original authors). using arithmetic coding (AC) and the other using
This by design makes the compression process ex- a secondary compression methods on generated
tremely autoregressive and non-parallelizable, as to ranks. LLMZip uses zlib (Jean-loup Gailly, 2024)
produce the rank of a token, you need the previous as a secondary compression method over ranks
512 tokens. whereas our experiments show that bzip2 provides
a much better compression ratio (Appendix: A.1).
FineZip overcomes this limitation by employing a
Thus, we use bzip2 as our secondary compres-
two-step dynamic context window technique:
sion method for LLM ranks in both LLMZip and
1. Divide the corpus into chunks of a pre-decided FineZip. We also refer to bzip2 as the baseline
window length. for text compression using traditional compression
methods (Table 1). To offer a better comparison,
2. Produce the ranks of each token within the
we also create a version of FineZip that incorpo-
window such that the rank for the ith token is
rates arithmetic coding. The process uses the logits
produced based on the tokens preceding it
that the LLM outputs for each new token as the
The dynamic context window gives a variable probability distribution update for the arithmetic
context size to each token in a chunk. For a uni- coding scheme.
form comparison, we use a chunking size of 512 in We used the first 10mb of the enwik8 (Marcus
FineZip, which is the same as the context window Hutter, 2006) dataset which is a standard bench-
size chosen by LLMZip. In FineZip, the ith token mark for compression tasks. Though compression
in a chunk has a context size of i − 1, thus only ratio (ratio of compressed file size and original file
the final token in a chunk has access to full context size) is the key metric, we are also interested in
length of 512. In contrast, every token in LLMZip measuring time taken by these compression sys-
has access to the full context length of 512. The tems to evaluate practicality. The results are shown
dynamic context leads to some loss of performance, in Table 1. The first key observation is that neu-
which is made up for by online memorization. ral network and LLM based compression methods
have significantly better compression ratios than
3 Experiments
traditional text compression methods (zlib, gzip,
We begin by comparing FineZip with (i) tradi- bzip2), thus highlighting the potential impact of
tional text compression methods - bzip2 (Julian these methods for text compression. The second
Seward, 2024), zlib (Jean-loup Gailly, 2024), and key observation is that neural network and LLM
gzip (Jean-loup Gailly, 1992), (ii) neural network based methods takes a long time to compress even
based text compression methods - NNCP (Bellard, small amounts of text, thus preventing their use in
2021), and the (iii) recent LLM-based text compres- practice. This is especially true when using AC
sion method called LLMZip. For both FineZip for compression in LLM-based methods, which
and LLMZip, we use Llama-3 8B (Dubey et al., produces exceptional compression ratios but also
2024). requires unprecedentedly large amounts of time.
Figure 3: Compressing 10mb dataset with LLama-3
Figure 2: FineZip ablations for different fine-tune 8B loaded with 4, 8, 16, and 32-bit precision. Purple
epochs bar shows compression ratio, red line shows time taken
to compress. Each batch size was chosen to max out
For LLMZip with AC, the time taken to compress memory on a 48GB GPU.
10MB of data is approximately 9.5 days. Thus,
we do not explore AC-based LLM compression is able to mitigate the loss in performance. We
further and strictly compare only rank-based LLM further push the limits of compression time using
baselines. quantization. We perform the memorization step
Table 1 shows that FineZip is able to achieve using QLoRA (Dettmers et al., 2023) and perform
comparable or better compression ratios than both compression using the quantized model. We do
NNCP and LLMZip with a much faster compres- this using a fixed compute budget of 48GB GPU
sion time. Specifically, we see that FineZip has memory on a single NVIDIA A6000 GPU. Lower
a much better compression ratio than NNCP with precision models will allow us to increase batch
comparable amount of compression time, while size and in turn, decrease time needed to compress
the 4-bit quantized FineZip is approximately 4 a file by a sizable amount. Figure 3 shows that fine-
times faster than NNCP and still exhibits a better tuning/compressing a 4 bit model allows us to fit a
compression ratio. FineZip compresses enwik8 batch size of 70 on one A6000 GPU and achieve a
within 4 hours, compared to approximately 227 compression time of 67 minutes. This 4x speed up
hours taken by LLMZip. This is a 54x improve- makes FineZip not only a competitive compressor
ment on compression time with a minor drop of 1 out-performning traditional text compression sys-
percentage point in compression ratio. tems by a huge margin, but also the fastest neural
network/transformer based compression currently
3.1 FineZip Ablations available.
FineZip uses an "online memorization step" as
shown in Figure 1 before performing compression.
4 Conclusion
This is done using Low-Rank Adaptation (LoRA)
(Hu et al., 2021). We compare the effect of fine-
In this paper we explore the possibility of using
tuning on compression using 3 different language
LLMs for lossless text compression. We show that
models: GPT2-XL 1.3B (Radford et al., 2019),
while using neural network and LLM based text
LLama-2 7B (Touvron et al., 2023b), and LLama-
compression systems lead to significantly better
3 8B (Dubey et al., 2024). We see that for each
compression rates, they also require impractical
model, memorization improves the absolute com-
amounts of compression time. To alleviate this,
pression ratio by at least 1 percentage point or a
we introduce FineZip - an LLM-based lossless
relative improvement of about 8% over its non-
text compression system which compresses text 54
fine-tuned baseline as shown in Figure 2. This is
times faster than LLMZip with a minor loss in com-
significant especially when dealing with such low
pression performance. FineZip also improves on
compression rates. It should be noted that the time
the compression ratio of traditional text compres-
taken for memorization is negligible compared to
sion systems by approximately 50%. We also show
compression time and can be ignored.
that while FineZip presents a significant step in
Quantization: We saw in Table 1 that dynamic making practical text compression systems using
context helps speed up the compression process LLMs, much still needs to be done. We hope our
by significant amounts, while online memorization work can serve as a stepping stone in that direction.
5 Limitations Julian Seward. 2024. bzip2 - a free and open-source file
compression program. Accessed: 2024-06-01.
LLM-based text compression systems assume a
GPU being available in the host machine for lo- Matthew V. Mahoney. 2000. Fast text compression with
neural networks. In Proceedings of the Thirteenth
cal compression. While this is not true for every International Florida Artificial Intelligence Research
personal computer, the landscape is rapidly chang- Society Conference, pages 230–234. AAAI Press.
ing. Many personal laptops are now equipped with
GPUs and as compute becomes cheaper and the Sourab Mangrulkar, Sylvain Gugger, Lysandre De-
but, Younes Belkada, Sayak Paul, and Benjamin
power of LLMs grow, we envision a future where Bossan. 2022. Peft: State-of-the-art parameter-
every personal computer will be equipped with an efficient fine-tuning methods. https://github.
LLM running locally and performing various tasks. com/huggingface/peft.
Marcus Hutter. 2006. enwik8. http://prize.
hutter1.net/index.htm. Accessed: 2024-08-15.
References
Fabrice Bellard. 2019. Lossless data compression with Alec Radford, Jeffrey Wu, Rewon Child, David Luan,
neural networks. Dario Amodei, Ilya Sutskever, et al. 2019. Language
models are unsupervised multitask learners. OpenAI
Fabrice Bellard. 2021. Nncp v2: Lossless data compres- blog, 1(8):9.
sion with transformer.
J. Schmidhuber and S. Heil. 1996. Sequential neural
J. Cleary and I. Witten. 1984. Data compression using text compression. IEEE Transactions on Neural Net-
adaptive coding and partial string matching. IEEE works, 7(1):142–146.
Transactions on Communications, 32(4):396–402.
Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier
Grégoire Delétang, Anian Ruoss, Paul-Ambroise Martinet, Marie-Anne Lachaux, Timothée Lacroix,
Duquenne, Elliot Catt, Tim Genewein, Christo- Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal
pher Mattern, Jordi Grau-Moya, Li Kevin Wenliang, Azhar, et al. 2023a. Llama: Open and effi-
Matthew Aitchison, Laurent Orseau, Marcus Hut- cient foundation language models. arXiv preprint
ter, and Joel Veness. 2024. Language modeling is arXiv:2302.13971.
compression. Preprint, arXiv:2309.10668.
Hugo Touvron, Louis Martin, Kevin Stone, Peter Al-
Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and bert, Amjad Almahairi, Yasmine Babaei, Nikolay
Luke Zettlemoyer. 2023. Qlora: Efficient finetuning Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti
of quantized llms. Preprint, arXiv:2305.14314. Bhosale, et al. 2023b. Llama 2: Open foundation
and fine-tuned chat models, 2023. URL https://arxiv.
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, org/abs/2307.09288.
Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman,
Akhil Mathur, Alan Schelten, Amy Yang, Angela Chandra Shekhara Kaushik Valmeekam, Krishna
Fan, et al. 2024. The llama 3 herd of models. arXiv Narayanan, Dileep Kalathil, Jean-Francois Chamber-
preprint arXiv:2407.21783. land, and Srinivas Shakkottai. 2023. Llmzip: Loss-
less text compression using large language models.
Google. 2024. Brotli compression algorithm. Accessed: Preprint, arXiv:2306.04050.
2024-06-01.
Mohit Goyal, Kedar Tatwawadi, Shubham Chandak,
and Idoia Ochoa. 2018. Deepzip: Lossless data com-
pression using recurrent neural networks. Preprint,
arXiv:1811.08162.
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan
Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and
Weizhu Chen. 2021. Lora: Low-rank adaptation of
large language models. Preprint, arXiv:2106.09685.
Yuzhen Huang, Jinghan Zhang, Zifei Shan, and Junx-
ian He. 2024. Compression represents intelligence
linearly. Preprint, arXiv:2404.09937.
Jean-loup Gailly. 1992. Gzip. http://www.gzip.org.
Accessed: 2024-08-15.
Jean-loup Gailly. 2024. Zlib: A massively spiffy yet
delicately unobtrusive compression library. http:
//www.zlib.net. Accessed: 2024-08-15.
A Appendix tional compression techniques (Brotli, BZ2, PPM)
to create a benchmark for ourselves. Figure 3
A.1 Evaluating Traditional Compression
shows that Brotli, BZ2, and PPM perform con-
Methods
sistently across varying input file sizes and that
We first experimented with three traditional com- PPM performs the best on textual data, reaching a
pression methods - Brotli (Google, 2024), BZ2 compression ratio of approximately 0.25. Figure 4
(Julian Seward, 2024), and PPM (Cleary and Wit- measures the compression ratio when two compres-
ten, 1984) - for text compression as a function of sion techniques are stacked and serves as a more
increasing dataset size. We find that PPM performs accurate benchmark for FineZip as it also employs
best for text compression, and that the performance two step compression. Through these set of base-
remains relatively constant with respect to dataset line experiments, we can see that a compression
size. The results can be seen in Figure 4. ratio of 0.25 is the value to beat.

Figure 4: Evaluating Baseline Compression Techniques


Brotli, BZ2, and PPM on enwik8 Figure 6: Evaluating Stacked Compression with Brotli,
BZ2, and PPM on enwik8
We then use these algorithms to compress the
ranks generated by LLMs in FineZip. We see that
BZ2 has the best performance so we chose it as the A.3 Context Size
traditional compression method for FineZip. To determine the best context window size to use,
we ran experiments with the LLama2-7B base
model (LLMZip) and discovered that a larger con-
text size results in a better compression ratio. The
compression ratio began to plateau as the context
window reached 512 so we decided to use that for
all of our experimentation.

Figure 5: Testing Traditional Compression Techniques


Brotli, BZ2, and PPM on the ranks produced by com-
pressing enwik8 with LLama2-7B finetuned for 64
epochs with r=16
A.2 Double Compression Benchmark
Figure 7: Evaluating Best Context Window for Com-
Prior to testing FineZip, we compressed the en-
pression
wik8 (Marcus Hutter, 2006) dataset using tradi-
Figure 8: Compressing input files of size 1, 10, and 100
megabytes with LLama-3 8B finetuned for 256 epochs.

A.4 FineZip and Dataset Size


The previous experiments were only using a dataset
size of 10mb and for this to be a viable com-
pression technique, it has to scale well for much
smaller and larger file sizes. Figure 8 shows that
LLama-3 8B (Dubey et al., 2024) fine-tuned for
256 epochs maintains a consistent compression ra-
tio on dataset sizes of 1, 10, and 100mb. This
verifies that FineZip remains viable regardless of
dataset size.

You might also like