0% found this document useful (0 votes)
15 views4 pages

$ XZ - V Data - CSV

The document explains how to use xz for compressing single files and multiple files using tar, highlighting the command syntax and options available for both compression and decompression. It also discusses the benefits of multithreading in xz for faster compression and provides examples of using environment variables to set options. Additionally, it compares the compression effectiveness of xz against gzip and pigz, demonstrating that xz can create smaller archives at the cost of increased compression time.

Uploaded by

Paulo Almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views4 pages

$ XZ - V Data - CSV

The document explains how to use xz for compressing single files and multiple files using tar, highlighting the command syntax and options available for both compression and decompression. It also discusses the benefits of multithreading in xz for faster compression and provides examples of using environment variables to set options. Additionally, it compares the compression effectiveness of xz against gzip and pigz, demonstrating that xz can create smaller archives at the cost of increased compression time.

Uploaded by

Paulo Almeida
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3.

Using xz for Single Files

Let’s use xz to compress a single file.


Apart from the program name, the usage is identical to that of gzip:
$ xz -v data.csv

This command compresses the file data.csv and replaces it with the file data.csv.xz. The -v option
makes xz display progress information.
xz has the same compression levels 1-9 as gzip. The default compression level is 6. However, unlike
gzip, that default compression level isn’t usually a good compromise between speed and compression
ratio.
So, let’s compress a file with the minimum compression level 1:
$ xz -v1 data.csv

Unlike gzip, there’s no separate program for decompressing a file.


Instead, we use the -d option to decompress a single file:
$ xz -dv data.csv.xz

This decompresses the file data.csv.xz and replaces it with data.csv. Again, the -v option also displays
progress information.

4. Using tar With xz for Multiple Files and Directories

Just like with gzip, xz can only compress a single file.

4.1. Compress Many Filesystem Objects

That’s why we usually leverage the tar archiving utility in combination with xz to compress
multiple files or entire directories:
$ tar cJvf archive.tar.xz *.csv

Let’s break down this command:


• f archive.tar.xz: resulting archive name
• *.csv: compress all files with a csv extension in the current directory
• J: sets the compression algorithm to xz
• v: verbosity makes tar show each added and compressed file
Notably, unlike xz and gzip, tar doesn’t delete the input files after it creates the archive.
Which xz compression level does tar pick? It depends on the version of tar, but it’s usually the default
compression level 6.
Still, tar enables setting the compression program through the –use-compress-program option. We
use this option to set the compression level since it accepts command-line arguments. Here, we specify
the minimum compression level 1:
$ tar cvf archive.tar.xz --use-compress-program='xz -1' *.csv

Notably, we remove the J option because –use-compress-program already sets the compression
program.

4.2. Decompress Archive

Decompressing a tar archive with xz is also a single step and identical to gzip (except for the
different file extension):
$ tar xvf archive.tar.xz

Again, let’s see what each option does:


• f archive.tar.xz: archive for extraction
• x: extract (decompress)
• v: verbosity makes tar show each extracted file
Again, the archive isn’t deleted after the operation. Notably, we don’t have to tell tar to decompress
with xz as tar does this automatically by inspecting the file and detecting the xz compression.

5. Faster Compression With Multithreading

Unlike gzip, xz supports multithreading directly, which speeds up compression.


By default, xz uses just a single thread. We can specify the number of threads with the -T option. A
value of 0 tells xz to use one thread for every available CPU core. That’s generally a good default
value to use:
$ xz -vT0 data.csv

If we decide to force multithreading, we can use more threads, such as the 3 in this example:
$ xz -vT3 data.csv

Unlike unpigz, decompression with xz doesn’t benefit from multithreading by default. If we want
to employ faster decompression, we’d have to use multithreaded compression as we did above.
Even then, more than two or three threads don’t usually present much improvement, if any.
6. Using Multithreading With tar

There are two main ways to use multithreading with tar and xz.

6.1. The –use-compress-program Option

Previously, we specified the compression level with the –use-compress-program option. Now, we
enable multithreading through the same –use-compress-program option by setting the number of
threads with the command-line options.
Here, we again use one thread for every CPU core:
$ tar cvf archive.tar.xz --use-compress-program='xz -1T0' *.csv

While decompression with xz doesn’t benefit from multithreading by default, we can still use the same
options:
$ tar xvf archive.tar.xz --use-compress-program='xz -dT3'

Thus, we again use -d with a specific thread count (3).

6.2. Environment Variables

Another way to set the options for xz is to use the XZ_* environment variables that tar is aware of:
• XZ_DEFAULTS: sets the default options for xz globally
• XZ_OPT is usually for passing options to the tool when run by another executable
So, in general, we use XZ_DEFAULTS in a .bashrc or similar initialization script, while XZ_OPT
generally helps in specific sessions or local scripts.
Let’s see the compression example from earlier with XZ_OPT:
$ XZ_OPT='-T0 -1' tar cJvf archive.tar.xz *.csv

Similarly, we can perform a decompression:


$ XZ_OPT='-d -T0' tar xJvf archive.tar.xz

Notably, we shouldn’t expect much improvement in either case due to the general way the
algorithm works when decompressing.

6.3. Decompression Considerations


Since version 5.4.1, xz provides support for parallel decompression with -T0. Yet, TAR files require a
sequential read. Because of this, the process might need to preread a number of blocks. To do this, xz
expects the archive to be compressed with the multithreading option.
Because of this, if multithreading is a must, we usually turn to algorithms like Zstd.

7. Testing Archive Sizes With xz

As we already noted, xz usually creates smaller archives than gzip.


To test this claim, we used the same 818 MB CSV file, and the same computer with six CPU cores and
hyperthreading. This is the same setup we used to test gzip in Linux.
We compared xz to pigz, a gzip implementation that uses multithreading for faster compression and
decompression:
• both archiving tools saturated the CPU: pigz does this by default, xz because of the -T0 option
• at compression level 7 out of 9, pigz compressed the 818 MB CSV file down to 95 MB in 4
seconds: higher compression levels didn’t produce meaningfully smaller archives
• at compression level 1 out of 9, xz compressed the 818 MB CSV file down to 48 MB in 4
seconds: 49% smaller result that pigz
With compression level 5, xz produced the smallest archive at 29 MB, which is 69% smaller than
pigz with the same setup. However, xz took nearly 18 times as long at 70 seconds. Compression levels
six and beyond hugely increased the compression time for a negligible 1% reduction in archive size.
So, we’ve demonstrated that xz does indeed create much smaller archives than gzip, sometimes at the
price of time.

You might also like