$ XZ - V Data - CSV
$ XZ - V Data - CSV
This command compresses the file data.csv and replaces it with the file data.csv.xz. The -v option
makes xz display progress information.
xz has the same compression levels 1-9 as gzip. The default compression level is 6. However, unlike
gzip, that default compression level isn’t usually a good compromise between speed and compression
ratio.
So, let’s compress a file with the minimum compression level 1:
$ xz -v1 data.csv
This decompresses the file data.csv.xz and replaces it with data.csv. Again, the -v option also displays
progress information.
That’s why we usually leverage the tar archiving utility in combination with xz to compress
multiple files or entire directories:
$ tar cJvf archive.tar.xz *.csv
Notably, we remove the J option because –use-compress-program already sets the compression
program.
Decompressing a tar archive with xz is also a single step and identical to gzip (except for the
different file extension):
$ tar xvf archive.tar.xz
If we decide to force multithreading, we can use more threads, such as the 3 in this example:
$ xz -vT3 data.csv
Unlike unpigz, decompression with xz doesn’t benefit from multithreading by default. If we want
to employ faster decompression, we’d have to use multithreaded compression as we did above.
Even then, more than two or three threads don’t usually present much improvement, if any.
6. Using Multithreading With tar
There are two main ways to use multithreading with tar and xz.
Previously, we specified the compression level with the –use-compress-program option. Now, we
enable multithreading through the same –use-compress-program option by setting the number of
threads with the command-line options.
Here, we again use one thread for every CPU core:
$ tar cvf archive.tar.xz --use-compress-program='xz -1T0' *.csv
While decompression with xz doesn’t benefit from multithreading by default, we can still use the same
options:
$ tar xvf archive.tar.xz --use-compress-program='xz -dT3'
Another way to set the options for xz is to use the XZ_* environment variables that tar is aware of:
• XZ_DEFAULTS: sets the default options for xz globally
• XZ_OPT is usually for passing options to the tool when run by another executable
So, in general, we use XZ_DEFAULTS in a .bashrc or similar initialization script, while XZ_OPT
generally helps in specific sessions or local scripts.
Let’s see the compression example from earlier with XZ_OPT:
$ XZ_OPT='-T0 -1' tar cJvf archive.tar.xz *.csv
Notably, we shouldn’t expect much improvement in either case due to the general way the
algorithm works when decompressing.