File System Basics File Compression Archiving and Backup
File System Basics File Compression Archiving and Backup
File System Basics File Compression Archiving and Backup
Lossy compression
With lossy compression some of the original data is discarded and the file
cannot be reconstructed to its original form. This form of compression is
often used on images and sound files, for example jpeg and mp3 formats.
For image files the number of colors might be reduced and for sound files
the number of samples. This type of compression can result in loss of
quality.
Lossless compression
With lossless compression methods the original file can be reconstructed
from the compressed file. In a nutshell, these methods recognize repeated
patterns, replace them with a smaller placeholder and then keep the actual
pattern in one place to be later substituted when the file is uncompressed.
The name gzip refers to both the tool that does the compression and the file
format of the resulting file. This tool has been around since 1992 and is still
being maintained. The compression algorithm used by gzip is known as
DEFLATE. The two main advantages of DEFLATE are speed, and memory
efficiency, during the compression process. It only compresses a single file
and the resulting file is given a .gz file extension.
man gzip
gzip Lesmiserables.txt
After gzip (notice the file size is now 1290580, much smaller!):
2. Get information about the compression using the -l option. It tells you
about the compressed size, the uncompressed size and the ratio of the
two. The file size was reduced by 61.7 percent.
gzip -l Lesmiserables.txt.gz
gzip -d Lesmiserables.txt.gz
4. Now take a look at the file and see that it is once again the exact same
size as before being compressed.
ls -l Lesmiserables.txt
The bzip2 compression tool
The bzip2 compression tool was first introduced in 1996. The algorithm
used by bzip2 is called Burrows–Wheeler transform (BWT), another name
for it is block-sorting compression. Compared to the gzip compression
utility, bzip2 outputs a smaller compressed file but due to the complexity of
the algorithm, it takes much longer for the process to complete. It also
requires more memory during the compression process than gzip. It only
compresses a single file and the resulting file is given a .bz2 file extension.
Try bzip2:
bzip2 Lesmiserables.txt
ls -l Lesmiserables.txt.bz2
The bzip2 results in a smaller file but the time difference for the
compression as compared to gzip is insignificant with a file of this size.
bzip2 -d Lesmiserables.txt.bz2
ls -l Lesmiserables.txt
man bzip2
The xz compression tool is the newest of the tools on this page, it was first
released in 2009. The algorithm used by xz is called LZMA2, it has a greater
compression ratio than the two previous tools. Like bzip2 the greater
compression ability comes at the expense of the speed of the process. In
some cases it can take 4-5 times longer than bzip2. It only compresses a
single file and the resulting file is given a .xz file extension.
You can view statistics about the compressed file using the -l option:
xz -l Lesmiserables.txt.xz
xz -d Lesmiserables.txt.xz
You can view more information about the xz command on the man page for
it:
man xz
Archiving
Archiving is the process of combining multiple files or directories into one
single file. Archiving is useful when you are backing data up or sharing it.
For example, if you have multiple files you want to send to someone it is
more efficient to turn them into a single archive, send them one file and
they can extract the files when they receive it.
Flag Function
-c Create an archive
-f Use an archive file
-r Append to an archive
-t List contents of an archive
-v Verbose output
-x Extract contents of an archive
-z Compress the archive using gzip
man tar
The directory images on the left contains 53 files. If you wanted to send
them to someone in an email, attaching all the files would be cumbersome.
Instead you can use tar to create an archive file and send that.
info
You can also create an archive and zip it with one command:
You can extract a single file - first we will rename the images directory
because by default it extracts to the same hierarchy
mv images imagesbak
tar -xvf images.tar images/concat.png
ls -l images
mkdir extracted
tar -zxvf images.tar.gz -C extracted
man cpio
cd images
ls | cpio -o > imagedir.cpio
ls -l imagedir.cpio
cd ..
zip -r images.zip imagesbak
ls -l
You’ll notice that the .zip file is smaller than the .tar file, that’s because
zip compresses by default.
unzip images.zip
Use man zip and man unzip for more information about these commands.
Backup
The dd command
The dd command can be used to convert and copy files. You might use it to
create a bootable usb version of your Linux operating system or to backup
files. It uses a different command line syntax than most other Linux
commands. Rather than -option the syntax looks like this - -option=value.
man dd