Data Compression
Data Compression
T h e De r b y Un i v e r s i t y
Sc h o o l s o f Art , De s i g n & Te c h n o l o g y
Di v i s i o n o f El e c t ro n i c s an d So u n d
[9|12|2010]
LOSSLESS COMPRESSION
LOSSY COMPRESSION
3
3
COMPRESSION TECHNIQUES
HUFMANN CODING
LEMPEL-ZIV-WELCH COMPRESSION
CONCLUSION
3
6
7
Table 1 ........................................................................................................................................ 4
Table 2 ........................................................................................................................................ 4
Table 3 ........................................................................................................................................ 5
Table 4 ........................................................................................................................................ 6
Table 5 ........................................................................................................................................ 6
Figure 1 ...................................................................................................................................... 5
Connection Speed
Modem (56Kb)
ADSL (128Kb)
ADSL-1 (8Mb)
ADSL-2 (12Mb)
Actual speed
7KB
16KB
1MB
1.5MB
Transfer Time
4 hours
1.7 hours
1.7 minutes
1.1 minutes
When it comes to data storage for personal or business use, data compression can make
huge difference when archiving data, compression can reduce cost and space.
There are two type of data compression; lossless and lossy, both of these can be utilize in
many different applications.
Lossless compression
This compression is used as 2:1 ratio, lossless compression is important when dealing with
critical information, like documents, applications and executable programs, for this reason,
after the decompression, file must be identical to original.
A disadvantage of this compression can result in little difference in size when compressed,
or even become twice the size of original file and compression ratio is very low.
Lossy compression
This compression is used when some loss of data can be discarded without detrimental
effect on final result. Lossy compression is manly used for; audio, video and pictures where
loss of some information is insignificant. An advantage of this method is higher compression
ratio and smaller file size and disadvantage is great amount of redundant data is lost, so this
is compression is only suitable for particular applications.
Compression techniques
Hufmann coding
The idea of understanding what information is, and how it can be utilised for information
and technology networks. Since nineteen-fifty-two, Dr. Hufmann was at the forefront of
development of data compression, he understood the importance of information and how it
can be utilised.
Huffmans concept, the total sum of any information is one, from this he came up with a
method referred to as Hufmann coding. This method works on probability frequency by
assigning a value to each ASCII symbol by the amount it repeats see Fig (1), foundation of
this method is based on Huffman algorithms. This method is lossless data compression; it
generates the least amount of program code.
The Huffman coding is widely incorporated into popular software, capable of running on
multi-platform systems, still present today.
An example is given to better understand this coding method.
Example:
This example will only work on a very small scale to show how this method works. A word
allarit is selected for Huffman coding.
A standard ASCII table-1 is used to show the value of each letter with its corresponding ASCII
to binary equivalent.
TABLE 1
Character
a
l
r
i
t
ASCII
097
108
114
105
116
Binary
01100001
01101100
01110010
01101001
01110100
The standard ASCII byte value of each letter is eight bits. This totals to 56 bits to make up
the word allarit.
The word allarit in term of bits without data compression is:
(01100001011011000110110001100001011100100110100101110100)
Character
Frequency
Probability
Value in bits
2/7 = 0.285
8*2= 16bits
l
r
i
t
2
1
1
1
2/7 = 0.285
1/7 = 0.142
1/7 = 0.142
1/7 = 0.142
8*2= 16bits
8bits
8bits
8bits
The third step is being able to draw Hufmann tree, to do this, all the letters should be
arranged in descending order with respect to its probability frequency; (a, l, i, r, t).
First lowest assigned value letters are branch into a new node with sum value of previous
two nodes.
The fourth step is to assign binary value (1) and (0) to each branch. For every left branch put
zero (0) and for every right branch put (1) see Fig (1) until reaching consequent letter.
FIGURE 1
TABLE 3
Character
a
l
i
r
t
A maximum bit value of Huffman binary code Table-3 equals three bits. Total amount equals
14 bits as compared to 56 bits originally, which makes it 25 percent smaller than
uncompressed value.
Lempel-Ziv-Welch compression
Abraham Lempel and Jacob Ziv was first to introduce Lempel-Ziv data compression (LZ77
and LZ78). An over a decade later, Terry A. Welch published an improved version LempelZiv-Welch (LZW) in 1984, a derivative of LZ78. Lempel-Ziv-Welch (LZW) is a lossless data
compression and it is used for various data format e.g. GIF, TIFF.
Terry Welch implemented an algorithm to produce an identical dictionary to encode and
decode data compression. The LZW algorithm comes encoded with preset symbol dictionary
ranging from 0256 individual bytes. As the data stream is processed, preset dictionary is
expended from 2564095 with new redundancy strings, referred to as substrings, value of
each byte is 12 bits.
The LZW creates a table of, strings of highly redundant data and assigns a string an index ID,
if the string of data repeats; it will be replaced by substring index ID to reduce the size of
initial data.
Example:
For a demonstration purpose if a file contains, a string (ALLARIT), which have been assigned
a substring index ID see Table-4 below, and now by replacing every string (ALLARIT) with
substring ID, would reduce the file size by great deal, higher the substring redundancy
would result in higher compression ratio,
TABLE 4
String
String ID
ALLARIT
275
Below is another example, illustrates see Table-5 how data is encoded using LZW method:
TABLE 5
This processes could generate a huge library if it had lots of repetitive data but due to
limitation of 4 Kbyte library size implemented by Terry Welch, ones the library is full no
future entries can be added. The LZW lossless compression is great benefit for English text
encoding, file size can be greatly reduced by more than fifty percent.
Decoding takes place in a same way as encoding process. The LZW algorithm does not
compare string table for decompression process, instead it is possible to convert the output
stream of the compression algorithm to input stream as data and rebuild the table, which
would be identical to compression table, this method is simple as well as increases the
speed of encoding and decoding process.
Conclusion
Information processing evolution, have resulted in great discoveries and inventions. At
present there is information overload, due to new technological advancement of
multimedia sharing, online gaming, virtual-reality and interaction of people around the
world, regardless of time.
Data compressions have made big difference, when it comes to storing or transferring
information. There is no preference between lossless or lossy compressions both do the job
well.
Lossless compression is important, when archiving critical information but lacks
compression ratio. Lossy compression have greater compression ratio but lacks reliability.
The Huffman method is very efficient compression for documents and program files; total
reduction in file size 20 to 30 percent. It produces, least amount of line code, uses least
memory and compression/decompression is fast. The disadvantage of this method is been
not able to change much because of data integrity.
The Lempel-Ziv-Welch compression can be used for lossless and lossy compression, it is
ideal for text and graphical information, where higher compression ratio is required. It
works best with files which have lost of repetition like a text and monochrome image. The
LZW have fast compression. Disadvantage of this compression, a files size can double if
there is no repetition.
Industrial revolution have changed the way we share information but it still lacks something
bit better then what we have. Perhaps soon, we might witness something more spectacle
discovery of data compression, which over comes all the issue we face with current
methods of compression, only the time will tell what future holds for human kind.