Compression Techniques
Compression Techniques
Compression Techniques
What-
Process of reducing size of data files without significantly affecting their quality/
integrity.
Why-
1. Reduced Storage requirements
2. Faster data transmission
3. Cost efficiency
4. Reduced traffic in the network
5. Faster processing
Types-
Lossy Lossless
Lossless Algorithms-
Working-
RLE works by examining the input data and identifying consecutive occurrences of the
same symbol.
It then replaces these sequences with a single symbol followed by the count of how
many times it appears.
Suitability-
It is best suitable for simple images and animations having a lot of redundant pixels.
It's useful for black and white images in particular. However, it may not be as effective
for data with minimal repetition, as the encoding could potentially result in a longer
string than the original data.
Advantages-
● Lossless compression
● Easy to implement, minimal computational resources
● Effective for large redundant data
Drawbacks-
● Ineffective for complex data
2. Huffman Encoding-
Working-
Assigns variable-length codes to input characters, with more frequent characters
having shorter codes and less frequent characters having longer codes.
These are called Prefix Codes(bit sequences) and are assigned in such a way that the
code assigned to one character is not the prefix of code assigned to any other
character. This is how Huffman Coding makes sure that there is no ambiguity while
decoding the generated bitstream.
character Frequency
a 5
b 9
c 12
d 13
e 16
f 45
Step 2- Building the Huffman Tree: A Huffman tree is constructed using a priority
queue or a min-heap data structure.
character Frequency
c 12
d 13
Internal Node 14
e 16
f 45
character Frequency
Internal Node 14
e 16
Internal Node 25
f 45
character Frequency
Internal Node 25
Internal Node 30
f 45
character Frequency
f 45
Internal Node 55
character Frequency
Internal Node 100
f 0
c 100
d 101
a 1100
b 1101
e 111
Time Complexity-O(nlogn) where n is the number of unique characters.
Space Complexity-O(n)
Advantages-
● Lossless compression
● No ambiguity in decoding
● Efficient for data with varying frequencies of characters
Drawbacks-
● Encoding overhead
3. Lempel–Ziv–Welch-
Working-
As input data is being processed, a dictionary keeps a correspondence between the
longest encountered words and a list of code values. The words are replaced by their
corresponding codes and so the input file is compressed. Therefore, the efficiency of the
algorithm increases as the number of long, repetitive words in the input data increases.
Space Complexity-O(N)
Advantages-
● Lossless compression
● LZW requires no prior information about the input data stream.
● LZW can compress the input stream in one single pass.
● LZW can achieve high compression ratios
Drawbacks-
● Slower compression
Lossy Algorithms-
2. Rounding Values:
Each original value is rounded or mapped to the nearest value in the reduced set of
levels. This rounding process reduces the precision of the data, potentially leading to a
loss of detail or accuracy.
3. Loss of Information:
Since the original data is approximated or simplified during quantization, there is
typically a loss of some information or fine details. This loss can affect the quality of
the reconstructed data, especially in the case of highly detailed or complex signals.
For example consider quantization of a grayscale image. Suppose we have a pixel
intensity range of values from 0 to 255, representing shades of gray. To reduce the
precision and represent this range with only 8 levels, we perform quantization as
follows:
Divide the range (0 to 255) into 8 intervals: [0-31], [32-63], [64-95], [96-127], [128-
159], [160-191], [192-223], [224-255].
Round each pixel intensity value to the center of the corresponding interval. For
example, a pixel value of 100 would be quantized to 111, as it falls within the interval
[96-127].