0% found this document useful (0 votes)
14 views11 pages

Compression Techniques

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 11

Compression Techniques

What-
Process of reducing size of data files without significantly affecting their quality/
integrity.

Why-
1. Reduced Storage requirements
2. Faster data transmission
3. Cost efficiency
4. Reduced traffic in the network
5. Faster processing

Types-

Lossy Lossless

Data integrity Some data is permanently Original data can be


removed from the original data to completely reconstructed
achieve higher compression ratios from compressed data
without any loss of
information

Suitability: For multimedia data( images, For text files, program


audio, video) where a certain files, databases where
degree of imperceptible loss is preserving every bit of the
acceptable in exchange for original information is
significant file size reduction crucial.

Advantage: ● Relatively quick ● Maintains original


● Reduces file size data
dramatically ● Offers lower
● User can select compression compression ratios
level

Algorithms: ● Discrete Cosine Transform ● Run Length


● Wavelet Transform Encoding
● Quantization ● Huffman Encoding
● Lempel–Ziv–Welch

Lossless Algorithms-

1. Run Length Encoding-

Working-
RLE works by examining the input data and identifying consecutive occurrences of the
same symbol.
It then replaces these sequences with a single symbol followed by the count of how
many times it appears.
Suitability-
It is best suitable for simple images and animations having a lot of redundant pixels.
It's useful for black and white images in particular. However, it may not be as effective
for data with minimal repetition, as the encoding could potentially result in a longer
string than the original data.

Advantages-
● Lossless compression
● Easy to implement, minimal computational resources
● Effective for large redundant data

Drawbacks-
● Ineffective for complex data

2. Huffman Encoding-

Working-
Assigns variable-length codes to input characters, with more frequent characters
having shorter codes and less frequent characters having longer codes.
These are called Prefix Codes(bit sequences) and are assigned in such a way that the
code assigned to one character is not the prefix of code assigned to any other
character. This is how Huffman Coding makes sure that there is no ambiguity while
decoding the generated bitstream.

Step 1- Frequency Analysis:

character Frequency

a 5

b 9

c 12

d 13

e 16

f 45

Step 2- Building the Huffman Tree: A Huffman tree is constructed using a priority
queue or a min-heap data structure.

character Frequency

c 12
d 13

Internal Node 14

e 16

f 45

character Frequency
Internal Node 14
e 16
Internal Node 25
f 45

character Frequency
Internal Node 25
Internal Node 30
f 45

character Frequency
f 45
Internal Node 55

character Frequency
Internal Node 100

Step 3- Assigning Codes:


character code-word

f 0

c 100

d 101

a 1100

b 1101

e 111
Time Complexity-O(nlogn) where n is the number of unique characters.

Space Complexity-O(n)

Advantages-
● Lossless compression
● No ambiguity in decoding
● Efficient for data with varying frequencies of characters

Drawbacks-
● Encoding overhead

3. Lempel–Ziv–Welch-
Working-
As input data is being processed, a dictionary keeps a correspondence between the
longest encountered words and a list of code values. The words are replaced by their
corresponding codes and so the input file is compressed. Therefore, the efficiency of the
algorithm increases as the number of long, repetitive words in the input data increases.

Time Complexity-O(n) where n is the length of the input data.

Space Complexity-O(N)

Advantages-
● Lossless compression
● LZW requires no prior information about the input data stream.
● LZW can compress the input stream in one single pass.
● LZW can achieve high compression ratios
Drawbacks-
● Slower compression

Lossy Algorithms-

1. Discrete Cosine Transform-


Breaking the Signal into Blocks:
Input signal (eg. image) is divided into small, square blocks of pixels. Each block
is treated as a 2D matrix of pixels

Transforming the Blocks:


DCT calculates the weighted sum of cosine functions of varying frequencies that
oscillate across the block. These functions capture the changes in intensity
across the block and represent the image data in terms of its frequency
components.

Separating High and Low Frequencies:


The resulting DCT coefficients represent the contributions of different
frequencies to the original signal. Lower-frequency components tend to capture
the overall structure of the image, while the higher-frequency components
capture the details and edges.
2. Quantization-

1. Breaking Values into Intervals:


Quantization involves dividing the range of continuous values into distinct intervals or
levels.

2. Rounding Values:
Each original value is rounded or mapped to the nearest value in the reduced set of
levels. This rounding process reduces the precision of the data, potentially leading to a
loss of detail or accuracy.

3. Loss of Information:
Since the original data is approximated or simplified during quantization, there is
typically a loss of some information or fine details. This loss can affect the quality of
the reconstructed data, especially in the case of highly detailed or complex signals.
For example consider quantization of a grayscale image. Suppose we have a pixel
intensity range of values from 0 to 255, representing shades of gray. To reduce the
precision and represent this range with only 8 levels, we perform quantization as
follows:
Divide the range (0 to 255) into 8 intervals: [0-31], [32-63], [64-95], [96-127], [128-
159], [160-191], [192-223], [224-255].
Round each pixel intensity value to the center of the corresponding interval. For
example, a pixel value of 100 would be quantized to 111, as it falls within the interval
[96-127].

You might also like