TE EXTC Data Compression & Cryptography Sem - V
Module - 1
Introduction to Data Compression
==================================================================
1.1 Data Compression, Modelling and Coding
What is Data Compression?
Data compression is the process of encoding information using fewer bits than the original representation.
The goal is to reduce redundancy in data representation to save storage or transmission resources.
Types of Data Compression
1. Lossless Compression:
o No loss of information.
o Original data can be perfectly reconstructed.
o Used in text, executable files, medical images, etc.
o Examples: Huffman Coding, Arithmetic Coding, LZ family (LZ-77, LZW).
2. Lossy Compression:
o Some loss of data is acceptable.
o Better compression ratios.
o Used in audio, video, and image compression.
o Examples: JPEG, MP3, MPEG
Modelling and Coding
Modelling: Predicts the probability of the next symbol.
Coding: Assigns binary codes based on the predicted probability.
For example, more frequent symbols are assigned shorter codes in Huffman coding.
Arithmetic Coding
Unlike Huffman coding, which assigns fixed-length codes to symbols, arithmetic coding represents
an entire message as a single fractional number between 0 and 1.
The range is divided based on symbol probabilities, and as symbols are processed, the range narrows.
Steps:
1. Calculate cumulative probability ranges.
2. Narrow down the range for each symbol.
3. Final range represents the message.
4. Encode: Select any number within this range.
5. Decode: Use the number to backtrack the original symbols.
Advantages:
Efficient for sources with skewed probabilities.
No need to round to the nearest bit.
Dictionary-Based Compression
LZ-77 (Sliding Window Compression)
Uses a sliding window over the input data to find matches to past data.
Replaces repeated occurrences with a reference (distance, length, next symbol).
LZ-78
Builds a dictionary of phrases seen in the input.
Each new phrase is added with an index and a character.
LZW (Lempel-Ziv-Welch)
Popular variant of LZ-78.
Starts with an initial dictionary of single characters.
Adds new sequences dynamically.
Used in formats like GIF, TIFF.
By Gauri Joshi VPM’s MPCOE, Velneshwar Page 1
TE EXTC Data Compression & Cryptography Sem - V
Example (LZW):
Input: ABABABA
Output (codes): 65 66 256 258…
🔹 1.2 Image Compression Techniques
DCT (Discrete Cosine Transform)
Converts spatial data into frequency components.
Used in JPEG.
Energy compaction property: Most image energy is concentrated in few low-frequency components.
JPEG (Joint Photographic Experts Group)
Steps:
1. Divide image into 8×8 blocks.
2. Apply DCT to each block.
3. Quantize the coefficients (lossy step).
4. Encode with run-length and entropy coding (Huffman).
Balances compression and quality.
JPEG-LS (Lossless)
Predictive technique followed by context modeling.
No quantization step.
Suitable for medical and archival images.
Differential Lossless Compression – DPCM
DPCM (Differential Pulse Code Modulation)
Predicts current pixel based on neighbors.
Stores the difference (residual) between actual and predicted value.
Works well when pixel values are correlated.
Equation:
e(n) = x(n) – x̂ (n)
Where:
x(n): actual sample
x̂ (n): predicted sample
e(n): prediction error
JPEG-2000
Wavelet-based image compression standard.
Supports both lossless and lossy compression.
Features:
By Gauri Joshi VPM’s MPCOE, Velneshwar Page 2
TE EXTC Data Compression & Cryptography Sem - V
o Progressive transmission
o Region-of-interest coding
o Better compression efficiency than JPEG
Uses DWT (Discrete Wavelet Transform) instead of DCT.
Summary
Technique Type Key Feature
Huffman Lossless Variable-length prefix codes
Arithmetic Lossless Encodes entire message into a number
LZ-77, LZ-78, LZW Lossless Dictionary-based encoding
DCT Lossy Converts to frequency domain
JPEG Lossy Block DCT + quantization
JPEG-LS Lossless Predictive + entropy coding
DPCM Lossless Predicts and encodes residual
JPEG-2000 Both Wavelet transform-based
By Gauri Joshi VPM’s MPCOE, Velneshwar Page 3