HTCS501 Unit 4
HTCS501 Unit 4
Ansh Kasaudhan
2024-2025
—
Cyber Security
—
2
INTRODUCTION
PAGE 2
Syllabus
Unit 4
Data Compression
1. Introduction to Data Compression
Data compression is a technique used to reduce the size of digital data, making it easier to store and transmit
efficiently. It involves transforming data into a compact format while maintaining its essential information.
Compression techniques are widely used in multimedia (images, audio, and video), networking (file
transfers, web pages), and storage systems (databases, cloud storage).
Key Aspects:
a) Storage Optimization
b) Bandwidth Efficiency
c) Cost Reduction
d) Improved Performance
• Compression improves backup efficiency, making it easier to store and retrieve data.
• Some encryption techniques use compressed data to improve security.
Data compression is based on the principle of representing information in a more efficient way by
eliminating redundancy. The core fundamentals include redundancy removal, entropy, lossless and lossy
compression, and compression ratio.
1. Redundancy in Data
Redundancy refers to repetitive or unnecessary information present in data. Removing this redundancy helps
in compressing data without losing essential information.
Types of Redundancy
a) Spatial Redundancy
b) Temporal Redundancy
c) Statistical Redundancy
• Occurs when some symbols in a dataset appear more frequently than others.
• Used in Huffman Coding and Arithmetic Coding to assign shorter codes to more frequent
symbols.
• Example: In English text, letters like "e" and "t" appear more often, so they get shorter codes in
Huffman compression.
d) Coding Redundancy
• Some details in images, audio, and video are imperceptible to the human eye or ear.
• Lossy compression removes these unnoticeable details.
• Example: MP3 removes frequencies that humans cannot hear, and JPEG removes high-frequency
details not easily noticed.
• It represents the minimum number of bits needed to encode a message without loss.
• A dataset with high entropy has less redundancy and is harder to compress.
• A dataset with low entropy has more predictable patterns, making it easier to compress.
H = − ∑ pi log2 pi
where:
4. Compression Ratio
Compression ratio measures how effectively data is compressed.
b) Example Calculation
Compression Ratio=10/2=5:1
• Assigns shorter codes to more frequent symbols and longer codes to less frequent ones.
• Reduces average code length, leading to better compression.
• Used in Huffman Coding and Shannon-Fano Coding.
Example:
Here, more frequent symbols get shorter codes, reducing overall file size.
b) Entropy Coding
Key Properties:
• A = [0, 0.6]
• B = [0.6, 1.0]
Then, encoding "AB" results in a single number between 0.36 - 0.42.
c) Dictionary-Based Coding
• Instead of encoding individual symbols, common sequences (patterns) are stored in a dictionary
and referenced by shorter codes.
• Efficient for large text-based data with repeated patterns.
• Used in Lempel-Ziv (LZ77, LZW), DEFLATE (ZIP, PNG compression).
• Compresses sequences of repeated symbols by storing the symbol and its count.
• Best suited for images with large uniform regions (e.g., black-and-white images, simple text).
Application:
e) Transform Coding
b) Prefix Codes
• No code word is a prefix of another code word, preventing ambiguity during decoding.
• Huffman Coding is a prefix code.
Example:
Symbol Code
A 0
B 10
C 110
D 111
• Compression efficiency depends on how closely the code length matches the entropy of the
source.
• Formula for Code Efficiency: Efficiency=Entropy/ Average Code Length.
o Higher efficiency means better compression.
b) Image Compression
c) Audio Compression
d) Video Compression
• One of the most technical communication models, focusing on how messages are transmitted
over channels and affected by noise.
• Components:
Sender → Encoder → Channel (with noise) → Decoder → Receiver
• Introduces "noise", which refers to any distortion or interference that disrupts the communication
process.
• Example: A phone conversation with poor signal quality, where words get distorted due to
background noise.
• Describes communication as a circular process where the sender and receiver continuously switch
roles.
• Key Concept: Both sender and receiver encode, decode, and interpret messages dynamically.
• Example: A WhatsApp conversation where two friends are alternatively sending and receiving
messages, making communication more interactive.
• Introduces the concept of "Gatekeeping", where a third party (like a journalist or editor) filters,
modifies, or controls the message before it reaches the audience.
• This model is particularly relevant to mass media, news reporting, and social media algorithms.
• Example: A news editor selecting which political stories to publish and which to leave out, shaping
public perception.
• Emphasizes that both sender and receiver participate actively, sending and receiving messages at
the same time.
• Introduces verbal and non-verbal communication (gestures, facial expressions, tone, etc.).
• Example: A live debate where speakers interrupt, respond, and react in real time, shaping the
conversation dynamically.
• Describes communication as an evolving and continuous process, much like a spiral (helix).
• Messages build on past experiences and previous interactions, influencing how future
communication occurs.
• Example: A teacher gradually improving their communication with students over a semester based
on past feedback and interactions.
Example Calculation
This means the compressed file is 5 times smaller than the original file, or 80% of the original data was
removed while retaining useful information.
2. Lossy Compression
Data compression is essential for optimizing storage, transmission, and processing efficiency. To ensure
effective compression, several key requirements must be met. These requirements vary depending on
whether the compression method is lossless or lossy and the specific application.
1. Efficiency in Compression
• The compression algorithm should significantly reduce file size while maintaining acceptable
quality.
• The compression ratio should be high, especially for applications where storage or bandwidth is
limited.
• Example: ZIP compression reduces text file size effectively while maintaining data integrity.
Data compression is the process of reducing the size of a file or data set to optimize storage and
transmission. It is broadly classified into two main categories:
Each category has different techniques suited for specific applications, such as text, images, audio, and
video.
1. Lossless Compression
Lossless compression preserves all the original data, ensuring that the decompressed data is identical to
the original. This is essential for applications where data integrity is crucial, such as text files, software, and
medical imaging.
Key Features
✔ No loss of information.
✔ Lower compression ratio (typically 2:1 to 5:1).
✔ Used in text, executable files, and high-precision data.
b) Huffman Coding
• Assigns shorter binary codes to frequently used symbols and longer codes to rare symbols.
• Used in ZIP files and PNG images.
d) Arithmetic Coding
Lossy compression removes some data to achieve much higher compression ratios, often at the cost of
quality. It is commonly used in multimedia applications, such as images, audio, and video, where slight
data loss is acceptable.
Key Features
c) Predictive Coding
• Predicts the next data value and stores only the difference.
• Used in video compression (H.264, H.265).
d) Perceptual Coding
e) Fractal Compression
3. Hybrid Compression
Some modern techniques combine lossless and lossy methods for better efficiency.
• JPEG uses lossy compression but applies lossless Huffman coding for final encoding.
• H.264 video compression uses lossy quantization but lossless entropy coding (CABAC).