0% found this document useful (0 votes)
93 views

3.source Coding Data Compression

The document discusses source coding and data compression techniques. It provides an overview of information theory concepts like entropy, Shannon's source coding theorem, and components of a communication system. Specific compression algorithms covered include Run Length Encoding, Huffman coding, Shannon-Fano coding, LZW coding, and JPEG. Error detection, correction, cryptography, steganography, and modulation techniques are also summarized.

Uploaded by

Zubair Minhas
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views

3.source Coding Data Compression

The document discusses source coding and data compression techniques. It provides an overview of information theory concepts like entropy, Shannon's source coding theorem, and components of a communication system. Specific compression algorithms covered include Run Length Encoding, Huffman coding, Shannon-Fano coding, LZW coding, and JPEG. Error detection, correction, cryptography, steganography, and modulation techniques are also summarized.

Uploaded by

Zubair Minhas
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 25

NUST

NUST School
School of
of Electrical
Electrical Engineering
Engineering &
& Computer
Computer Sciences
Sciences (SEECS)
(SEECS)
Department
Department of
of Communication
Communication Systems
Systems Engineering
Engineering

CSE-434: Systems Engineering

Source Coding and Data Compression


Information Theory
• The mathematical theory of communication
involving the quantification of information
• Deals with the transmission of information over
a noisy channel
– Source coding theorem
– Noisy channel coding theorem
• Does not concern with the importance of a
message
Shannon’s Generic Communication System

Generic Communication System, from Chapter 2 of K.V. Prasad.


Components of a Communication System
• Information source produces the symbols
• Source encoder converts the symbols into a data
stream
– Source encoding reduces the redundancy
– Can be divided into lossless encoding techniques and
lossy encoding techniques
• Channel encoder introduces redundancy for error
detection or error correction at receiver
• Modulator transforms the signal so that it can be
transmitted through the medium
Entropy
• Shannon’s formula to measure information of
the source, known as Entropy
H  log 2 N bits / symbol
for equally likely symbols

H   P i  log 2 P i  bits / symbol


if ith symbol has probability P(i)
Example 1
• Entropy of a source producing English alphabet
with each symbol being equally likely:
H  log 2 26  4.7 bits / symbol
• Entropy of a source producing 4 symbols with
probability {0.5, 0.25, 0.125, 0.125} respectively

H    P (i ) log 2 P (i )
H  1.75 bits / symbol
Exercise 1
• Calculate the entropy of a source that produces
4 symbols with probability 1/8 and 2 symbols
with probability 1/4

• Answer: 2.5 bits/symbol


Data Compression
• The process of encoding information using
fewer bits than the original message
• Lossless data compression
– e.g. RLE, Huffman, Shannon-Fano, LZW, LZ77
• Lossy data compression
– e.g. JPEG, MPEG, AMR, AC3
Run Length Encoding
• Sequence of repeating characters are replaced
by their count
• Useful when input text has long repeating
sequences
• Special character inserted to identify
compression
Example 2:
Input stream: WHOOOOOODUNNNNNIT!!!
Special char: \
Output stream: WH\6ODU\5NIT\3!
Huffman Compression
• Variable length lossless data compression
technique
• More frequently occurring characters are given
shorter code
Example 3:
• Input stream: WHOOOOOODUNNNNNIT!!!
Character W H O D U N I T !
Frequency 1 1 6 1 1 5 1 1 3

Sorted
Character W H D U I T ! N O
Frequency 1 1 1 1 1 1 3 5 6
Huffman Compression
Character D U I T (W+H) ! N O
Frequency 1 1 1 1 2 3 5 6

Character I T (D+U) (W+H) ! N O


Frequency 1 1 2 2 3 5 6

Character (I+T) (D+U) (W+H) ! N O


Frequency 2 2 2 3 5 6

Character (W+H) ! (I+T)+(D+U) N O


Frequency 2 3 4 5 6

Character ((I+T)+(D+U)) ((W+H)+!) N O


Frequency 4 5 5 6
Huffman Compression
Character (((I+T)+(D+U))+((W+H)+!)) N+O
Frequency 9 11

Huffman Codes Huffman Tree


Char Code
O 11
N 10
! 011
H 0101
W 0100
U 0011
D 0010
T 0001
I 0000
Exercise 2
• Find the entropy of the source in the previous
example
• Find the average number of bits/symbol for the
Huffman code derived in the previous example
– Hint: Use
L   L(i ) P (i )
where L(i) is the length of code assigned to symbol ‘i’
• Discuss why Huffman code will be better than a
fixed length code
Shannon-Fano
Algorithm:
1. Determine the probability of each symbol in the source
text
2. Sort the symbols in decreasing probability order
3. Divide the set of symbols into two parts such that each
part has an approximately equal probability
4. The symbols in the first part are coded with the bit zero
and the symbols in the second part with the bit one
5. Repeat steps 3 and 4 until each sub-division contains
exactly one symbol
Shannon-Fano Example
Example 4:
• Input stream: WHOOOOOODUNNNNNIT!!!
Character W H O D U N I T !
Probability 1/20 1/20 6/20 1/20 1/20 5/20 1/20 1/20 3/20
Shannon-Fano Example
Shannon-Fano Example
Shannon-Fano Codes Shannon-Fano Tree

Char Code
O 00
N 01
! 100
W 101
H 1100
D 1101
U 1110
I 11110
T 11111

Exercise 3: Find the average number of bits/symbol for this code


LZW Coding Example
• Lossless data compression algorithm
• Created by Abraham Lempel, Jacob Ziv and Terry
Welch
• Do not need to know the probability of symbol
occurence
Example 5:
• Input Stream: TOBEORNOTTOBEORTOBEORNOT#
• Symbols: A-Z, #
• 5 bits required for fixed length code
• Length of message: 25 x 5 = 125 bits
LZW Coding Example

Compressed Message
= 97 bits

Ref: http://en.wikipedia.org/wiki/LZW
JPEG
• "Joint Photographic Expert Group" – an
international standard in 1992
• JPEG is a commonly used method of
compression for photographic images
• Works with both color and grey-scale images
• JPEG file can be encoded in several ways e.g.,
JFIF (JPEG File Interchange Format)
JPEG

Loss of information
Coding Techniques
• Text
– ASCII, Extended ASCII, Morse, RLE, Huffman,
Adaptive Huffman, Shannon-Fano, LZ77, LZ78, LZW,
CTW, BWT, DMC
• Audio
– A-law, -law, G.7xx (ITU-T suite of standards)
Error Detection and Correction
• Ability to detect transmission errors in the
received data and to reconstruct the original
data
• Error detection techniques
– e.g. Parity, Checksum, CRC, Hamming codes, Hash
functions
• Error correction techniques
– ARQ (Stop-and-Wait, Go-back-N, Selective Repeat)
– FEC (Hamming, Reed-Solomon, Golay)
Cryptography and Steganography
• Cryptography is the study of hiding the
information
– Substitution ciphers
– Transposition ciphers
– One-time pads
– Symmetric and public key algorithms
• Steganography is the study of hiding the
existence of information
Modulation
• The addition of information to a signal carrier
– Digital data, digital signal (data encoding)
– Digital data, analog signal(ASK, FSK, PSK)
– Analog data, analog signal (AM, FM, PM)
– Analog data, digital signal (PCM, DM)
• Reasons
– Compatibility of signal with transmission medium
– Frequency division multiplexing

You might also like