Data Compression and Source Coding
Data Compression and Source Coding
C = B log2(1 + SNR)
This means that the highest bit rate for a telephone line
is 34.860 kbps. If we want to send data faster than this,
we can either increase the bandwidth of the line or
improve the signal-to-noise ratio.
3/25/2024 CME 514 10
Example 5
Solution
First, we use the Shannon formula to find the upper
limit.
1) I (s k ) 0 for p k 1
2) I ( sk ) 0 for 0 p k 1
3) I ( sk ) I ( si ) for p k pi
4)
I ( sk si ) I ( sk ) I ( si ), if s k and s i statist. indep.
N
1
H ( S ) P ( Si ) log 2
i 1 P( Si )
where
1 1 1
P(a ) log P(b) log P(c) log 1.5
P(a) P (b) P (c )
H ( S ) la (C )
Theorem (upper bound): For any probability distribution p(S) with
associated optimal prefix code C,
la (C ) H ( S ) 1
3/25/2024 CME 514 27
Coding Efficiency
• Coding Efficiency
• n = Lmin/La
• where La is the average code-word length
• From Shannon’s Theorem
• La >= H(S)
• Thus Lmin = H(S)
• Thus
• n = H(S)/La
3/25/2024
Data compression
CME 514
methods 33
Shannon – Fano Encoding:
First the messages are ranked in a table in descending order of probability. The table is then progressively divided into subsections of
probability near equal as possible. Binary 0 is attributed to the upper subsection and binary 1 to the lower subsection. This process
continues until it is impossible to divide any further, the following steps show the algorithmic procedure of Shannon – Fano encoding:
2-Divide the table into as near as possible two equal values of probability.
3-Allocate binary 0 to the upper section and binary 1 to the lower section.
4-Divide both the upper section and the lower section into two.
5-Allocate binary 0 to the top half of each section and binary 1 to the lower half.
6-Repeat steps (4) and (5) until it is not possible to go any further.
A 0 A
0 1
B 100
0 D
C 101 1
D 11 B C
a 1 Huffman Codes :
b 1 a 1
0 b 01
c 0 c 00
HT 0 1
A C(HT) =1·1/2 +3·1/8 +3·1/8 +2·1/4=1.75
0 1
0 D
1
B C
D E
D E
B
0 1
D E
3/25/2024 CME 514 53
Construction of Huffman Trees
0.3 0.3 0.4 0.6 0.4
C A A
0 1 0 1
B C
0 1 0 1
D E B
0 1
D E
0.4 0.6
A
0 1
0 1
C A
0 1
0 1
B C
0 1 0 1
D E B
0 1
0 1
A=0
A B = 100
0 1
C = 11
C D = 1010
0 1
E = 1011
B
0 1
D E
3/25/2024 CME 514 56
Huffman Codes
• Theorem: For any source S the Huffman code can be
computed efficiently in time O(n·log n) , where n is the size
of the source S.
Proof: The time complexity of Huffman coding algorithm is
dominated by the use of priority queues
• One can also prove that Huffman coding creates the most
efficient set of prefix codes for a given text
• It is also one of the most efficient entropy coder
Best Case:
Input: 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0
Output: 0,16,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1…
Worst Case:
Input: 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
Output: 0,1,1,1,2,1,3,1,4,1,5,1,6,1,7,1,8,1,9,1,10,1,11,1,12,1,13,1,14,1,15,1
3/25/2024 CME 514 64