0% found this document useful (0 votes)
72 views

Data Compression - Unit 2

The document explains Huffman coding, a lossless data compression technique that assigns variable-length codes based on symbol frequencies to minimize transmission time. It details the steps to create a Huffman tree, calculate bits required for encoding, and the differences between standard Huffman and minimum variance Huffman coding. Additionally, it covers adaptive Huffman coding for real-time data compression, including the updating process and encoding procedures.

Uploaded by

r98641897
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views

Data Compression - Unit 2

The document explains Huffman coding, a lossless data compression technique that assigns variable-length codes based on symbol frequencies to minimize transmission time. It details the steps to create a Huffman tree, calculate bits required for encoding, and the differences between standard Huffman and minimum variance Huffman coding. Additionally, it covers adaptive Huffman coding for real-time data compression, including the updating process and encoding procedures.

Uploaded by

r98641897
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT-2

Before jumping on this, what is encode and decode?


Huffman coding (Greedy technique)
- Huffman coding is used to compress the data or encode the data.
- Loss less process
- We have to reduce the Transmission time, that’s why we are using Huffman technique.
- The size of the message measured in term of Bits.
- So to transfer any data we have to encode them, so we already have a better
encoding technique is ASCII (8 bits) coding which has range 0-127 (means total we
have 128 characters)
How you can calculate number of bits required to represent a character/number:
- we have to follow Decimal to Binary conversion process.
- Example:
12710 = ( )2
Divide by 2, write down the quotient and remainder:
• 127 ÷ 2 = 63, remainder 1
• 63 ÷ 2 = 31, remainder 1
• 31 ÷ 2 = 15, remainder 1
• 15 ÷ 2 = 7, remainder 1
• 7 ÷ 2 = 3, remainder 1
• 3 ÷ 2 = 1, remainder 1
• 1 ÷ 2 = 0, remainder 1 (Stop when quotient is 0)

Steps involved in Huffman coding process:


Example:
AAABBCCCCDDDD

A=3 1. First we have to create a frequency


B=2 table which should contains “ How
C=4 many times the number of
D=4 characters occurring “
B:2, A:3, C:4, D:4 2. Create a minimum heap
3. Start building tree by keeping
selecting Two lowest frequency
nodes and merge them. ( Keep
higher value on right and lower on
the left )
4. Assing 0 to left branch and 1 to right
branch
5. Now we have to calculate total bits
required to represent data.
6. Start from top
D = 0 (4 x 1 = 4)
( Means total 1 bit required to
represent D and the number of times
D occur is 4 )

C = 10 ( 4 x 2 = 8 )
B = 110 (3X3) = 9
A = 111 (3x3) = 9

Add all, you will get 27.


Means 27 bits required.
7. Now replace AAABBCCCCDDDD
With Binary number.
8. 111111111110110101010100000

• To transmit the data


Transmission time = Message size/ bandwidth
Example: 700 bits/BW

Means if Message size is greater then TT tooks longer time and vice versa
• 3 bit = 23 = 8 Combinations
• 8 bit = 28 = 256 bits
• 1 byte = 8 bits
• 1 byte can hold number between 0 to 255
• Decoding process:
Encoded bits + number of unique alphabets * 8 + Total number of frequency
Example:
27 + 4*8 + 15
Questions:
1. Build a Huffman Tree:
Given the following characters and their frequencies, construct a Huffman tree and
determine the Huffman codes for each character:
A - 5, B - 9, C - 12, D - 13, E - 16, F – 45
2. Decode a Huffman Encoded String:
Given the Huffman tree you built in Exercise 1, decode the following binary sequence:
1100110011111010 (Using the above question generated code)
3. Find the Huffman Code for a Given String:
Consider the string "MISSISSIPPI". Compute the frequency of each character and
generate the Huffman encoding.
More questions:
1. ABBCDBCCDAABBEEEBEAB (20 characters) (8 bit * 20 = 160 bits)
For 1 character, we required 8 bits
2. A:15, B:6, C:7, D:12, E:25, F:4, G:6, H:10, I:15
Example: ABCDE – 5 different characters available , it can be fit into 3 bits ( how we find this 3
bits, by performing 2n , where n represents the number of Bits. So to fit 5 characters we need 5
bits, means 23 )

Now we get 5, 8, 7 , again try to arrange them in minimum heap ( We have to keep on
arranging numbers to create new heap )

5,7,8
• Create chart and write codeword and calculate required bits.
• Now perform Encoding process.

How to calculate decoding bits required?


Encoded bits + number of unique alphabets * 8 + Total number of frequency
TOPIC:
Minimum variance Huffman code:
• It is more better than Huffman coding technique
• Main difference between Huffman coding and minimum variance coding is the way of
sorting, rest is almost same
• We will combine symbols and arrange them in a list.
Means Symbol with the given probability we have to sort in ascending or descending
order then we will start combining those symbols whose probability is less.
Repeat this until we get the result.
Difference Between Standard Huffman and Minimum Variance Huffman

Example:
Latter prob codeword
A1 0.2
A2 0.4
A3 0.2
A4 0.1
A5 0.1

Steps:

1. Now arrange them in descending order


2. When we combine, then we use to write the above symbol with the dash.
3. Note: when our new probability is equal to old probability , then in that condition we
used to keep new probability above to the old probability.
4. Now we have to do binary coding, keep 0 to the above and 1 to the bottom.
5. Now we have to find the codeword:
Suppose we have to find out the code for A1,
In the second row we have A1, now follow the flow
A1 -> A1` (which is 0)
Now we will find, where we went from A1` to 1 (which is 1)
Now we get code 01 but we have to write in reverse order i.e 10.
So code for A1 is 10
A1 10
A2 00
A3 11
A4 010
A5 011

Now we have to find the Average length: (probability * total bits )


Avg len = 0.2*2 + 0.4*2 + 0.2*2 + 0.1*3 + 0.1*3 = 22 bits / latter

Questions:
Huffman Coding:
a) Sometimes, we can obtain more than one Huffman code due to different tie breaks during
the Huffman code construction. Construct two Huffman codes, H and H', for the following
data:
Symbol A B C D E

Probability 0.1 0.1 0.2 0.2 0.4


For data transmission purposes, it is often desirable to have a code with minimum variance in
codeword length (among codes of the same average length).
b) Compute the average length and variance of the codeword lengths for H and H'.
c) Determine which code (H or H') is preferable for transmission purposes.
Answer:
b) Average length (E[L]) = ∑(Pi×Li)
where:
• Pi = Probability (or relative frequency) of symbol i
• Li = Length of the Huffman codeword for symbol I (number of bits for that symbol)
c) variance formula: (σ standard deviation)
σ2=E[L2]−(E[L])2
To find E[L], we already know the formula for avg length:
E[L] = ∑(Pi×Li)
To find E[L2], we have:
E[L2]=∑(Pi×Li2)
Now compute Variance:
σ2=E[L2]−(E[L])2

Here are the AKTU-style Huffman coding questions along with their answers:

Short Answer Questions


1. What is Huffman coding, and why is it used in data compression?
Answer:
Huffman coding is a lossless data compression algorithm that assigns variable-length
binary codes to input symbols based on their frequencies. Symbols with higher frequencies
get shorter codes, while those with lower frequencies get longer codes. This minimizes the
average code length and reduces the overall data size.

2. Explain the process of constructing a Huffman tree with an example.


Answer:
To construct a Huffman tree:
1. List all symbols with their probabilities (or frequencies).
2. Create leaf nodes for each symbol and arrange them in ascending order.
3. Merge the two nodes with the smallest probabilities into a new node, whose
probability is their sum.
4. Repeat the merging process until a single root node is formed.
5. Assign binary codes (left: 0, right: 1) to each branch to get Huffman codes.
Example:
For symbols {A: 0.1, B: 0.2, C: 0.3, D: 0.4}, the Huffman codes might be:
• A → 00
• B → 01
• C → 10
• D → 11

3. Can Huffman coding generate a prefix-free code? Justify your answer.


Answer:
Yes, Huffman coding always generates a prefix-free code because no code is a prefix of
another. This ensures that each symbol can be uniquely decoded without ambiguity.
4. What is the time complexity of constructing a Huffman tree?
Answer:
The time complexity of Huffman coding is O(n log n), where n is the number of symbols. This
is because:
• Building a priority queue takes O(n) time.
• Extracting and merging nodes takes O(log n) time for each of the n symbols.

5. Why does Huffman coding lead to variable-length encoding instead of fixed-length


encoding?
Answer:
Huffman coding assigns shorter codes to more frequent symbols and longer codes to less
frequent symbols, leading to variable-length encoding. This optimizes the overall code
length and reduces storage space, unlike fixed-length encoding, which assigns equal-length
codes to all symbols.

Numerical/Problem-Solving Questions
6. Construct a Huffman tree for the given symbols and probabilities.
Symbol A B C D E F
Probability 0.05 0.1 0.15 0.2 0.25 0.25
Solution:
• Build binary tree using the steps which we learnt in the class.
Final Huffman Codes (It may Vary):
• A → 000
• B → 001
• C → 010
• D → 011
• E → 10
• F → 11

7. Decode the following Huffman encoded message:


Encoded Message: "110011001010"
Assume the given Huffman table:
• A → 00
• B → 01
• C → 100
• D → 101
• E → 11
Solution:
• 11 → E
• 00 → A
• 11 → E
• 00 → A
• 10 → C
• 10 → C
Decoded message: "EAEACC"

8. Two Huffman codes H1 and H2 have the same average length, but H1 has higher variance.
Which one is better for transmission and why?
Answer:
H2 is better because lower variance means that the codeword lengths are more uniform,
leading to less fluctuation in transmission times and buffering. High variance can cause
jitter and inefficient resource allocation in real-time applications.

9. If all symbols have equal probabilities, what type of encoding is optimal? Is Huffman
coding still beneficial?
Answer:
When all symbols have equal probabilities, fixed-length encoding (e.g., ASCII, UTF-8) is
optimal. Huffman coding would assign the same-length codes in this case, making it no
more efficient than fixed-length encoding.

10. Modify Huffman coding to handle probabilities given as frequencies.


Answer:
If probabilities are given as frequencies:
1. Convert frequencies into probabilities by dividing each frequency by the total count.
2. Follow standard Huffman coding steps to construct the tree.
3. Assign binary codes as usual.
Example:
• Frequencies: {A: 5, B: 10, C: 15, D: 20}
• Probabilities: {A: 5/50, B: 10/50, C: 15/50, D: 20/50}
• Then apply the Huffman algorithm normally.

Summary Table
Question Key Takeaway
Lossless compression algorithm that assigns shorter
What is Huffman coding?
codes to frequent symbols.
Steps to build a Huffman tree? Merge smallest probabilities, assign 0/1 recursively.
Prefix-free codes? Yes, Huffman codes are always prefix-free.
Time complexity? O(n log n) using priority queues.
Question Key Takeaway
Huffman codes adapt to symbol frequency, unlike fixed-
Variable-length encoding?
length codes.

Decoding example? Decode using the given Huffman table.

High variance vs. low variance? Low variance is preferable for smoother transmission.

Equal probability symbols? Fixed-length encoding is optimal.

Using frequencies instead of


Convert to probabilities before applying Huffman coding.
probabilities?

To convert frequencies into probabilities:


Probability=Frequency/ Total Frequency
Total frequency = 5 + 10 + 15 + 20 + 50 = 100

Symbol A B C D E

Frequency 5 10 15 20 50

Probability 0.05 0.10 0.15 0.20 0.50

Questions:
Find Probability, Huffman code, Code- length

Probability = Frequency/ Total frequency

Avg length:
E[L]=∑(Pi×Li) ( ∑ means the sum of a set of terms)
TOPIC:
ADAPTIVE HUFFMAN CODING:
1. It is used when probability of the huffman coding is not known.
2. It is used for real-time data compression
3. Adaptive Huffman coding updates the tree dynamically as symbols are processed
4. Examples like audio, video streaming, and real-time communication.
5. 3 phases are included in Adaptive Huffman coding: Updating the tree, Encoding and
Decoding

Update process:
- NYT (Not yet transmitted)
- Encoding based on the tree which is we are going to create in Updating process

Steps:
1. At the first there will be no node, So the first we will took NYT by default.
2. The default value of NYT is 0
3. NYT always remains at the left child of the tree even when the tree start
growing in the next steps.
4. Initial weight for NYT is 51
5. To find the weight for symbol is:
Weight = (2*n)-1
// here n is the total number of alphabets i.e 26
So,
Weight = (2*26)-1
= 51
6. Whenever the Symbol is encountered for the first time , its weight will be 1
(means which we are going to write inside box)
7. If the same symbol is occurring again and again, the value of that symbol keep
increasing again and again.
8. Internal nodes represented with and external node
External nodes: which is on the outer side.
Internal nodes: which is on the inner side. (In short whose having child)
Encoding process:
Message: a a r d v a r k
Algorithm:
Read input symbol

First time appearance?

(yes) (No)

NYT code followed by Fixed Code Send code for that symbol

Call update procedure

NYT CODE and Send code : Reaching from Node to NYT and finding out the Binary code like
0101

1. Fixed Code:
2 parameter (e and r)

M = 2e + r , 0<= r <=2e ( value of r should lies between 0 and 2e , sp to


achieve value equivalent to 26, we have to assign value of e = 4 and r = 10 )
• M is the total number of alphabets present i.e 26
• Now we have to put value of e and r in such a manner, so that the sum is equivalent to
the value of M
• Now
26 = 24 + 10
I.e, value of e = 4 and r = 10

2. Now The letter (ak ) is encoded as:


Value of e = 4 and r = 10
• (e+1) bit binary representation of (k-1)
# case 1:
1 <= k <= 2r
Explanation: If my k belongs between 1 and 2 r, we have to represent in e+1 bit,
means e=4, so the value e+1 = 5 bits(means we got the number of bits to
represent that symbol), Now what I have to represent i.e k-1
The value of k is the number of the position of that symbol in the alphabets
series. ( k value of B is 2 )
# case 2:
k >= 2r
Explanation: e bit binary representation of (k-r-1) and the value of e is 4

Example 1:
B ( 1 <= 2 <= 20 ), [value of k is 2, as the position of B in alphabet series is 2]
* Means the condition is true, so we need total 5 bits to represent B symbol
* Now we have to represent binary code for the symbol B, using 5 bits, so to
represent it, we have case 1, i.e k-1 (value of k is 2)
Now,
k-1
2-1 = 1.
So binary code for B is 00001 (1)

Example 2:
Z (Value of k become 26)
▪ Means the second condition:
26 >= 20
Now,
k-r-1 (k = 26, r = 10)
26-10-1
16-1
15 (means we have to represent 15 in e bit i.e 4 bits, by converting 15 into
binary code)
To convert 15 into binary, we have to perform Decimal to Binary
conversion steps by taking LCM.
So binary code for z is 1111

DECODING:
NO
Is it NYT Code Decode element directly

YES

Read ‘e’ bits after NYT (Value of e is having 4 and value of r is 10)

NO
Is the bit value ‘p’ is < r? Add ‘r’ to ‘p’ (add r to that element which we got)

YES

Read one more bit Decode the (current value + 1)


Question: Decode the given string:
‘0000010100010000011000101110110001010’

Step 1: According to flow chart:

• Is it NYT code: Yes


• So we have to follow 2nd step ie. “Read e bits after NYT” , the value of e=4 means, we
have to take total 4 bits from the given string i.e 0000 (after NYT, so this is our
initialization phase that’s why we are taking first 4 zero’s)
• 0000 < r ? (Here the value of r is 10)
• Now the above condition is true so we read one more bit i.e 00000, now 00000 is 0
• So the last step 0+1 = 1 i.e a(In alphabet value of a is 1)
• And draw diagram along with the code.
• So first 5 0 we got a

Step 2: ‘0000010100010000011000101110110001010’
• Next value is 1
• So according to binary tree diagram, we are reaching from parent node which is
having value 1 to a and the bit we are using to reach to a is also 1
• Means node is not NYT
• Simply decode that element with the symbol mentioned according to the binary tree
• Second value also we got a

Step 3: ‘0000010100010000011000101110110001010’
• Next value is 0
• So 0 is the NYT node, as we are coming to NYT from parent node using 0 according to
diagram
• Read ‘e’ bits after NYT, means take 4 bits i.e 1000
How to convert?

• So 1000<10 (i.e 8<10?) read one more bit


• So now the bits become 10001 i.e 17
• Now according to flow chart, add +1 to 17 i.e 18 = r

Step 4: ‘0000010100010000011000101110110001010’
• Now using next two 0’s we are reaching to NYT and after 2 0’s we at end with not
further child node that’s why we are not taking 3rd 0
• Now we will read e bits after NYT(i.e after two 0’s)
• 0001 means this is also less then 10 (ie. 1<10) then read one more bit
• So now the bits become 00011 i.e 3
• Now according to flow chart we have to add +1 to 3 i.e 4 = d
• Now update your tree.

Continue this until you get the desired output………………

TOPIC
Golomb Code:
• Denotated by Gm(n)
• Used for loss less data compression
• Golomb coding is a lossless data compression method that is particularly efficient for
encoding sequences of integers with a geometric distribution (i.e., when smaller
numbers occur more frequently than larger ones).

Golomb coding represents a number using two parts:


1. Quotient part: The number is divided by a divisor (a parameter mmm), and the
quotient is encoded using Unary Coding.
2. Remainder part: The remainder is encoded in a fixed or variable-length binary
representation.
Algorithm:
Step 1:
Find Quotient:
• Find q=⌊n/m⌋ (Floor value of n/m) , m = devisor , n = Integer value, and convert it into
unary code
• Way to writing Unary code
o Suppose the value of q we got 3, means that much times you can write
▪ 111 followed by one 0’s
Or
▪ 000 followed by one 1’s
Find Remainder:
• r = n mod m
Step 2 (Different code of r):
• K=⌈ log2m ⌉ (ceiling of the base-2 logarithm of m)
• C = 2k-m
Now the conditions we have to find out number of bits required and what to represent in
those bits?
a. 0<=r<c
o What to represent = r
o In how many bits = k-1
b. r>=c
o What to represent = r + c
o In how many bits = k

Step 3:
• Concatenate result of step1 and step2

Example:
Question: Design golomb code for 9 with devisor 4
Answer:
n = 9, m = 4, G4(9) = ?
Lets start:
Step 1:
Quotient:
Unary code of q=⌊n/m⌋
q= ⌊9/4⌋, means we have to take only integer value after diving, not decimal value, so
q= 2 (now we have to take unary code followed by one 0)
q = 110

Remainder:
R = 9 mod 4 (means we have to take remainder part) = 1

Step 2:
K=⌈ log2m ⌉
K =⌈ log24⌉ means 2
• C = 2k-m
C = 22-4 = 0

So we got the case


r>=c
• What to represent = r + c
• In how many bits = k

r+c = 1+0 = 1
k=2
Means we have to represent 1 in 2 bits i.e 01

Step 3:
• Now we have to simple concatenate the result of step1 and step2
• Step1 = 110, step2 = 01
• 11001 is the result
So the golomb code of G4(9) = 11001

More questions:
1. Find the golomb code for n = 0,1,2,3…..15, where m = 5

So the value ceiling value of 5 is 3


TOPIC

Tunstall Code
• Tunstall Code is a type of variable-to-fixed-length coding used in data compression.
• Tunstall coding replaces variable-length sequences of input symbols with fixed-
length codewords.
N + k(N-1) <= 2n
(Number of iterations) that is used to find Tunstall code.
• n → Bits required for Tunstall code
• N → Size of source (Number of letters)
• k → Number of iterations
Example:
Letters Probability
a₁ 0.7
a₂ 0.2
a₃ 0.1
Given N = 3, we want to generate a 3-bit Tunstall code.
N + k(N-1) <= 2n
3+k(3-1) <=23
k<=2.5
Since k is a natural number, we take k = 2.

1st Iteration:
Find the highest probability which is a1 letter & remove it from the base. And multiply with the
all 3 letters
Letters Probability
a₂ 0.2
a₃ 0.1
a₁a₁ 0.7×0.7=0.49
a₁a₂ 0.7×0.2=0.14
a₁a₃ 0.7×0.1=0.07

2nd Iteration: Now this is our last iteration as we got 3 bits for symbols
- Remove the highest probability and multiply with the letters given in the questions,
means we have to multiply a₁a₁ with a₁ , a₂
Letters Codeword
a₂ 000 [0]
a₃ 001 [1]
a₁a₂ 010 [2]
a₁a₃ 011 [3]
a₁ a₁ a₁ 100 [4]
a₁ a₁a₂ 101 [5]
a₁ a₁a₃ 110 [6]

Question: Design a 3-bit Tunstall code for the following alphabet.


Letter Probability
A 0.6
B 0.3
C 0.1

→ Find the highest probability from Table-1 & remove the entry & concatenate with others.

Letters Probability
B 0.3
C 0.1
AA 0.6 × 0.6 = 0.36
AB 0.6 × 0.3 = 0.18
AC 0.6 × 0.1 = 0.06

→ Now again, find the highest probability & perform the same:

Letters Probability Codeword


B 0.3 000 [0]
C 0.1 001 [1]
AB 0.18 010 [2]
AC 0.06 011 [3]
AA A 0.36 x 0.6 = 0.216 100 [4]
AA B 0.36 × 0.3 = 0.108 101 [5]
AA C 0.36 × 0.1 = 0.036 110 [6]

We got 3 bit codeword so at this point we will stop iteration part.


Now find Average Bit:
Step 1:
(Diagram showing probability distribution)
• A → 0.6
• B → 0.3
• C → 0.1
TOPIC
Rice Code:
• Rice Code is a simple and efficient method for compressing numbers, especially when
small values are more common.
Q1: Encode the following sequence of 16 values using Rice code with J = 8 & one split sample
option.
Given Sequence:
38, 33, 35, 29, 37, 38, 39, 40, 40, 40, 39, 40, 40, 41, 40
Use the previous value as the prediction value & '0' for the first value.
Answer:

yi 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41 40
Ŷi 0 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41
di 32 1 2 4 -2 1 1 1 0 0 0 -1 1 0 1 -1
Ti 0 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
xi 32 2 4 8 3 2 2 2 0 0 0 1 2 0 2 1

What is One Split Sample Option?


Example:
Consider the number 3.
If we want to represent 3 in 8 bits,
3=11
Unary encoding of this binary representation:
00000011
Write red mark 1 first as we have to perform one split sample, and rest code 0000001 convert
into decimal and write it’s unary code
0000001 = 1
So the unary code is 10
Answer is 110
Now also find Xi….
AKTU QUESTIONS BASED ON UNIT-2

1. Describe the Huffman Coding Algorithm and explain its optimality.


Answer:
Huffman coding is a greedy algorithm used for lossless data compression. It assigns
variable-length codes to input characters based on their frequencies.
Steps of Huffman Coding:
1. Create a priority queue of all characters, where the priority is their frequency.
2. Extract the two nodes with the smallest frequency and combine them into a new
node with a frequency equal to their sum.
3. Repeat this process until only one node remains (the root of the Huffman tree).
4. Assign ‘0’ to left edges and ‘1’ to right edges to generate Huffman codes.
Optimality:
• Huffman coding produces the most efficient prefix-free encoding for a given set of
character frequencies.
• It minimizes the average code length, ensuring no two characters share the same
prefix, making it uniquely decodable.

2. Given a set of symbols with their respective probabilities, construct the Huffman code
and calculate the average code length.
Answer:
Example: Given symbols A, B, C, D with frequencies (0.4, 0.3, 0.2, 0.1), construct Huffman
codes.
Symbol Probability Huffman Code
A 0.4 0
B 0.3 10
C 0.2 110
D 0.1 111
Average Code Length:
L=(0.4×1)+(0.3×2)+(0.2×3)+(0.1×3)=1.9 bits per

3. Explain the concept of Extended Huffman Codes. How do they differ from standard
Huffman Codes?
Answer:
• Extended Huffman Codes use blocks of symbols instead of single symbols for
encoding.
• If the original Huffman coding doesn't give sufficient compression, combining multiple
symbols into a single unit helps in further reducing the code length.
• Difference:
o Standard Huffman coding assigns a code to each individual character.
o Extended Huffman coding assigns a code to groups of characters (pairs,
triplets, etc.).

4. Discuss Adaptive Huffman Coding. How does it adjust to changing data characteristics?
Answer:
• Adaptive Huffman Coding dynamically updates the Huffman tree as new symbols are
encountered.
• Unlike static Huffman coding (where frequencies are predetermined), adaptive
coding does not require a frequency table beforehand.

5. What are Rice Codes and Golomb Codes? Provide examples of each.

6. Define Tunstall Codes and explain their application in data compression.

7. Compare and contrast Huffman Coding with Arithmetic Coding.


Answer:
Feature Huffman Coding Arithmetic Coding
Type Prefix code Fractional encoding
Efficiency Slightly less efficient for small data More efficient
Code Length Fixed or variable Variable

8. Determine whether the following code set is uniquely decodable: {0, 01, 10, 110}.
Answer:
• A set is uniquely decodable if no code word is a prefix of another.
• Here, 0 is a prefix of 01, making it not uniquely decodable.

10. Explain the concept of coding redundancy and its impact on compression efficiency.
Answer:
• Coding redundancy occurs when more bits than necessary are used to represent
data.
• Impact:
o Reduces compression efficiency.
o Huffman coding reduces redundancy by assigning shorter codes to frequent
symbols.
NUMERICAL QUESTIONS:

1. Huffman Coding Numerical


Given the following set of symbols and their probabilities, construct the Huffman tree and
determine the average code length.
Symbol Probability
A 0.3
B 0.25
C 0.2
D 0.15
E 0.1

2. Huffman Code Generation


Construct the Huffman coding for the following symbols and their frequencies:
Symbol Frequency
X 5
Y 9
Z 12
W 13
V 16
U 45
Compute the total number of bits required before and after compression if the original
symbols used a fixed-length 3-bit code.

3. Prefix Code Verification


Given the following set of codewords, determine whether they form a prefix code or not:
1. {00, 01, 10, 110}
2. {0, 01, 10, 110}
3. {1, 10, 101, 1011}
Justify your answer.

4. Adaptive Huffman Coding


A file contains a stream of characters with the following frequency distribution:
Character Frequency
A 20
B 5
C 10
D 15
Construct an Adaptive Huffman tree and encode the string "ABCDAB" using the dynamic
tree.

5. Arithmetic Coding vs Huffman Coding


Given a message M = "ABCA", with the probability distribution:
Symbol Probability
A 0.4
B 0.2
C 0.4
1. Construct a Huffman tree and determine the encoded message.
2. Perform Arithmetic Coding for the same message and obtain the final code.
Compare the lengths of Huffman and Arithmetic encoding.

6. Golomb Coding Numerical


A number X = 25 is to be encoded using Golomb Coding with divisor m = 4. Determine the
encoded bit sequence.

7. Rice Coding Numerical


Encode the integer N = 18 using Rice Coding with parameter k = 3.

8. Tunstall Coding
Construct a Tunstall Code for the following source probabilities using block size n = 3:
Symbol Probability
A 0.5
B 0.3
C 0.2
Find the generated codewords.

9. Entropy Calculation
For a source generating symbols X, Y, and Z with probabilities 0.5, 0.3, and 0.2, calculate:
1. Entropy of the source.
2. Minimum average bits required per symbol.
3. Efficiency of Huffman coding if the average code length obtained is 1.6 bits per
symbol.

10. Huffman Coding with Unequal Probabilities


Given a set of six symbols with the following probabilities:
Symbol Probability
S1 0.35
S2 0.10
S3 0.20
S4 0.15
S5 0.10
S6 0.10
Construct a Huffman tree and find the encoded bit sequence for the input string
"S1S3S2S6S4".

You might also like