0% found this document useful (0 votes)

145 views31 pages

Data Compression - Unit 2

The document explains Huffman coding, a lossless data compression technique that assigns variable-length codes based on symbol frequencies to minimize transmission time. It details the steps to create a Huffman tree, calculate bits required for encoding, and the differences between standard Huffman and minimum variance Huffman coding. Additionally, it covers adaptive Huffman coding for real-time data compression, including the updating process and encoding procedures.

Uploaded by

r98641897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

145 views31 pages

Data Compression - Unit 2

Uploaded by

r98641897

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

UNIT-2

Before jumping on this, what is encode and decode?

Huffman coding (Greedy technique)
- Huffman coding is used to compress the data or encode the data.
- Loss less process
- We have to reduce the Transmission time, that’s why we are using Huffman technique.
- The size of the message measured in term of Bits.
- So to transfer any data we have to encode them, so we already have a better
encoding technique is ASCII (8 bits) coding which has range 0-127 (means total we
have 128 characters)
How you can calculate number of bits required to represent a character/number:
- we have to follow Decimal to Binary conversion process.
- Example:
12710 = ( )2
Divide by 2, write down the quotient and remainder:
• 127 ÷ 2 = 63, remainder 1
• 63 ÷ 2 = 31, remainder 1
• 31 ÷ 2 = 15, remainder 1
• 15 ÷ 2 = 7, remainder 1
• 7 ÷ 2 = 3, remainder 1
• 3 ÷ 2 = 1, remainder 1
• 1 ÷ 2 = 0, remainder 1 (Stop when quotient is 0)

Steps involved in Huffman coding process:

Example:
AAABBCCCCDDDD

A=3 1. First we have to create a frequency

B=2 table which should contains “ How
C=4 many times the number of
D=4 characters occurring “
B:2, A:3, C:4, D:4 2. Create a minimum heap
3. Start building tree by keeping
selecting Two lowest frequency
nodes and merge them. ( Keep
higher value on right and lower on
the left )
4. Assing 0 to left branch and 1 to right
branch
5. Now we have to calculate total bits
required to represent data.
6. Start from top
D = 0 (4 x 1 = 4)
( Means total 1 bit required to
represent D and the number of times
D occur is 4 )

C = 10 ( 4 x 2 = 8 )
B = 110 (3X3) = 9
A = 111 (3x3) = 9

Add all, you will get 27.

Means 27 bits required.
7. Now replace AAABBCCCCDDDD
With Binary number.
8. 111111111110110101010100000

• To transmit the data

Transmission time = Message size/ bandwidth
Example: 700 bits/BW

Means if Message size is greater then TT tooks longer time and vice versa
• 3 bit = 23 = 8 Combinations
• 8 bit = 28 = 256 bits
• 1 byte = 8 bits
• 1 byte can hold number between 0 to 255
• Decoding process:
Encoded bits + number of unique alphabets * 8 + Total number of frequency
Example:
27 + 4*8 + 15
Questions:
1. Build a Huffman Tree:
Given the following characters and their frequencies, construct a Huffman tree and
determine the Huffman codes for each character:
A - 5, B - 9, C - 12, D - 13, E - 16, F – 45
2. Decode a Huffman Encoded String:
Given the Huffman tree you built in Exercise 1, decode the following binary sequence:
1100110011111010 (Using the above question generated code)
3. Find the Huffman Code for a Given String:
Consider the string "MISSISSIPPI". Compute the frequency of each character and
generate the Huffman encoding.
More questions:
1. ABBCDBCCDAABBEEEBEAB (20 characters) (8 bit * 20 = 160 bits)
For 1 character, we required 8 bits
2. A:15, B:6, C:7, D:12, E:25, F:4, G:6, H:10, I:15
Example: ABCDE – 5 different characters available , it can be fit into 3 bits ( how we find this 3
bits, by performing 2n , where n represents the number of Bits. So to fit 5 characters we need 5
bits, means 23 )

Now we get 5, 8, 7 , again try to arrange them in minimum heap ( We have to keep on
arranging numbers to create new heap )

5,7,8
• Create chart and write codeword and calculate required bits.
• Now perform Encoding process.

How to calculate decoding bits required?

Encoded bits + number of unique alphabets * 8 + Total number of frequency
TOPIC:
Minimum variance Huffman code:
• It is more better than Huffman coding technique
• Main difference between Huffman coding and minimum variance coding is the way of
sorting, rest is almost same
• We will combine symbols and arrange them in a list.
Means Symbol with the given probability we have to sort in ascending or descending
order then we will start combining those symbols whose probability is less.
Repeat this until we get the result.
Difference Between Standard Huffman and Minimum Variance Huffman

Example:
Latter prob codeword
A1 0.2
A2 0.4
A3 0.2
A4 0.1
A5 0.1

Steps:

1. Now arrange them in descending order

2. When we combine, then we use to write the above symbol with the dash.
3. Note: when our new probability is equal to old probability , then in that condition we
used to keep new probability above to the old probability.
4. Now we have to do binary coding, keep 0 to the above and 1 to the bottom.
5. Now we have to find the codeword:
Suppose we have to find out the code for A1,
In the second row we have A1, now follow the flow
A1 -> A1` (which is 0)
Now we will find, where we went from A1` to 1 (which is 1)
Now we get code 01 but we have to write in reverse order i.e 10.
So code for A1 is 10
A1 10
A2 00
A3 11
A4 010
A5 011

Now we have to find the Average length: (probability * total bits )

Avg len = 0.2*2 + 0.4*2 + 0.2*2 + 0.1*3 + 0.1*3 = 22 bits / latter

Questions:
Huffman Coding:
a) Sometimes, we can obtain more than one Huffman code due to different tie breaks during
the Huffman code construction. Construct two Huffman codes, H and H', for the following
data:
Symbol A B C D E

Probability 0.1 0.1 0.2 0.2 0.4

For data transmission purposes, it is often desirable to have a code with minimum variance in
codeword length (among codes of the same average length).
b) Compute the average length and variance of the codeword lengths for H and H'.
c) Determine which code (H or H') is preferable for transmission purposes.
Answer:
b) Average length (E[L]) = ∑(Pi×Li)
where:
• Pi = Probability (or relative frequency) of symbol i
• Li = Length of the Huffman codeword for symbol I (number of bits for that symbol)
c) variance formula: (σ standard deviation)
σ2=E[L2]−(E[L])2
To find E[L], we already know the formula for avg length:
E[L] = ∑(Pi×Li)
To find E[L2], we have:
E[L2]=∑(Pi×Li2)
Now compute Variance:
σ2=E[L2]−(E[L])2

Here are the AKTU-style Huffman coding questions along with their answers:

Short Answer Questions

1. What is Huffman coding, and why is it used in data compression?
Answer:
Huffman coding is a lossless data compression algorithm that assigns variable-length
binary codes to input symbols based on their frequencies. Symbols with higher frequencies
get shorter codes, while those with lower frequencies get longer codes. This minimizes the
average code length and reduces the overall data size.

2. Explain the process of constructing a Huffman tree with an example.

Answer:
To construct a Huffman tree:
1. List all symbols with their probabilities (or frequencies).
2. Create leaf nodes for each symbol and arrange them in ascending order.
3. Merge the two nodes with the smallest probabilities into a new node, whose
probability is their sum.
4. Repeat the merging process until a single root node is formed.
5. Assign binary codes (left: 0, right: 1) to each branch to get Huffman codes.
Example:
For symbols {A: 0.1, B: 0.2, C: 0.3, D: 0.4}, the Huffman codes might be:
• A → 00
• B → 01
• C → 10
• D → 11

3. Can Huffman coding generate a prefix-free code? Justify your answer.

Answer:
Yes, Huffman coding always generates a prefix-free code because no code is a prefix of
another. This ensures that each symbol can be uniquely decoded without ambiguity.
4. What is the time complexity of constructing a Huffman tree?
Answer:
The time complexity of Huffman coding is O(n log n), where n is the number of symbols. This
is because:
• Building a priority queue takes O(n) time.
• Extracting and merging nodes takes O(log n) time for each of the n symbols.

5. Why does Huffman coding lead to variable-length encoding instead of fixed-length

encoding?
Answer:
Huffman coding assigns shorter codes to more frequent symbols and longer codes to less
frequent symbols, leading to variable-length encoding. This optimizes the overall code
length and reduces storage space, unlike fixed-length encoding, which assigns equal-length
codes to all symbols.

Numerical/Problem-Solving Questions
6. Construct a Huffman tree for the given symbols and probabilities.
Symbol A B C D E F
Probability 0.05 0.1 0.15 0.2 0.25 0.25
Solution:
• Build binary tree using the steps which we learnt in the class.
Final Huffman Codes (It may Vary):
• A → 000
• B → 001
• C → 010
• D → 011
• E → 10
• F → 11

7. Decode the following Huffman encoded message:

Encoded Message: "110011001010"
Assume the given Huffman table:
• A → 00
• B → 01
• C → 100
• D → 101
• E → 11
Solution:
• 11 → E
• 00 → A
• 11 → E
• 00 → A
• 10 → C
• 10 → C
Decoded message: "EAEACC"

8. Two Huffman codes H1 and H2 have the same average length, but H1 has higher variance.
Which one is better for transmission and why?
Answer:
H2 is better because lower variance means that the codeword lengths are more uniform,
leading to less fluctuation in transmission times and buffering. High variance can cause
jitter and inefficient resource allocation in real-time applications.

9. If all symbols have equal probabilities, what type of encoding is optimal? Is Huffman
coding still beneficial?
Answer:
When all symbols have equal probabilities, fixed-length encoding (e.g., ASCII, UTF-8) is
optimal. Huffman coding would assign the same-length codes in this case, making it no
more efficient than fixed-length encoding.

10. Modify Huffman coding to handle probabilities given as frequencies.

Answer:
If probabilities are given as frequencies:
1. Convert frequencies into probabilities by dividing each frequency by the total count.
2. Follow standard Huffman coding steps to construct the tree.
3. Assign binary codes as usual.
Example:
• Frequencies: {A: 5, B: 10, C: 15, D: 20}
• Probabilities: {A: 5/50, B: 10/50, C: 15/50, D: 20/50}
• Then apply the Huffman algorithm normally.

Summary Table
Question Key Takeaway
Lossless compression algorithm that assigns shorter
What is Huffman coding?
codes to frequent symbols.
Steps to build a Huffman tree? Merge smallest probabilities, assign 0/1 recursively.
Prefix-free codes? Yes, Huffman codes are always prefix-free.
Time complexity? O(n log n) using priority queues.
Question Key Takeaway
Huffman codes adapt to symbol frequency, unlike fixed-
Variable-length encoding?
length codes.

Decoding example? Decode using the given Huffman table.

High variance vs. low variance? Low variance is preferable for smoother transmission.

Equal probability symbols? Fixed-length encoding is optimal.

Using frequencies instead of

Convert to probabilities before applying Huffman coding.
probabilities?

To convert frequencies into probabilities:

Probability=Frequency/ Total Frequency
Total frequency = 5 + 10 + 15 + 20 + 50 = 100

Symbol A B C D E

Frequency 5 10 15 20 50

Probability 0.05 0.10 0.15 0.20 0.50

Questions:
Find Probability, Huffman code, Code- length

Probability = Frequency/ Total frequency

Avg length:
E[L]=∑(Pi×Li) ( ∑ means the sum of a set of terms)
TOPIC:
ADAPTIVE HUFFMAN CODING:
1. It is used when probability of the huffman coding is not known.
2. It is used for real-time data compression
3. Adaptive Huffman coding updates the tree dynamically as symbols are processed
4. Examples like audio, video streaming, and real-time communication.
5. 3 phases are included in Adaptive Huffman coding: Updating the tree, Encoding and
Decoding

Update process:
- NYT (Not yet transmitted)
- Encoding based on the tree which is we are going to create in Updating process

Steps:
1. At the first there will be no node, So the first we will took NYT by default.
2. The default value of NYT is 0
3. NYT always remains at the left child of the tree even when the tree start
growing in the next steps.
4. Initial weight for NYT is 51
5. To find the weight for symbol is:
Weight = (2*n)-1
// here n is the total number of alphabets i.e 26
So,
Weight = (2*26)-1
= 51
6. Whenever the Symbol is encountered for the first time , its weight will be 1
(means which we are going to write inside box)
7. If the same symbol is occurring again and again, the value of that symbol keep
increasing again and again.
8. Internal nodes represented with and external node
External nodes: which is on the outer side.
Internal nodes: which is on the inner side. (In short whose having child)
Encoding process:
Message: a a r d v a r k
Algorithm:
Read input symbol

First time appearance?

(yes) (No)

NYT code followed by Fixed Code Send code for that symbol

Call update procedure

NYT CODE and Send code : Reaching from Node to NYT and finding out the Binary code like
0101

1. Fixed Code:
2 parameter (e and r)

M = 2e + r , 0<= r <=2e ( value of r should lies between 0 and 2e , sp to

achieve value equivalent to 26, we have to assign value of e = 4 and r = 10 )
• M is the total number of alphabets present i.e 26
• Now we have to put value of e and r in such a manner, so that the sum is equivalent to
the value of M
• Now
26 = 24 + 10
I.e, value of e = 4 and r = 10

2. Now The letter (ak ) is encoded as:

Value of e = 4 and r = 10
• (e+1) bit binary representation of (k-1)
# case 1:
1 <= k <= 2r
Explanation: If my k belongs between 1 and 2 r, we have to represent in e+1 bit,
means e=4, so the value e+1 = 5 bits(means we got the number of bits to
represent that symbol), Now what I have to represent i.e k-1
The value of k is the number of the position of that symbol in the alphabets
series. ( k value of B is 2 )
# case 2:
k >= 2r
Explanation: e bit binary representation of (k-r-1) and the value of e is 4

Example 1:
B ( 1 <= 2 <= 20 ), [value of k is 2, as the position of B in alphabet series is 2]
* Means the condition is true, so we need total 5 bits to represent B symbol
* Now we have to represent binary code for the symbol B, using 5 bits, so to
represent it, we have case 1, i.e k-1 (value of k is 2)
Now,
k-1
2-1 = 1.
So binary code for B is 00001 (1)

Example 2:
Z (Value of k become 26)
▪ Means the second condition:
26 >= 20
Now,
k-r-1 (k = 26, r = 10)
26-10-1
16-1
15 (means we have to represent 15 in e bit i.e 4 bits, by converting 15 into
binary code)
To convert 15 into binary, we have to perform Decimal to Binary
conversion steps by taking LCM.
So binary code for z is 1111

DECODING:
NO
Is it NYT Code Decode element directly

YES

Read ‘e’ bits after NYT (Value of e is having 4 and value of r is 10)

NO
Is the bit value ‘p’ is < r? Add ‘r’ to ‘p’ (add r to that element which we got)

YES

Read one more bit Decode the (current value + 1)

Question: Decode the given string:
‘0000010100010000011000101110110001010’

Step 1: According to flow chart:

• Is it NYT code: Yes

• So we have to follow 2nd step ie. “Read e bits after NYT” , the value of e=4 means, we
have to take total 4 bits from the given string i.e 0000 (after NYT, so this is our
initialization phase that’s why we are taking first 4 zero’s)
• 0000 < r ? (Here the value of r is 10)
• Now the above condition is true so we read one more bit i.e 00000, now 00000 is 0
• So the last step 0+1 = 1 i.e a(In alphabet value of a is 1)
• And draw diagram along with the code.
• So first 5 0 we got a

Step 2: ‘0000010100010000011000101110110001010’
• Next value is 1
• So according to binary tree diagram, we are reaching from parent node which is
having value 1 to a and the bit we are using to reach to a is also 1
• Means node is not NYT
• Simply decode that element with the symbol mentioned according to the binary tree
• Second value also we got a

Step 3: ‘0000010100010000011000101110110001010’
• Next value is 0
• So 0 is the NYT node, as we are coming to NYT from parent node using 0 according to
diagram
• Read ‘e’ bits after NYT, means take 4 bits i.e 1000
How to convert?

• So 1000<10 (i.e 8<10?) read one more bit

• So now the bits become 10001 i.e 17
• Now according to flow chart, add +1 to 17 i.e 18 = r

Step 4: ‘0000010100010000011000101110110001010’
• Now using next two 0’s we are reaching to NYT and after 2 0’s we at end with not
further child node that’s why we are not taking 3rd 0
• Now we will read e bits after NYT(i.e after two 0’s)
• 0001 means this is also less then 10 (ie. 1<10) then read one more bit
• So now the bits become 00011 i.e 3
• Now according to flow chart we have to add +1 to 3 i.e 4 = d
• Now update your tree.

Continue this until you get the desired output………………

TOPIC
Golomb Code:
• Denotated by Gm(n)
• Used for loss less data compression
• Golomb coding is a lossless data compression method that is particularly efficient for
encoding sequences of integers with a geometric distribution (i.e., when smaller
numbers occur more frequently than larger ones).

Golomb coding represents a number using two parts:

1. Quotient part: The number is divided by a divisor (a parameter mmm), and the
quotient is encoded using Unary Coding.
2. Remainder part: The remainder is encoded in a fixed or variable-length binary
representation.
Algorithm:
Step 1:
Find Quotient:
• Find q=⌊n/m⌋ (Floor value of n/m) , m = devisor , n = Integer value, and convert it into
unary code
• Way to writing Unary code
o Suppose the value of q we got 3, means that much times you can write
▪ 111 followed by one 0’s
Or
▪ 000 followed by one 1’s
Find Remainder:
• r = n mod m
Step 2 (Different code of r):
• K=⌈ log2m ⌉ (ceiling of the base-2 logarithm of m)
• C = 2k-m
Now the conditions we have to find out number of bits required and what to represent in
those bits?
a. 0<=r<c
o What to represent = r
o In how many bits = k-1
b. r>=c
o What to represent = r + c
o In how many bits = k

Step 3:
• Concatenate result of step1 and step2

Example:
Question: Design golomb code for 9 with devisor 4
Answer:
n = 9, m = 4, G4(9) = ?
Lets start:
Step 1:
Quotient:
Unary code of q=⌊n/m⌋
q= ⌊9/4⌋, means we have to take only integer value after diving, not decimal value, so
q= 2 (now we have to take unary code followed by one 0)
q = 110

Remainder:
R = 9 mod 4 (means we have to take remainder part) = 1

Step 2:
K=⌈ log2m ⌉
K =⌈ log24⌉ means 2
• C = 2k-m
C = 22-4 = 0

So we got the case

r>=c
• What to represent = r + c
• In how many bits = k

r+c = 1+0 = 1
k=2
Means we have to represent 1 in 2 bits i.e 01

Step 3:
• Now we have to simple concatenate the result of step1 and step2
• Step1 = 110, step2 = 01
• 11001 is the result
So the golomb code of G4(9) = 11001

More questions:
1. Find the golomb code for n = 0,1,2,3…..15, where m = 5

So the value ceiling value of 5 is 3

TOPIC

Tunstall Code
• Tunstall Code is a type of variable-to-fixed-length coding used in data compression.
• Tunstall coding replaces variable-length sequences of input symbols with fixed-
length codewords.
N + k(N-1) <= 2n
(Number of iterations) that is used to find Tunstall code.
• n → Bits required for Tunstall code
• N → Size of source (Number of letters)
• k → Number of iterations
Example:
Letters Probability
a₁ 0.7
a₂ 0.2
a₃ 0.1
Given N = 3, we want to generate a 3-bit Tunstall code.
N + k(N-1) <= 2n
3+k(3-1) <=23
k<=2.5
Since k is a natural number, we take k = 2.

1st Iteration:
Find the highest probability which is a1 letter & remove it from the base. And multiply with the
all 3 letters
Letters Probability
a₂ 0.2
a₃ 0.1
a₁a₁ 0.7×0.7=0.49
a₁a₂ 0.7×0.2=0.14
a₁a₃ 0.7×0.1=0.07

2nd Iteration: Now this is our last iteration as we got 3 bits for symbols
- Remove the highest probability and multiply with the letters given in the questions,
means we have to multiply a₁a₁ with a₁ , a₂
Letters Codeword
a₂ 000 [0]
a₃ 001 [1]
a₁a₂ 010 [2]
a₁a₃ 011 [3]
a₁ a₁ a₁ 100 [4]
a₁ a₁a₂ 101 [5]
a₁ a₁a₃ 110 [6]

Question: Design a 3-bit Tunstall code for the following alphabet.

Letter Probability
A 0.6
B 0.3
C 0.1

→ Find the highest probability from Table-1 & remove the entry & concatenate with others.

Letters Probability
B 0.3
C 0.1
AA 0.6 × 0.6 = 0.36
AB 0.6 × 0.3 = 0.18
AC 0.6 × 0.1 = 0.06

→ Now again, find the highest probability & perform the same:

Letters Probability Codeword

B 0.3 000 [0]
C 0.1 001 [1]
AB 0.18 010 [2]
AC 0.06 011 [3]
AA A 0.36 x 0.6 = 0.216 100 [4]
AA B 0.36 × 0.3 = 0.108 101 [5]
AA C 0.36 × 0.1 = 0.036 110 [6]

We got 3 bit codeword so at this point we will stop iteration part.

Now find Average Bit:
Step 1:
(Diagram showing probability distribution)
• A → 0.6
• B → 0.3
• C → 0.1
TOPIC
Rice Code:
• Rice Code is a simple and efficient method for compressing numbers, especially when
small values are more common.
Q1: Encode the following sequence of 16 values using Rice code with J = 8 & one split sample
option.
Given Sequence:
38, 33, 35, 29, 37, 38, 39, 40, 40, 40, 39, 40, 40, 41, 40
Use the previous value as the prediction value & '0' for the first value.
Answer:

yi 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41 40
Ŷi 0 32 33 35 39 37 38 39 40 40 40 40 39 40 40 41
di 32 1 2 4 -2 1 1 1 0 0 0 -1 1 0 1 -1
Ti 0 9 8 6 2 4 3 2 1 1 1 1 2 1 1 0
xi 32 2 4 8 3 2 2 2 0 0 0 1 2 0 2 1

What is One Split Sample Option?

Example:
Consider the number 3.
If we want to represent 3 in 8 bits,
3=11
Unary encoding of this binary representation:
00000011
Write red mark 1 first as we have to perform one split sample, and rest code 0000001 convert
into decimal and write it’s unary code
0000001 = 1
So the unary code is 10
Answer is 110
Now also find Xi….
AKTU QUESTIONS BASED ON UNIT-2

1. Describe the Huffman Coding Algorithm and explain its optimality.

Answer:
Huffman coding is a greedy algorithm used for lossless data compression. It assigns
variable-length codes to input characters based on their frequencies.
Steps of Huffman Coding:
1. Create a priority queue of all characters, where the priority is their frequency.
2. Extract the two nodes with the smallest frequency and combine them into a new
node with a frequency equal to their sum.
3. Repeat this process until only one node remains (the root of the Huffman tree).
4. Assign ‘0’ to left edges and ‘1’ to right edges to generate Huffman codes.
Optimality:
• Huffman coding produces the most efficient prefix-free encoding for a given set of
character frequencies.
• It minimizes the average code length, ensuring no two characters share the same
prefix, making it uniquely decodable.

2. Given a set of symbols with their respective probabilities, construct the Huffman code
and calculate the average code length.
Answer:
Example: Given symbols A, B, C, D with frequencies (0.4, 0.3, 0.2, 0.1), construct Huffman
codes.
Symbol Probability Huffman Code
A 0.4 0
B 0.3 10
C 0.2 110
D 0.1 111
Average Code Length:
L=(0.4×1)+(0.3×2)+(0.2×3)+(0.1×3)=1.9 bits per

3. Explain the concept of Extended Huffman Codes. How do they differ from standard
Huffman Codes?
Answer:
• Extended Huffman Codes use blocks of symbols instead of single symbols for
encoding.
• If the original Huffman coding doesn't give sufficient compression, combining multiple
symbols into a single unit helps in further reducing the code length.
• Difference:
o Standard Huffman coding assigns a code to each individual character.
o Extended Huffman coding assigns a code to groups of characters (pairs,
triplets, etc.).

4. Discuss Adaptive Huffman Coding. How does it adjust to changing data characteristics?
Answer:
• Adaptive Huffman Coding dynamically updates the Huffman tree as new symbols are
encountered.
• Unlike static Huffman coding (where frequencies are predetermined), adaptive
coding does not require a frequency table beforehand.

5. What are Rice Codes and Golomb Codes? Provide examples of each.

6. Define Tunstall Codes and explain their application in data compression.

7. Compare and contrast Huffman Coding with Arithmetic Coding.

Answer:
Feature Huffman Coding Arithmetic Coding
Type Prefix code Fractional encoding
Efficiency Slightly less efficient for small data More efficient
Code Length Fixed or variable Variable

8. Determine whether the following code set is uniquely decodable: {0, 01, 10, 110}.
Answer:
• A set is uniquely decodable if no code word is a prefix of another.
• Here, 0 is a prefix of 01, making it not uniquely decodable.

10. Explain the concept of coding redundancy and its impact on compression efficiency.
Answer:
• Coding redundancy occurs when more bits than necessary are used to represent
data.
• Impact:
o Reduces compression efficiency.
o Huffman coding reduces redundancy by assigning shorter codes to frequent
symbols.
NUMERICAL QUESTIONS:

1. Huffman Coding Numerical

Given the following set of symbols and their probabilities, construct the Huffman tree and
determine the average code length.
Symbol Probability
A 0.3
B 0.25
C 0.2
D 0.15
E 0.1

2. Huffman Code Generation

Construct the Huffman coding for the following symbols and their frequencies:
Symbol Frequency
X 5
Y 9
Z 12
W 13
V 16
U 45
Compute the total number of bits required before and after compression if the original
symbols used a fixed-length 3-bit code.

3. Prefix Code Verification

Given the following set of codewords, determine whether they form a prefix code or not:
1. {00, 01, 10, 110}
2. {0, 01, 10, 110}
3. {1, 10, 101, 1011}
Justify your answer.

4. Adaptive Huffman Coding

A file contains a stream of characters with the following frequency distribution:
Character Frequency
A 20
B 5
C 10
D 15
Construct an Adaptive Huffman tree and encode the string "ABCDAB" using the dynamic
tree.

5. Arithmetic Coding vs Huffman Coding

Given a message M = "ABCA", with the probability distribution:
Symbol Probability
A 0.4
B 0.2
C 0.4
1. Construct a Huffman tree and determine the encoded message.
2. Perform Arithmetic Coding for the same message and obtain the final code.
Compare the lengths of Huffman and Arithmetic encoding.

6. Golomb Coding Numerical

A number X = 25 is to be encoded using Golomb Coding with divisor m = 4. Determine the
encoded bit sequence.

7. Rice Coding Numerical

Encode the integer N = 18 using Rice Coding with parameter k = 3.

8. Tunstall Coding
Construct a Tunstall Code for the following source probabilities using block size n = 3:
Symbol Probability
A 0.5
B 0.3
C 0.2
Find the generated codewords.

9. Entropy Calculation
For a source generating symbols X, Y, and Z with probabilities 0.5, 0.3, and 0.2, calculate:
1. Entropy of the source.
2. Minimum average bits required per symbol.
3. Efficiency of Huffman coding if the average code length obtained is 1.6 bits per
symbol.

10. Huffman Coding with Unequal Probabilities

Given a set of six symbols with the following probabilities:
Symbol Probability
S1 0.35
S2 0.10
S3 0.20
S4 0.15
S5 0.10
S6 0.10
Construct a Huffman tree and find the encoded bit sequence for the input string
"S1S3S2S6S4".

Huffman Coding Notes
No ratings yet
Huffman Coding Notes
7 pages
Huffman
No ratings yet
Huffman
11 pages
Huffman
No ratings yet
Huffman
13 pages
UNIT 2
No ratings yet
UNIT 2
82 pages
Huffman Coding: Greedy Algorithm
No ratings yet
Huffman Coding: Greedy Algorithm
27 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
12 - Huffman Coding Algorithm
No ratings yet
12 - Huffman Coding Algorithm
16 pages
Huffman Coding
No ratings yet
Huffman Coding
12 pages
Huff Man
No ratings yet
Huff Man
8 pages
Huffman Coding
No ratings yet
Huffman Coding
10 pages
Huffman_coding
No ratings yet
Huffman_coding
30 pages
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
No ratings yet
Data Structure: Huffman Tree:Project Submitted To: Sir Abdul Wahab
24 pages
2.3a Huffman Coding
No ratings yet
2.3a Huffman Coding
25 pages
Huffman Coding
No ratings yet
Huffman Coding
8 pages
Huffman Coding
No ratings yet
Huffman Coding
22 pages
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
No ratings yet
Huffman Encoding: WWW - Cis.Upenn - Edu/ Matuszek/Cit594-2002/SLIDES/HUFFMAN
13 pages
Steps of Huffman Encoding:: Calculate The Frequency of Each Character Build A Priority Queue Build A Binary Tree
No ratings yet
Steps of Huffman Encoding:: Calculate The Frequency of Each Character Build A Priority Queue Build A Binary Tree
1 page
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Imc14 03 Huffman Codes PDF
No ratings yet
Imc14 03 Huffman Codes PDF
31 pages
Huffman Code (Variable Length)
No ratings yet
Huffman Code (Variable Length)
19 pages
Huffman Code (Variable Length)
No ratings yet
Huffman Code (Variable Length)
19 pages
Mini Project 2
No ratings yet
Mini Project 2
4 pages
HuffmanCoding-2
No ratings yet
HuffmanCoding-2
16 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
Huffman Trees and Codes-v1
No ratings yet
Huffman Trees and Codes-v1
15 pages
Huffman Coding
No ratings yet
Huffman Coding
7 pages
Huffman Tree
No ratings yet
Huffman Tree
9 pages
Huffman Coding Ms 140400147 Sadia Yunas Butt
No ratings yet
Huffman Coding Ms 140400147 Sadia Yunas Butt
9 pages
Huffman Code
No ratings yet
Huffman Code
5 pages
Lec.4n - COMM 552 Information Theory and Coding
No ratings yet
Lec.4n - COMM 552 Information Theory and Coding
23 pages
Huffman Coding
No ratings yet
Huffman Coding
16 pages
Design Analysis Algorithm 3
No ratings yet
Design Analysis Algorithm 3
2 pages
Unite 4-Greedy Method - CSE
No ratings yet
Unite 4-Greedy Method - CSE
41 pages
Huffman Coding Algorithm
No ratings yet
Huffman Coding Algorithm
4 pages
Huffman
No ratings yet
Huffman
24 pages
5 Huffman Coding
No ratings yet
5 Huffman Coding
50 pages
Manual GRP A - Assignment 2 .Docx 1 1
No ratings yet
Manual GRP A - Assignment 2 .Docx 1 1
15 pages
Huffman
No ratings yet
Huffman
53 pages
Huffman Coding Assignment
50% (2)
Huffman Coding Assignment
7 pages
Huffman Coding Technique
No ratings yet
Huffman Coding Technique
13 pages
Mini Project
No ratings yet
Mini Project
26 pages
LP-III Assignment No 2
No ratings yet
LP-III Assignment No 2
16 pages
Problem E: Huffman Codes
No ratings yet
Problem E: Huffman Codes
2 pages
Huffman Code
No ratings yet
Huffman Code
51 pages
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
No ratings yet
Huffman Codes and Its Implementation: Submitted by Kesarwani Aashita Int. M.Sc. in Applied Mathematics (3 Year)
28 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Huffman Coding
No ratings yet
Huffman Coding
11 pages
Huffman Code
No ratings yet
Huffman Code
25 pages
Static Huffman Coding Term Paper
No ratings yet
Static Huffman Coding Term Paper
23 pages
Huffman Coding - Wikipedia
No ratings yet
Huffman Coding - Wikipedia
11 pages
Huffman Coding Scheme
No ratings yet
Huffman Coding Scheme
59 pages
Optimization Problems
No ratings yet
Optimization Problems
38 pages
Huffman Coding
No ratings yet
Huffman Coding
32 pages
Huffman Coding: Vida Movahedi
No ratings yet
Huffman Coding: Vida Movahedi
24 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
Algorithmics: Information Coding Techniques
No ratings yet
Algorithmics: Information Coding Techniques
44 pages
Learn C++
From Everand
Learn C++
Durgesh
4.5/5 (9)
Principles of Digital Electronics
From Everand
Principles of Digital Electronics
Sapana Rane
No ratings yet
Master Fundamental Concepts of Math Olympiad: Maths, #1
From Everand
Master Fundamental Concepts of Math Olympiad: Maths, #1
Subbalakshmi Devaki
No ratings yet
Chapter 7 Lossless Compression Algorithms
No ratings yet
Chapter 7 Lossless Compression Algorithms
25 pages
3F7 - FTR - Improving Arithmetic Codes
No ratings yet
3F7 - FTR - Improving Arithmetic Codes
12 pages
Algorithm 673 Dynamic Huffman Coding: 15, No. 2, June 1989, Pages 158-167
No ratings yet
Algorithm 673 Dynamic Huffman Coding: 15, No. 2, June 1989, Pages 158-167
10 pages
Adaptive Huffman Coding PDF
No ratings yet
Adaptive Huffman Coding PDF
7 pages
Lecture
No ratings yet
Lecture
75 pages
HoKo Compression in Graphics Pipeline PDF
No ratings yet
HoKo Compression in Graphics Pipeline PDF
8 pages
Adaptive Huffman Coding
No ratings yet
Adaptive Huffman Coding
13 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
Mad Unit 3-Jntuworld
No ratings yet
Mad Unit 3-Jntuworld
53 pages
Unit3 Ece Mmc 6th Sem (2)
No ratings yet
Unit3 Ece Mmc 6th Sem (2)
96 pages
Chapter 3 - Huffman Coding
No ratings yet
Chapter 3 - Huffman Coding
123 pages
Adaptiv Huffman Coding
No ratings yet
Adaptiv Huffman Coding
7 pages
Adaptive Huffman Coding: Fall 2006
No ratings yet
Adaptive Huffman Coding: Fall 2006
7 pages
Adaptive Huffman Coding
No ratings yet
Adaptive Huffman Coding
26 pages
Ch03 Huffman Coding
No ratings yet
Ch03 Huffman Coding
101 pages
Design and Analysis of Dynamic Huffman Codes: Jeffrey Scott Vitter
No ratings yet
Design and Analysis of Dynamic Huffman Codes: Jeffrey Scott Vitter
21 pages
Data Compression - Unit 2
No ratings yet
Data Compression - Unit 2
31 pages
Data Compression(term 2 Set A)
No ratings yet
Data Compression(term 2 Set A)
1 page

Data Compression - Unit 2

Uploaded by

Data Compression - Unit 2

Uploaded by

UNIT-2

Before jumping on this, what is encode and decode?

Steps involved in Huffman coding process:

A=3 1. First we have to create a frequency

Add all, you will get 27.

• To transmit the data

How to calculate decoding bits required?

1. Now arrange them in descending order

Now we have to find the Average length: (probability * total bits )

Probability 0.1 0.1 0.2 0.2 0.4

Short Answer Questions

2. Explain the process of constructing a Huffman tree with an example.

3. Can Huffman coding generate a prefix-free code? Justify your answer.

5. Why does Huffman coding lead to variable-length encoding instead of fixed-length

7. Decode the following Huffman encoded message:

10. Modify Huffman coding to handle probabilities given as frequencies.

Decoding example? Decode using the given Huffman table.

Equal probability symbols? Fixed-length encoding is optimal.

Using frequencies instead of

To convert frequencies into probabilities:

Probability 0.05 0.10 0.15 0.20 0.50

Probability = Frequency/ Total frequency

First time appearance?

Call update procedure

M = 2e + r , 0<= r <=2e ( value of r should lies between 0 and 2e , sp to

2. Now The letter (ak ) is encoded as:

Read one more bit Decode the (current value + 1)

Step 1: According to flow chart:

• Is it NYT code: Yes

• So 1000<10 (i.e 8<10?) read one more bit

Continue this until you get the desired output………………

Golomb coding represents a number using two parts:

So we got the case

So the value ceiling value of 5 is 3

Question: Design a 3-bit Tunstall code for the following alphabet.

Letters Probability Codeword

We got 3 bit codeword so at this point we will stop iteration part.

What is One Split Sample Option?

1. Describe the Huffman Coding Algorithm and explain its optimality.

6. Define Tunstall Codes and explain their application in data compression.

7. Compare and contrast Huffman Coding with Arithmetic Coding.

1. Huffman Coding Numerical

2. Huffman Code Generation

3. Prefix Code Verification

4. Adaptive Huffman Coding

5. Arithmetic Coding vs Huffman Coding

6. Golomb Coding Numerical

7. Rice Coding Numerical

10. Huffman Coding with Unequal Probabilities

You might also like