0% found this document useful (0 votes)

119 views13 pages

String Compression

This document discusses data compression using Huffman coding. It begins by explaining how the ASCII encoding scheme uses a fixed number of bits (eight) to encode each character, even though some characters are more common than others. Huffman coding is then introduced as a variable-length encoding scheme that assigns more common characters shorter codewords consisting of fewer bits. The document provides an example of building a Huffman tree from character frequencies and using it to encode a sample string into a bitstream. It explains that for the encoding to be unambiguous, the codewords must form a prefix code where no codeword is a prefix of another. The bulk of the document then focuses on Huffman's algorithm for constructing an optimal prefix code from character frequencies by

Uploaded by

vatarpl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

119 views13 pages

String Compression

Uploaded by

vatarpl

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

Stonehill College

Data Structures and Discrete Mathematics Learning Community

Professors Ralph Bravaco and Shai Simonson
Text Compression and Huffman Trees

Consider a simple message (or file) which is stored as a string of characters using the ASCII code. Each
character in the message requires eight bits of storage. For example character A, (ASCII code 65) would
be stored as 010000001, character B (ASCII 66) would bet stored as 010000010 etc. In fact, this very
paragraph would require 3472 bits of storage (434 characters and eight bits for each character).
Data compression is a technique which reduces the size of a file or data collection without affecting the
information contained in the file. A compressed file, obviously, requires less storage space. Moreover,
electronic transfer of a compressed file is faster than the transfer of an uncompressed file and a smaller file is
consequently less susceptible to transmission errors.
One of the simplest but efficient methods of data compression was introduced by David Huffman in his
paper ``A method for the construction of minimum redundancy codes.'' Huffmans method is very intuitive.
Unlike the ASCII code which uses exactly eight bits to encode each character, a Huffman code uses a
variable number of bits for each character. With Huffmans scheme, a common character such as a or e
might be encoded with perhaps just two bits and less frequently used characters like q or # would be
encoded with a longer bit string. Consequently, the most frequently occurring characters are encoded with a
minimal number of bits. (Morse code uses the same technique, E is encodes as a single dot, T just one
dash, but Q is dash-dash-dot-dash). Because ASCII encodes every character with eight bits we say that
ASCII code is a fixed length code. Huffmans code is a variable length code.
Well begin with a trivial example.
Suppose we wish to encode the string
SHE-SELLS-SEA-SHELLS
The ASCII code would require 160 bits (20 characters each requiring 8 bits).
Notice, however, that the characters which appear in the string have the following frequencies ( and relative
frequencies)
A
1
.05
H
2
.10
3
.15
E
4
.20
L
4
.20
S
6
.30
A possible variable length code which uses just 2 bits to encode frequently used characters like S or L
might be :
Character
Codeword
Frequency of Character
A
0000
1
H
0001
2
001
3
E
10
4
L
11
4
S
01
6
We will refer to this code as Code 1. We will also use the term codeword to denote an individual
character code. Thus, 0000 and 001 are codewords.
Notice that S which occurs 6 times (or 30% of the time) is encoded using just 2 bits and A which occurs
only once is encoded with 4 bits.
With this code, the string, SHE-SELLS-SEA-SHELLS, can be encoded with just 49 bits:
0100011000101101111010010110000000101000110111101

You may wonder why we didnt use a more efficient code like:
S
0
L
1
E
00
11
H
01
A
10
( Code 2)
Using Code 2 we can encode the string SHE-SELLS-SEA-SHELLS as
001001100011011000101101000110
which uses a mere 30 bits. Certainly, compressing the string to 30 bits is more efficient than storing it as 49
bits. So, why not use Code 2? It does the job with fewer bits.
To answer this question, consider just the simple substring, SHE. Using the ASCII coding scheme,
SHE is encoded as
010100110100100001000101
Now, if we wish to decode (or translate this bit string back to ordinary, readable text), we simply look at
each successive group of exactly 8 bits:
01010011 - (83) S
01001000 - (72) H
01000101 - (69) E
Thats all there is to it.
Now, lets encode SHE using our more efficient Code 2:
00100.
Wow! Just 5 bits!
Now, given this five bit string, how do we decode it? Do we decode 00100 as EAS (00 10 0) or as
SSLE ( 0 0 1 00 ) or perhaps as SSLSS (0 0 1 0 0) ? You really cannot tell without putting some
marker between the codewords for each letter.
On the other hand, using Code 1 we encode SHE as
01000110.
To decode this bit string , simply scan the string from left to right until you find a complete codeword. You
have just decoded the first character. Continue the process until you have scanned the entire bit string. You
will see there is no ambiguity.
01 0001 10
S
H
E
Exercise 1.
a. Decode each of the following bit strings which are encoded with Code 1:
01000011001011011110100110101101
00010000110010001000001001000110101101
b. Now, using the following bit string, show that Code 2 is ambiguous. In other words, find two possible
translations of the following bit string:

01011100011011001000
c. Explain why Code 1 is not ambiguous. What property does Code 1 have which ensures that each bit string
has a unique translation?
Prefix Codes
For arbitrary variable length codes, we may not be able determine the end of one codeword . For example,
using Code 2, do we interpret the bit string 0101 as 0.10.1 (SAL) or 01.01 (HH) or 0.1.0.1 (SLSL)? Is the
initial letter encoded as 0 (S) or 01 (H)? Without some kind of marker between codewords, it is impossible
to determine where each codeword ends. Code 1 however, does not have this drawback. The bit string
00010000011 can have only one interpretation:
0001.0000.001.
Lets look at this string. As we scan from left to right we see:
0 is not a valid code. Continue scanning
00 is not a valid code. Continue
000 is not a valid code. Continue
0001 is the code for H. Done.
OK, we have unearthed an H. But how do we know that there is no code that starts with 0001 ? How do
we know that 00011 is not the code for some other letter?
If you look closely at the codewords of Code 1, you will notice that no codeword is a prefix of another
codeword. The codeword for H is 0001. No other codeword starts with 0001. The codeword for S is
01. Nothing else begins with 01. Thus, using a left-right scan, once we read 0001 we recognize the
codeword for H. Further, we know that 0001 is not the start of the codeword for some other character,
since no other character code begins with 0001.
Definition.
A code is a prefix code ( or has the prefix property) if no codeword is the prefix of another
codeword.
Again, Code 1 has the prefix property but Code 2 does not. The codes which we will develop will have the
prefix property.
A Huffman code turns out to be an optimal prefix code. We will be more precise about the meaning of
optimal a bit later. First, lets see how the Huffman code is constructed.
Once we look at the mechanics, we will take a closer look underneath the hood.
Codes and Binary Trees:
Lets look at Code 1 again:
Character
Codeword
A
0000
1
H
0001
2
001
3
E
10
4
L
11
4
S
01
6
20

Frequency

Relative Frequency
.05
.10
.15
.20
.20
.30
1.00

We can represent this code with a binary tree. On such a tree,

each left branch is labeled 0,
each right branch is labeled 1
the encoded characters are placed in the leaves

Notice that the concatenation of the branch labels on the path from the root to any leaf gives the code word
for the symbol stored in the leaf. For example, the path from the root to the leaf labeled H is left-left-leftright and the codeword for H is 0001. The path to L is
right-right , so the codeword for L is 11.
Huffmans Algorithm constructs such a binary tree in a bottom up manner. The Huffman tree yields an
optimal prefix code for the symbols contained in the leaves.
Building the Huffman Tree.
Given the frequencies of each character in a file, the Huffman tree can be constructed as follows:
For each symbol to be encoded, make the binary tree consisting of a single root. Each of these root nodes
should hold the frequency (or relative frequency) of one symbol. (Ultimately, these nodes will be the leaves
of our Huffman tree)

2. Choose the two trees with the smallest root values and merge these into a new tree, as in the following
diagram. Here the trees labeled A and H are chosen. The root of the new tree holds the value 3, obtained as
the sum of the values stored in the roots of its left and right subtrees.

3. Continue the process, always forming a new tree from the two trees with the lowest root values. Here we
merge the two trees each with 3 stored in the root are Four trees remain with root values 6,4,4,and 6.

4. Continue this process, only a single tree remains. This final tree is a representation of the Huffman code.
Remember always merge the two subtrees which have the smallest root values

Now merge the two trees with root values 6. Two trees remain

Here is the final tree, with each branch labeled either 0 or 1.

The following is a general algorithm:

For each character c to be encoded
create a binary tree, consisting of a single root node r such that content (r) = f(c), the frequency of c
for i = 1 to n-1
{
From the set of binary trees ,S, remove the two trees, x and y,
with minimal values, f1 and f2 in the root
create a tree with root z such that
x is the left child of z
y is the right child of z
content(z) = f1 + f2
add this new tree with root z to S.
}

Using the Huffman tree to determine the codewords.

Once the Huffman tree is created, it is easy to determine the codeword for each character.
Method1: Bottom Up
As weve seen, each leaf represents one character. If we begin at a leaf and proceed up ttree to the root, we
can trace out (in reverse) the codeword for a given symbol:
For each leaf a
while a is not the root of the tree
if a is a left subtree
output 0
else
output 1
a = parent(a)
Notice that the algorithm will output codewards in reverse order.
Example:

Let a be leaf .

a is a right subtree; output 1

a = parent(a)

a is a left subtree; output 0

a = parent(a)

a is a left subtree; output 0

a = parent(a)

a is the root; done

Output: 100; codeword for is 001

Method 2 : Topdown
An ordinary, recursive tree traversal with an additional parameter can also be used to determine the
codewords:
void traverse(string s, tree root)
Each time that the function is called recursively, pass it string s concatenated with either 0 or 1. When
visiting a leaf, simply output string s s is the codeword for character associated with the leaf.
Encoding the file:
Simply translate each character using a table of codewords:
Example:
Character
A
H
E
L
S

Codeword
0000
0001
001
10
11
01

Translate: SHE-SELLS.......
010001100010110111101

S H E

SE LLS

Decoding a compressed file:

When decoding a compressed file, you will once again use the Huffman tree. Consequently, the frequency
table must once again be available. When the tree is constructed, the text characters are stored in the leaves.
Each codeword, determines a unique path from the root of the tree to to the leaf holding the translation of the
code word. Once we reach a leaf, we begin the decoding process again at the root.
Example: Decode : 0100011.....
Begin at the root:

0100011

move left

0100011 move right

A leafS (01 is the codeword for S)

Go back to the root and continue the process.

0100011 etc.
Exercise 2:
a. Construct a Huffman tree for the alphabet A B C D E F where the relative frequencies of each character
are:
A
B
C
D
E
F

.22
.13
.33
.10
.20
.02

b. Use the tree to determine the code word for each of the character of this short alphabet
c. Encode the string AECBCAF
Exercise 3:
In terms of n, the number of characters to be encoded, determine the running time of Huffmans algorithm.
When the set of trees is stored as a linked list
When the set of trees is implemented as a priority queue keyed on the frequency of a character. Assume the
priority queue is implemented as a heap.
Exercise 4. Encoding a text file.
Write a program which will encode a text file using the Huffman algorithm.
The user interface should be very simple. Just provide the user with two prompts:
Input file:
Output file:
so that the user may supply the name of the input (text) file as well as a name for the compressed file. Your
output should actually be two files : the compressed file, and also a file with the frequency table . This
second file should have the same name as the compressed file and an extention of feq. For example if the
compressed file is called something.zip then the second file should be something.feq
Note:
Although a "real" Huffman tree deals with bits, for convenience, you will use the characters 0 and 1.
Thus, the number of 0s and 1s in your encoded file will represent the number of bits necessary for
compression.
To encode text using a Huffman code you must
First build a frequency chart for all characters in your text
Build a Huffman tree
Make a list of all codewords
Using the list of codewords, encode your text.
Here are a few suggestions to help you with the implementation.
Since ASCII values fall in the range 0..127,
use a simple array (int freq[128])to store frequencies
For example, if A has frequency 13, then freq[65] will have the value 12. Of course, not all ASCII
characters will appear in your text, so many entries in the frequency array will be 0.
When you build the Huffman tree, you must make a number of design decisions.

As you know, the algorithm builds the tree incrementally. At each step, the subtrees with minimum roots
must be chosen. The most efficient method for determining the these two minimum valued trees would be to
use a heap based priority queue. Unless you are really stuck, use a priority queue.
If you plan to build your table of codewords in a bottom up fashion, you will need to maintain a table which
holds the address of each leaf on the tree. Also, each node must contain the address of its parent
Simply keep the codewords in an array index from 0 to 127 so that you may access the a characters
codeword directly i.e. without a search. Output your frequency table to a file simply list the 128
frequencies in the array.
Just use the table from step three.
Exercise 5: Decoding a compressed file.
Write a program which will decode a compressed file. Your program should prompt the user for two
filenames:
Compressed file:
Text file:
If the name of the compressed file is something.zip, your program should use that file and also the file
something.feq as input.
Exercise 6:
Use the programs in the previous exercises to compress( and decode)
a. The Declaration of Independence
b. The US Constitution
c. Shakespeares Hamlet
( The text of these files can be downloaded at .....)
Use Winzip to compress each of the files in part 1.
Compete the following chart:
SIZE IN BYTES

Text file

winzip

Huffman

Declaration of
Independence
Constitution
Hamlet
Remember: You are using the characters 0 and 1 in your program. These are not single bits. Make sure
you take that into account. You do not need to write any programs to find these values. You can get all this
information via Windows and Word.
Huffman Codesmathematical questions.
Now that we have looked at the Huffman algorithm, lets take a closer look at some of the mathematical
properties of a Huffman tree.
Prefix property:
Excercise 6: We have stated that the Huffmans Algorithm produces a code with the prefix property. Explain
why this is the case. You do not have to give a rigorous mathematical proofjust think about how the tree is
constructed and give a plausible explanation. A few sentences will suffice.

Optimal codes.
We have stated the the Huffman code is optimal so perhaps it is now time to give a more precise definition
of this term.
We first begin by defining the cost of the code or equivalently the tree.
Defininition.
Suppose T is a tree representing a prefix code for a file. The cost of T, C(T), is the number of bits necessary
to encode the file. Thus
C(T) = S frequency(c)*length(c), for each character c
where length(c) is the length of the path from the root of T to the leaf representing c.
Note: length(c) is also the number of bits in the codeword for character c.
Example:
Consider the following Huffman tree, T which we built to encode the message SHE-SELLS-SEASHELLS

C(T) = frequency(A)length(A)+ frequency(H)length(H) +frequency(-)length(-)+

frequency(S)length(S)+ frequency(E)length(E)+ frequency(L)length(L) =
1*4 + 2*4+3*3 + 6*2 + 4*2 + 4*2 = 4 + 8 + 9 + 12 + 8 +8 = 49
Definition.
Suppose T is a tree representing a prefix code for a file. The code is optimal if C(T) is minimal.
Note an optimal code is not unique for a given file more than one optimal code may exist. (Trivially, just
change all 0s to 1s and 1s to 0s in any optimal code and you have another optimal code).
With a precise definition of optimality in hand, we will prove (in the next several exercises) that a Huffman
tree is optimal.
Excersise 7:
A full binary tree is a binary tree in which ever nonleaf node has two children.
Notice that the Huffman trees in our examples have been full trees.
Prove: An optimal prefix code for a file is always represented by a full binary tree
Hint: Suppose there is a non-full tree that did represent an optimal code. Find a contradiction by producing
a better code.

Exercise 8:
Show by means of a counter example that a full binary tree may represent a prefix code which is not optimal,
even if the letters with lower frequencies have the longer codewords.
Suggestion: Build another full tree for SHE-SELLS-SEA-SHELLS which has a higher cost than the tree of
our previous examples.
Exercise 9: Let x and y be two characters with the lowest frequencies. (Arbitrarily, assume frequency(x) <=
frequency(y)). Prove that there is an optimal tree where x and y are siblings at the maximum depth of the
tree.
Hint:
Assume that you have an optimal prefix tree, T,h that a and b are at the maximum depth.
(You can assume frequency (a) <= frequency(b)). Here is a partial view:

Now, because x and y have the two smallest frequencies freqency(x) <= frequency(a) and frequency(y) <=
frequency(b). Also, length(a) >= length(x) and length(b) >= length(y).
Now switch a and x, resulting in a new tree T. Show C(T) <= C(T)?. Now, in T, switch y and b. Show
C(T) <= C(T)?
Exercise 10:
Prove that Huffmans algorithm produces an optimal prefix code tree.
Hint: Use mathematical induction on n, the number of characters/leaves.
For n = 1, Huffmans algorithm produces a tree with one node, obviously optimal.
Now assume that for n leaves, Huffmans algorithm produces an optimal tree.
Show that Huffmans algorithm produces an optimal tree for n+1 leaves.
Let Tn+1 be an optimal tree with n+1 leaves. We may assume that in an optimal tree the two characters, say x
and y, with the lowest frequencies are siblings at the lowest level of the tree. (Why may we assume that?).
Now remove x and y and replace them with a new character z with frequency(z) = frequency(x) +
frequency(y). Call the new tree Tn. Let Tn be the Huffman tree constructed from these n characters Now
finish the proof.
Exercise 11:
Suppose that the frequencies for the characters a,b,c,d,e,f,,g, and h are the first 8 Fibonacci numbers
1,1,2,3,5,8,13,and 21.
Construct the Huffman tree.
What is the cost of the tree?
Now, determine the cost of the tree for n characters when the frequencies are the first n Fibonacci numbers.

PSYCH 1XX3 - Development 1 Module Notes
No ratings yet
PSYCH 1XX3 - Development 1 Module Notes
3 pages
Pharmacology Oral Exam Questions
50% (2)
Pharmacology Oral Exam Questions
8 pages
Huffman Coding
No ratings yet
Huffman Coding
3 pages
L10 Huffman Encoding Greedy
No ratings yet
L10 Huffman Encoding Greedy
52 pages
Di-Huffman Trees
No ratings yet
Di-Huffman Trees
44 pages
Graph Theory - Important Application of Trees Huffman Coding
No ratings yet
Graph Theory - Important Application of Trees Huffman Coding
50 pages
Huffman Code
No ratings yet
Huffman Code
47 pages
Mmis G1 Ass
No ratings yet
Mmis G1 Ass
13 pages
2. Coding Theory
No ratings yet
2. Coding Theory
49 pages
Compress: Input
No ratings yet
Compress: Input
2 pages
2.7. Huffman Cod
No ratings yet
2.7. Huffman Cod
12 pages
Huffman
No ratings yet
Huffman
24 pages
umair week 7
No ratings yet
umair week 7
9 pages
Huffman Code
No ratings yet
Huffman Code
25 pages
Entropy: A 00 A 01 A 10 A 11
No ratings yet
Entropy: A 00 A 01 A 10 A 11
22 pages
Huffman Coding
No ratings yet
Huffman Coding
9 pages
11 Huffman Coding
No ratings yet
11 Huffman Coding
25 pages
Notes 07 Compression PDF
No ratings yet
Notes 07 Compression PDF
193 pages
Ultimedia OF ATA Ompression: IS502:M D I S
No ratings yet
Ultimedia OF ATA Ompression: IS502:M D I S
29 pages
Group Assignment Multimedia System
No ratings yet
Group Assignment Multimedia System
26 pages
Module IV
No ratings yet
Module IV
37 pages
Huffman
No ratings yet
Huffman
17 pages
Huff Man
No ratings yet
Huff Man
12 pages
Computer System Architecture: (Third Edition)
No ratings yet
Computer System Architecture: (Third Edition)
25 pages
KMA SS05 Kap03 Compression
No ratings yet
KMA SS05 Kap03 Compression
54 pages
2.2.5Huffman
No ratings yet
2.2.5Huffman
52 pages
210 Huffman Encoding
No ratings yet
210 Huffman Encoding
10 pages
Huffman Coding
No ratings yet
Huffman Coding
77 pages
Huffman Coding
No ratings yet
Huffman Coding
65 pages
Huffman Coding
No ratings yet
Huffman Coding
40 pages
What Is Huffman Coding and Its History
No ratings yet
What Is Huffman Coding and Its History
5 pages
codes
No ratings yet
codes
16 pages
Representing Text With Binary Numbers: Mr. Mcbrien
No ratings yet
Representing Text With Binary Numbers: Mr. Mcbrien
23 pages
Huffman
No ratings yet
Huffman
70 pages
Huffman Encoding: Farhad Muhammad Riaz
No ratings yet
Huffman Encoding: Farhad Muhammad Riaz
17 pages
Assignment 2 – text compression
No ratings yet
Assignment 2 – text compression
5 pages
Huffman Code
No ratings yet
Huffman Code
29 pages
Chapter Five Data Compression
No ratings yet
Chapter Five Data Compression
28 pages
Mini Project 2
No ratings yet
Mini Project 2
4 pages
236-7
No ratings yet
236-7
116 pages
Data Compression
No ratings yet
Data Compression
35 pages
Hauffman Coading
No ratings yet
Hauffman Coading
6 pages
Huffman
No ratings yet
Huffman
53 pages
Huffman_coding
No ratings yet
Huffman_coding
30 pages
Lecture 04 Ascii vs Unicode
No ratings yet
Lecture 04 Ascii vs Unicode
23 pages
Huffman Coding
No ratings yet
Huffman Coding
16 pages
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
No ratings yet
Ec8093-Digital Image Processing: Dr.K.Kalaivani Associate Professor Dept. of EIE Easwari Engineering College
37 pages
Discrete Mathematics
No ratings yet
Discrete Mathematics
51 pages
Lecture35-37 SourceCoding
No ratings yet
Lecture35-37 SourceCoding
20 pages
ch3 Part1
No ratings yet
ch3 Part1
7 pages
Huffman Codes
No ratings yet
Huffman Codes
12 pages
Character Sets KS4 Presentation
No ratings yet
Character Sets KS4 Presentation
16 pages
Digital Systems I EC 381: Mohamed Elgalhud
No ratings yet
Digital Systems I EC 381: Mohamed Elgalhud
22 pages
Coding Systems Student Notes
No ratings yet
Coding Systems Student Notes
3 pages
COSC_105_LEC_III-1[1]
No ratings yet
COSC_105_LEC_III-1[1]
9 pages
Static Huffman Coding Term Paper
No ratings yet
Static Huffman Coding Term Paper
23 pages
Huffman Example
No ratings yet
Huffman Example
3 pages
Hamming Code Cleaned (Autosaved)
No ratings yet
Hamming Code Cleaned (Autosaved)
7 pages
SuperBASIC: The Manual
From Everand
SuperBASIC: The Manual
Jerry Stratton
No ratings yet
People of Africa
From Everand
People of Africa
Edith A. How
No ratings yet
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Coding Interview Questions and Answers
From Everand
Coding Interview Questions and Answers
Chinmoy Mukherjee
No ratings yet
AR-109 - 191030 - Rev.9 - 2ND-3RD FLOOR Reflected Ceiling Plan
No ratings yet
AR-109 - 191030 - Rev.9 - 2ND-3RD FLOOR Reflected Ceiling Plan
1 page
Custom Number Formats Tutorial
No ratings yet
Custom Number Formats Tutorial
2 pages
Bayani Han
No ratings yet
Bayani Han
2 pages
Full Download Cambridge International AS A Level Psychology Revision Guide 2nd edition David Clarke PDF DOCX
100% (1)
Full Download Cambridge International AS A Level Psychology Revision Guide 2nd edition David Clarke PDF DOCX
24 pages
OS Modules Summary
No ratings yet
OS Modules Summary
6 pages
Wayne A. Thorp - Analyzing Supply & Demand Using Point & Figure Charts
100% (1)
Wayne A. Thorp - Analyzing Supply & Demand Using Point & Figure Charts
5 pages
Mindmasters The Datadriven Science Of Predicting And Changing Human Behavior Sandra Matz instant download
No ratings yet
Mindmasters The Datadriven Science Of Predicting And Changing Human Behavior Sandra Matz instant download
30 pages
_Weekly_ 1-7 March 2025 CA Content - English
No ratings yet
_Weekly_ 1-7 March 2025 CA Content - English
26 pages
Abbas Resume 17july
No ratings yet
Abbas Resume 17july
3 pages
BÀI TẬP 1 - TPR and TBL
50% (2)
BÀI TẬP 1 - TPR and TBL
11 pages
Grace and Conformity: The Reformed Conformist Tradition and the Early Stuart Church of England (Oxford Studies in Historical Theology) Stephen Hampton All Chapters Instant Download
100% (4)
Grace and Conformity: The Reformed Conformist Tradition and the Early Stuart Church of England (Oxford Studies in Historical Theology) Stephen Hampton All Chapters Instant Download
66 pages
Andreadakis and Prince Owusu-Wiredu_Fashion Footprint_How Clothes Are Destroying Our Planet and the Growing Impacts of Fast Fashion
No ratings yet
Andreadakis and Prince Owusu-Wiredu_Fashion Footprint_How Clothes Are Destroying Our Planet and the Growing Impacts of Fast Fashion
324 pages
555 Timer Design
No ratings yet
555 Timer Design
8 pages
XC Instruction Manual
No ratings yet
XC Instruction Manual
415 pages
Beauty & Wellness
No ratings yet
Beauty & Wellness
9 pages
CE GATE 2022 Forenoon Solution
No ratings yet
CE GATE 2022 Forenoon Solution
30 pages
Ohio Prison COVID Testing Plan
No ratings yet
Ohio Prison COVID Testing Plan
4 pages
Sermons by Gbile Akanni
No ratings yet
Sermons by Gbile Akanni
6 pages
ANS KEY TO MOR MATHS CLASS 8 PA 2 EXAM 2024-25
No ratings yet
ANS KEY TO MOR MATHS CLASS 8 PA 2 EXAM 2024-25
9 pages
Encyclopedia of Knowledge Management David Schwartz download
No ratings yet
Encyclopedia of Knowledge Management David Schwartz download
53 pages
Dekko Knitwears LT
No ratings yet
Dekko Knitwears LT
3 pages
Currentissue
No ratings yet
Currentissue
306 pages
Understanding Balance Scales
No ratings yet
Understanding Balance Scales
1 page
Upstream Intermediate b2 Students Book (2)
No ratings yet
Upstream Intermediate b2 Students Book (2)
2 pages
Research Paper On Nursing
No ratings yet
Research Paper On Nursing
4 pages
Chapter 17 - BREATHING AND EXCHANGE OF GASES
No ratings yet
Chapter 17 - BREATHING AND EXCHANGE OF GASES
7 pages
Strata - Manual Rev 399 PDF
No ratings yet
Strata - Manual Rev 399 PDF
103 pages
Eo M331.01
No ratings yet
Eo M331.01
30 pages

String Compression

Uploaded by

String Compression

Uploaded by

Stonehill College

Data Structures and Discrete Mathematics Learning Community

We can represent this code with a binary tree. On such a tree,

Here is the final tree, with each branch labeled either 0 or 1.

The following is a general algorithm:

Using the Huffman tree to determine the codewords.

a is a right subtree; output 1

a is a left subtree; output 0

a is a left subtree; output 0

a is the root; done

Output: 100; codeword for is 001

Decoding a compressed file:

0100011 move right

A leafS (01 is the codeword for S)

Go back to the root and continue the process.

C(T) = frequency(A)length(A)+ frequency(H)length(H) +frequency(-)length(-)+

You might also like