Data Compression Project-Huffman Algorithm
Data Compression Project-Huffman Algorithm
Mini Project Report Submitted to DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING By Samir Sheri Satvik N In partial fullment of the requirements for the award of the degree
BACHELOR OF ENGINEERING
IN COMPUTER SCIENCE AND ENGINEERING
R V College of Engineering
(Autonomous Institute, Aliated to VTU) BANGALORE - 560059
May 2012
DECLARATION
We, Samir Sheri and Satvik N bearing USN number 1RV09CS093 and 1RV09CS095 respectively, hereby declare that the dissertation entitled Data Compression Project completed and written by us, has not been previously formed the basis for the award of any degree or diploma or certicate of any other University.
Bangalore
R V COLLEGE OF ENGINEERING
(Autonomous Institute Aliated to VTU) DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
CERTIFICATE This is to certify that the dissertation entitled, Data Compression Project, which is being submitted herewith for the award of B.E is the result of the work completed by Samir Sheri and Satvik N under my supervision and guidance.
2:
ACKNOWLEDGEMENT
The euphoria and satisfaction of the completion of the project will be incomplete without thanking the persons responsible for this venture. We acknowledge RVCE (Autonomous under VTU) for providing an opportunity to create a mini-project in the 5th semester. We express our gratitude towards Prof. B.S. Satyanarayana, principal, R.V.C.E for constant encouragement and facilitates extended in completion of this project. We would like to thank Prof. N.K.Srinath, HOD, CSE Dept. for providing excellent lab facilites for the completion of the project. We would personally like to thank our project guides Chaitra B.H. and Suma B. and also the lab in charge, for providing timely assistance & guidance at the time. We are indebted to the co-operation given by the lab administrators and lab assistants, who have played a major role in bringing out the mini-project in the present form. Bangalore Samir Sheri 6th semester, CSE USN:1RV09CS093 Satvik N 6th semester, CSE USN:1RV09CS095
ABSTRACT
The Project Data Compression Techniques is aimed at developing programs that transform a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. The design of data compression schemes involve trade-o s among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data. Many data processing applications require storage of large volumes of data, and the number of such applications is constantly increasing as the use of computers extends to new disciplines. Compressing data to be stored or transmitted reduces storage and/or communication costs. When the amount of data to be transmitted is reduced, the eect is that of increasing the capacity of the communication channel. Similarly, compressing a le to half of its original size is equivalent to doubling the capacity of the storage medium. It may then become feasible to store the data at a higher, thus faster, level of the storage hierarchy and reduce the load on the input/output channels of the computer system.
ii
Contents
ACKNOWLEDGEMENT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 INTRODUCTION 1.1 SCOPE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i ii ii 1 1 3 4 4 5 6 6 6 8 8 9 9 10 11 12 14
2 REQUIREMENT SPECIFICATION 3 Compression 3.1 3.2 3.3 3.4 A Naive Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The Basic Idea . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Building the Human Tree . . . . . . . . . . . . . . . . . . . . . . . . . . An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.4.1 3.4.2 3.4.3 An Example: go go gophers . . . . . . . . . . . . . . . . . . . . Example Encoding Table . . . . . . . . . . . . . . . . . . . . . . . Encoded String . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Decompression 4.1 4.2 4.3 Storing the Human Tree . . . . . . . . . . . . . . . . . . . . . . . . . . Creating the Human Table . . . . . . . . . . . . . . . . . . . . . . . . . Storing Sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
APPENDICES15
iv
Chapter 1 INTRODUCTION
The Project Data Compression Techniques is aimed at developing programs that transform a string of characters in some representation (such as ASCII) into a new string (of bits, for example) which contains the same information but whose length is as small as possible. Compression is useful because it helps reduce the consumption of resources such as data space or transmission capacity. The design of data compression schemes involve trade-os among various factors, including the degree of compression, the amount of distortion introduced (e.g., when using lossy data compression), and the computational resources required to compress and uncompress the data.
1.1
SCOPE
The data compression techniques nd applications in almost all elds. To list a few, Audio data compression reduces the transmission bandwidth and storage requirements of audio data. Audio compression algorithms are implemented in software as audio codecs. Lossy audio compression algorithms provide higher compression at the cost of delity, are used in numerous audio applications. These algorithms almost all rely on psychoacoustics to eliminate less audible or meaningful sounds, thereby reducing the space required to store or transmit them. Video
Video compression uses modern coding techniques to reduce redundancy in video data. Most video compression algorithms and codecs combine spatial image compression and temporal motion compensation. Video compression is a practical implementation of source coding in information theory. In practice most video codecs also use audio compression techniques in parallel to compress the separate, but combined data streams. Grammar-Based Codes They can extremely compress highly-repetitive text, for instance, biological data collection of same or related species, huge versioned document collection, internet archives, etc. The basic task of grammar-based codes is constructing a contextfree grammar deriving a single string. Sequitur and Re-Pair are practical grammar compression algorithms which public codes are available.
Chapter 3 Compression
Well look at how the string go go gophers is encoded in ASCII, how we might save bits using a simpler coding scheme, and how Human coding is used to compress the data resulting in still more savings.
3.1
A Naive Approach
With an ASCII encoding (8 bits per character) the 13 character string go go gophers requires 104 bits. The table below on the left shows how the coding works. 4
Compression
The string go go gophers would be written (coded numerically) as 103 111 32 103 111 32 103 111 112 104 101 114 115. Although not easily readable by humans, this would be written as the following stream of bits (the spaces would not be written, just the 0s and 1s) 1100111 1101111 1100000 1100111 1101111 1000000 1100111 1101111 1110000 1101000 1100101 1110010 1110011 Since there are only eight dierent characters in go go gophers, its possible to use only 3 bits to encode the dierent characters. We might, for example, use the encoding in the table on the right above, though other 3-bit encodings are possible. Now the string go go gophers would be encoded as 0 1 7 0 1 7 0 1 2 3 4 5 6 or, as bits: 000 001 111 000 001 111 000 001 010 011 100 101 110 111 By using three bits per character, the string go go gophers uses a total of 39 bits instead of 104 bits. More bits can be saved if we use fewer than three bits to encode characters like g, o, and space that occur frequently and more than three bits to encode characters like e, p, h, r, and s that occur less frequently in go go gophers.
3.2
This is the basic idea behind Human coding: to use fewer bits for more frequently occurring characters. Well see how this is done using a tree that stores characters at the leaves, and whose root-to-leaf paths provide the bit sequence used to encode the characters. Well use Humans algorithm to construct a tree that is used for data compression. Well assume that each character has an associated weight equal to the number of times the character occurs in a le, for example. In the go go gophers example, the characters g and o have weight 3, the space has weight 2, and the other characters have weight 1. When compressing a le well need to calculate these weights, well ignore this step for now and assume that all character weights have been calculated.
Compression
3.3
Humans algorithm assumes that were building a single tree from a group (or forest) of trees. Initially, all the trees have a single node with a character and the characters weight. Trees are combined by picking two trees, and making a new tree from the two trees. This decreases the number of trees by one at each step since two trees are combined into one tree. The algorithm is as follows: Begin with a forest of trees. All trees are one node, with the weight of the tree equal to the weight of the character in the node. Characters that occur most frequently have the highest weights. Characters that occur least frequently have the smallest weights. Repeat this step until there is only one tree: Choose two trees with the smallest weights, call these trees T1 and T2. Create a new tree whose root has a weight equal to the sum of the weights T1 + T2 and whose left subtree is T1 and whose right subtree is T2. The single tree left after the previous step is an optimal encoding tree..
3.4
3.4.1
An Example
An Example: go go gophers
Well use the string go go gophers as an example. Initially we have the forest shown below. The nodes are shown with a weight/count that represents the number of times the nodes character occurs.
Compression
Decompression
3.4.2
The character encoding induced by the last tree is shown below where again, 0 is used for left edges and 1 for right edges.
3.4.3
Encoded String
The string go go gophers would be encoded as shown (with spaces used for easier reading, the spaces wouldnt appear in the real encoding). 00 01 100 00 01 100 00 01 1110 1101 101 1111 1100 Once again, 37 bits are used to encode go go gophers. There are several trees that yield an optimal 37-bit encoding of go go gophers. The tree that actually results from a programmed implementation of Humans algorithm will be the same each time the program is run for the same weights (assuming no randomness is used in creating the tree).
Chapter 4 Decompression
Generally speaking, the process of decompression is simply a matter of translating the stream of prex codes to individual byte values, usually by traversing the Human tree node by node as each bit is read from the input stream (reaching a leaf node necessarily terminates the search for that particular byte value). Before this can take place, however, the Human tree must be somehow reconstructed.
4.1
In the simplest case, where character frequencies are fairly predictable, the tree can be preconstructed (and even statistically adjusted on each compression cycle) and thus reused every time, at the expense of at least some measure of compression eciency. Otherwise, the information to reconstruct the tree must be sent a priori. A naive approach might be to prepend the frequency count of each character to the compression stream. Unfortunately, the overhead in such a case could amount to several kilobytes, so this method has little practical use. Another method is to simply prepend the Human tree, bit by bit, to the output stream. For example, assuming that the value of 0 represents a parent node and 1 a leaf node, whenever the latter is encountered the tree building routine simply reads 9
Decompression
the next 8 bits to determine the character value of that particular leaf. The process continues recursively until the last leaf node is reached; at that point, the Human tree will thus be faithfully reconstructed. The overhead using such a method ranges from roughly 2 to 320 bytes (assuming an 8-bit alphabet).
Many other techniques are possible as well. In any case, since the compressed data can include unused trailing bits the decompressor must be able to determine when to stop producing output. This can be accomplished by either transmitting the length of the decompressed data along with the compression model or by dening a special code symbol to signify the end of input (the latter method can adversely aect code length optimality, however).
4.2
To create a table or map of coded bit values for each character youll need to traverse the Human tree (e.g., inorder, preorder, etc.) making an entry in the table each time you reach a leaf. For example, if you reach a leaf that stores the character C, following a path left-left-right-right-left, then an entry in the C-th location of the map should be set to 00110. Youll need to make a decision about how to store the bit patterns in the map. At least two methods are possible for implementing what could be a class/struct BitPattern: Use a string. This makes it easy to add a character (using +) to a string during tree traversal and makes it possible to use string as BitPattern. Your program may be slow because appending characters to a string (in creating the bit pattern) and accessing characters in a string (in writing 0s or 1s when compressing) is slower than the next approach.
10
Conclusion
Alternatively you can store an integer for the bitwise coding of a character. You need to store the length of the code too to dierentiate between 01001 and 00101. However, using an int restricts root-to-leaf paths to be at most 32 edges long since an int holds 32 bits. In a pathological le, a Human tree could have a root-to-leaf path of over 100. Because of this problem, you should use strings to store paths rather than ints. A slow correct program is better than a fast incorrect program.
4.3
Storing Sizes
The operating system will buer output, i.e., output to disk actually occurs when some internal buer is full. In particular, it is not possible to write just one single bit to a le, all output is actually done in chunks, e.g., it might be done in eight-bit chunks. In any case, when you write 3 bits, then 2 bits, then 10 bits, all the bits are eventually written, but you can not be sure precisely when theyre written during the execution of your program. Also, because of buering, if all output is done in eight-bit chunks and your program writes exactly 61 bits explicitly, then 3 extra bits will be written so that the number of bits written is a multiple of eight. Because of the potential for the existence of these extra bits when reading one bit at a time, you cannot simply read bits until there are no more left since your program might then read the extra bits written due to buering. This means that when reading a compressed le, you CANNOT use code like this. int bits; while (input.readbits(1, bits)) { // process bits } To avoid this problem, you can write the size of a data structure before writing the data structure to the le.
11
Conclusion
can properly form an n-ary tree for Human coding. In this case, additional 0probability place holders must be added. If the number of source words is congruent to 1 modulo n-1, then the set of source words will form a proper Human tree. 2. Adaptive Human coding: A variation called adaptive Human coding calculates the probabilities dynamically based on recent actual frequencies in the source string. This is some what related to the LZ family of algorithms. 3. Human template algorithm: Most often, the weights used in implementations of Human coding represent numeric probabilities, but the algorithm given above does not require this; it requires only a way to order weights and to add them. The Human template algorithm enables one to use any kind of weights (costs,frequencies etc) 4. Length-limited Human coding: Length-limited Human coding is a variant where the goal is still to achieve a minimum weighted path length, but there is an additional restriction that the length of each codeword must be less than a given constant. The package-merge algorithm solves this problem with a simple greedy approach very similar to that used by Humans algorithm. Its time complexity is O(nL),where L is the maximum length of a codeword. No algorithm is known to solve this problem in linear or linear logarithmic time, unlike the presorted and unsorted conventional Human problems, respectively.
13
Bibliography
14
using namespace s t d ;
template <c l a s s TYPE > c l a s s Charnode { TYPE ch ; int count ; Charnode l e f t ; Charnode r i g h t ;
public : Charnode (TYPE ch , int count = 0 ) ; Charnode ( const Charnode New ) ; int GetCount ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
15
int Value ( ) ; void S e t L e f t ( Charnode l e f t ) ; void S e t R i g h t ( Charnode r i g h t ) ; Charnode GetL eft ( void ) ; Charnode GetRight ( void ) ; TYPE GetChar ( void ) ; void show ( ) ; bool operator <(Charnode &o b j 2 ) ;
template <c l a s s TYPE > Charnode<TYPE> : : Charnode (TYPE ch , int count ) { LOG( new Charnode <<count<< r e q u e s t e d ) ; this>ch = ch ; this>count = count ; this> l e f t = this>r i g h t = NULL; }
template <c l a s s TYPE > Charnode<TYPE> : : Charnode ( const Charnode New) { LOG( new Charnode <<New>count<< r e q u e s t e d ) ;
16
17
{ return l e f t ; }
template <c l a s s TYPE > Charnode<TYPE Charnode<TYPE> : : GetRight ( void ) > { return r i g h t ; }
template <c l a s s TYPE > bool Charnode<TYPE> : : operator <(Charnode &o b j 2 ) { return ( count < o b j 2 . GetCount ( ) ) ; }
18
{ this>ch = ch ; }
#endif Listing 5.2: The denition of the class Human this class helps in building the human tree for an input le. #include <i o s t r e a m > #include Charnode . h #include g l o b a l s . h #include b i t o p s . h #include <v e c t o r > #include <map> #include <f s t r e a m >
#i f n d e f #define
HuffmanCode h HuffmanCode h
using namespace s t d ;
template <c l a s s TYPE > c l a s s Huffman { private : v e c t o r <Charnode<TYPE > charactermap ; > Charnode<TYPE huffmanTreeRoot ; > map<TYPE, s t r i n g > t a b l e ;
19
map<TYPE, int> f r e q t a b ;
void buildHuffmanTree ( ) ;
public :
Huffman ( ) ;
Huffman ( ) ;
void d i s p l a y C h a r a c t e r m a p ( ) ;
20
void displayHuffmanTable ( ) ;
int ge tC ha rV ec Si ze ( ) ; };
void Huffman<TYPE> : : p r o c e s s f i l e ( const char f i l e n a m e , map<TYPE, int> & cha { ibstream i n f i l e ( filename ) ;
int i n b i t s ; while ( i n f i l e . r e a d b i t s (BITS PER WORD, i n b i t s ) != f a l s e ) { // c o u t << (TYPE) i n b i t s ; charmap [ (TYPE) i n b i t s ]++; }
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
21
LOG( \n\n\nEND\n ) }
v e c t o r <Charnode<TYPE > Huffman<TYPE> : : convertToVector (map<TYPE, int> &cham > { v e c t o r <Charnode<TYPE > charactermap ; >
for (typename map<TYPE, int > : : i t e r a t o r i i =chamap . b e g i n ( ) ; i i !=chamap . { // c o u t << ( i i ) . f i r s t << : << ( i i ) . second << e n d l ;
Charnode<TYPE ch = new Charnode<TYPE>(( i i ) . f i r s t , ( i i ) . > charactermap . push back ( ch ) ; #i f DEBUG // ch>show ( ) ; i f ( ch>GetL eft ()==NULL && ch>GetRight()==NULL) LOG( L e a f Node i n i t i a l i z e d p r o p e r l y ) ; #endif }
return charactermap ; }
template <c l a s s TYPE > bool Huffman<TYPE> : : compare ( Charnode<TYPE i , Charnode<TYPE j ) > > { return ( i < j ) ; }
22
void Huffman<TYPE> : : MinHeapify ( v e c t o r <Charnode<TYPE > & charactermap , int > { int l e f t = 2 i + 1 ; int r i g h t = l e f t + 1 ; int s m a l l e s t = 1;
i f ( l e f t <n && charactermap [ l e f t ]>Value ()< charactermap [ i ]>Value ( ) ) smallest = l e f t ; else smallest = i ;
MinHeapify ( charactermap , s m a l l e s t , n ) ; } }
template <c l a s s TYPE > void Huffman<TYPE> : : BuildMinHeap ( v e c t o r <Charnode<TYPE > &charactermap ) > { int n = charactermap . s i z e ( ) ; for ( int i = n / 2 ; i >=0 ; i ) MinHeapify ( charactermap , i , n ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
23
/ HUFFMAN (C) R e f e r CLRS ( nonu n i c o d e c h a r a c t e r s . ) / int n = charactermap . s i z e ( ) ; LOG( S i z e o f t he char map = <<n ) ; for ( int i =1; i <n ; i ++) { LOG( i << th i t e r a t i o n ) BuildMinHeap ( charactermap ) ;
Charnode<TYPE l e f t = new Charnode<TYPE>( charactermap [ 0 ] ) ; > LOG( l e f t >GetCount ( ) ) ; charactermap . e r a s e ( charactermap . b e g i n ( ) + 0 ) ; BuildMinHeap ( charactermap ) ;
24
z>S e t R i g h t ( r i g h t ) ;
huffmanTreeRoot = charactermap [ 0 ] ;
// I n i t i a l i z
template <c l a s s TYPE > Huffman<TYPE> : : Huffman ( const char f i l e n a m e ) { map<TYPE, int> charmap ; p r o c e s s f i l e ( f i l e n a m e , charmap ) ; charactermap = convertToVector ( charmap ) ; f r e q t a b = charmap ;
25
template <c l a s s TYPE > void Huffman<TYPE> : : delNode ( Charnode<TYPE node ) > { i f ( node == NULL) return ; delNode ( node>GetL eft ( ) ) ; delNode ( node>GetRight ( ) ) ;
delete node ; }
template <c l a s s TYPE > Huffman<TYPE> : : Huffman ( ) { delNode ( huffmanTreeRoot ) ; huffmanTreeRoot = NULL; }
template <c l a s s TYPE > void Huffman<TYPE> : : createHuffmanTable ( Charnode<TYPE t r e e , int code , int > { LOG( func ); // This c o d i t i o n n e v e r o c c u r s !
i f ( t r e e==NULL) return ;
s t r i n g c o d e S t r i n g = ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
26
for ( int j = h e i g h t 1; j >=0; j ) { i f ( code & (1<< j ) ) { // cout < < 1 ; c o d e S t r i n g += 1 ; } else { // cout < < 0 ; c o d e S t r i n g += 0 ; } } // cout<<e n d l ;
t a b l e [ t r e e >GetChar ( ) ] = c o d e S t r i n g ;
return ;
} code = code <<1; createHuffmanTable ( t r e e >GetL eft ( ) , code , h e i g h t +1); createHuffmanTable ( t r e e >GetRight ( ) , code | 1 , h e i g h t +1); }
int n = charactermap . s i z e ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
27
template <c l a s s TYPE > Charnode<TYPE Huffman<TYPE> : : getRoot ( ) > { return huffmanTreeRoot ; }
template <c l a s s TYPE > void Huffman<TYPE> : : displayHuffmanTable ( ) { LOG( HUFFMAN TABLE ) ; for (typename map<TYPE, s t r i n g > : : i t e r a t o r i i =t a b l e . b e g i n ( ) ; i i != t a b { cout << e n d l << ( i i ) . f i r s t << \ t << ( i i ) . second ; } cout << e n d l ; }
28
#endif Listing 5.3: The denition of the class CompressionWriting this class helps in writing the bits to the compressed le. #i f n d e f COMP H #define COMP H
#include <i o s t r e a m > #include <v e c t o r > #include <map> #include <s t r i n g > #include <f s t r e a m >
using namespace s t d ;
29
public : CompressionWriting ( ) { }
void writeCompressedDataToFile ( ) ;
void d i s p l a y O u t p u t F i l e ( ) ;
30
freqMap = freMap ; }
// W
// c o u t <<
Dept. of CSE, R V C E, Bangalore.
<< b i t P a t t e r n << e n d l ;
31
o u t f i l e . w r i t e b i t s ( huffmanTable [ (TYPE) i n b i t s ] . l e n g t h ( ) , b i t P }
template <c l a s s TYPE > int CompressionWriting<TYPE> : : totalNumOfBits ( ) { int count = 0 ; int n = freqMap . s i z e ( ) ;
32
return b i t P a t t e r n ; }
33
return ;
i f ( node>GetL eft ( ) == NULL && node>GetRight ( ) == NULL) { o u t f i l e . writebits (1 , 1); o u t f i l e . w r i t e b i t s (BITS PER WORD, node>GetChar ( ) ) ; }
#endif Listing 5.4: The main program of the human compression algorithm. #include<f s t r e a m > #include<c s t d i o > #include<a l g o r i t h m > #include<i o s t r e a m > #include<c s t r i n g > #include<map> #include<v e c t o r > #include<c s t d l i b >
#include Charnode . h
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
34
using namespace s t d ;
Huffman<char> h u f f ( argv [ 1 ] ) ; // h u f f . d i s p l a y C h a r a c t e r m a p ( ) ;
// t e s t ( ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
35
// c i n . g e t ( ) ; } Listing 5.5: The denition of the class Decompressor this class helps in decompressing the compressed le using human algorithm. #i f n d e f DECOMP H #define DECOMP H #include <i o s t r e a m > #include <v e c t o r > #include <map> #include <s t r i n g >
using namespace s t d ;
template <c l a s s TYPE > c l a s s Decompressor { Charnode<TYPE huffmanTreeRoot ; > s t r i n g outputFilename ; s t r i n g compressedFilename ; int numChars ;
36
Decompressor ( ) ;
void decompress ( ) ;
template <c l a s s TYPE > Decompressor<TYPE> : : Decompressor ( s t r i n g cname , s t r i n g oname ) { outputFilename = oname ; compressedFilename = cname ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : delNode ( Charnode<TYPE node ) > { i f ( node == NULL) return ;
37
delete node ; }
template < c l a s s TYPE > int Decompressor<TYPE> : : readCount ( i b s t r e a m & i b s ) { int count = 0 ; i b s . r e a d b i t s ( BITS PER INT , count ) ; return count ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : p r e o r d e r ( Charnode<TYPE node ) > { i f ( node == NULL) {
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
38
return ; }
template <c l a s s TYPE > void Decompressor<TYPE> : : c o n s t r u c t T r e e ( Charnode<TYPE &node , int n , i b s t r > { i f ( n == 0 ) return ;
int b i t r e a d ; ibs . readbits (1 , bitread ) ; i f ( b i t r e a d == 1 ) { i b s . r e a d b i t s (BITS PER WORD, b i t r e a d ) ; node = new Charnode<TYPE>(( char ) b i t r e a d ) ; n; } else { node = new Charnode<TYPE>( \0 ) ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
39
Charnode<TYPE l e f t n o d e = node>GetL eft ( ) ; > Charnode<TYPE r i g h t n o d e = node>GetRight ( ) ; > constructTree ( leftnode , n , ibs ) ; constructTree ( rightnode , n , ibs ) ;
node>S e t L e f t ( l e f t n o d e ) ; node>S e t R i g h t ( r i g h t n o d e ) ; } }
i s e n c o u n t e r e d i t s a l e a f and t h e n e x t 8 b i t s r e p r e s e n t t h a t
// S t e p 1
40
// S t e p 2
// S t e p 4 int i = readCount ( c o m p r e s s e d F i l e ) ; Charnode<TYPE t r a v e r s e r = huffmanTreeRoot ; > while ( i ) { int b i t r e a d ; compressedFile . readbits (1 , bitread ) ;
t r a v e r s e r = ( b i t r e a d ) ? t r a v e r s e r >GetRight ( ) : t r a v e r s e r >
t r a v e r s e r = huffmanTreeRoot ; } i ;
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
41
#endif Listing 5.6: The main program of the human decompression algorithm. #include<f s t r e a m > #include<c s t d i o > #include<a l g o r i t h m > #include<i o s t r e a m > #include<c s t r i n g > #include<map> #include<v e c t o r > #include<c s t d l i b >
using namespace s t d ;
i f ( a r g c != 3 ) {
Dept. of CSE, R V C E, Bangalore. Feb 2012 - May 2013
42
cout<< Usage <<argv [0]<< Input f i l e << O u t p u t f i l e \n ; exit (0); } Decompressor<char> c o m p r e s s e d f i l e ( argv [ 1 ] , argv [ 2 ] ) ; c o m p r e s s e d f i l e . decompress ( ) ;
// c i n . g e t ( ) ; // c i n . g e t ( ) ; }
43
44
45
46
47