Karantp
Karantp
TOPIC:HUFFMAN CODES
ACKNOWLEDGEMENT
First and foremost I, KARANBIR SINGH is very thankful to Lect.VIJAY GARG who assigned me this term paper HUFFMAN CODES. I am hearty thankful to college library for providing the books, my roommates and classmates for helping me in assembling the notes related to this topic. Last but not the least; I am very thankful to my parents who give me financial support to complete my term paper.
KARANBIR SINGH
Contents
1) Introduction 2) Types of Huffman coding a)N-ray Huffman coding b)Adaptive Huffman coding c)Huffman template algorithm d)Length limited Huffman coding e)Huffman coding with unequal letter costs f)Hu-tucker coding g)Canonical Huffman code 3) Properties
4) Advantages
5) Disadvantages
6) Applications 7) References
INTRODUCTION
Huffman coding is an entropy encoding algorithm used for lossless data compression. It was developed by David A. Huffman while he was a Ph.D. student at MIT, and published in the 1952 ".Huffman coding is based on the frequency of occurrence of a data item (pixel in images). The principle is to use a lower number of bits to encode the data that occurs more frequently. Codes are stored in a Code Book which may be constructed for each image or a set of images. In all cases the code book plus encoded data must be transmitted to enable decoding.Huffman coding uses a specific method for choosing the representation for each symbol, resulting in a prefix code sometimes called "prefix-free codes", that is, the bit string representing some particular symbol is never a prefix of the bit string representing any other symbol that expresses the most common source symbols using shorter strings of bits than are used for less common source symbols. Huffman was able to design the most efficient compression method of this type ,no other mapping of individual source symbols to unique strings of bits will produce a smaller average output size when the actual symbol frequencies agree with those used to create the code. A method was later found to design a Huffman code in linear time if input probabilities (also known as weights) are sorted. Huffman coding is equivalent to simple binary block encoding, e.g., ASCII coding. Huffman coding is such a widespread method for creating prefix codes that the term "Huffman code" is widely used as a synonym for "prefix code" even when such a code is not produced by Huffman's algorithm.
Such algorithms can solve other minimization problems, such as minimizing design. , a problem first applied to circuit
the same manner or with the same efficiency as conventional Huffman coding.
PROPERTIES
1. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) -> great for decoder, unambiguous. 2. If prior statistics are available and accurate, then Huffman coding is very good 3. The frequencies used can be generic ones for the application domain that are based on average experience, or they can be the actual frequencies found in the text being compressed. 4. Huffman coding is optimal when the probability of each input symbol is a negative power of two. 5. The worst case for Huffman coding can happen when the probability of a symbol 6 cedes 2-1 = 0.5, making the upper limit of inefficiency unbounded. These situations often respond well to a form of blocking called run-length encoding.
ADVANTAGES
Algorithm is easy to implement Produce a lossless compression of images
DISADVANTAGES
Efficiency depends on the accuracy of the statistical model used and type of image.
Algorithm varies with different formats, but few get any better than 8:1 compression. Compression of image files that contain long runs of identical pixels by Huffman is not as efficient when compared to RLE. The Huffman encoding process is usually done in two passes. During the first pass, a statistical model is built, and then in the second pass the image data is encoded based on the generated model. From here we can see that Huffman encoding is a relatively slow process as time is required to build the statistical model in order to archive an efficient compression rate. Another disadvantage of Huffman is that, all codes of the encoded data are of different sizes (not of fixed length). Therefore it is very difficult for the decoder to know that it has reached the last bit of a code, and the only way for it to know is by following the paths of the up-side down tree and coming to an end of it (one of the branch). Thus, if the encoded data is corrupted with additional bits added or bits missing, then whatever that is decoded will be wrong values, and the final image displayed will be garbage. It is required to send Huffman table at the beginning of the compressed file ,otherwise the decompressor will not be able to decode it. This causes overhead.
APPLICATIONS
1. Arithmetic coding can be viewed as a generalization of Huffman coding; indeed, in practice arithmetic coding is often preceded by Huffman coding, as it is easier to find an arithmetic code for a binary input than for a nonbinary input. 2. Huffman coding is in wide use because of its simplicity, high speed and lack of encumbrance by patents. 3. Huffman coding today is often used as a "back-end" to some other compression method. DEFLATE (PKZIP's algorithm) and multimedia codecs such as JPEG and MP3 have a front-end model and quantization followed by Huffman coding.
REFERENCES
1.www.google.com/Huffman 2. http://en.wikipedia.org/huffman_codes 3. A.V.Aho, J.E. Hopcroft and J.D.Ullman, The Design and Analysis Of Computer Algorithms, Pearson Education Asia, 2007 4. T.H. Cormen, C.E. Leiserson, R.L. Rivest and C. Stein, Introduction to Algorithms, PHI Pvt. Ltd., 2007