Vik
Vik
At present there is an insatiable demand for ever-greater bandwidth in communication networks and forever-greater storage capacity in computer system. This led to the need for an efficient compression technique. The compression is the process that is required either to reduce the volume of information to be transmitted text, fax and images or reduce the bandwidth that is required for its transmission speech, audio and video. The compression technique is first applied to the source information prior to its transmission. ata compression is the process of converting an input data stream or the source stream or the original raw data into another data stream that has a smaller si!e. data compression is popular because of two reasons "# $eople like to accumulate data and hate to throw anything away. %o matter however large a storage device may be, sooner or later it is going to overflow. inevitability &# $eople hate to wait a long time for data transfer. There are many known methods of data compression. They are based on the different ideas and are suitable for different types of data. They produce different results, but they are all based on the same basic principle that they compress data by removing the redundancy from the original data in the source file. The idea of compression by reducing redundancy suggests the general law of data compression, which is to 'assign short codes to common events and long ata compression seems useful because it delays this
representation from inefficient to efficient form. The main aim of the field of data compression is of course to develop methods for better and better compression. (xperience shows that fine tuning an algorithm to squee!e out the last remaining bits of redundancy from the data gives diminishing returns. ata compression has become so important that some researches have proposed the 'simplicity and power theory'. )pecifically it says, data compression may be interpreted as a process of removing unnecessary complexity in information and thus maximi!ing the simplicity while preserving as much as possible of its non redundant descriptive power. The main problem is that this *great idea requires too many bits. +n fact there exist many coding techniques that will effectively reduce the total number of bits needed to represent the above information. +n this lossless data compression algorithms we perform that compression of the data, and also it involves that no distortion of the original signal once it is decompressed or reconstituted. )o much data will exist, in archives and elsewhere, that it has become critical to compress this information. ,ossless compression is one way to proceed. -enerally compression in the sense encoding the data, i.e., reducing the data. .ompression is performed by an encoder and decompressions performed by the decoder. /e call the output of the encoder .odes or code works. The intermediate medium could either be data storage or a communication0computer network. +f the compression and decompression
procedure induces no information loss, the compression scheme is lossless1 other it is lossy. 2ere the data will be compressed by using three coding techniques. They are run-length coding, variable -length coding, fixed-length coding. 3un length coding is simplest form of data compression. +n the variable-length coding we are using )hannon-fano algorithm and haffman coding algorithm.)hannonfano algorithm is a top down approach, and the haffman coding algorithm is a bottom up approach.+n the fixed-,ength coding we are using the dictionarybased coding algorithm. The emergency of multimedia technologies has made digital libraries a reality.noe a day4s libraries, museums, film studios and government are converting more and more data archives into digital form. )ome of the data indeed to be stored without any loss.
applications, it is normally imperative that no part of the source information is lost during either the compression or decompression operations and file storage systems ;tapes, hard disk drives, solid state storage, file servers# and communication networks ;,A%, /A%, wireless#.
color of the run. Because the code word output is shorter than the input, pixel data compression is achieved. The run-length code words are taken from a predefined table of values representing runs of black or white pixels. This table is part of the T.9 specification and is used to encode and decode all -roup 8 data. The si!e of the code words were originally determined by the ..+TT, based statistically on the average frequency of black-and-white runs occurring in typical type and handwritten documents. The documents included line art and were written in several different languages. 3un lengths that occur more frequently are assigned smaller code words while run lengths that occur less frequently are assigned larger code words. +n printed and handwritten documents, short runs occur more frequently than long runs. Two- to 9-pixel black runs are the most frequent in occurrence. The maximum si!e of a run length is bounded by the maximum width of a -roup 8 scan line. 3un lengths are represented by two types of code words< makeup and terminating. An encoded pixel run is made up of !ero or more makeup code words and a terminating code word. Terminating code words represent shorter runs, and makeup codes represent longer runs. There are separate terminating and makeup code words for both black and white runs. $ixel runs with a length of A to D8 are encoded using a single terminating code. 3uns of D9 to &D&8 pixels are encoded by a single makeup code and a terminating code. 3un lengths greater than &D&8 pixels are encoded using
one or more makeup codes and a terminating code. The run length is the sum of the length values represented by each code word.
probability is the sum of two merged nodes. &.&. Arbitrarily assign " E A each pair of branches merging into a node.
3. 3ead sequentially from root to leaf node where symbol is located. $robability of occurrence of a bit stream of length 3n is $ ;n#. As a result, shorter codes were developed for less frequent runlength. ?or e.g., from below table , runlength code for "D white pixels is "A"A"A, while runlength code for "D black pixels is AAAAA"A""",since occurrence of "D white pixels is more than that of "D black pixels. .odes greater than "GJ& runlength is same for both pixels. ..+TT -roup 8 " utili!es 2uffman coding to generate a set of terminating codes E make-up codes for given bit stream. (.g., runlength of "8& where pixels are encoded by using, >ake-up code for "&F white pixels-"AA"A. Terminating code for 9white pixels-"A""
data occurring before and after each run length is not important to the encoding step1 only the data occurring in the present run is needed. /ith -roup 8 Two- imensional ;-8& # encoding, on the other hand, the way a scan line is encoded may depend on the immediately preceding scanline data. >any images have a high degree of vertical coherence ;redundancy#. By describing the differences between two scan lines, rather than describing the scan line contents, & compression. The first pixel of each run length is called a changing element. (ach changing element marks a color transition within a scan line ;the point where a run of one color ends and a run of the next color begins#. The position of each changing element in a scan line is described as being a certain number of pixels from a changing element in the current, coding line ;hori!ontal coding is performed# or in the preceding, reference line ;vertical coding is performed#. The output codes used to describe the actual positional information are called 3elative (lement Address esignate ;3(A # codes. )horter code words are used to describe the color transitions that are less than four pixels away from each other on the code line or the reference line. ,onger code words are used to describe color transitions lying a greater distance from the current changing element. & encoding is more efficient than "-dimensional because the usual data that is compressed ;typed or handwritten documents# contains a high amount of & coherence. encoding achieves better
Because a -8& -encoded scan line is dependent on the correctness of the preceding scan line, an error, such as a burst of line noise, can affect multiple, &-dimensionally encoded scan lines. +f a transmission error corrupts a segment of encoded scan line data, that line cannot be decoded. But, worse still, all scan lines occurring after it also decode improperly. To minimi!e the damage created by noise, -8& uses a variable called a L
factor and &-dimensionally encodes L-" lines following a "-dimensionally encoded line. +f corruption of the data transmission occurs, only L-" scan lines of data will be lost. The decoder will be able to resync the decoding at the next available (5, code. The typical value for L is & or 9. -8& data that is encoded with a L value
of 9 appears as a single block of data. (ach block contains three lines of & scan-line data followed by a scan line of "-dimensionally encoded data.
and a L variable set to infinity. -roup 9 was designed specifically to encode data residing on disk drives and data networks. The built-in transmission error detection0correction found in -roup 8 is therefore not needed by -roup 9 data. The first reference line in -roup 9 encoding is an imaginary scan line containing all white pixels. +n -8& encoding, the first reference line is the first scan line of the image. +n -roup 9 encoding, the 3T. code word is replaced by an end of facsimile block ;(5?B# code, which consists of two consecutive -roup 8 (5, code words. ,ike the -roup 8 3T., the (5?B is also part of the transmission protocol and not actually part of the image data.
Also, -roup 9-encoded image data may be padded out with fill bits after the (5?B to end on a byte boundary. -roup 9 encoding will usually result in an image compressed twice as small as if it were done with -8" encoding. The main tradeoff is that -roup 9 encoding is more complex and requires more time to perform. /hen implemented in hardware, however, the difference in execution speed between the -roup 8 and -roup 9 algorithms is not significant, which usually makes -roup 9 a better choice in most imaging system implementations.
order of "D colors#. ?or such a reduced alphabet, the full "&-bit codes yielded poor compression unless the image was large, so the idea of a /ar a1*e23 $t( code was introduced< codes typically start one bit wider than the symbols being encoded, and as each code si!e is used up, the code width increases by " bit, up to some prescribed maximum ;typically "& bits#. Encoding A high level view of the encoding algorithm is shown here< ". +nitiali!e the dictionary to contain all strings of length one. &. ?ind the longest string / in the dictionary that matches the current input. 8. (mit the dictionary index for / to output and remove / from the input. 9. Add / followed by the next symbol in the input to the dictionary. I. -o to )tep &. A dictionary is initiali!ed to contain the single-character strings corresponding to all the possible input characters ;and nothing else except the clear and stop codes if theyKre being used#. The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary. /hen such a string is found, the index for the string less the last character ;i.e., the longest substring that is in the dictionary# is retrieved from the dictionary and sent to output, and the new string ;including the last character# is added to the dictionary with the next available code. The last input character is then used as the next starting point to scan for substrings.
+n this way, successively longer strings are registered in the dictionary and made available for subsequent encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a message will see little compression. As the message grows, however, the compression ratio tends asymptotically to the maximum.N&O Decoding The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initiali!ed dictionary. At the same time it obtains the next value from the input, and adds to the dictionary the concatenation of the string 6ust output and the first character of the string obtained by decoding the next input value. The decoder then proceeds to the next input value ;which was already read in as the 'next value' in the previous pass# and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. +n this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data1 6ust the initial dictionary containing the single-character strings is sufficient ;and is typically defined beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.#
A commonly used method for data compression is huffman coding. The method starts by building a list of all the alphabet symbols in descending order of their probabilities. +t then constructs a tree with a symbol at every leaf from the bottom up. This is done in steps where at each step the two symbols with smallest probabilities are selected, added to the top of partial tree, deleted from the list and replaced with an auxiliary symbol representing both of them. /hen the list is reduced to 6ust one auxiliary symbol the tree is complete. The tree is then traversed to determine the codes of the symbols. The huffman method is somewhat similar to shannon fano method. The main difference between the two methods is that shannon fano constructs its codes from top to bottom while huffman constructs a code tree from bottom up. This is best illustrated by an example. -iven five symbols with probabilities as shown in ?igure. They are paired in the following order< ". a9 is combined with aI and both are replaced by the combined symbol a 9I, whose probability is A.&. &. There are now four symbols left, a", with probability A.9, and a &, a8, and a9I, with probabilities A.& each. /e arbitrarily select a 8 and a9I combine them and replace them with the auxiliary symbol a 89I, whose probability is A.9. 8. Three symbols are now left, a", a&, and a89I, with probabilities A.9, A.&, and A.9 respectively. /e arbitrarily select a& and a89I, combine them and replace them with the auxiliary symbol a&89I, whose probability is A.D. 9. ?inally, we combine the two remaining symbols a ", and a&89I, and replace them with a"&89I with probability ".
The tree is now complete, *lying on its sideP with the root on the right and the five leaves on the left. To assign the codes, we arbitrarily assign a bit of " to the top edge, and a bit of A to the bottom edge of every pair of edges. This results in the codes A, "A, """, ""A", and ""AA. The assignments of bits to the edges is arbitrary. The average si!e of this code is A.9 x " M A.& x & M A.& x 8 M A." x 9 M A." x 9 @ &.& bits 0 symbol, but even more importantly, the 2uffman code is not unique.
CHAPTER +
DCT2CALCULATIONS:
%os in the above table represent amplitude of pixel in FQF blocks. The tale& shows FQF matrixes, after applying .T ;i,6# as "G&, it is called as .T. +n table&, rowA, columnA has . component, others are called as A.
After quanti!ation, =$(- elected to compress A values by utili!ing runlength scheme.To find no of As, =$(- uses !ig!ag manner.
+t provides two basic schemes< iscrete transform-based compression for reduction of spatial
redundancy. Block based motion compensation for reduction of temporal redundancy. uring initial stages of compensation is used. M#t #n "#&)en'at #n: +t assumes current picture as the translation of some previous pictures. >otion compensation attempts to compensate for movement of ob6ects in compression phase. To make it easier to compare frames, a frame is split into blocks, and blocks are encoded. ?or each block in the frame to be encoded, best matching block in the reference frame is searched among the number of candidate blocks. ?or each block, motion vector is generated. This vector is viewed as an analytical indication of new location +n the frame being encoded from an existing block in the reference frame. This is an attempt to match up new location of a moved ob6ect. The process of matching up based on )re$ "t #n #r nter)#*at #n. Pre$ "t #n: $rediction requires current frame and reference frame. Based on motion vector values generated, prediction attempts to find a new position of ob6ect E conform it by .omparing some blocks. Inter)#*at #n: .T compression, both >$(- E =$(- algorithms are essentially same. ?or full motion video ;>$(-" E&#, motion
>otion vectors are generated in relation to two reference frames, one from past E the predicted frame. The best matching blocks in both frames are searched, and average is taken as position of block in current frame.
Advanced technologies:
+ntel4s indeo technology Apples quicktime >icrosoft A7+ +ntel4s A7+