0% found this document useful (0 votes)
83 views23 pages

Vik

good one
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
83 views23 pages

Vik

good one
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 23

CHAPTER 1 INTRODUCTION

At present there is an insatiable demand for ever-greater bandwidth in communication networks and forever-greater storage capacity in computer system. This led to the need for an efficient compression technique. The compression is the process that is required either to reduce the volume of information to be transmitted text, fax and images or reduce the bandwidth that is required for its transmission speech, audio and video. The compression technique is first applied to the source information prior to its transmission. ata compression is the process of converting an input data stream or the source stream or the original raw data into another data stream that has a smaller si!e. data compression is popular because of two reasons "# $eople like to accumulate data and hate to throw anything away. %o matter however large a storage device may be, sooner or later it is going to overflow. inevitability &# $eople hate to wait a long time for data transfer. There are many known methods of data compression. They are based on the different ideas and are suitable for different types of data. They produce different results, but they are all based on the same basic principle that they compress data by removing the redundancy from the original data in the source file. The idea of compression by reducing redundancy suggests the general law of data compression, which is to 'assign short codes to common events and long ata compression seems useful because it delays this

codes to rare events'.

ata compression is done by changing its

representation from inefficient to efficient form. The main aim of the field of data compression is of course to develop methods for better and better compression. (xperience shows that fine tuning an algorithm to squee!e out the last remaining bits of redundancy from the data gives diminishing returns. ata compression has become so important that some researches have proposed the 'simplicity and power theory'. )pecifically it says, data compression may be interpreted as a process of removing unnecessary complexity in information and thus maximi!ing the simplicity while preserving as much as possible of its non redundant descriptive power. The main problem is that this *great idea requires too many bits. +n fact there exist many coding techniques that will effectively reduce the total number of bits needed to represent the above information. +n this lossless data compression algorithms we perform that compression of the data, and also it involves that no distortion of the original signal once it is decompressed or reconstituted. )o much data will exist, in archives and elsewhere, that it has become critical to compress this information. ,ossless compression is one way to proceed. -enerally compression in the sense encoding the data, i.e., reducing the data. .ompression is performed by an encoder and decompressions performed by the decoder. /e call the output of the encoder .odes or code works. The intermediate medium could either be data storage or a communication0computer network. +f the compression and decompression

procedure induces no information loss, the compression scheme is lossless1 other it is lossy. 2ere the data will be compressed by using three coding techniques. They are run-length coding, variable -length coding, fixed-length coding. 3un length coding is simplest form of data compression. +n the variable-length coding we are using )hannon-fano algorithm and haffman coding algorithm.)hannonfano algorithm is a top down approach, and the haffman coding algorithm is a bottom up approach.+n the fixed-,ength coding we are using the dictionarybased coding algorithm. The emergency of multimedia technologies has made digital libraries a reality.noe a day4s libraries, museums, film studios and government are converting more and more data archives into digital form. )ome of the data indeed to be stored without any loss.

CHAPTER 2 COMPRESSION TECHNIQUES


5 ,ossless .ompression 5 ,ossy .ompression 2.1 LOSSLESS COMPRESSION:
+n this type of lossless compression algorithm, the aim is to reduce the amount of source information to be transmitted in such a way that, when the compressed information is decompressed, there is no loss of information. ,ossless compression is said therefore, to be reversible. i.e., ata is not altered or lost in the process of compression or decompression. ecompression generates an exact replica of the original $ackbits encoding ..+TT -roup 8 " ..+TT -roup 8 & ..+TT -roup 9 & ,empel-:iv and /elch algorithm ,:/ 2uffman (xample transferring data applications of lossless a text file compression are over a network as since, in such

ob6ect. The 7arious lossless .ompression Techniques are,

applications, it is normally imperative that no part of the source information is lost during either the compression or decompression operations and file storage systems ;tapes, hard disk drives, solid state storage, file servers# and communication networks ;,A%, /A%, wireless#.

2.2 LOSSY COMPRESSION:


The aim of the ,ossy compression algorithms is normally not to reproduce an exact copy of the source information after decompression but rather a version of it that is perceived by the recipient as a true copy. The ,ossy compression algorithms are< =$(- ;=oint $hotographic (xpert -roup# >$(- ;>oving $icture (xperts -roup#

CHAPTER 3 LOSSLESS COMPRESSION TECHNIQUES


3.1 PACKBITS ENCODING:
+n this, a consecutive repeated string of characters is replaced by two bytes. ?irst byte@no of times the character is repeated. )econd byte@character itself. ?or e.g., AAAAAA""""""AAAA is represented as Byte", byte&, byte8C byte%.AxAD, "xAD, AxA9. +n some cases, one byte is used to represent both value of character E also no of times. + +n this, one bit out of F used for representing pixel value, G bits are for H to"0I. runlength. Typical compression efficiency is from

3.2 CCITT GROUP 3 1D:


)ometimes ..+TT encoding is referred to, not entirely accurately, as 2uffman encoding. 2uffman encoding is a simple compression algorithm introduced by avid 2uffman in "JI&. ..+TT "-dimensional encoding, described in a subsection below, is a specific type of 2uffman encoding. -roup 8 5ne- imensional encoding ;-8" # is a variation of the 2uffman keyed compression scheme. A bi-level image is composed of a series of black-and-white "-bit pixel runs of various lengths ;" @ black and A @ white#. A -roup 8 encoder determines the length of a pixel run in a scan line and outputs a variable-length binary code word representing the length and

color of the run. Because the code word output is shorter than the input, pixel data compression is achieved. The run-length code words are taken from a predefined table of values representing runs of black or white pixels. This table is part of the T.9 specification and is used to encode and decode all -roup 8 data. The si!e of the code words were originally determined by the ..+TT, based statistically on the average frequency of black-and-white runs occurring in typical type and handwritten documents. The documents included line art and were written in several different languages. 3un lengths that occur more frequently are assigned smaller code words while run lengths that occur less frequently are assigned larger code words. +n printed and handwritten documents, short runs occur more frequently than long runs. Two- to 9-pixel black runs are the most frequent in occurrence. The maximum si!e of a run length is bounded by the maximum width of a -roup 8 scan line. 3un lengths are represented by two types of code words< makeup and terminating. An encoded pixel run is made up of !ero or more makeup code words and a terminating code word. Terminating code words represent shorter runs, and makeup codes represent longer runs. There are separate terminating and makeup code words for both black and white runs. $ixel runs with a length of A to D8 are encoded using a single terminating code. 3uns of D9 to &D&8 pixels are encoded by a single makeup code and a terminating code. 3un lengths greater than &D&8 pixels are encoded using

one or more makeup codes and a terminating code. The run length is the sum of the length values represented by each code word.

Algorithm for 2uffman coding


". Arrange the symbols probabilities pi in decreasing orderE consider them as leaf nodes of a tree. &. while there is more than one node1 &.". >erge the two nodes with smallest probability to form a new node whose

probability is the sum of two merged nodes. &.&. Arbitrarily assign " E A each pair of branches merging into a node.
3. 3ead sequentially from root to leaf node where symbol is located. $robability of occurrence of a bit stream of length 3n is $ ;n#. As a result, shorter codes were developed for less frequent runlength. ?or e.g., from below table , runlength code for "D white pixels is "A"A"A, while runlength code for "D black pixels is AAAAA"A""",since occurrence of "D white pixels is more than that of "D black pixels. .odes greater than "GJ& runlength is same for both pixels. ..+TT -roup 8 " utili!es 2uffman coding to generate a set of terminating codes E make-up codes for given bit stream. (.g., runlength of "8& where pixels are encoded by using, >ake-up code for "&F white pixels-"AA"A. Terminating code for 9white pixels-"A""

Generat n! "#$ n! tree:

3.3 CCITT GROUP 3 2D COMPRESSION:


-roup 8 5ne- imensional ;-8" # encoding, which weKve discussed above, encodes each scan line independent of the other scan lines. 5nly one run length at a time is considered during the encoding and decoding process. The

data occurring before and after each run length is not important to the encoding step1 only the data occurring in the present run is needed. /ith -roup 8 Two- imensional ;-8& # encoding, on the other hand, the way a scan line is encoded may depend on the immediately preceding scanline data. >any images have a high degree of vertical coherence ;redundancy#. By describing the differences between two scan lines, rather than describing the scan line contents, & compression. The first pixel of each run length is called a changing element. (ach changing element marks a color transition within a scan line ;the point where a run of one color ends and a run of the next color begins#. The position of each changing element in a scan line is described as being a certain number of pixels from a changing element in the current, coding line ;hori!ontal coding is performed# or in the preceding, reference line ;vertical coding is performed#. The output codes used to describe the actual positional information are called 3elative (lement Address esignate ;3(A # codes. )horter code words are used to describe the color transitions that are less than four pixels away from each other on the code line or the reference line. ,onger code words are used to describe color transitions lying a greater distance from the current changing element. & encoding is more efficient than "-dimensional because the usual data that is compressed ;typed or handwritten documents# contains a high amount of & coherence. encoding achieves better

Because a -8& -encoded scan line is dependent on the correctness of the preceding scan line, an error, such as a burst of line noise, can affect multiple, &-dimensionally encoded scan lines. +f a transmission error corrupts a segment of encoded scan line data, that line cannot be decoded. But, worse still, all scan lines occurring after it also decode improperly. To minimi!e the damage created by noise, -8& uses a variable called a L

factor and &-dimensionally encodes L-" lines following a "-dimensionally encoded line. +f corruption of the data transmission occurs, only L-" scan lines of data will be lost. The decoder will be able to resync the decoding at the next available (5, code. The typical value for L is & or 9. -8& data that is encoded with a L value

of 9 appears as a single block of data. (ach block contains three lines of & scan-line data followed by a scan line of "-dimensionally encoded data.

Data %#r&at %#r 2D '"(e&e:


& -scheme uses a combination of pass code, vertical code, and hori!ontal code. $ass code is AAA".hori!otal code is AA". iff b0w pixel position in reference lineAnd coding line 8 & " A -" -& -8 7erticalcode AAAAA"A AAAA"A A"A " A"" AAAA"" AAAAA""

Ste)' t# "#$e t(e "#$ n! * ne:


". $arse the coding line and for change in pixel value.The pixel value change if found at the location. &. $arse the reference line and look for change in pixel value. The change is found at the b"-location. 8. ?ind the difference in locations b" E a"< elta@b"-a". +f delta is in b0w -8 to M8, apply vertical codes.

3.+ CCITT GROUP + 2D COMPRESSION:


-roup 9 Two- imensional ;-9& # encoding was developed from the -8& algorithm as a better & compression scheme--so much better, in fact, that -roup 9 has almost completely replaced -8& in commercial use. -roup 9 encoding is identical to -8& modifications. -roup 9 is basically the -8& encoding except for a few algorithm with no (5, codes

and a L variable set to infinity. -roup 9 was designed specifically to encode data residing on disk drives and data networks. The built-in transmission error detection0correction found in -roup 8 is therefore not needed by -roup 9 data. The first reference line in -roup 9 encoding is an imaginary scan line containing all white pixels. +n -8& encoding, the first reference line is the first scan line of the image. +n -roup 9 encoding, the 3T. code word is replaced by an end of facsimile block ;(5?B# code, which consists of two consecutive -roup 8 (5, code words. ,ike the -roup 8 3T., the (5?B is also part of the transmission protocol and not actually part of the image data.

Also, -roup 9-encoded image data may be padded out with fill bits after the (5?B to end on a byte boundary. -roup 9 encoding will usually result in an image compressed twice as small as if it were done with -8" encoding. The main tradeoff is that -roup 9 encoding is more complex and requires more time to perform. /hen implemented in hardware, however, the difference in execution speed between the -roup 8 and -roup 9 algorithms is not significant, which usually makes -roup 9 a better choice in most imaging system implementations.

3., Le&)e*-. /-0e*"(: Algorithm


The scenario described in /elchKs "JF9 paperN"O encodes sequences of F-bit data as fixed-length "&-bit codes. The codes from A to &II represent "character sequences consisting of the corresponding F-bit character, and the codes &ID through 9AJI are created in a dictionary for sequences encountered in the data as it is encoded. At each stage in compression, input bytes are gathered into a sequence until the next character would make a sequence for which there is no code yet in the dictionary. The code for the sequence ;without that character# is emitted, and a new code ;for the sequence with that character# is added to the dictionary. The idea was quickly adapted to other situations. +n an image based on a color table, for example, the natural character alphabet is the set of color table indexes, and in the "JFAs, many images had small color tables ;on the

order of "D colors#. ?or such a reduced alphabet, the full "&-bit codes yielded poor compression unless the image was large, so the idea of a /ar a1*e23 $t( code was introduced< codes typically start one bit wider than the symbols being encoded, and as each code si!e is used up, the code width increases by " bit, up to some prescribed maximum ;typically "& bits#. Encoding A high level view of the encoding algorithm is shown here< ". +nitiali!e the dictionary to contain all strings of length one. &. ?ind the longest string / in the dictionary that matches the current input. 8. (mit the dictionary index for / to output and remove / from the input. 9. Add / followed by the next symbol in the input to the dictionary. I. -o to )tep &. A dictionary is initiali!ed to contain the single-character strings corresponding to all the possible input characters ;and nothing else except the clear and stop codes if theyKre being used#. The algorithm works by scanning through the input string for successively longer substrings until it finds one that is not in the dictionary. /hen such a string is found, the index for the string less the last character ;i.e., the longest substring that is in the dictionary# is retrieved from the dictionary and sent to output, and the new string ;including the last character# is added to the dictionary with the next available code. The last input character is then used as the next starting point to scan for substrings.

+n this way, successively longer strings are registered in the dictionary and made available for subsequent encoding as single output values. The algorithm works best on data with repeated patterns, so the initial parts of a message will see little compression. As the message grows, however, the compression ratio tends asymptotically to the maximum.N&O Decoding The decoding algorithm works by reading a value from the encoded input and outputting the corresponding string from the initiali!ed dictionary. At the same time it obtains the next value from the input, and adds to the dictionary the concatenation of the string 6ust output and the first character of the string obtained by decoding the next input value. The decoder then proceeds to the next input value ;which was already read in as the 'next value' in the previous pass# and repeats the process until there is no more input, at which point the final input value is decoded without any more additions to the dictionary. +n this way the decoder builds up a dictionary which is identical to that used by the encoder, and uses it to decode subsequent input values. Thus the full dictionary does not need be sent with the encoded data1 6ust the initial dictionary containing the single-character strings is sufficient ;and is typically defined beforehand within the encoder and decoder rather than being explicitly sent with the encoded data.#

3.4 HU55MAN CODING:

A commonly used method for data compression is huffman coding. The method starts by building a list of all the alphabet symbols in descending order of their probabilities. +t then constructs a tree with a symbol at every leaf from the bottom up. This is done in steps where at each step the two symbols with smallest probabilities are selected, added to the top of partial tree, deleted from the list and replaced with an auxiliary symbol representing both of them. /hen the list is reduced to 6ust one auxiliary symbol the tree is complete. The tree is then traversed to determine the codes of the symbols. The huffman method is somewhat similar to shannon fano method. The main difference between the two methods is that shannon fano constructs its codes from top to bottom while huffman constructs a code tree from bottom up. This is best illustrated by an example. -iven five symbols with probabilities as shown in ?igure. They are paired in the following order< ". a9 is combined with aI and both are replaced by the combined symbol a 9I, whose probability is A.&. &. There are now four symbols left, a", with probability A.9, and a &, a8, and a9I, with probabilities A.& each. /e arbitrarily select a 8 and a9I combine them and replace them with the auxiliary symbol a 89I, whose probability is A.9. 8. Three symbols are now left, a", a&, and a89I, with probabilities A.9, A.&, and A.9 respectively. /e arbitrarily select a& and a89I, combine them and replace them with the auxiliary symbol a&89I, whose probability is A.D. 9. ?inally, we combine the two remaining symbols a ", and a&89I, and replace them with a"&89I with probability ".

The tree is now complete, *lying on its sideP with the root on the right and the five leaves on the left. To assign the codes, we arbitrarily assign a bit of " to the top edge, and a bit of A to the bottom edge of every pair of edges. This results in the codes A, "A, """, ""A", and ""AA. The assignments of bits to the edges is arbitrary. The average si!e of this code is A.9 x " M A.& x & M A.& x 8 M A." x 9 M A." x 9 @ &.& bits 0 symbol, but even more importantly, the 2uffman code is not unique.

CHAPTER +

LOSSY COMPRESSION TECHNIQUES


+.1 6OINT PHOTOGRAPHIC E7PERTS GROUP 86PEG9:

DCT :8 DISCRETE COSINE TRANS5ORM9


+n time domain, signal requires lots of data points to represent the time in x-axis E amplitude in y-axis. 5nce the signal is converted into a frequency domain, only a few data points are required to represent the same signal. A color image is composed of pixels1 these pixels have 3-B values, each with its x E y coordinates using FQF matrixes. To compress the gray scale in =$(-, each pixel is translated to luminance. To compress color image, work is 8 times as much.

DCT2CALCULATIONS:

%os in the above table represent amplitude of pixel in FQF blocks. The tale& shows FQF matrixes, after applying .T ;i,6# as "G&, it is called as .T. +n table&, rowA, columnA has . component, others are called as A.

component E it has largervalue than others.

Q:ant ;at #n:


Baseline =$(- algorithm supports 9 quanti!ation tables E & 2uffman tables for .and A. .T coefficients. Ruanti!edcoeff ;i, 6# @ .T ;i, 6#0quantum ;i, 6#1 .Tcoefficients after quanti!ation<

After quanti!ation, =$(- elected to compress A values by utili!ing runlength scheme.To find no of As, =$(- uses !ig!ag manner.

+.2 MO<ING PICTURE E7PERTS GROUP 8MPEG9: Ar"( te"t:re:

+t provides two basic schemes< iscrete transform-based compression for reduction of spatial

redundancy. Block based motion compensation for reduction of temporal redundancy. uring initial stages of compensation is used. M#t #n "#&)en'at #n: +t assumes current picture as the translation of some previous pictures. >otion compensation attempts to compensate for movement of ob6ects in compression phase. To make it easier to compare frames, a frame is split into blocks, and blocks are encoded. ?or each block in the frame to be encoded, best matching block in the reference frame is searched among the number of candidate blocks. ?or each block, motion vector is generated. This vector is viewed as an analytical indication of new location +n the frame being encoded from an existing block in the reference frame. This is an attempt to match up new location of a moved ob6ect. The process of matching up based on )re$ "t #n #r nter)#*at #n. Pre$ "t #n: $rediction requires current frame and reference frame. Based on motion vector values generated, prediction attempts to find a new position of ob6ect E conform it by .omparing some blocks. Inter)#*at #n: .T compression, both >$(- E =$(- algorithms are essentially same. ?or full motion video ;>$(-" E&#, motion

>otion vectors are generated in relation to two reference frames, one from past E the predicted frame. The best matching blocks in both frames are searched, and average is taken as position of block in current frame.

Advanced technologies:
+ntel4s indeo technology Apples quicktime >icrosoft A7+ +ntel4s A7+

You might also like