Data Reduction Tech For Graphics-REPORT
Data Reduction Tech For Graphics-REPORT
Year, 2006
Prepared By : Guided By :
Radadia Archana G. Mrs. Vaikhari Deodhar
Roll No. – 46 Mrs. Mayuri Mehta
Computer Department
CERTIFICATE
Place :Surat
Date :
I would also like to offer my gratitude towards DIC, Mr. Keyur Rana of
Computer Engg. Department, who helped me by giving their valuable suggestions and
encouragement which not only helped me in preparing this report but also in having a
better insight in this field.
One of the reasons that the World Wide Web has become so popular is that it
allows for the easy display of graphics. While most of the information content publish on
the Web is still textual in nature, graphics make pages more visually appealing,
supplement textual information, and supply information that can only be transmitted
visually. There must be some way to make this transmission of image on the Web faster.
So this article includes some basic and universal techniques for the reduction of
graphics file size to reduce the traffic on the net. Here two Universal graphics file for the
net which are GIF and JPEG have been discussed and algorithms for these reduction
techniques are given. In this article some features of GIF and JPEG file format is also
discussed.
NO CHAPTER PAGE
NO.
1 Introduction 1
2 Data Compression 2
Bibliography 22
1. INTRODUCTION
If this lament sounds all too familiar, you are not alone. The network's backbone isn't the
problem; it's what happens at each end that frustrates users. The increasing size of digital
media and lack of server bandwidth are the main culprits. More bandwidth won't
necessarily solve this problem. What will help is minimizing the amount of data that
travels through this bandwidth.
The secret of shrinking graphic file size is reduction of bit depth resolution, and
dimension while preserving image quality. This classic size-versus-quality tradeoff
is the key to the art and science of graphics compression.
Color Palettes
The two ways to store color raster images are indexed and RGB. Indexed formats
are indexed, or mapped to a 256-color (or less) lookup table (CLUT) or color
palette. RGB formats, also known as true color, use 8 bits (0 to 255) for each Red,
Green, and Blue value to form a 24-bit pixel (8+8+8=24) palette which can hold
one of 16.7 million colors (2^24=16,777,216 colors). Some formats support even
higher bit depths, useful for medical imaging or smooth transparencies.
2. DATA COMPRESSION
A code is mapping of source messages (words from the source alphabet alpha)
into code words (word from the code alphabet beta). The source messages are the basic
units into which the string to be represented is partitioned. These basic units may be
single symbols from the source alphabet, or they may be string of symbols. For string
EXAMPLE, alpha = {a, b, c, d, e, f, g, space}. For purposes of explanation, beta will be
taken to {0, 1}. Codes can be categorized as block-block, block-variable, variable-block
or variable-variable, where block-block indicates that the source messages and codeword
are of fixed length available variable codes map variable-length source messages into
Variable length codeword.
Dynamic Method
A code is dynamic if the mapping from the set of messages to the set of code
words changes over time. For example, dynamic Huffman coding involves computing an
approximation to the probabilities of occurrence “on the fly”, as the ensembles is being
transmitted. The assignment of code words to messages is based on the values of the
relative’s frequencies of occurrence at each point in time. A message x may be
represented by a short codeword early in the transmission because it occurs frequently at
the beginning of the ensemble, even though its probability of occurrence over the total
ensemble is low. Later, when the more probable message begins to occur with higher
frequency, the short codeword will be mapped to one of the higher probability messages
and x will be mapped to longer codeword.
As an illustration, below figure represent a dynamic Huffman code table corresponding to
the following string.
Dynamic codes are also referred in literature as adaptive, in that they adapt to changes in
ensemble characteristic over time.
The essential figure of merit for data compression is the “compression ratio” of the
file size of a compressed file to the original uncompressed file.
3. TYPES OF COMPRESSION
Lossy compression techniques involve some loss of information, and data that
have been compressed using lossy techniques generally cannot be recovered or
reconstructed exactly. When the compressed message is decoded it does not give back the
original message. Data has been lost. In return for accepting this distortion in the
reconstruction, we can generally obtain much higher compression ratio than is possible
with lossless compression. Lossy compression produces a much smaller compressed
file than lossless compression. Because Lossy compression cannot be decoded to yield
the exact original message, it is not good method of compression for critical data, such as
textual data.
It is most useful for Digitally Sampled Analog Data (DSAD). DSAD consists
mostly of sound, video, graphics, or picture files. Algorithms for Lossy compression of
DSAD vary, but many use a threshold level truncation. This means that a level is chosen
past which all data is truncated. In a sound file, for example, the very high and low
frequencies, which the human ear cannot hear, may be truncated from the file.
The following algorithms are the example of Lossy compression.
· JPEG Compression
· MPEG Compression
4. LOSSLESS ALGORITHM
Step 1
List the source symbols in decreasing probability order. (The symbols A through G for
the example are already labeled in decreasing probability order).
Step 2
Combine the two least-probability symbols into a single entity (call it "fg" in Stage 2
for the example), associating with it the sum of the probabilities of the individual
symbols in the combination. Assign to each of the symbols that were combined one of the
binary digits, "0" for upper symbol , "1" for the lower one (F à"0", G à"1" in the
example).
Step 3
Re-order, if necessary, the resulting symbol list in decreasing probability order, treating
any combined symbols from the previous list as a single symbol. If the list now has only
2 symbols, assign to each of them one of the binary digits (upper symbol à0, lower one
à1) and go to Step 4. Otherwise go to Step 2.
Step 4
Read off the binary codeword for the source symbols, from right to left. (Note:
Assignment of code letters "0" and "1" to upper and lower symbols may be done in any
way as long as it is consistent throughout. For example, text uses "1" for upper and "0"
for lower. If this was done for the example below, all the codeword would have "1" and
· Adaptive Huffman code dynamically changes the code words according to the change
of probabilities of the symbols.
· Extended Huffman compression can encode groups of symbols rather than single
symbols. Huffman compression is mainly used in compression programs like pkZIP, lha,
gz, zoo and arj. It is also used within JPEG and MPEG compression.
Arithmetic coding is a technique for coding that allows the information from the
messages in a message sequence to be combined to share the same bits. The technique
allows the total number of bits sent to asymptotically approach the sum of the self
information of the individual messages the drawback with Huffman coding, is that we
assign an integer number of bits as the code for each symbol. The code is thus optimal
when each symbol has an occurrence probability of the description of the algorithm in
pseudo code for implementation.
Set Low to 0
Set High to 1
While there are input symbol do
Take a symbol
CodeRange=High-Low
High=low+CodeRange*Highrange(symbol)
Low=Low+CodeRange*LowRange(symbol)
End of while
Output Low
EXAMPLE:-
The message to be encoded is “ARITHMETIC”. There are ten symbols in the message.
The probability distribution is given follow.
Symbol Subinterval
A 1/10
C 1/10
E 1/10
H 1/10
I 2/10
M 1/10
R 1/10
T 2/10
Each character is assigned the portion of the starting interval [0, 1). The size of the
interval corresponds to symbol’s probability of appearance.
Symbol Subinterval
A 0.00-0.10
C 0.10-0.20
E 0.20-0.30
H 0.30-0.40
I 0.40-0.60
M 0.60-0.70
R 0.70-0.80
T 0.80-1.00
One other problem is the fact that the binary fraction that is output by the
arithmetic coder is of indefinite length, and the decoder has no idea of where the string
ends if it's not told. In practice, a length header can be sent to indicate how long the
fraction is, or an end of-transmission symbol of some sort can be used to tell the decoder
where the end of the fraction is.
The neat thing about arithmetic coding is that by amassing a complete message
into a single probability interval value, individual characters can be encoded with the
equivalent of fractional values of bits. Huffman coding requires an integer number of bits
for each character, and so this is one of the reasons that arithmetic coding is in general
more efficient than Huffman coding.
This algorithm is very easy to implement and does not require much CPU
horsepower. RLE compression is only efficient with files that contain lots of repetitive
data. These can be text files if they contain lots of spaces for indenting but line-art
images that contain large white or black areas are far more suitable. Computer generated
color images (e.g. architectural drawings) can also give fair compression ratios. The
algorithm is easy to implement (in hardware, if necessary) and runs very quickly.
• If part of the data is lost or corrupted, all or nearly all of the rest of the data can be
reconstructed. Not true of many compression techniques!
• If the data is not suitable to run-length encoding (and it’s easy to construct data that
isn’t), then this encoding can be larger than the original.
4.2.1 Introduction
4.2.2 LZW
-Decompression Algorithm
buffer;
b. If not, go to step 5;
3. Are there any more symbols in the temporary buffer?
-Advantages of LZW
· LZW compression works best for files containing lots of repetitive data. This is often
the case with text and monochrome images. Files that are compressed but that do not
contain any repetitive information at all can even grow bigger!
· LZW compression is fast.
5. LOSSY ALGORITHM
JPEG is a lossy compression scheme for color and gray-scale images. It works
on full 24-bit color, and was designed to be used with photographic material and
naturalistic artwork. It is not the ideal format for line-drawings, textual images, or other
images with large areas of solid color or a very limited number of distinct colors.
JPEG is designed so that the loss factor can be tuned by the user to tradeoff image
size and image quality, and is designed so that the loss has the least effect on human
perception. It however does have some anomalies when the compression ratio gets high,
such as odd effects across the boundaries of 8x8 blocks. For high compression ratios,
other techniques such as wavelet compression appear to give more satisfactory results.
JPEG is designed for compressing full-color or gray-scale images of natural, real-world
scenes. It works well on photographs, naturalistic artwork, and similar material; not so
well on lettering, simple cartoons, or line drawings. JPEG handles only still images, but
there is a related standard called MPEG for motion pictures.
JPEG is used to make your image files smaller, and to store 24-bit-per-pixel
color data instead of 8-bit-per-pixel data. Making image files smaller is a win for
There are four steps in JPEG compression algorithm. The first step is to extract
an 8x8 pixel block from the picture. The second step is to calculate the discrete cosine
transform for each element in the block. Third, quantized rounds off the discrete cosine
transform (DCT) coefficient according to the specified image quality (this phase is where
most of the original image information is lost, thus it is dubbed the Lossy phase of the
JPEG algorithm). Fourth, the coefficients are compressed using an encoding scheme
such as Huffman Coding or Arithmetic Coding. The final compressed code is then
written to the output file.
In JPEG Compression, the black-and-white drawing with hard edges is used for
demonstration because we can clearly see how the compression techniques affect the
image. But the demonstration isn’t really fair because JPEG wasn’t designed for these
type of images. With a true photo realistic image, the changes aren’t nearly visible, at
least at low compression level.
The examples below compare a 24-color uncompressed image on the above with
JPEG compressed at a very low level on the below. If you can see any differences, you
have got better eyes than I do, and yet the JPEG file is about one twenty fifth the size of
the uncompressed image. The lightly compressed image on the below is virtually
identical to the uncompressed image on above. You can definitely see a difference in
JPEG gets better compression. JPEG typically gets about 10:1 at the lowest
compression levels and up to 200:1 and more at the highest levels of compression. At a
medium compression level, where the quality loss is only slightly apparent, 30:1 is a
typical ratio.
In addition to reducing the data rate, MPEG has several important features. The
movie can be played forward or in reverse, and at either normal or fast speed. The
encoded information is random access, that is, any individual frame in the sequence can
be easily displayed as a still picture. This goes along with making the movie editable,
meaning that short segments from the movie can be encoded only with reference to
themselves, not the entire sequence. The main distortion associated with MPEG occurs
when large sections of image change quickly. In effect, a burst of information is needed
to keep up with the rapidly changing scenes. If the data rate is fixed, the viewer notices
“blocky” patterns when changing from one scene to next. This can be minimized in
networks that transmit multiple video channels simultaneously, such as cable television.
CONCLUSION
BIBLIOGRAPHY
Websites:
Books: