Jpeg PDF
Jpeg PDF
1. Introduction
The purpose of this project is to create a decoder program in C that can interpret bit streams from .JPG files and
display the result to the screen using the Windows API. Since there are multiple steps in this process, modular
programming will be practiced. That is, each stage of the decoding process will be a component that has well-defined
input and output. As such, the project will serve as a good review both for ELE 201 and COS 217. Someone who has
taken both of these courses should be able to understand my code and report.
NOTE: During incremental development and debugging, the GCC c complier was used under Cygwin, and the open-
source hex editor “Hack” was used to make sure I was on the right track understanding JPG files. The final executable
program was compiled with Dev C++ for Windows, linking with the Windows API libraries.
NOTE ALSO: In this discussion, I will precede hex values with the notation 0x (e.g. 0x18 is really 24 in base 10).
This notation is also convenient since each hex value is a nibble.
2. Background Information
An uncompressed, color bitmap image, usually uses 24 bits for each pixel: 8 for the red component, 8 for the
blue component, and 8 for the green component (RGB format). Although bitmap files are very straightforward to
interpret, they store much unneeded data. For instance, there is a certain level of “redundancy” in most images. Parts
of the image may not change very quickly, and there is often a very strong correlation between pixels in a region. Also,
bitmap does not fully take advantage of all the properties of the human visual system, and thus stores more information
than we can even detect under some circumstances. JPEG is a (usually) lossy method of compression that attempts to
save space by taking advantage of these properties.
2.1: RGB to YCbCr and chroma subsampling
As it turns out, humans are much less sensitive to changes in chromiance, or color, than they are to changes in
luminance, or brightness, in an image (Hass, Calvin). To take advantage of this, one of the first steps of JPEG is to
make a “color space transformation.” This maps each pixel, which is ordinarily stored in RGB format, to a new format:
YCbCr (Y is Luminance, Cb is chroma blue, and Cr is chroma red). The mappings are as follows (Cuturicu, Cristi):
RGB to YCbCr (the transform an encoder uses):
Y = 0.299*R + 0.587*G + 0.114*B (this weighting shows that the eye is more sensitive to green than it is to red and
blue when it comes to perceived brightness)
Cb = -0.1687*R – 0.3313*G + 0.5*B + 128
Cr = 0.5*R – 0.4187*G – 0.0813*B + 128
YCbCr to RGB (the one the decoder must use)
R = Y + 1.402 * (Cr – 128)
G = Y – 0.34414*(Cb – 128) – 0.71414*(Cr – 128)
B = Y + 1.772 * (Cb - 128)
It should be noted that, convert a color image to a grayscale image, one merely has to calculate the Y component from
the original RGB values for each pixel, and then to do the inverse mapping back to RGB with Cb and Cr values of zero.
The Cb and Cr components are less vital to the perceived quality of the image than the Y component, so the
encoder can choose to quantize these values more harshly (more about quantization later), or it can “subsample.” This
means that instead of taking the Cb and Cr values at every pixel like an encoder would with the Y values, the encoder
can choose to take them every other pixel, or perhaps only once every 2x2 block. The most common Y Cb Cr sampling
schemes are as follows (Hass):
*1x1 chroma subsampling (Cb and Cr values are taken at every pixel)
*2x1 chroma subsampling (For every 2x2 block of pixels, the Cb and Cr values are taken from one 2x1 column)
*2x2 chroma subsampling (For every 2x2 block of pixels, the Cb and Cr values are only taken from one pixel). 2x2 is
the most common for JPEG, and it is the one that seems to be used by MS Paint.
2.2 The Discrete Cosine Transform on 8x8 blocks
Instead of representing a JPEG image as a 2-dimensional spatial set of pixels, it is more efficient to transform it
somehow into the frequency domain of separable, 2-dimensional cosine functions. This addresses part of the issue of
the original image having redundancy; in an image with little detail and little change, the higher frequency components
will be very small and can thus be compressed more (this option is not available in the spatial domain). A clever
method is to perform the transform on 8x8 sub-blocks of the image instead of the whole image at once. This way, if
there is an 8x8 block that is nearly constant and has little detail, it can be systematically compressed, while an 8x8
block that has more detail in the higher frequency components can retain more detail.
The Fourier analysis method of choice for JPEG is the “Discrete Cosine Transform” (DCT). The DCT assumes
an even extension of the function on which it operates, so it only has cosine components for each frequency. This
means that for an 8x8 block of 64 pixels, 64 DCT coefficients are needed. On the other hand, the complex Discrete
Fourier Transform would need 128 frequency coefficients (64 for real components and 64 for imaginary components),
so it would not be a wise choice. The formulas for the DCT and the Inverse DCT (the one my decoder must use) are
listed below: (Cuturicu)
*NOTE: Since Y, Cb, and Cr components are stored in an unsigned byte (range from 0 to 255), a DC offset of 128 must
be subtracted from the components before performing the transform
Forward 8x8 DCT
All of the header structs and the functions that fill them in are defined in the JPGObj.c.
3.2 Decoding the Data Section
Once the header has been parsed properly, decoding the data section is relatively straightforward. The program
decodes one “minimum coded unit” (MCU) at a time (Hass). An MCU includes one 8x8 block of the lowest sampled
component and however many 8x8 blocks are needed of the other components to accommodate those pixels. The most
common MCU occurs with 2x2 subsampling, in which 4 8x8 Y components are decoded (upper left, upper right, lower
left, lower right), followed by one 8x8 block of subsampled Cb components, followed by one 8x8 block of subsampled
Cr components. Hence, each MCU in this scheme takes up a 16x16 pixel block. For each of the 8x8 component
blocks that make up the MCU, the program locates the appropriate DC and AC Huffman tables and quantization table,
and it does the Huffman and run-length coding in reverse.
The trickiest algorithmic part of the decoding process was designing a function to fill in binary trees based on
the “define Huffman table” portion of the header. To do this, I designed a recursive function in the file HTree.c (all of
the decoding functions reside here) called setValues, which fills in the bytes from left to right at a particular level (the
caller looped through each level from 1 to 16 and filled in the values at that level). Then, whenever I encountered a
zero bit in the file stream, I knew to go to the left child of a node in this tree, and whenever I found a 1 bit, I knew to go
to the right (Hass). I repeated this process until I reached a leaf node. At this point, I retrieved the byte that
corresponded to the bit string I had read so far, and I performed run-length decoding to recover the quantized
coefficients. Finally, I looped through and multiplied the coefficients by the values in the quantization table to
“dequantize” them. This process was repeated until all of the coefficients were decoded, and the program was then
ready to do the inverse DCT on a block and to render an RGB image to the screen.
4. Results/Conclusion
The program works surprisingly well on most JPEG images. There is only one known bug at present that gives
rise to slight artifacts. When I first got all of the blocks rendered to the screen in order, I noticed that there seemed to
be a high amount of scattered noise throughout the image. I recalled that during the Inverse DCT phase, I had stored
the Y, Cb, and Cr values in floating point format for maximum precision, but I then had to store my RGB values back
on an unsigned byte after conversions. I noticed that before casting RGB values to a byte, some of them were slightly
negative or slightly over 255 (both outside of the range of an unsigned byte). This meant that when they were casted,
the most significant information was usually lost, and they would take on seemingly random values. I tried halving
each of the DCT coefficients to prevent this type of overflow/underflow, but the cost in contrast in my images was too
great (they all became very dim). I found that, perceptually, a better solution was to map all RGB values over 255 to
255 before conversion, and to map all RGB values below 0 to 0. I cannot explain why this happened, but perhaps it has
something to do with deficiencies of floating point numbers, or perhaps there was something wrong with my
quantization tables that caused the amplitudes of certain frequencies to be too great.
In spite of this minor problem, I am very pleased with how well the program works. Although debugging was
quite painful at times, this project met the design specifications and was an overall success. My program can display
JPEG images to the screen of resolution 800x800 or less in full color. This was an excellent learning experience for me
as well. Not only did it solidify my understanding of coding methods, but I had never dealt with a binary file directly
before. Additionally, this project can be easily extended because of the code and data types that have already been
mapped out. One natural avenue for further work is a JPEG encoder. If I decide to proceed to the encoder phase, I will
probably use my program for JPEG steganography.
Example executions of the “verbose” command are shown on the next page. First, I typed verbose lowquality1.jpg at
the prompt, and then I typed verbose highquality1.jpg at the prompt. These two images were generated with the help
of “The Gimp,” an open-source Photoshop equivalent. The Gimp had a slider that allowed me to specify percent
quality of the images. All of my low quality images were saved with 10% quality, while all of my high quality images
were saved with 100% quality. Notice how the quantization table for the high quality image is all ones (no
quantization), while the low quality table is much more coarse (all but the low frequencies have quantization divisors of
255, the max possible).
The “display” command is much more straightforward. Simply type display followed by the name of the image to
display (e.g display photo1.jpg). It is also possible to display your own images if you provide an absolute path to your
image (e.g. display C:\images\image1.jpg). NOTE: It may take a few seconds to display images because of the
computationally intense inverse DCT. Also, to view my source code, double click on the file source.bat.
Bibliography
Cuturicu, Cristi. “CRYX's note about the JPEG decoding algorithm.” 4 January, 2008
<http://www.opennet.ru/docs/formats/jpeg.txt>.
Hass, Calvin. “JPEG Snoop – JPEG File Decoding Utility.” 4 January, 2008
<http://www.impulseadventure.com/photo/jpeg-snoop.html>.
King, K.N. C Programming: A Modern Approach. New York: Norton & Company, 1996.
<http://docs.freebsd.org/info/gcc/gcc.info.Variable_Attributes.html>.
<http://www.obrador.com/essentialjpeg/headerinfo.htm>
<http://www.toymaker.info/Games/html/windows_api.html>.