0% found this document useful (0 votes)
28 views

Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri

The document discusses various techniques for compressing data including images and video. It describes run length encoding which replaces long runs of repeating bits with the number of repeats. Frequency dependent codes like Huffman coding assign shorter codes to more frequent characters/symbols. JPEG compression for images is lossy and works by dividing an image into blocks, applying DCT, quantizing coefficients, and encoding. P and B frames in video compression rely on differences between current and previous/future frames.

Uploaded by

arvind985
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views

Data Compression: by Dilip Jha Assistant Prof. GBPEC, Pauri

The document discusses various techniques for compressing data including images and video. It describes run length encoding which replaces long runs of repeating bits with the number of repeats. Frequency dependent codes like Huffman coding assign shorter codes to more frequent characters/symbols. JPEG compression for images is lossy and works by dividing an image into blocks, applying DCT, quantizing coefficients, and encoding. P and B frames in video compression rely on differences between current and previous/future frames.

Uploaded by

arvind985
Copyright
© Attribution Non-Commercial (BY-NC)
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
You are on page 1/ 21

Data By Dilip Jha Compression

Assistant Prof. G B P E C , Pauri

Introduction
Fax machine: 40000 DPSI => 4 million dots per page 56 KBPS modem, time to transmit =? Video: 30 pictures per second Each picture = 200,000 dots or pixels 8-bits to represent each primary color Bits required for one picture = ? Two hour movie requires = ?

Introduction
Compression is a way to reduce the number of bits in a frame but retaining its meaning. Decreases space, time to transmit, and cost Technique is to identify redundancy and to eliminate it If a file contains only capital letters, we may encode all the 26 alphabets using 5-bit numbers instead of 8-bit

Introduction
If the file had n-characters, then the savings = (8n-5n)/8n => 37.5%

Frequency Dependent Codes


Not all the characters appear with same frequency, some are more prevalent than the others Frequently appearing characters could be assigned shorter codes than the others => results in reduced number of bits Such codes are examples of frequency dependent code

Frequency Dependent Codes


Huffman code: (illustrated with a manageable example) Letter Frequency (%) A 25 B 15 C 10 D 20 E 30

Frequency Dependent Codes


Huffman code: Code formation - Assign weights to each character - Merge two lightest weights into one root node with sum of weights (if multiple? Not unique code) - Repeat until one tree is left - Traverse the tree from root to the leaf (for each node, assign 0 to the left, 1 to the right)

Frequency Dependent Codes


Huffman code: Code Interpretation - No prefix property: code for any character never appears as the prefix of another code (Verify) - Receiver continues to receive bits until it finds a code and forms the character - 01110001110110110111 (extract the string)

Run Length Encoding


Huffman code requires: - frequency values - bits are grouped into characters or units Many items do not fall into such category - machine code files - facsimile Data (bits corresponding to light or dark area of a page)

Run Length Encoding


For such files, RLE is used. Instead of sending long runs of 0s or 1s, it sends only how many are in the run. 70%-80% space is white on a typed character space, so RLE is useful.
A

Run Length Encoding


Runs of the same bit In facsimile Data, there are many 0s (white spots) > transmit the runlength as fixed size binary integer Receiver generates proper number of bits in the run and inserts the other bit in between 14 zeros, 1, 9 zeros, 11, 20 zeros, 1, 30 zeros, 11, 11 zeros (number of zeros encoded in 4-bits)

Run Length Encoding


Runs of the same bit Code: 1110 1001 0000 1111 0101 1111 1111 0000 0000 1011 (next value after 1111 is added to the run) SAVINGS IN BITS: ? If the stream started with 1 instead? Best when there are many long runs of zeros, with increased frequency of 1s, becomes less efficient.

Run Length Encoding


Runs with different characters Send the actual character with the run-length HHHHHHHUFFFFFFFFFYYYYYYYYYYYD GGGGG code = 7, H, 1, U, 9, F, 11, Y, 1, D, 5, G
SAVINGS IN BITS (considering ASCII): ?

Run Length Encoding


Run lengths may vary from 0 to 1728 -> many Possibilities and inefficiency with a fixed size code Some runs occur more frequently than others, e.g. most typed pages contain 80% white pixels, spacing between letters is fairly consistent => probabilities of certain runs are predictable

Relative Encoding
Relative Encoding: Some applications may not benefit from the above: video image -> little repetitive within, but much repetition from one image to the next Differential encoding is based on coding only the difference from one to the next

Relative Encoding
Relative Encoding: 1234 1334 0100 2537 2537 0000 3648 3647 0 0 0 -1 4759 3759 -1 0 0 0 1st Frame 2nd Frame Difference Resulting difference can be RLE.

Image Representation
BW pixels each represented by 8-bit level Color composed of R, G, B primaries, each is represented by 8-bit level -> Each color pixel can be represented by one of 28 .28.28 = 224 colors VGA screen: 640 * 480 pixels -> 640 * 480 * 24 = 7, 372, 800 bits

Image Compression
JPEG compression both for grayscale and color images Previous compression methods were lossless it was possible to recover all the information from the compressed code JPEG is lossy: image recovered may not be the same as the original

JPEG Compression
Case 1: all Ps are same => image of single color with no variation at all, AC coefficients are all zeros. Case 2: little variation in Ps => many, not all, AC coefficients are zeros. Case 3: large variation in Ps => a few AC coefficients are zeros.

JPEG Compression
Quantization: Provides an way of ignoring small differences in an image that may not be perceptible. Another array Q is obtained by dividing each element of T by some number and rounding-off to nearest integer => loss

Multimedia Compression
I (intra-picture) - frame: Just a JPEG encoded image. P (predicted) frame: Encoded by computing the differences between a current and a previous frame. B (bidirectional) - frame: Similar to Pframe except that it is interpolated between previous and future frame.

You might also like