09 CM0340 Basic Compression Algorithms
09 CM0340 Basic Compression Algorithms
Recap: The Need for Compression Raw Video, Image and Audio les can be very large: Uncompressed Audio 1 minute of Audio:
Audio Type 44.1 KHz 22.05 KHz 11.025 KHz 16 Bit Stereo 10.1 Mb 5.05 Mb 2.52 Mb 16 Bit Mono 5.05 Mb 2.52 Mb 1.26 Mb 8 Bit Mono 2.52 Mb 1.26 Mb 630 Kb
353
Uncompressed Images:
Image Type 512 x 512 Monochrome 512 x 512 8-bit colour image 512 x 512 24-bit colour image File Size 0.25 Mb 0.25 Mb 0.75 Mb
Back Close
Video
Can also involve: Stream of audio plus video imagery. Raw Video Uncompressed Image Frames, 512x512 True Colour, 25 fps, 1125 MB Per Min HDTV Gigabytes per minute uncompressed (1920 1080, true colour, 25fps: 8.7GB per min) Relying on higher bandwidths is not a good option M25 Syndrome. Compression HAS TO BE part of the representation of audio, image and video formats.
354
Back Close
01000101 01001001 01000101 01001001 01001111 = 5 8 = 40 bits The Main aim of Data Compression is nd a way to use less bits per character, E.g.:
E(2bits) I(2bits) E(2bits) I2bits) O(3bits) 2E 2I O
xx bits
yy
xx
yy
zzz = (2 2) + (2 2) + 3 = 11
Note: We usually consider character sequences here for simplicity. Other token streams can be used e.g. Vectorised Image Blocks, Binary Streams.
Back Close
Temporal in 1D data, 1D signals, Audio etc. Spatial correlation between neighbouring pixels or data items Spectral correlation between colour or luminescence components. This uses the frequency domain to exploit relationships between frequency of change in data. Psycho-visual exploit perceptual properties of the human visual system.
Back Close
Back Close
Basic reason: Compression ratio of lossless methods (e.g., Huffman coding, arithmetic coding, LZW) is not high enough.
Back Close
Back Close
Fairly straight forward to understand and implement. Simplicity is their downfall: NOT best compression ratios. Some methods have their applications, e.g. Component of JPEG, Silence Suppression.
Back Close
Compression savings depend on the content of the data. Applications of this simple compression technique include: Suppression of zeros in a le (Zero Length Suppression) Silence in audio data, Pauses in conversation etc. Bitmaps Blanks in text or program source les Backgrounds in simple images Other regular image or data tokens
Back Close
can be encoded as: (1,4),(2,3),(3,6),(1,4),(2,4) How Much Compression? The savings are dependent on the data: In the worst case (Random Noise) encoding is more heavy than original le: 2*integer rather than 1* integer if original data is integer vector/array. MATLAB example code: rle.m (run-length encode) , rld.m (run-length decode)
Back Close
The code is shorter than the pattern giving us compression. A simple Pattern Substitution scheme could employ predened codes
Back Close
Back Close
Token Assignment
More typically tokens are assigned to according to frequency of occurrence of patterns:
367
Count occurrence of tokens Sort in Descending order Assign some symbols to highest count tokens A predened symbol table may be used i.e. assign code i to token T . (E.g. Some dictionary of common words/tokens) However, it is more usual to dynamically assign codes to tokens. The entropy encoding schemes below basically attempt to decide the optimum assignment of codes to achieve the best compression.
Back Close
Lossless Compression frequently involves some form of entropy encoding Based on information theoretic techniques.
Back Close
1 pi log2 pi
where pi is the probability that symbol Si in S will occur. 1 log2 pi indicates the amount of information contained in Si, i.e., the number of bits needed to code Si. For example, in an image with uniform distribution of gray-level intensity, i.e. pi = 1/256, then The number of bits needed to code each gray level is 8 bits. The entropy of this image is 8.
Back Close
Back Close
Encoding for the Shannon-Fano Algorithm: A top-down approach 1. Sort symbols (Tree Sort) according to their frequencies/probabilities, e.g., ABCDE. 2. Recursively divide into two parts, each with approx. same number of counts.
371
Back Close
372
Raw token stream 8 bits per (39 chars) token = 312 bits Coded data stream = 89 bits
Back Close
Ideal entropy = (15 1.38 + 7 2.48 + 6 2.7 +6 2.7 + 5 2.96)/39 = 85.26/39 = 2.19 Number of bits needed for Shannon-Fano Coding is: 89/39 = 2.28
Back Close
Huffman Coding
Based on the frequency of occurrence of a data item (pixels or small blocks of pixels in images). Use a lower number of bits to encode more frequent data Codes are stored in a Code Book as for Shannon (previous slides) Code book constructed for each image or a set of images. Code book plus encoded data must be transmitted to enable decoding.
374
Back Close
Back Close
376
Symbol -----A B C D E
Count ----15 7 6 6 5
log(1/p) Code Subtotal (# of bits) ---------------- ----------------1.38 0 15 2.48 100 21 2.70 101 18 2.70 110 18 2.96 111 15 TOTAL (# of bits): 87
Back Close
Decoding for the above two algorithms is trivial as long as the coding table/book is sent before the data. There is a bit of an overhead for sending this. But negligible if the data le is big. Unique Prex Property: no code is a prex to any other code (all symbols are at the leaf nodes) > great for decoder, unambiguous. If prior statistics are available and accurate, then Huffman coding is very good.
Back Close
Huffman Entropy
In the above example:
378
Ideal entropy = (15 1.38 + 7 2.48 + 6 2.7 +6 2.7 + 5 2.96)/39 = 85.26/39 = 2.19 Number of bits needed for Huffman Coding is: 87/39 = 2.23
Back Close
Back Close
Arithmetic Coding
A widely used entropy coder Also used in JPEG more soon Only problem is its speed due possibly complex computations due to large symbol tables, Good compression ratio (better than Huffman coding), entropy around the Shannon Ideal value. Why better than Huffman? Huffman coding etc. use an integer number (k) of bits for each symbol, hence k is never less than 1. Sometimes, e.g., when sending a 1-bit image, compression becomes impossible.
380
Back Close
Back Close
Basic Idea
The idea behind arithmetic coding is To have a probability line, 01, and Assign to every symbol a range in this line based on its probability, The higher the probability, the higher range which assigns to it. Once we have dened the ranges and the probability line, Start to encode symbols, Every symbol denes where the output oating point number lands within the range.
Back Close 382
Symbol Range A [0.0, 0.5) B [0.5, 0.75) C [0.75, 1.0) The rst symbol in our example stream is B We now know that the code will be in the range 0.5 to 0.74999 . . ..
Back Close
Back Close
Where:
range, keeps track of where the next range should be. high and low, specify the output number. Initially high = 1.0, low = 0.0
Back Close
Back Close
Third Iteration
We now reapply the subdivision of our scale again to get for our third symbol (range = 0.125, low = 0.5, high = 0.625):
388
Symbol Range BAA [0.5, 0.5625) BAB [0.5625, 0.59375) BAC [0.59375, 0.625)
Back Close
Fourth Iteration
Subdivide again (range = 0.03125, low = 0.59375, high = 0.625): Symbol Range BACA [0.59375, 0.60937) BACB [0.609375, 0.6171875) BACC [0.6171875, 0.625)
389
So the (Unique) output code for BACA is any number in the range: [0.59375, 0.60937).
Back Close
Decoding
To decode is essentially the opposite
390
We compile the table for the sequence given probabilities. Find the range of number within which the code number lies and carry on
Back Close
Back Close
So in binary we get 0.1 binary = 211 = 1/2 decimal 0.01 binary = 212 = 1/4 decimal 0.11 binary = 211 + 212 = 3/4 decimal
Back Close
Back Close
To encode message, just send enough bits of a binary fraction that uniquely species the interval.
394
Back Close
Similarly, we can map all possible length 3 messages to intervals in the range [0..1]:
395
Back Close
Implementation Issues
FPU Precision Resolution of the number we represent is limited by FPU precision Binary coding extreme example of rounding Decimal coding is the other extreme theoretically no rounding. Some FPUs may us up to 80 bits As an example let us consider working with 16 bit resolution.
396
Back Close
0.000 0.250 0.500 0,750 1.000 0000h 4000h 8000h C000h FFFFh If we take a number and divide it by the maximum (FFFFh) we will clearly see this: 0000h: 4000h: 8000h: C000h: FFFFh: 0/65535 = 0.0 16384/65535 = 32768/65535 = 49152/65535 = 65535/65535 = 0.25 0.5 0.75 1.0
Back Close
The operation of coding is similar to what we have seen with the binary coding: Adjust the probabilities so the bits needed for operating with the number arent above 16 bits. Dene a new interval The way to deal with the innite number is to have only loaded the 16 rst bits, and when needed shift more onto it: 1100 0110 0001 000 0011 0100 0100 ... work only with those bytes as new bits are needed theyll be shifted.
398
Back Close
Memory Intensive
What about an alphabet with 26 symbols, or 256 symbols, ...? In general, number of bits is determined by the size of the interval. In general, (from entropy) need log p bits to represent interval of size p. Can be memory and CPU intensive
399
Back Close
Basic idea/Example by Analogy: Suppose we want to encode the Oxford Concise English dictionary which contains about 159,000 entries. Why not just transmit each word as an 18 bit number?
Back Close
LZW Constructs Its Own Dictionary Problems: Too many bits per word, Everyone needs a dictionary,
401
Only works for English text. Solution: Find a way to build the dictionary adaptively. Original methods (LZ) due to Lempel and Ziv in 1977/8. Quite a few variations on LZ. Terry Welch improvement (1984), Patented LZW Algorithm LZW introduced the idea that only the initial dictionary needs to be transmitted to enable decoding: The decoder is able to build the rest of the table from the encoded sequence.
Back Close
Back Close
A 19-symbol input has been reduced to 7-symbol plus 5-code output. Each code/symbol will need more than 8 bits, say 9 bits. Usually, compression doesnt start until a large number of bytes (e.g., > 100) are read in.
403
Back Close
Note (Recall): LZW decoder only needs the initial dictionary: The decoder is able to build the rest of the table from the encoded sequence.
Back Close
LZW Decompression Algorithm Example: Input string is "WED<256>E<260><261><257>B<260>T" w k output index symbol --------------------------------------- W W 256 W W E E 257 WE E D D 258 ED D <256> W 259 D <256> E E 260 WE E <260> WE 261 E <260> <261> E 262 WEE <261> <257> WE 263 EW <257> B B 264 WEB B <260> WE 265 B <260> T T 266 WET
405
Back Close
norm2lzw.m: LZW Encoder lzw2norm.m: LZW Decoder lzw demo1.m: Full MATLAB demo More Info on MATLAB LZW code
Back Close
Transform Coding
A simple transform coding example A Simple Transform Encoding procedure maybe described by the following steps for a 2x2 block of monochrome pixels: 1. Take top left pixel as the base value for the block, pixel A. 2. Calculate three other transformed values by taking the difference between these (respective) pixels and pixel A, i.e. B-A, C-A, D-A. 3. Store the base pixel and the differences as the values of the transform.
408
Back Close
Simple Transforms
Given the above we can easily form the forward transform:
409
X0 X1 X2 X3
= = = =
A BA C A DA
Back Close
Back Close
Example
Consider the following 4x4 image block: 120 130 125 120 then we get: X0 X1 X2 X3 = = = = 120 10 5 0
411
We can then compress these values by taking less bits to represent the data.
Back Close
Practical approaches: use more complicated transforms e.g. DCT (see later)
Back Close
Differential Encoding
Simple example of transform coding mentioned earlier and instance of this approach. Here: The difference between the actual value of a sample and a prediction of that values is encoded. Also known as predictive encoding. Example of technique include: differential pulse code modulation, delta modulation and adaptive pulse code modulation differ in prediction part. Suitable where successive signal samples do not differ much, but are not zero. E.g. Video difference between frames, some audio signals.
Back Close 414
MATLAB Complete (with quantisation) DPCM Example dpcm demo.m.m: DPCM Complete Example dpcm.zip.m: DPCM Support FIles
Back Close
Back Close
Transformation from one domain time (e.g. 1D audio, video:2D imagery over time) or Spatial (e.g. 2D imagery) domain to the frequency domain via Discrete Cosine Transform (DCT) Heart of JPEG and MPEG Video, (alt.) MPEG Audio. Fourier Transform (FT) MPEG Audio
Back Close
Back Close
Vector Quantisation
The basic outline of this approach is: Data stream divided into (1D or 2D square) blocks vectors A table or code book is used to nd a pattern for each block. Code book can be dynamically constructed or predened. Each pattern for block encoded as a look value in table Compression achieved as data is effectively subsampled and coded at this level. Used in MPEG4, Video Codecs (Cinepak, Sorenson), Speech coding, Ogg Vorbis.
Back Close 420
Search Engine: Group (Cluster) data into vectors Find closest code vectors On decode output need to unblock (smooth) data
Back Close
Vector Quantisation Code Book Construction How to cluster data? Use some clustering technique, e.g. K-means, Voronoi decomposition Essentially cluster on some closeness measure, minimise inter-sample variance or distance.
422
Back Close
Vector Quantisation Code Book Construction How to code? For each cluster choose a mean (median) point as representative code for all points in cluster.
423
Back Close
Consider Vectors of 2x2 blocks, and only allow 8 codes in table. 9 vector blocks present in above:
Back Close
Vector Quantisation Image Coding Example (Cont.) 9 vector blocks, so only one has to be vector quantised here. Resulting code book for above image
425
Back Close