0% found this document useful (0 votes)
10 views11 pages

Itc 11

The document discusses dictionary-based source coding techniques that enhance data compression by leveraging recurring patterns in data, particularly in text sources. It covers static and adaptive methods, including digram coding and the LZ77 and LZ78 algorithms, which utilize dictionaries to encode sequences efficiently. The effectiveness of these techniques is dependent on the size and adaptability of the dictionary used for encoding patterns.

Uploaded by

sethu101286
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views11 pages

Itc 11

The document discusses dictionary-based source coding techniques that enhance data compression by leveraging recurring patterns in data, particularly in text sources. It covers static and adaptive methods, including digram coding and the LZ77 and LZ78 algorithms, which utilize dictionaries to encode sequences efficiently. The effectiveness of these techniques is dependent on the size and adaptability of the dictionary used for encoding patterns.

Uploaded by

sethu101286
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Information Theory and Coding

Dr. Waquar Ahmad

National Institute of Technology Calicut


[email protected]

Dr. Waquar Ahmad (NITC) ITC 1 / 11


Dictionary based Source Coding- Overview

Till now we discussed the coding techniques that assume a source


that generates a sequence of independent symbols.
Now we will look at techniques that incorporate the structure in the
data in order to increase the amount of compression.
These techniques both static and adaptive (or dynamic) in nature and
build a list of commonly occurring patterns and encode these patterns
by transmitting their index in the list.
They are most useful with sources that generate a relatively small
number of patterns quite frequently, such as text sources and
computer commands.

Dr. Waquar Ahmad (NITC) ITC 2 / 11


Dictionary based Source Coding

Recurring patterns are common in many applications, such as text


sources where certain words or sequences repeat frequently.
Encoding frequently occurring patterns with references to a dictionary
can result in more efficient compression of the source output.
This approach involves dividing the input into two classes: frequently
occurring patterns and infrequently occurring patterns.
The effectiveness of this technique relies on the size of the dictionary,
which should be much smaller than the total number of possible
patterns.
The utility of the encoding scheme depends on the probability of
encountering patterns from the dictionary.

Dr. Waquar Ahmad (NITC) ITC 3 / 11


Digram Coding

Digram coding is a common form of static dictionary coding used in


data compression.
The dictionary in digram coding consists of all letters of the source
alphabet and frequently used pairs of letters, known as digrams.
The digram encoder checks if a two-character input exists in the
dictionary; if so, it encodes the corresponding index, otherwise, it
encodes the first character and moves to two next symbols.

Dr. Waquar Ahmad (NITC) ITC 4 / 11


Example

Suppose we have a source with a five-letter alphabet


A = (a, b, c, d, r ). Based on knowledge about the source, we build
the dictionary shown in the Table given below

Dr. Waquar Ahmad (NITC) ITC 5 / 11


Adaptive Dictionary

Adaptive-dictionary-based techniques originated from landmark


papers by Jacob Ziv and Abraham Lempel in 1977 and 1978.
The 1977 paper introduced approaches belonging to the LZ77 family
(LZ1), while the 1978 paper introduced approaches belonging to the
LZ78 family (LZ2).
Adaptive dictionary techniques dynamically adjust the dictionary
based on the input data stream.
The dictionary evolves as new symbols are encountered, allowing for
efficient encoding of recurring patterns.

Dr. Waquar Ahmad (NITC) ITC 6 / 11


LZ-77

The LZ77 approach utilizes a dictionary that is a portion of the


previously encoded sequence.
The encoder operates with a sliding window mechanism, consisting of
a search buffer and a look-ahead buffer.
The search buffer holds a portion of the recently encoded sequence,
while the look-ahead buffer contains the next portion to be encoded.
To encode the sequence in the look-ahead buffer, the encoder moves
a search pointer back through the search buffer until it encounters a
match to the first symbol in the look-ahead buffer.
The distance of the pointer from the look-ahead buffer is called the
offset.

Dr. Waquar Ahmad (NITC) ITC 7 / 11


LZ-77

The encoder then examines the symbols following the symbol at the
pointer location to see if they match consecutive symbols in the
look-ahead buffer.
The number of consecutive symbols in the search buffer that match
consecutive symbols in the look-ahead buffer, starting with the first
symbol, is called the length of the match.
The encoder searches the search buffer for the longest match. Once
the longest match has been found, the encoder encodes it with a
triple < o, l, c >, where o is the offset, l is the length of the match,
and c is the codeword corresponding to the symbol in the look-ahead
buffer that follows the match.

Dr. Waquar Ahmad (NITC) ITC 8 / 11


LZ-77-Example

Suppose the length of the window is 13, the size of the look-ahead
buffer is six encode the following
cabracadabrarrarrad

Dr. Waquar Ahmad (NITC) ITC 9 / 11


THE LZ78

In LZ78, the inputs are coded as a double < i, c > with i being an
index corresponding to the dictionary entry that was the longest
match to the input.
c being the code for the character in the input following the matched
portion of the input.

Dr. Waquar Ahmad (NITC) ITC 10 / 11


LZ 78

Let us encode the following sequence using the LZ78 approach.


wabba · wabba · wabba · wabba · woo · woo · woo
where · stands for space

Dr. Waquar Ahmad (NITC) ITC 11 / 11

You might also like