0% found this document useful (0 votes)

20 views19 pages

Chapter 5 New

Uploaded by

Legesse Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views19 pages

Chapter 5 New

Uploaded by

Legesse Samuel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Chapter 75

Data Compression
Introduction
Data compression is often referred to as coding, where coding is a very general term encompassing
any special representation of data which satisfies a given need.

Definition: Data compression is the process of encoding information using fewer number of bits so
that it takes less memory area (storage) or bandwidth during transmission.
Two types of compression:
 Lossy data compression
 Lossless data compression

Lossless Data Compression: in lossless data compression, the original content of the data is not
lost/changed when it is compressed (encoded).

Examples:
RLE (Run Length Encoding)
Dictionary Based Coding
Arithmetic Coding

Lossy data compression: the original content of the data is lost to certain degree when compressed.
Part of the data that is not much important is discarded/lost. The loss factor determines whether there
is a loss of quality between the original image and the image after it has been compressed and played
back (decompressed). The more compression, the more likely that quality will be affected. Even if
the quality difference is not noticeable, these are considered lossy compression methods.

Examples
JPEG (Joint Photographic Experts Group)
MPEG (Moving Pictures Expert Group)
ADPCM

Information Theory

Information theory is defined to be the study of efficient coding and its consequences. It is the field
of study concerned about the storage and transmission of data. It is concerned with source coding
and channel coding.
Source coding: involves compression
Channel coding: how to transmit data, how to overcame noise, etc

Data compression may be viewed as a branch of information theory in which the primary objective
is to minimize the amount of data to be transmitted.
Fig Information coding and transmission

Need for Compression

With more colors, higher resolution, and faster frame rates, you produce better quality video, but you
need more computer power and more storage space for your video. Doing some simple calculations
(see below) it can be shown that with 24-bit color video, with 640 by 480 resolutions, at 30 fps,
requires an astonishing 26 megabytes of data per second! Not only does this surpass the capabilities
of the many home computer systems, but also overburdens existing storage systems.

640 horizontal resolution

X 480 vertical resolution
= 307, 200 total pixels per frame
X 3 bytes per pixel
= 921, 600 total bytes per frame
X 30 frames per second
= 27, 648, 000 total bytes per second
/ 1, 048 576 to convert to megabytes
= 26.36 megabytes per second!

The calculation shows space required for video is excessive. For video, the way to reduce this
amount of data down to a manageable level is to compromise on the quality of video to some extent.
This is done by lossy compression which forgets some of the original data.

Compression Algorithms

Compression methods use mathematical algorithms to reduce (or compress) data by eliminating,
grouping and/or averaging similar data found in the signal. Different Although there are various
compression methods, including Motion JPEG, only MPEG-1 and MPEG-2 are internationally
recognized standards for the compression of moving pictures (video).

A simple characterization of data compression is that it involves transforming a string of characters

in some representation (such as ASCII) into a new string (of bits, for example) which contains the
same information but whose length is as small as possible. Data compression has important
application in the areas of data transmission and data storage.
The proliferation of computer communication networks is resulting in massive transfer of data over
communication links. Compressing data to be stored or transmitted reduces storage and/or
communication costs. When the amount of data to be transmitted is reduced, the effect is that of
increasing the capacity of the communication channel.

Lossless compression is a method of reducing the size of computer files without losing any
information. That means when you compress a file, it will take up less space, but when you
decompress it, it will still have the exact same information. The idea is to get rid of any redundancy
in the information, this is exactly what happens is used in ZIP and GIF files. This differs from lossy
compression, such as in JPEG files, which loses some information that isn't very noticeable. Why
use lossless compression?

You can use lossless compression whenever space is a concern, but the information must be the
same. An example is when sending text files over a modem or the Internet. If the files are smaller,
they will get there faster. However, they must be the same as that you sent at destination. Modem
uses LZW compression automatically to speed up transfers.

There are several popular algorithms for lossless compression. There are also variations of most of
them, and each has many implementations. Here is a list of the families, their variations, and the file
types where they are implemented:

Family Variations Used in

Running-Length none
Huffman MNP5
Huffman Adaptive Huffman COMPACT
Shannon-Fano SQ
Arithmetic none
GIF
LZ78 (Lempel-Ziv 1978) LZW (Lempel-Ziv-Welch) v.42bis
compress
ZIP
LZ77 (Lempel-Ziv 1977) LZFG ARJ
LHA
Table lossless coding algorithm families and variations

Variable Length Encoding

Claude Shannon and R.M. Fano created the first compression algorithm in the 1950's. This algorithm
assigns variable number of bits to letters/symbols.

Shannon-Fano Coding
Let us assume the source alphabet S={X1,X2,X3,…,Xn} and
Associated probability P={P1,P2,P3,…,Pn}
The steps to encode data using Shannon-Fano coding algorithm is as follows:
Order the source letter into a sequence according to the probability of occurrence in non-increasing
order i.e. decreasing order.
ShannonFano(sequence s)
If s has two letters
Attach 0 to the codeword of one letter and 1 to the codeword of another;
Else if s has more than two letter
Divide s into two subsequences S1, and S2 with the minimal difference between
probabilities of each subsequence;
extend the codeword for each letter in S1 by attaching 0, and by attaching 1 to each
codeword for letters in S2;
ShannonFano(S1);
ShannonFano(S2);

Example: Suppose the following source and with related probabilities

S={A,B,C,D,E}
P={0.35,0.17,0.17,0.16,0.15}
Message to be encoded=”ABCDE”

The probability is already arranged in non-increasing order. First we divide the message into AB and
CDE. Why? This gives the smallest difference between the total probabilities of the two groups.
S1={A,B} P={0.35,0.17}=0.52
S2={C,D,E} P={0.17,0.17,0.16}=0.46
The difference is only 0.52-0.46=0.06. This is the smallest possible difference when we divide the
message.
Attach 0 to S1 and 1 to S2.
Subdivide S1 into sub groups.
S11={A} attach 0 to this
S12={B} attach 1 to this

Again subdivide S2 into subgroups considering the probability again.

S21={C} P={0.17}=0.17
S22={D,E} P={0.16,0.15}=0.31
Attach 0 to S21 and 1 to S22. Since S22 has more than one letter in it, we have to subdivide it.
S221={D} attach 0
S222={E} attach 1

Fig Shannon-Fano coding tree

The message is transmitted using the following code (by traversing the tree)
A=00 B=01
C=10 D=110
E=111

Instead of transmitting ABCDE, we transmit 000110110111.

Dictionary Encoding

Dictionary coding uses groups of symbols, words, and phrases with corresponding abbreviation. It
transmits the index of the symbol/word instead of the word itself. There are different variations of
dictionary based coding:
LZ77 (printed in 1977)
LZ78 (printed in 1978)
LZSS
LZW (Lempel-Ziv-Welch)

LZW Compression
LZW compression has its roots in the work of Jacob Ziv and Abraham Lempel. In 1977, they
published a paper on "sliding-window" compression, and followed it with another paper in 1978 on
"dictionary" based compression. These algorithms were named LZ77 and LZ78, respectively. Then
in 1984, Terry Welch made a modification to LZ78 which became very popular and was called
LZW.

The Concept
Many files, especially text files, have certain strings that repeat very often, for example " the ". With
the spaces, the string takes 5 bytes, or 40 bits to encode. But what if we were to add the whole string
to the list of characters? Then every time we came across " the ", we could send the code instead of
32,116,104,101,32. This would take less no of bits.

This is exactly the approach that LZW compression takes. It starts with a dictionary of all the single
character with indexes 0-255. It then starts to expand the dictionary as information gets sent through.
Then, redundant strings will be coded, and compression has occurred.

The Algorithm:

LZWEncoding()
Enter all letters to the dictionary;
Initialize string s to the first letter from the input;
While any input is left
read symbol c;
if s+c exists in the dictionary
s = s+c;
else
output codeword(s); //codeword for s
enter s+c to dictionary;
s =c;
end loop
output codeword(s);

Example: encode the ff string “aababacbaacbaadaa”

The program reads one character at a time. If the code is in the dictionary, then it adds the character
to the current work string, and waits for the next one. This occurs on the first character as well. If the
work string is not in the dictionary, (such as when the second character comes across), it adds the
work string to the dictionary and sends over the wire the works string without the new character. It
then sets the work string to the new character.

Example:
Encode the message aababacbaacbaadaaa using the above algorithm

Encoding
Create dictionary of letters found in the message
Encoder Dictionary
Input Output Index Entry
1 a
2 b
3 c
4 d

S is initialized to the first letter of message a (s=a)

Read symbol to c, and the next symbol is a (c=a)
Check if s+c (s+c=aa) is found in the dictionary (the one created above in step 1). It is not found.
So add s+c(s+c=aa) to dictionary and output codeword for s(s=a). The code for a is 1 from the
dictionary.
Then initialize s to c (s=c=a).

Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa

Read the next letter from message to c (c=b)

Check if s+c (ab) is found in the dictionary. It is not found. Then, add s+c (s+c=ab) into dictionary
and output code for c (c=b). The codeword is 2. Then initialize s to c (s=c=b).
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab

Read the next letter to c (c=a).

Check if s+c (s+c=ba) is found in the dictionary. It is not found. Then add s+c (s+c=ba) to the
dictionary. Then output the codeword for s (s=b). It is 2. Then initialize s to c (s=c=b).

Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba

Read the next message to c (c=a). Then check if s+c (s+c=ab) is found in the dictionary. It is there.
Then initialize s to s+c (s=s+c=ab).

Read again the next letter to c (c=a). Then check if s+c (s+c=aba) is found in the dicitionary. It is not
there. Then transmit codeword for s (s=ab). The code is 6. Initialize s to c(s=c=a).

Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba
aba 6 8 aba

Again read the next letter to c and continue the same way till the end of message. At last you will
have the following encoding table.
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
aa 1 5 aa
ab 1 6 ab
ba 2 7 ba
aba 6 8 aba
ac 1 9 ac
cb 3 10 cb
baa 7 11 baa
acb 9 12 acb
baad 11 13 baad
da 4 14 da
aaa 5 15 aa
Table encoding string

Now instead of the original message, you transmit their indexes in the dictionary. The code for the
message is 112613791145.

Decompression
The algorithm:

LZWDecoding()
Enter all the source letters into the dictionary;
Read priorCodeword and output one symbol corresponding to it;
While codeword is still left
read Codeword;
PriorString = string (PriorCodeword);
If codeword is in the dictionary
Enter in dictionary PriorString + firstsymbol(string(codeword));
output string(codeword);
else
Enter in the dictionary priorString +firstsymbol(priorString);
Output priorString+firstsymbol(priorstring);
priorCodeword=codeword;
end loop

The nice thing is that the decompressor builds its own dictionary on its side, that matches exactly the
compressor’s dictionary, so that only the codes need to be sent.

Example:
Let us decode the message 112613791145.
We will start with the following table.
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d

Read the first code. It is 1. Output the corresponding lettera

Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a

Read the next code. It is 1 and it is found in the dictionary. So add aa to the dictionary and output a
again.
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a
1 a 5 aa

Read the next code which is 2. It is found in the dictionary. We add ab to dictionary and output b.
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
Read the next code which is 6. It is found in the dictionary. Add ba to dictionary and output ab
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba

Read the next code. It is 1. 1 is found in the dictionary. Add aba to the dictionary and output a.
Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba
1 a 8 aba

Read the next code. It is 3 and it is found in the dictionary. Add ac to dictionary and output c.
Continue like this till the end of code is reached. You will get the following table:

Encoder Dictionary
Input(s+c) Output Index Entry
1 a
2 b
3 c
4 d
1 a
1 a 5 aa
2 b 6 ab
6 ab 7 ba
1 a 8 aba
3 c 9 ac
7 ba 10 cb
9 ac 11 baa
11 baa 12 acb
4 d 13 baad
5 aa 14 da
The decoded message is aababacbaacbaadaa

Huffman Compression

When we encode characters in computers, we assign each an 8-bit code based on an ASCII chart.
But in most files, some characters appear more often than others. So wouldn't it make more sense to
assign shorter codes for characters that appear more often and longer codes for characters that appear
less often? D.A. Huffman published a paper in 1952 that improved the algorithm slightly and it soon
superceded Shannon-Fano coding with the appropriately named Huffman coding.

Huffman coding has the following properties:

 Codes for more probable characters are shorter than ones for less probable characters.
 Each code can be uniquely decoded

To accomplish this, Huffman coding creates what is called a Huffman tree, which is a binary tree.

First count the amount of times each character appears, and assign this as a weight/probability to
each character, or node. Add all the nodes to a list.
Then, repeat these steps until there is only one node left:
 Find the two nodes with the lowest weights.
 Create a parent node for these two nodes. Give this parent node a weight of the sum of the two
nodes.
 Remove the two nodes from the list, and add the parent node.
This way, the nodes with the highest weight will be near the top of the tree, and have shorter codes.

Algorithm to create the tree

Assume the source alphabet S={X1, X2, X3, …,Xn} and
Associated Probabilities P={P1, P2, P3,…, Pn}

Huffman()
For each letter create a tree with single root node and order all trees according to the
probability of letter of occurrence;
while more than one tree is left
take two trees t1, and t2 with the lowest probabilities p1, p2 and create a tree with
probability in its root equal to p1+p2 and with t1 and t2 as its subtrees;
associate 0 with each left branch and 1 with each right branch;
create unique codeword for each letter by traversing the tree the root to the leaf containing the
probability corresponding to this letter and putting all encountered 0s and 1s together;

Example: Suppose the following source and related probability

S={A,B,C,D,E}
P={0.15,0.16,0.17,0.17,0.35}
Message=”abcde”
Fig Huffman tree

To read the codes from a Huffman tree, start from the root and add a 0 every time you go left to a
child, and add a 1 every time you go right. So in this example, the code for the character b is 01 and
the code for d is 110.

As you can see, a has a shorter code than d. Notice that since all the characters are at the leafs of the
tree, there is never a chance that one code will be the prefix of another one (eg. a is 01 and b is 011).
Hence, this unique prefix property assures that each code can be uniquely decoded.

The code for each letter is:

a=000 b=001
c=010 d=011
e=1

The original message will be encoded to:

abcde=0000010100111

To decode the message coded by Huffman coding, a conversion table had to be known by the
receiver. Using this table, a tree can be constructed with the same path as the tree used for coding.
Leaves store the same path as the tree used for coding. Leaves store letters instead of probabilities
for efficiency purpose.

The decoder then can use the Huffman tree to decode the string by following the paths according to
the string and adding a character every time it comes to one.
Fig Huffman tree

The Algorithm

Move left if you get 0

Move right if you get 1
If you get letter (reach leaf node) output that letter.
Go back and start from root again with the remaining code.

Using this algotihm and the above decoding tree, let us decode the encoded message
0000010100111 at destination.
0-move left
0-move left again
0-move left again, and we have reached leaf. Output the letter on the leaf node which is a.

Go back to root.
0-move left
0-move left
1-move right, and we have reached the leaf. Output letter on the leaf and it is b.

Go back to root.
0-move left
1-move right
0-move left, and we reach leaf. Output letter found on the leaf which is c.

Go back to root.
0-move left
1-move right
1-move right, and we reach leaf. Output letter on leaf which is d.

Go back to root.
1-move right, and we reach leaf node. Output the letter on the node which is e. Now we have
finished i.e. no more code remains. Display the letters output as message. Abcde

How can the encoder let the decoder know which particular coding tree has been used? Two ways:
i) Both agree on particular Hufmann tree and both use it for sending any message
ii) The encoder constructs Huffman tree afresh every time a new message is sent and sends the
conversion table along with the message. This is more versatile, but has additional overload—
sending conversion table. But for large data, there is the advantage.

It is also possible to create tree for pairs of letters. This improves performance.

Example:
S={x, y, z}
P={0.1, 0.2, 0.7}
To get the probability of pairs, multiply the probability of each letter.
xx=0.1*0.1=0.01
xy=0.1*0.2=0.02
xz=0.1*0.3=0.07
yx=0.2*0.1=0.02
yy=0.2*0.2=0.04
yz=0.2*0.7=0.14
zx=0.7*0.1=0.07
zz=0.7*0.7=0.49
zy=0.7*0.2=0.14
Using these probabilities, you can create Huffman tree of pairs the same way as we did previously.

Arithmetic Coding

The entire data set is represented by a single rational number, whose value is between 0 and 1. This
range is divided into sub-intervals each representing a certain symbol. The number of sub-intervals
is identical to the number of symbols in the current set of symbols and the size is proportional to
their probability of appearance. For each symbol in the original data a new interval division takes
place, on the basis of the last sub-interval.

Algorithm:

ArithmeticEncoding(message)
CurrentInterval=[0,1); //includes 0 but not 1
while the end of message is not reached
read letter Xi from message;
divide the CurrentInterval into SubInterval IRcureentInterval;
CurrentInterval=SubIntervali in CurrentInterval;
Output bits uniquely identifying CurrentInterval;

Assume the source alphabet s={X1, X2, X3,…, Xn} and associated probability of
P={p1, p2, p3,…, pn}

To calculate sub interval of current interval [L,R], use the following formula
IR[L,R]={[L, L+(R-L)*P1],[ L+(R-L)*P1, L+(R-L)*P2],[ L+(R-L)*P2, L+(R-L)*P3],…,
[L+(R-L)*Pn-1, L+(R-L)*P1)}

where Pi= , and

[L,R]=current interval for which sub interval is calculated

Cumulative probabilities are indicated using capital P and single probabilities are indicated using
small p.

Example:
Encode the message abbc# using arithmetic encoding.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}
At the beginning CurentInterval is set to [0,1). Let us calculate subintervals of [0,1).

First let us get cumulative probability Pi

P1=0.4
P2=0.4+0.3=0.7
P3=0.4+0.3+0.1=0.8
P4=0.4+0.3+0.1+0.2=1

Next calculate subintervals of [0,1) using the formula given above.

IR[0,1]={[0,0+(1-0)*0.4),[0+(1-0)*0.4, 0+(1-0)*0.7), [0+(1-0)*0.7, 0+(1-0)*0.8),
[0+(1-0)*0.8, 0+(1-0)*1)}
IR[0,1]={[0,0.4),[0.4,0.7),[0.7,0.8),[0.8,1)}-- four subintervals

Now the question is, which one of the SubIntervals will be the CurrentInterval? To determine this,
read the first letter of the message. It is a. Look where a is found in the source alphabet. It is found at
the beginning. So the next CurrentInterval will be [0,4) which is also found at the beginning in the
SubIntervals.

Again let us calculate the SubIntervals of CurrentInterval [0,0.4). The cumulative probability does
not change i.e the same as previous.
IR[0,0.4]={[0,0+(0.4-0)*0.4),[ 0+(0.4-0)*0.4, 0+(0.4-0)*0.7),[ 0+(0.4-0)*0.7, 0+(0.4-0)*0.8),
[0+(0.4-0)*0.8, 0+(0.4-0)*1)}
IR[0,0.4]={[0,0.16),[0.16,0.28),[0.28,0.32),[0.32,0.4)}.
Which interval will be the next CurrentInterval? Read the next letter from message. It is b. B is
found in the second place in the source alphabet list. The next CurrentInterval will be the second
SubInterval i.e [0.16,0.28).

Continue like this till there is letter left in the message. You will get the following result:
IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. Next
IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). Next
IR[0.2332,0.2368]={[0.2332,0.23464),[0.23464,0.23572),[0.23572,0.23608),[0.23608, 0.2368)}.

We are done because no more letter remained in the message. The last letter read was #. It is the
fourth letter in source alphabet. So take the fourth SubInterval as CurrentInterval i.e [0.23608,
0.2368]. Now any number between the last CurrentInterval is sent as the message. So you can send
0.23608 as the encoded message or any number between 0.23608, and 0.2368.

Diagramatically, calculating SubIntervals look like this:

Fig sub interval and current interval

Decoding
Algorithm:

ArithmeticDecoding(codeword)
CurrentInterval=[0,1];
While (1)
Divide CurrentInterval into SubIntervals IRcurrentInterval;
Determine the SubIntervali of CurrentInterval to which the codeword belongs;
Output letter Xi corresponding to this SubInterval;
If end of file
Return;
CurrentInterval=SubIntervali in IRcurrentInterval;
End of while

Example:
Decode 0.23608 which we previously encoded.
To decode the source alphabet and related probability should be known by destination. Let us use the
above source and probability.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}

First set CurrentInterval to [0,1], and then calculate SubInterval for it. The formula to calculate the
SubInterval is the same to encoding. The cumulative probabilities are:
P1=0.4
P2=0.4+0.3=0.7
P3=0.4+0.3+0.1=0.8
P4=0.4+0.3+0.1+0.2=1
IR[0,1]={[0,0+[1-0]*0.4),[ 0+[1-0]*0.4, 0+[1-0]*0.7),[ 0+[1-0]*0.7, 0+[1-0]*0.8),
[ 0+[1-0]*0.8, 0+[1-0]*1)}
IR[0,1]={[0,0.4),[0.4,0.7),[0.7,0.8),[0.8,1)}. Now check in which SubInterval the encode message
falls. It falls in the first SubInterval i.e [0,0.4]. Output the first letter from source alphabet. It is a. Set
CurrentInterval to [0,0.4]

IR[0,0.4]={[0,0+(0.4-0)0.4),[ 0+(0.4-0)0.4, 0+(0.4-0)0.7),[ 0+(0.4-0)0.7, 0+(0.4-0)*0.8),

[0+(0.4-0)*0.8, 0+(0.4-0)*1)}
IR[0,0.4]={[0,0.16),[0.16,0.28),[0.28,0.32),[0.32,0.4)}. Again check where 0.23608 falls. It falls in
the second SubInterval i.e [0.16,0.28]. Set CurrentInterval to this SubInterval. Output the second
letter from source alphabet. It is b.

IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. 0.23608 falls in the second

SubInterval. Output the second letter from source alphabet. It is b.

IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). falls in the third

SubInterval. Output the third letter from source alphabet. It is c.

IR[0.2332,0.2368]={[0.2332,0.23464),[0.23464,0.23572),[0.23572,0.23608),[0.23608, 0.2368)}.
0.23608 falls in the fourth SubInterval. Output fourth letter which is #. Now end of the message has
been reached.

Disadvantage: arithmetic precision of computer is soon suppressed and hence large message can’t
be encoded.

Implementation of Arithmetic Coding

To solve the above disadvantage, arithmetic coding is implemented as follows:

Algorithm:

OutputBits()
{
While(1)
If CurrentInterval [0,0.5)
Output 0 and bitcount 1s; //and here shows concatenation
Bitcount=0;
Else if CurrentInterval [0.5,1)
Output 1 and bitcount 0s;
Bitcount=0;
Subtract 0.5 from left and right bounds of CurrentInterval;
Else if CurrentInterval [0.25,0.75)
Bitcount++;
Subtract 0.25 both left and right bounds of CurrentInterval;
Else
Break;
Double left and right bounds of CurreentInterval;
}

FinishArithmeticCoding()
{
bitcount++;
if lowerbound of CurrentInterval <0.25
output 0 and bitcount 1s;
else
output 1 and bitcount 0s;
}

ArithmeticEncoding(message)
{
CurrentInterval=[0,1];
Bitcount=0;
While the end of message is not reached
{
read letter Xi from message;
divide CurrentInterval into SubInterval IRCurrentInterval;
CurrentInterval=SubIntervali in IRCurrentInterval;
OutputBits();
}
FinishArithmeticEncoding();
}

Example:
Encode the message abbc#.
s={a,b,c,#}
p={0.4,0.3,0.1,0.2}
CurrentInterval input output bitcount Subintervals
[0,1) a 0 [0,0.4) [0.4,0.7) [0.7,0.8) [0.8, 1)
[0,0.4) 0
[0,0.8) b [0,0.32) [0.32, 0.56) [0.56,0.64) [0.64,0.8)
[0.32,0.56) - 1
[0.14,0.62) b [0.14,0.332) [0.332,0.476) [0.476,0.524) [0.524,0.62)
[0.332,0.476) 01 0
[0.664,0.952) 1
[0.328,0.904) c [0.328,0.5584) [0.5584,0.7312) [.7312,.7888) [.7888,.904)
[.7312,.788) 1
[.4624,.5776) - 1
[.4248,.6552) - 2
[.3496,.8104) # [.3496,.53392) [.53392,.67216) [.67216,.71824) [.71824,.8104)
[.71824,.8104) 100 0
[.43648,.6208) - 1
[.37296,.7416) - 2
[.24592,.9832) 0111

The final code will be 001111000111 from the output column of table.

Data Compression Btech Notes
No ratings yet
Data Compression Btech Notes
32 pages
Unit 5 Data Compression
No ratings yet
Unit 5 Data Compression
98 pages
3 Chapter Text and Image Compression
No ratings yet
3 Chapter Text and Image Compression
132 pages
MM05 1
No ratings yet
MM05 1
27 pages
Lossless Vs Lossy Compression Presentation
No ratings yet
Lossless Vs Lossy Compression Presentation
9 pages
Dmslecture 7
No ratings yet
Dmslecture 7
22 pages
Data and Voice Coding
No ratings yet
Data and Voice Coding
20 pages
Data Compression
No ratings yet
Data Compression
18 pages
Compression
No ratings yet
Compression
21 pages
Data Compression
No ratings yet
Data Compression
23 pages
Multimdia Word File
No ratings yet
Multimdia Word File
22 pages
Module 5 IVP
No ratings yet
Module 5 IVP
112 pages
Teacher Education Curriculum Framework Newly Compiled Document V10
No ratings yet
Teacher Education Curriculum Framework Newly Compiled Document V10
122 pages
Unit3 Ece MMC 6th Sem
No ratings yet
Unit3 Ece MMC 6th Sem
96 pages
Multimedia Unit-4
No ratings yet
Multimedia Unit-4
30 pages
Compression
100% (1)
Compression
38 pages
Chapter 7
No ratings yet
Chapter 7
70 pages
Unit 5 - Presentation Layer
No ratings yet
Unit 5 - Presentation Layer
8 pages
20250320121146-Module-3 MMC Notes
No ratings yet
20250320121146-Module-3 MMC Notes
27 pages
Data Compression
No ratings yet
Data Compression
19 pages
Unit 1
No ratings yet
Unit 1
4 pages
System Analysis Design MCQ Questions
No ratings yet
System Analysis Design MCQ Questions
2 pages
Main Techniques and Performance of Each Compression
No ratings yet
Main Techniques and Performance of Each Compression
23 pages
Supplementary Notes On Compression and Formats
No ratings yet
Supplementary Notes On Compression and Formats
15 pages
Fundamentals of Compression: Prepared By: Haval Akrawi
No ratings yet
Fundamentals of Compression: Prepared By: Haval Akrawi
21 pages
Chapter 3 Multimedia Data Compression
100% (2)
Chapter 3 Multimedia Data Compression
23 pages
1.3 Compression Notes by EMK
No ratings yet
1.3 Compression Notes by EMK
4 pages
Chap15 1473751047 598113
No ratings yet
Chap15 1473751047 598113
34 pages
Siemens Relay
No ratings yet
Siemens Relay
12 pages
Data Compression
No ratings yet
Data Compression
22 pages
Unit 1 Data Compression
No ratings yet
Unit 1 Data Compression
30 pages
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
100% (2)
How To Create Bank Statement Transaction Creation Rules and Account Bank Charges Fees or Interest
9 pages
Module 3
No ratings yet
Module 3
23 pages
Mdobook
No ratings yet
Mdobook
642 pages
Data Compression
No ratings yet
Data Compression
20 pages
Comparison of Lossless Data Compression Algorithms
No ratings yet
Comparison of Lossless Data Compression Algorithms
12 pages
Xyz
No ratings yet
Xyz
69 pages
A Comparitive Study of Text Compression Algorithms PDF
No ratings yet
A Comparitive Study of Text Compression Algorithms PDF
9 pages
Image Compression
100% (1)
Image Compression
38 pages
Data Compression
No ratings yet
Data Compression
25 pages
Data Compression Techniques: Pushpender Rana, Student
No ratings yet
Data Compression Techniques: Pushpender Rana, Student
4 pages
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
100% (1)
Chapter Seven Multimedia Data Compression 1. Lossy and Lossless Compression
34 pages
Multimedia Systems Chapter 7
No ratings yet
Multimedia Systems Chapter 7
21 pages
Aadel Veri
No ratings yet
Aadel Veri
37 pages
Data Compression
No ratings yet
Data Compression
7 pages
Chapter 5 Data Compression
No ratings yet
Chapter 5 Data Compression
18 pages
Image Compression: Transmit
No ratings yet
Image Compression: Transmit
16 pages
Data Compression Techniques
No ratings yet
Data Compression Techniques
21 pages
Computer Science Extended Essay
No ratings yet
Computer Science Extended Essay
15 pages
Data Compression
No ratings yet
Data Compression
4 pages
3 MM Compression
100% (1)
3 MM Compression
35 pages
CHAPTER FOURmultimedia
No ratings yet
CHAPTER FOURmultimedia
23 pages
Assembly Drawing
No ratings yet
Assembly Drawing
19 pages
Image Compression
No ratings yet
Image Compression
33 pages
A-Level 14 Presentation - Compression, Encryption and Hashing
No ratings yet
A-Level 14 Presentation - Compression, Encryption and Hashing
61 pages
Seminar Data Compression
No ratings yet
Seminar Data Compression
32 pages
CH 15
No ratings yet
CH 15
34 pages
Data Compression: This Article May Require by
No ratings yet
Data Compression: This Article May Require by
25 pages
Data Compression: CS 147 Minh Nguyen
No ratings yet
Data Compression: CS 147 Minh Nguyen
25 pages
Vik
No ratings yet
Vik
23 pages
Trade Secret Protection in India
100% (1)
Trade Secret Protection in India
22 pages
Data Compression: Basic Idea or Definition Purpose
No ratings yet
Data Compression: Basic Idea or Definition Purpose
6 pages
Data Compression Report
No ratings yet
Data Compression Report
12 pages
15 Data Compression: Foundations of Computer Science Cengage Learning
No ratings yet
15 Data Compression: Foundations of Computer Science Cengage Learning
33 pages
Reading Sample Sap Press Sap Analytics Cloud Financial Planning and Analysis
No ratings yet
Reading Sample Sap Press Sap Analytics Cloud Financial Planning and Analysis
28 pages
Network Media
100% (1)
Network Media
23 pages
SQL CH 5
No ratings yet
SQL CH 5
26 pages
Data Compression
No ratings yet
Data Compression
29 pages
James Hall
No ratings yet
James Hall
8 pages
Literature Survey
No ratings yet
Literature Survey
5 pages
Chapter One PDF
No ratings yet
Chapter One PDF
11 pages
VHDL Coding Syntax PDF
No ratings yet
VHDL Coding Syntax PDF
36 pages
Chapter 2
No ratings yet
Chapter 2
26 pages
Chapter 3
No ratings yet
Chapter 3
12 pages
POSPac GNSS-Inertial Tools
No ratings yet
POSPac GNSS-Inertial Tools
252 pages
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
No ratings yet
Big Data Analytics: Free Guide: 5 Data Science Tools To Consider
8 pages
Chapter 4
No ratings yet
Chapter 4
40 pages
Digital Video Recorder: Operation Manual
No ratings yet
Digital Video Recorder: Operation Manual
75 pages
Chapter 5 - Recovery Techniques
No ratings yet
Chapter 5 - Recovery Techniques
30 pages
Assignment Content
No ratings yet
Assignment Content
13 pages
CH5 My-Rmi
No ratings yet
CH5 My-Rmi
34 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Chapter 7 - Distributed Database System
No ratings yet
Chapter 7 - Distributed Database System
42 pages
Chapter 3 - Transaction Management
No ratings yet
Chapter 3 - Transaction Management
46 pages
Ip Addressing
No ratings yet
Ip Addressing
46 pages
Address Mapping
No ratings yet
Address Mapping
24 pages
103 - GNU and Unix Commands
No ratings yet
103 - GNU and Unix Commands
66 pages
How To Secure Your Wireless Home Network
No ratings yet
How To Secure Your Wireless Home Network
4 pages
OOP Chapter 4 Control Structures Part 1
No ratings yet
OOP Chapter 4 Control Structures Part 1
45 pages
CHAPTER ONE Final
No ratings yet
CHAPTER ONE Final
16 pages
Diffie-Hellman:Key Exchange and Public Key Cryptosystems: Sivanagaswathi Kallam
No ratings yet
Diffie-Hellman:Key Exchange and Public Key Cryptosystems: Sivanagaswathi Kallam
27 pages
123proposal Gap Assessment 2
No ratings yet
123proposal Gap Assessment 2
20 pages
Becoming A Full-Stack Developer With Python Involves Mastering Both Front - 20241230 - 120127 - 0000
No ratings yet
Becoming A Full-Stack Developer With Python Involves Mastering Both Front - 20241230 - 120127 - 0000
23 pages
Dire Dawa University School of Graguate Studies Department of Information Technology (It)
No ratings yet
Dire Dawa University School of Graguate Studies Department of Information Technology (It)
25 pages
102 - Linux Installation and Package Management
No ratings yet
102 - Linux Installation and Package Management
45 pages
How To Recover Hidden Files From Virus Infected USB Pendrive
No ratings yet
How To Recover Hidden Files From Virus Infected USB Pendrive
2 pages
Project Report
No ratings yet
Project Report
58 pages
Xtool Ip819 Smart Diagnostic System User Manual
No ratings yet
Xtool Ip819 Smart Diagnostic System User Manual
49 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
10 pages
AN - 385 FTDI D3XX Driver Installation Guide
No ratings yet
AN - 385 FTDI D3XX Driver Installation Guide
16 pages
How To Password Protect Your Usb Flash Drive1
No ratings yet
How To Password Protect Your Usb Flash Drive1
12 pages
Introduction To Stack
No ratings yet
Introduction To Stack
9 pages
Limit Data Selections From A MySQL Database
No ratings yet
Limit Data Selections From A MySQL Database
1 page
Browsing File Hierarchy Exercises
No ratings yet
Browsing File Hierarchy Exercises
1 page
Poseidon2 - User Manual
No ratings yet
Poseidon2 - User Manual
76 pages
Corn Leaf Disease Detection (The Crop Master)
No ratings yet
Corn Leaf Disease Detection (The Crop Master)
7 pages
Paging in LTE
No ratings yet
Paging in LTE
10 pages
Sip Project
No ratings yet
Sip Project
57 pages
A5-R5: Data Structure Through Object Oriented Programming Language
No ratings yet
A5-R5: Data Structure Through Object Oriented Programming Language
8 pages
Project Closure - Hims - Merry E-Health
No ratings yet
Project Closure - Hims - Merry E-Health
3 pages
Ais615 Key Terms Chapter 4
No ratings yet
Ais615 Key Terms Chapter 4
2 pages
ADSA 3rd
No ratings yet
ADSA 3rd
4 pages
Service Discovery in Microservices - Baeldung On Computer Science
No ratings yet
Service Discovery in Microservices - Baeldung On Computer Science
9 pages
Ecommerce in Developing Countries-The Case of Liberia
No ratings yet
Ecommerce in Developing Countries-The Case of Liberia
22 pages
T100taf Bing Dk046b
No ratings yet
T100taf Bing Dk046b
3 pages
Afifa Syed Resume
No ratings yet
Afifa Syed Resume
2 pages
Project Abstract:: Project Title: Sound Pattern Locking System
No ratings yet
Project Abstract:: Project Title: Sound Pattern Locking System
3 pages
SpectraLink 8000 SVP Admin Password CS 04 06 0
No ratings yet
SpectraLink 8000 SVP Admin Password CS 04 06 0
2 pages
Bcom PDF
No ratings yet
Bcom PDF
2 pages
Audio Visual Speech Recognition: Advancements, Applications, and Insights
From Everand
Audio Visual Speech Recognition: Advancements, Applications, and Insights
Fouad Sabry
No ratings yet

Chapter 5 New

Uploaded by

Chapter 5 New

Uploaded by

Chapter 75

Need for Compression

640 horizontal resolution

A simple characterization of data compression is that it involves transforming a string of characters

Family Variations Used in

Variable Length Encoding

Example: Suppose the following source and with related probabilities

Again subdivide S2 into subgroups considering the probability again.

Fig Shannon-Fano coding tree

Instead of transmitting ABCDE, we transmit 000110110111.

Example: encode the ff string “aababacbaacbaadaa”

S is initialized to the first letter of message a (s=a)

Read the next letter from message to c (c=b)

Read the next letter to c (c=a).

Read the first code. It is 1. Output the corresponding lettera

Huffman coding has the following properties:

Algorithm to create the tree

Example: Suppose the following source and related probability

The code for each letter is:

The original message will be encoded to:

Move left if you get 0

where Pi= , and

First let us get cumulative probability Pi

Next calculate subintervals of [0,1) using the formula given above.

Diagramatically, calculating SubIntervals look like this:

IR[0,0.4]={[0,0+(0.4-0)*0.4),[ 0+(0.4-0)*0.4, 0+(0.4-0)*0.7),[ 0+(0.4-0)*0.7, 0+(0.4-0)*0.8),

IR[0.16,0.28]={[0.16,0.208),[0.208,0.244),[0.244,0.256),[0.256,0.28)}. 0.23608 falls in the second

IR[0.208,0.244]={[0.208,0.2224),[0.2224,0.2332),[0.2332,0.2368),[0.2368,0.242). falls in the third

Implementation of Arithmetic Coding

You might also like

IR[0,0.4]={[0,0+(0.4-0)0.4),[ 0+(0.4-0)0.4, 0+(0.4-0)0.7),[ 0+(0.4-0)0.7, 0+(0.4-0)*0.8),