0% found this document useful (0 votes)
6 views

Lec 5 Data Compression Part3

lec-5-Data-Compression-Part3

Uploaded by

simonwzm
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Lec 5 Data Compression Part3

lec-5-Data-Compression-Part3

Uploaded by

simonwzm
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

CS258: Information Theory

Fan Cheng
Shanghai Jiao Tong
University

http://www.cs.sjtu.edu.cn/~chengfan/
[email protected]
Spring, 2023
Outline

 Kraft inequality
 Optimal codes
 Huffman coding
 Shannon-Fano-Elias coding
 Generation of discrete distribution
 Universal source coding
Random Variable Generation
 We are given a sequence of
fair coin tosses and we wish
to generate with probability
mass function
 Let the random variable denote
the number of coin flips used in
the algorithm.

Heads Vs. Tails Generate a random variable according the


outcome of fair coin flips:
HHHH, TTTTT, HTHTHT, THTHTH
If
 H:
 TH:
 TT:
 How many fair coin flips to generate ?

The entropy of

The expected number of coin flips


Random Variable Generation
Representation of a generation algorithm
 We can describe the algorithm mapping strings of bits to possible
outcomes by a binary tree
 The leaves of the tree are marked by output symbols , and the path to
the leaves is given by the sequence of bits produced by the fair coin

The tree representing the algorithm must


satisfy certain properties:
 The tree should be complete (i.e., every
node is either a leaf or has two
descendants in the tree). The tree may be
infinite, as we will see in some examples.
 The probability of a leaf at depth is .
Many leaves may be labeled with the
same output symbol—the total probability Tree for generation of the
distribution
of all these leaves should equal the
Intuition: Each coin tossing
desired probability of the output symbol. generates 1 bit.
 The expected number of fair bits required
to generate is equal to the expected
depth of this tree.
Random Variable Generation
Let denote the set of leaves of a complete tree. Consider a distribution
on the leaves such that the probability of a leaf at depth on the tree is .
Let be a random variable with
this distribution.
(Lemma). For any complete tree, consider a probability distribution on
the leaves such that the probability of a leaf at depth is . Then the
expected depth of the tree is equal to the entropy of this
distribution ().
 The expected depth of the tree

 The entropy of the distribution of is

where denotes the depth of leaf Thus,

𝑯 (𝒀 )= 𝑬𝑻
Random Variable Generation
(Theorem). For any algorithm generating , the expected number of fair
bits used is greater than the entropy , that is,

 Any algorithm generating from fair bits can be represented by a


complete binary tree. Label all the leaves of this tree by distinct
symbols If the tree is infinite, the alphabet is also infinite.
 Now consider the random variable defined on the leaves of the tree,
such that for any leaf at depth , the probability that is . The
expected depth of this tree is equal to the entropy of :

 Now the random variable is a function of (one or more leaves map


onto an output symbol), and hence we have

𝑬𝑻 ≥ 𝑯 (𝑿 )
Random Variable Generation
(Theorem). Let the random variable have a dyadic
distribution. The optimal algorithm to generate
from fair coin flips requires an expected number of
coin tosses precisely equal to the entropy:

 For the constructive part, we use the Huffman code tree for as the
tree to generate the random variable. Each will correspond to a leaf
 For a dyadic distribution, the Huffman code is the same as the
Shannon code and achieves the entropy bound.

 For any , the depth of the leaf in the code tree corresponding to is
the length of the corresponding codeword, which is . Hence, when
this code tree is used to generate , the leaf will have a probability

 The expected number of coin flips is the expected depth of the tree,
which is equal to the entropy (because the distribution is dyadic).
Hence, for a dyadic distribution, the optimal generating algorithm
achieves
Random Variable Generation
 If the distribution is not dyadic? In this case we cannot use the same
idea, since the code tree for the Huffman code will generate a
dyadic distribution on the leaves, not the distribution with which
we started
 Since all the leaves of the tree have probabilities of the form , it follows
that we should split any probability that is not of this form into
atoms of this form. We can then allot these atoms to leaves on the
tree

 Finding the binary expansions of the probabilities . Let the binary


expansion of the probability be

where or . Then the atoms of the expansion are the .


 Since , the sum of the probabilities of these atoms is 1. We will
allot an atom of probability to a leaf at depth on the tree.
 The depths (j) of the atoms satisfy the Kraft inequality, we can
always construct such a tree with all the atoms at the right depths.
Random Variable Generation
Let have distribution

We find the binary expansions of these probabilities:

Hence, the atom for the expansion are:

Tree to generate a
distribution

 This procedure yields a tree that generates the random variable . We


have argued that this procedure is optimal (gives a tree of minimum
expected depth)
 (Theorem) The expected number of fair bits required by the optimal
algorithm to generate a random variable lies between and :
Universal Source Coding
Challenge: For many practical situations, however, the probability
distribution underlying the source may be unknown
 One possible approach is to wait until we have seen all the data,
estimate the distribution from the data, use this distribution to
construct the best code, and then go back to the beginning and
compress the data using this code.
 This two-pass procedure is used in some applications where there
is a fairly small amount of data to be compressed.
 In yet other cases, there is no probability distribution underlying the
data—all we are given is an individual sequence of outcomes. How
well can we compress the sequence?
 If we do not put any restrictions on the class of algorithms, we get
a meaningless answer—there always exists a function that
compresses a particular sequence to one bit while leaving every
 Assume we have a random variable drawn according to a distribution
other sequence uncompressed. This function is clearly
from“overfitted”
the family ,to
where the parameter is unknown
the data.
 We wish to find an efficient code for this source
Minmax Redundancy

 If we know , we can construct a code with codeword length

 What happens if we do not know the true distribution , yet wish


to code as efficiently as possible? In this case, using a code with
codeword lengths and implied probability , we define the redundancy
of the code as the difference between the expected length of the code
and the lower limit for the expected length:

 We wish to find a code that does well irrespective of the true


distribution , and thus we define the minimax redundancy as
Redundancy and Capacity
How to compute : Take as a transition a matrix

This is a channel The capacity of this channel is given by

where

(Theorem) The capacity of a channel with rows is given by

Channel capacity is well


understood.
Shannon-Fano-Elias Arithmetic
Coding
Shannon-Fano-Elias Coding:

Motivation: using intervals to represent symbols

Consider a random variable with a ternary


alphabet , with probabilities 0.4, 0.4, and
0.2, respectively.
Let the sequence to be encoded by ACAA
 A
 AC → [0.32, 0.4) (scale with ratio )
 ACA → [0.32, 0.352)
 ACAA → [0.32, 0.3328)
 The procedure is incremental and can
Combination of
be used for any blocklength
 Coding by intervals: new insight
“ 火车刚发明的时候比马车还慢”
Lempel-Ziv Coding:
Introduction
 Use dictionaries for compression dates back to the invention of the
telegraph.
 “25: Merry Christmas”
The idea
“26:ofMay
adaptive dictionary-based
Heaven’s schemes
choicest blessings was not explored
be showered until
on the newly
Ziv and Lempel
married wrote their papers in 1977 and 1978. The two papers
couple.”
describe two distinct versions of the algorithm. We refer to these
versions as LZ77 or sliding window Lempel–Ziv and LZ78 or tree-
structured Lempel–Ziv.

Abraham Lempel Yaakov Ziv


Gzip, pkzip, compress in
unix, GIF
Lempel-Ziv Coding: Sliding
Window
The key idea of the Lempel–Ziv algorithm is to parse the string into
phrases and to replace phrases by pointers to where the same string has
Sliding Window
occurred Lempel–Ziv Algorithm
in the past.
 We assume that we have a string to be compressed from a finite
alphabet. A parsing S of a string is a division of the string into
phrases, separated by commas. Let be the length of the window.
 Assume that we have compressed the string until time . Then to find
the next phrase, find the largest such that for some the string of
length starting at is equal to the string (of length ) starting at (i.e.,
for all ). The next phrase is then of length (i.e., ) and is represented by
the pair where is the location of the beginning of the match and is the
length of the match.
 If a match is not found in the window, the next character is sent
uncompressed.

0101010101010101011010101010101101, W = 7
0101010101010101011010101010101101
0101010101010101011010101010101101
Find the maximum repeated substring inside
Lempel-Ziv Coding: Sliding
Window
0101010101010101011010101010101101, W = 6
0101010101010101011010101010101101
0101010101010101011010101010101101
Find the maximum repeated substring inside
, ABBABBABBBAABABA

ABBABBABBBAABABA
A BBABBABBBAABABA
A, B BABBABBBAABABA

A, B, B ABBABBBAABABA

A, B, B, ABBABB BAABABA

A, B, B, ABBABB, BA ABABA

A, B, B, ABBABB, BA, A BABA

A, B, B, ABBABB, BA, A, BA BA

A, B, B, ABBABB, BA, A, BA, BA


Lempel-Ziv Coding: Tree-
Structured
 In the 1978 paper, Ziv and Lempel described an algorithm that parses
a string into phrases, where each phrase is the shortest phrase
not seen earlier.
 This algorithm can be viewed as building a dictionary in the form of a
tree, where the nodes correspond to phrases seen so far.
 Find a string in a set of strings: Trie
ABBABBABBBAABABAA
ABBABBABBBAABABAA
A BBABBABBBAABABAA
A, B BABBABBBAABABAA

A, B, BA BBABBBAABABAA

A, B, BA, BB ABBBAABABAA

A, B, BA, BB, AB BBAABABAA

A, B, BA, BB, AB, BBA ABABAA

A, B, BA, BB, AB, BBA, ABA BAA

A, B, BA, BB, AB, BBA, ABA,


BAA
Optimality of LZ77, LZ78
Ref. Ch. 13.5 T. Cover
Summary
Cover: 5.11, 13.1, 13.3, 13.4, 13.5

You might also like