Compression and Coding Algorithms
Compression and Coding Algorithms
net/publication/220695872
CITATIONS READS
153 1,984
2 authors:
All content following this page was uploaded by Andrew Turpin on 20 May 2014.
by
Alistair Moffat
The University of Melbourne, Australia
and
Andrew Turpin
Curtin University of Technology, Australia
Preface vii
2 Fundamental Limits 15
2.1 Information content . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Kraft inequality . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Human compression . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Mechanical compression systems . . . . . . . . . . . . . . . . 20
3 Static Codes 29
3.1 Unary and binary codes . . . . . . . . . . . . . . . . . . . . . 29
3.2 Elias codes . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
3.3 Golomb and Rice codes . . . . . . . . . . . . . . . . . . . . . 36
3.4 Interpolative coding . . . . . . . . . . . . . . . . . . . . . . . 42
3.5 Making a choice . . . . . . . . . . . . . . . . . . . . . . . . . 48
4 Minimum-Redundancy Coding 51
4.1 Shannon-Fano codes . . . . . . . . . . . . . . . . . . . . . . 51
4.2 Huffman coding . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.3 Canonical codes . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.4 Other decoding methods . . . . . . . . . . . . . . . . . . . . 63
4.5 Implementing Huffman’s algorithm . . . . . . . . . . . . . . 66
4.6 Natural probability distributions . . . . . . . . . . . . . . . . 70
4.7 Artificial probability distributions . . . . . . . . . . . . . . . 78
4.8 Doing the housekeeping chores . . . . . . . . . . . . . . . . . 81
4.9 Related material . . . . . . . . . . . . . . . . . . . . . . . . . 88
v
PAGE VI C OMPRESSION AND C ODING A LGORITHMS
5 Arithmetic Coding 91
5.1 Origins of arithmetic coding . . . . . . . . . . . . . . . . . . 92
5.2 Overview of arithmetic coding . . . . . . . . . . . . . . . . . 93
5.3 Implementation of arithmetic coding . . . . . . . . . . . . . . 98
5.4 Variations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Binary arithmetic coding . . . . . . . . . . . . . . . . . . . . 118
5.6 Approximate arithmetic coding . . . . . . . . . . . . . . . . . 122
5.7 Table-driven arithmetic coding . . . . . . . . . . . . . . . . . 127
5.8 Related material . . . . . . . . . . . . . . . . . . . . . . . . . 130
References 257
Index 271
Preface
None of us is comfortable with paying more for a service than the minimum
we believe it should cost. It seems wantonly wasteful, for example, to pay $
for a loaf of bread that we know should only cost $ , or $ more than the
sticker price of a car. And the same is true for communications costs – which
of us has not received our monthly phone bill and gone “ouch”? Common to
these cases is that we are not especially interested in reducing the amount of
product or service that we receive. We do want to purchase the loaf of bread
or the car, not half a loaf or a motorbike; and we want to make the phone calls
recorded on our bill. But we also want to pay as little as possible for the desired
level of service, to somehow get the maximal “bang for our buck”.
That is what this book is about – figuring out how to minimize the “buck”
cost of obtaining a certain amount of “bang”. The “bang” we are talking about
is the transmission of messages, just as in the case of a phone bill; and the
“buck” we seek to minimize is the dollar cost of sending that information. This
is the process of data compression; of seeking the most economical represen-
tation possible for a source message. The only simplification we make when
discussing compression methods is to suppose that bytes of storage or commu-
nications capacity and bucks of money are related, and that if we can reduce
the number of bytes of data transmitted, then the number of bucks spent will
be similarly minimal.
Data compression has emerged as an important enabling technology in a
wide variety of communications and storage applications, ranging from “disk
doubling” operating systems that provide extra storage space; to the facsim-
ile standards that facilitate the flow of business information; and to the high-
definition video and audio standards that allow maximal use to be made of
scarce satellite transmission bandwidth. Much has been written about data
compression – indeed, we can immediately recommend two excellent books,
only one of which involves either of us as an author [Bell et al., 1990, Witten
et al., 1999] – and as a research area data compression is relatively mature.
As a consequence of that maturity, it is now widely agreed that compres-
sion arises from the conjunction of two quite distinct activities, modeling and
PAGE VIII C OMPRESSION AND C ODING A LGORITHMS
Acknowledgements
One of the nice things about writing a book is getting to name names without
fear of being somehow unacademic or too personal. Here are some names,
people who in some way or another contributed to the existence of this work.
Research collaborators come first. There are many, as it has been our good
fortune to enjoy the friendship and assistance of a number of talented and gen-
erous people. Ian Witten has provided enthusiasm and encouragement over
more years than are worth counting, and lent a strategic nudge to this project at
a delicate moment. Lang Stuiver devoted considerable energy to his investiga-
tion of arithmetic coding, and much of Chapter 5 is a result of his efforts. Lang
also contributed to the interpolative coding mechanism described in Chapter 3.
Justin Zobel has been an accomplice for many years, and has contributed to
this book by virtue of his own interests [Zobel, 1997]. Others that we have
enjoyed interacting with include Abe Bookstein, Bill Teahan, Craig Nevill-
Manning, Darryl Lovato, Glen Langdon, Hugh Williams, Jeff Vitter, Jesper
Larsson, Jim Storer, John Cleary, Julien Seward, Jyrki Katajainen, Mahesh
Naik, Marty Cohn, Michael Schindler, Neil Sharman, Paul Howard, Peter Fen-
wick, Radford Neal, Suzanne Bunton, Tim C. Bell, and Tomi Klein. We have
also benefited from the research work undertaken by a very wide range of other
people. To those we have not mentioned explicitly by name – thank you.
Mike Liddell, Raymond Wan, Tim A.H. Bell, and Yugo Kartono Isal un-
dertook proofreading duties with enthusiasm and care. Many other past and
present students at the University of Melbourne have also contributed: Alwin
Ngai, Andrew Bishop, Gary Eddy, Glen Gibb, Mike Ciavarella, Linh Huynh,
Owen de Kretser, Peter Gill, Tetra Lindarto, Trefor Morgan, Tony Wirth, Vo
Ngoc Anh, and Wayne Salamonsen. We also thank the Australian Research
Council, for their funding of the various projects we have been involved in;
our two Departments, who have provided environments in which projects such
as this are feasible; Kluwer, who took it out of our hands and into yours; and
Gordon Kraft, who provided useful information about his father.
Family come last in this list, but first where it counts. Aidan, Allison, Anne,
Finlay, Kate, and Thau Mee care relatively little for compression, coding, and
algorithms, but they know something far more precious – how to take us away
from our keyboards and help us enjoy the other fun things in the world. It is
because of their influence that we plant our tongues in our cheeks and suggest
that you, the reader, take a minute now to look out your window. Surely there
is a nice leafy spot outside somewhere for you to do your reading?
Alistair Moffat, Andrew Turpin,
Melbourne, Australia Perth, Australia
January 2002