0% found this document useful (0 votes)
30 views416 pages

Zoran Gacovski Information and Coding Theory in Computer Science

The document is a comprehensive book titled 'Information and Coding Theory in Computer Science' edited by Zoran Gacovski, published by Arcler Press in 2023. It covers various topics in information theory, coding methods, and their applications in computer science, featuring contributions from multiple authors. The book includes chapters on cognitive radio systems, data compression techniques, and Shannon entropy, among others, and is available in both e-book and hardcover formats.

Uploaded by

ion dogeanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views416 pages

Zoran Gacovski Information and Coding Theory in Computer Science

The document is a comprehensive book titled 'Information and Coding Theory in Computer Science' edited by Zoran Gacovski, published by Arcler Press in 2023. It covers various topics in information theory, coding methods, and their applications in computer science, featuring contributions from multiple authors. The book includes chapters on cognitive radio systems, data compression techniques, and Shannon entropy, among others, and is available in both e-book and hardcover formats.

Uploaded by

ion dogeanu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 416

Information and Coding Theory

in Computer Science
INFORMATION AND CODING
THEORY IN COMPUTER SCIENCE

Edited by:
Zoran Gacovski

ARCLER
P r e s s

www.arclerpress.com
Information and Coding Theory in Computer Science
Zoran Gacovski

Arcler Press
224 Shoreacres Road
Burlington, ON L7L 2H2
Canada
www.arclerpress.com
Email: [email protected]

e-book Edition 2023

ISBN: 978-1-77469-610-1 (e-book)

This book contains information obtained from highly regarded resources. Reprinted
material sources are indicated. Copyright for individual articles remains with the au-
thors as indicated and published under Creative Commons License. A Wide variety of
references are listed. Reasonable efforts have been made to publish reliable data and
views articulated in the chapters are those of the individual contributors, and not neces-
sarily those of the editors or publishers. Editors or publishers are not responsible for
the accuracy of the information in the published chapters or consequences of their use.
The publisher assumes no responsibility for any damage or grievance to the persons or
property arising out of the use of any materials, instructions, methods or thoughts in the
book. The editors and the publisher have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission
has not been obtained. If any copyright holder has not been acknowledged, please write
to us so we may rectify.

Notice: Registered trademark of products or corporate names are used only for explana-
tion and identification without intent of infringement.

© 2023 Arcler Press

ISBN: 978-1-77469-446-6 (Hardcover)

Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
DECLARATION

Some content or chapters in this book are open access copyright free
published research work, which is published under Creative Commons
License and are indicated with the citation. We are thankful to the
publishers and authors of the content and chapters as without them this
book wouldn’t have been possible.
ABOUT THE EDITOR

Dr. Zoran Gacovski’s current position is a full professor at the Faculty of Technical
Sciences, “Mother Tereza” University, Skopje, Macedonia. His teaching subjects
include Software engineering and Intelligent systems, and his areas of research are:
information systems, intelligent control, machine learning, graphical models (Petri,
Neural and Bayesian networks), and human-computer interaction. Prof. Gacovski
has earned his PhD degree at Faculty of Electrical engineering, UKIM, Skopje. In his
career he was awarded by Fulbright postdoctoral fellowship (2002) for research stay at
Rutgers University, USA. He has also earned best-paper award at the Baltic Olympiad
for Automation control (2002), US NSF grant for conducting a specific research in the
field of human-computer interaction at Rutgers University, USA (2003), and DAAD
grant for research stay at University of Bremen, Germany (2008 and 2012). The projects
he took an active participation in, are: “A multimodal human-computer interaction and
modelling of the user behaviour” (for Rutgers University, 2002-2003) - sponsored by
US Army and Ford; “Development and implementation of algorithms for guidance,
navigation and control of mobile objects” (for Military Academy – Skopje, 1999-2002);
“Analytical and non-analytical intelligent systems for deciding and control of uncertain
complex processes” (for Macedonian Ministry of Science, 1995-1998). He is the author
of 3 books (including international edition “Mobile Robots”), 20 journal papers, over 40
Conference papers, and he is also a reviewer/ editor for IEEE journals and Conferences.
TABLE OF CONTENTS

List of Contributors........................................................................................xv
List of Abbreviations..................................................................................... xxi
Preface................................................................................................... ....xxiii

Section 1: Information Theory Methods and Approaches

Chapter 1 Information Theory of Cognitive Radio System.......................................... 3


Introduction................................................................................................ 3
Cognitive Radio Network Paradigms........................................................... 4
Interference-Mitigating Cognitive Behavior: The Congnitive
Radio Channel.................................................................................. 8
Interference Avoiding Channel.................................................................. 13
Colloborative Cognitive Channel.............................................................. 15
Comparsions............................................................................................. 17
References................................................................................................ 19

Chapter 2 Information Theory and Entropies for Quantized Optical


Waves in Complex Time-Varying Media.................................................. 23
Introduction.............................................................................................. 23
Quantum Optical Waves in Time-Varying Media...................................... 25
Information Measures for Thermalized Quantum Optical Fields................ 29
Husimi Uncertainties and Uncertainty Relations....................................... 34
Entropies and Entropic Uncertainty Relations............................................ 35
Application to a Special System................................................................ 37
Summary and Conclusion......................................................................... 38
References................................................................................................ 42
Chapter 3 Some Inequalities in Information Theory Using Tsallis Entropy............... 45
Abstract.................................................................................................... 45
Introduction.............................................................................................. 46
Formulation of the Problem...................................................................... 47
Mean Codeword Length and its Bounds.................................................... 49
Conclusion............................................................................................... 53
References................................................................................................ 54

Chapter 4 The Computational Theory of Intelligence: Information Entropy............. 57


Abstract.................................................................................................... 57
Introduction.............................................................................................. 58
Entropy..................................................................................................... 59
Intelligence: Definition and Assumptions.................................................. 61
Global Effects........................................................................................... 65
Applications............................................................................................. 66
Related Works.......................................................................................... 69
Conclusions.............................................................................................. 70
References................................................................................................ 72

Section 2: Block and Stream Coding

Chapter 5 Block-Split Array Coding Algorithm for Long-Stream Data


Compression............................................................................................ 75
Abstract.................................................................................................... 75
Introduction.............................................................................................. 76
Problems of Long-Stream Data Compression for Sensors........................... 78
CZ-Array Coding...................................................................................... 85
Analyses of CZ-Array Algorithm................................................................ 96
Experiment Results.................................................................................. 102
Conclusions............................................................................................ 110
Acknowledgments.................................................................................. 111
References.............................................................................................. 112

Chapter 6 Bit-Error Aware Lossless Image Compression with


2D-Layer-Block Coding.......................................................................... 115
Abstract.................................................................................................. 115
Introduction............................................................................................ 116

x
Related Work on Lossless Compression.................................................. 119
Our Proposed Method............................................................................ 121
Experiments............................................................................................ 129
Conclusions............................................................................................ 142
Acknowledgments.................................................................................. 143
References.............................................................................................. 144

Chapter 7 Beam Pattern Scanning (BPS) versus Space-Time Block


Coding (STBC) and Space-Time Trellis Coding (STTC)........................... 149
Abstract.................................................................................................. 149
Introduction............................................................................................ 150
Introduction of STBC, STTC and BPS Techniques.................................... 152
BPS versus STBC, STTC........................................................................... 159
Simulations............................................................................................. 161
Conclusions............................................................................................ 170
References.............................................................................................. 171

Chapter 8 Partial Feedback Based Orthogonal Space-Time Block


Coding With Flexible Feedback Bits....................................................... 175
Abstract.................................................................................................. 175
Introduction............................................................................................ 176
Proposed Code Construction and System Model..................................... 177
Linear Decoder at the Receiver............................................................... 178
Feedback Bits Selection and Properties................................................... 179
Simulation Results.................................................................................. 183
Conclusions............................................................................................ 185
References.............................................................................................. 186

Chapter 9 Rateless Space-Time Block Codes for 5G Wireless


Communication Systems........................................................................ 187
Abstract.................................................................................................. 187
Introduction............................................................................................ 188
Concept of Rateless Codes...................................................................... 189
Rateless Coding and Hybrid Automatic Retransmission Query................ 191
Rateless Codes’ Literature Review........................................................... 192
Rateless Codes Applications................................................................... 196

xi
Motivation to Rateless Space-Time Coding.............................................. 197
Rateless Space-Time Block Code for Massive MIMO Systems................. 197
Conclusion............................................................................................. 202
References.............................................................................................. 204

Section 3: Lossless Data Compression

Chapter 10 Lossless Image Compression Technique Using


Combination Methods............................................................................ 211
Abstract.................................................................................................. 211
Introduction............................................................................................ 212
Literature Review.................................................................................... 214
The Proposed Method............................................................................. 216
Conclusions............................................................................................ 229
Future Work............................................................................................ 230
References.............................................................................................. 231

Chapter 11 New Results in Perceptually Lossless Compression of


Hyperspectral Images............................................................................. 233
Abstract.................................................................................................. 233
Introduction............................................................................................ 234
Data and Approach................................................................................. 236
Experimental Results............................................................................... 242
Conclusion............................................................................................. 264
Acknowledgements................................................................................ 265
References.............................................................................................. 266

Chapter 12 Lossless compression of digital mammography using base


switching method................................................................................... 269
Abstract.................................................................................................. 269
Introduction............................................................................................ 270
Base-Switching Algorithm....................................................................... 274
Proposed Method................................................................................... 277
Results.................................................................................................... 284
Conclusions............................................................................................ 285
References.............................................................................................. 286

xii
Chapter 13 Lossless Image Compression Based
on Multiple-Tables Arithmetic Coding................................................... 289
Abstract.................................................................................................. 290
Introduction............................................................................................ 290
The MTAC Method................................................................................. 291
Experiments............................................................................................ 300
Conclusions............................................................................................ 304
References.............................................................................................. 305

Section 4: Information and Shannon Entropy

Chapter 14 Entropy—A Universal Concept in Sciences............................................ 309


Abstract.................................................................................................. 309
Introduction............................................................................................ 310
Entropy as a Qualificator of the Configurational Order........................... 314
The Concept of Entropy in Thermodynamics and Statistical Physics........ 315
The Shannon-Like Entropies.................................................................... 318
Conclusions............................................................................................ 323
Appendix................................................................................................ 324
Notes ..................................................................................................... 328
References.............................................................................................. 331

Chapter 15 Shannon entropy: Axiomatic Characterization and Application............. 333


Introduction............................................................................................ 334
Shannon Entropy: Axiomatic Characterization........................................ 334
Total Shannon Entropy and Entropy of Continuous Distribution.............. 337
Application: Differential Entropy and Entropy in Classical Statistics........ 339
Conclusion............................................................................................. 341
References.............................................................................................. 343

Chapter 16 Shannon Entropy in Distributed Scientific Calculations on


Mobiles Ad-Hoc Networks (MANETs).................................................... 345
Abstract.................................................................................................. 345
Introduction............................................................................................ 346
Measuring the Problem........................................................................... 346
Simulation.............................................................................................. 352

xiii
Conclusion............................................................................................. 358
References.............................................................................................. 360

Chapter 17 Advancing Shannon Entropy for Measuring Diversity in Systems........... 361


Abstract.................................................................................................. 361
Introduction............................................................................................ 362
Renormalizing Probability: Case-Based Entropy and the
Distribution of Diversity................................................................ 365
Case-Based Entropy of a Continuous Random Variable........................... 367
Results.................................................................................................... 369
Using 𝐶𝑐 to Compare and Contrast Systems............................................. 377
Conclusion............................................................................................. 379
Acknowledgments.................................................................................. 380
References.............................................................................................. 381

Index...................................................................................................... 383

xiv
LIST OF CONTRIBUTORS

F. G. Awan
University of Engineering and Technology Lahore, 54890 Pakistan

N. M. Sheikh
University of Engineering and Technology Lahore, 54890 Pakistan

M. F. Hanif
University of the Punjab Quaid-e-Azam Campus 54590, Lahore Pakistan

Jeong Ryeol Choi


Department of Radiologic Technology, Daegu Health College, Yeongsong-ro 15, Buk-
gu, Daegu 702-722, Republic of Korea

Litegebe Wondie
Department of Mathematics, College of Natural and Computational Science, University
of Gondar, Gondar, Ethiopia

Satish Kumar
Department of Mathematics, College of Natural and Computational Science, University
of Gondar, Gondar, Ethiopia

Daniel Kovach
Kovach Technologies, San Jose, CA, USA

Qin Jiancheng
School of Electronic and Information Engineering, South China University of
Technology, Guangdong, China

Lu Yiqin
School of Electronic and Information Engineering, South China University of
Technology, Guangdong, China

Zhong Yu
Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China
School of Software, South China University of Technology, Guangdong, China
Jungan Chen
Department of Electronic and Computer Science, Zhejiang Wanli University, Ningbo,
China

Jean Jiang
College of Technology, Purdue University Northwest, Indiana, USA

Xinnian Guo
Department of Electronic Information Engineering, Huaiyin Institute of Technology,
Huaian, China

Lizhe Tan
Department of Electrical and Computer Engineering, Purdue University Northwest,
Indiana, USA

Peh Keong The


Department of Electrical and Computer Engineering, Michigan Technology University,
Houghton, Michigan, USA

Seyed (Reza) Zekavat


Department of Electrical and Computer Engineering, Michigan Technology University,
Houghton, Michigan, USA

Lei Wang
School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an,
China.

Zhigang Chen
School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an,
China.

Ali Alqahtani
College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia

A. Alarabeyyat
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan

S. Al-Hashemi
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan

xvi
T. Khdour
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan

M. Hjouj Btoush
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan

S. Bani-Ahmad
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan

R. Al-Hashemi
The Computer Information Systems Department, College of Information Technology,
Al-Hussein Bin Talal University, Ma’an, Jordan.

Chiman Kwan
Applied Research LLC, Rockville, Maryland, USA

Jude Larkin
Applied Research LLC, Rockville, Maryland, USA

Ravi kumar Mulemajalu


Department of IS&E., KVGCE, Sullia, Karnataka, India

Shivaprakash Koliwad
Department of E&C., MCE, Hassan, Karnataka, India.

Rung-Ching Chen
Department of Information Management, Chaoyang University of Technology, No.
168, Jifong E. Rd., Wufong Township Taichung County 41349, Taiwan

Pei-Yan Pai
Department of Computer Science, National Tsing-Hua University, No. 101, Section 2,
Kuang-Fu Road, Hsinchu 300, Taiwan

Yung-Kuan Chan
Department of Management Information Systems, National Chung Hsing University,
250 Kuo-kuang Road, Taichung 402, Taiwan

xvii
Chin-Chen Chang
Department of Computer Science, National Tsing-Hua University, No. 101, Section 2,
Kuang-Fu Road, Hsinchu 300, Taiwan
Department of Information Engineering and Computer Science, Feng Chia University,
No. 100 Wenhwa Rd., Seatwen, Taichung 407, Taiwan

Vladimír Majerník
Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia

C. G. Chakrabarti
Department of Applied Mathematics, University of Calcutta, India

Indranil Chakrabarty
Department of Mathematics, Heritage Institute of Technology, Chowbaga Road,
Anandapur, India

Pablo José Iuliano


LINTI, Facultad de Informatica UNLP, La Plata, Argentina

Luís Marrone
LINTI, Facultad de Informatica UNLP, La Plata, Argentina

R. Rajaram
Department of Mathematical Sciences, Kent State University, Kent, OH, USA

B. Castellani
Department of Sociology, Kent State University, 3300 Lake Rd. West, Ashtabula, OH,
USA

A. N. Wilson
School of Social and Health Sciences, Abertay University, Dundee DD1 1HG, UK

xviii
LIST OF ABBREVIATIONS

5G Fifth-Generation
ACK Acknowledgments
AMC Adaptive Modulation And Coding
AR Augmented Reality
ARM Advanced RISC Machines
ARQ Automatic Repeat reQuest
ASIC Application-Specific Integrated Circuit
AVIRIS Airborne Visible Infrared Imaging Spectrometer
AWGN Additive White Gaussian Noise
BAB Block Array Builder
BB-ANS Bits Back With Asymmetric Numeral Systems
BBM Bialynicki-Birula and Mycielski
BCH Bose, Chaudhuri and Hocquenghem
BEC Backward Error Correction
BECH Binary Erasure Channel
BER Bit Error Rate
BPS Beam Pattern Scanning
BS Base Station
BS Base switching
BSC Binary Symmetrical Channel
BST Base-Switching Transformation
BWT Burrows-Wheeler Transform
CAD Computer Aided Diagnosis
CDF Cumulative Distribution Function
CI Computational Intelligence
CNN Convolutional Neural Network
CoBALP+ Context-Based Adaptive Linear Prediction
CR Cognitive Radio
CR Compression Ratio
CSI Channel State Information
DCT Discrete Cosine Transform
DMT Diversity Multiplexing Tradeoff
DST Dempster-Shafer Theory of Evidence
EDP Edge-Directed Prediction
EGC Equal Gain Combining
EURs Entropic Uncertainty Relations
FBB Fixed Block Based
FCC Federal Communications Commission
FEC Forward Error Correction
FLIF Free Lossless Image Format
FPGA Field-Programmable Gate Array
GAI General Artificial Intelligence
GAP Gradient Adjusted Predictor
GED Gradient Edge Detection
GP Gel’fand Pinsker
HD High-Definition
HIS Hyperspectral Images
HPBW Half Power Beam Width
HUR Heisenberg Uncertainty Relation
HVS Human Visual Systems
IC-DMS Interference Channel with Degraded Message Sets
IoT Internet of Things
IR Industrial Revolution
KLT Karhunen-Loeve Transform
LDGM Low-Density Generator Matrix
LDPC Low-Density Parity Check Code
LS Least-Square
LT Luby Transform
LZW Lempel-Ziv-Welch
MAC Multiple Access Channel
MAI Multiple Access Interference
MANETs Mobile Ad-Hoc Networks

xx
MANIAC Meta-Adaptive Near-Zero Integer Arithmetic Coding
MBMS Multimedia Broadcast Multicast System
MC-CDMA Multi-Carrier Code Division Multiple Access
MED Median Edge Detector
MGF Moment Generating Function
MIAS Mammography Image Analysis
MIMO Multiple Input Multiple Output
ML Maximum Likelihood
MLB Matching Link Builder
MLP Multi-Level Progressive
MMSE Minimum Mean Square Error
MRC Maximal Ratio Combining
MTAC Multiple-Tables Arithmetic Coding
MTF Move-To-Front
MUI Multiuser Interference
OFDM Orthogonal Frequency Division Multiplexing
OSI Open Systems Interconnection
OSTBC Orthogonal Space-Time Block Coding
PC Pilot Contamination
PCA Principal Component Analysis
PNG Portable Network Graphics
PPM Prediction By Partial Matching
PPMM Partial Precision Matching Method
PSNR Peak-Signal-To-Noise Ratio
PU Primary User
QoS Quality of Service
RAM Random Access Memory
RCs Rateless Codes
RestNet Residual Neural Network
RF Radio Frequency
RLE Run-Length Encoding
RLS Recursive Least Squares
RNN Recurrent Neural Network
ROSIS Reflective Optics System Imaging Spectrometer

xxi
RSTBC Rateless Space-Time Block Code
SB Split Band
SDMA Spatial Division Multiple Access
SER Symbol-Error-Rate
SINR Signal-To-Interference-And-Noise Ratio
SISO Single-Input Single-Output
SNR Signal-To-Noise Ratio
STBC Space-Time Block Coding
STC Space-Time Coding
STTC Space-Time Trellis Coding
SVD Singular Value Decomposition
TMBA Two Modules Based Algorithm
UWB Ultrawideband
V-BLAST Vertical Bell Labs Layered Spacetime
VBSS Variable Block Size Segmentation
VM Virtual Memory
VSI Visual Saliency-based Index
WBAN Wearable Body Area Network
WSN Wireless Sensor Network
XML Extensible Markup Language

xxii
PREFACE

Coding theory is a field that studies the codes, their properties and their
suitability for specific applications. Codes are used for data compression,
cryptography, error detection and correction, data transfer and data storage.
Codes are studied in a variety of scientific disciplines - such as information
theory, electrical engineering, mathematics, linguistics, and computer science -
in order to design efficient and reliable data transmission methods. This usually
involves removing redundant digits ​​and detecting / correction of errors in the
transmitted data.
There are four types of coding:
• Data compression (or, source coding)
• Error control (or channel peeling)
• Cryptographic coding
• Line coding
Data compression tries to remove redundant data from a source in order to
transfer it as efficiently as possible. For example, Zip data compression
makes data files smaller for purposes such as reducing Internet traffic. Data
compression and error correction can be studied in combination.
Error correction adds extra bits to make data transmission more robust to
interference that occurs on the transmission channel. The average user may not
be aware of the many types of applications that use error correction. A typical
music CD uses the Reed-Solomon code to correct the impact of scratches and
dust. In this application, the transmission channel is the CD itself. Mobile
phones also use coding techniques to correct attenuation and high frequency
transmission noise. Data modems, telephone transmissions, and NASA’s Deep
Space Network all use channel coding techniques to transmit bits, such as turbo
code and LDPC codes.
This edition covers different topics from information theory methods and
approaches, block and stream coding, lossless data compression, and information
and Shannon entropy.
Section 1 focuses on information theory methods and approaches, describing
information theory of cognitive radio system, information theory and entropies
for quantized optical waves in complex time-varying media, some inequalities
in information theory using Tsallis entropy, and computational theory of
intelligence: information entropy.
Section 2 focuses on block and stream coding, describing block-split array coding
algorithm for long-stream data compression, bit-error aware lossless image
compression with 2d-layer-block coding, beam pattern scanning (BPS) versus
space-time block coding (STBC) and space-time trellis coding (STTC), partial
feedback based orthogonal space-time block coding with flexible feedback bits,
and rate-less space-time block codes for 5g wireless communication systems.
Section 3 focuses on lossless data compression, describing lossless image
compression technique using combination methods, new results in perceptually
lossless compression of hyperspectral images, lossless compression of digital
mammography using base switching method, and lossless image compression
based on multiple-tables arithmetic coding.
Section 4 focuses on information and Shannon entropy, describing entropy as
universal concept in sciences, Shannon entropy - axiomatic characterization
and application, Shannon entropy in distributed scientific calculations on
mobiles ad-hoc networks (MANETs), the computational theory of intelligence:
information entropy, and advancing Shannon entropy for measuring diversity
in systems.
SECTION 1: INFORMATION THEORY
METHODS AND APPROACHES
Chapter

INFORMATION THEORY
OF COGNITIVE RADIO
SYSTEM
1

F. G. Awan1, N. M. Sheikh1, and M. F. Hanif2


University of Engineering and Technology Lahore, 54890 Pakistan
1

University of the Punjab Quaid-e-Azam Campus 54590, Lahore Pakistan


2

INTRODUCTION
Cognitive radio (CR) carries bright prospects for very efficient utilization
of spectrum in future. Since cognitive radio is still in its early days, many
of its theoretical limits are yet to be explored. In particular, determination
of its maximum information rates for the most general case is still an open
problem. Till today, many cognitive channel models have been presented.
Either achievable or maximum bit rates have been evaluated for each of these.
This chapter will summarize all the existing results, makes a comparison
between different channel models and draw useful conclusions.

Citation: F. G. Awan, N. M. Sheikh and M. F. Hanif, “Information Theory of Cognitive


Radio System”, Cognitive Radio Systems, 2009. DOI: 10.5772/7843.
Copyright: © 2009 The Author(s) and IntechOpen. This chapter is distributed under
the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is
properly cited.
4 Information and Coding Theory in Computer Science

The scarcity of the radio frequency (RF) spectrum along with its severe
underutilization, as suggested by various government bodies like the
Federal Communications Commission (FCC) in USA and Ofcom in UK like
Rafsh [1] and [2] has triggered immense research activity on the concept
of CR all over the world. Of the many facets that need to be dealt with,
the information theoretic modeling of CR is of core importance, as it helps
predict the fundamental limit of its maximum reliable data transmission.
The information theoretic model proposed in [3] represents the real world
scenario that the CR will have to encounter in the presence of primary user
(PU) devices. Authors in [3] characterize the CR system as an interference
channel with degraded message sets (IC-DMS), since the spectrum
sensing nature of the CR may enable its transmitter (TX) to know PU’s
message provided the PU is in close proximity of the CR. Elegantly using a
combination of rate-splitting [4] and Gel’fand Pinsker (GP) [5] coding, [3]
has given an achievable rate region of the so called CR-channel or IC-DMS.
Further, in [3] time sharing is performed between the two extreme cases
when either the CR dedicates zero power (“highly polite”) or full power
(“highly rude”) to its message. A complete review of information theoretic
studies can be found in [6] and [7].
This chapter then discusses outer-bounds to the individual rates and
the conditions under which these bounds become tight for the symmetric
Gaussian CR channel in the low interference gain regime. The CR transmitter
is assumed to use dirty paper coding while deriving the outer-bounds. The
capacity of the CR channel in the low interference scenario is known when
the CR employs “polite” approach by devoting some portion of its power
to transmit PU’s message that will help calculating quality of service for
the CR users. Then, we will focus on the scenario when the CR goes for the
“rude” approach i.e., does not relay PU’s message and tries to maximize its
own rates only. It will be derived that when both CR and the PU operate in
low interference gain regime, then treating interference as additive noise at
the PU receiver and doing dirty paper coding at the CR is nearly optimal.

COGNITIVE RADIO NETWORK PARADIGMS


Since its introduction in [8], the definition of cognitive radio has evolved
over the years. Consequently, different interpretations of cognitive radio
and different visions for its future exist today. In this section we describe a
few communication models that have been proposed for cognitive radio. We
Information Theory of Cognitive Radio System 5

broadly classify them into overlay or known interference models, underlay


or interference avoidance models.

Underlay Paradigm
The underlay paradigm encompasses techniques that allow communication
by the cognitive radio assuming it has knowledge of the interference
caused by its transmitter to the receivers of all noncognitive users [7].
In this setting the cognitive radio is often called a secondary user which
cannot significantly interfere with the communication of existing (typically
licensed) users, who are referred to as primary users. Specifically, the
underlay paradigm mandates that concurrent noncognitive and cognitive
transmissions may occur only if the interference generated by the cognitive
devices at the noncognitive receivers is below some acceptable threshold.
The interference constraint for the noncognitive users may be met by using
multiple antennas to guide the cognitive signals away from the noncognitive
receivers, or by using a wide bandwidth over which the cognitive signal
can be spread below the noise floor, then despread at the cognitive receiver.
The latter technique is the basis of both spread spectrum and ultrawideband
(UWB) communications.
The interference caused by a cognitive transmitter to a noncognitive
receiver can be approximated via reciprocity if the cognitive transmitter can
overhear a transmission from the cognitive receiver’s location. Alternatively,
the cognitive transmitter can be very conservative in its output power to
ensure that its signal remains below the prescribed interference threshold.
In this case, since the interference constraints in underlay systems are
typically quite restrictive, this limits the cognitive users to short range
communications.
While the underlay paradigm is most common in the licensed spectrum
(e.g. UWB underlays many licensed spectral bands), it can also be used in
unlicensed bands to provide different classes of service to different users.

Overlay Paradigm
The enabling premise for overlay systems is that the cognitive transmitter
has knowledge of the noncognitive users’ codebooks and its messages as
well [7]. The codebook information could be obtained, for example, if the
noncognitive users follow a uniform standard for communication based on
a publicized codebook. Alternatively, they could broadcast their codebooks
periodically. A noncognitive user message might be obtained by decoding
6 Information and Coding Theory in Computer Science

the message at the cognitive receiver. However, the overlay model assumes
the noncognitive message is known at the cognitive transmitter when the
noncognitive user begins its transmission. While this is impractical for an
initial transmission, the assumption holds for a message retransmission
where the cognitive user hears the first transmission and decodes it, while
the intended receiver cannot decode the initial transmission due to fading or
interference. Alternatively, the noncognitive user may send its message to the
cognitive user (assumed to be close by) prior to its transmission. Knowledge
of a noncognitive user’s message and/or codebook can be exploited in a
variety of ways to either cancel or mitigate the interference seen at the
cognitive and noncognitive receivers. On the one hand, this information can
be used to completely cancel the interference due to the noncognitive signals
at the cognitive receiver by sophisticated techniques like dirty paper coding
[9]. On the other hand, the cognitive users can utilize this knowledge and
assign part of their power for their own communication and the remainder
of the power to assist (relay) the noncognitive transmissions. By careful
choice of the power split, the increase in the noncognitive user’s signal-to-
noise power ratio (SNR) due to the assistance from cognitive relaying can
be exactly offset by the decrease in the noncognitive user’s SNR due to the
interference caused by the remainder of the cognitive user’s transmit power
used for its own communication. This guarantees that the noncognitive
user’s rate remains unchanged while the cognitive user allocates part of its
power for its own transmissions. Note that the overlay paradigm can be
applied to either licensed or unlicensed band communications. In licensed
bands, cognitive users would be allowed to share the band with the licensed
users since they would not interfere with, and might even improve, their
communications. In unlicensed bands cognitive users would enable a higher
spectral efficiency by exploiting message and codebook knowledge to
reduce interference.

Interweave Paradigm
The ‘interweave’ paradigm is based on the idea of opportunistic
communication, and was the original motivation for cognitive radio [10].
The idea came about after studies conducted by the FCC [8] and industry
[2] showed that a major part of the spectrum is not utilized most of the time.
In other words, there exist temporary space-time frequency voids, referred
to as spectrum holes, that are not in constant use in both the licensed and
unlicensed bands.
Information Theory of Cognitive Radio System 7

These gaps change with time and geographic location, and can be exploited
by cognitive users for their communication. Thus, the utilization of spectrum
is improved by opportunistic frequency reuse over the spectrum holes. The
interweave technique requires knowledge of the activity information of the
noncognitive (licensed or unlicensed) users in the spectrum. One could also
consider that all the users in a given band are cognitive, but existing users
become primary users, and new users become secondary users that cannot
interfere with communications already taking place between existing users.
To summarize, an interweave cognitive radio is an intelligent wireless
communication system that periodically monitors the radio spectrum,
intelligently detects occupancy in the different parts of the spectrum and
then opportunistically communicates over spectrum holes with minimal
interference to the active users. For a fascinating motivation and discussion
of the signal processing challenges faced in interweave cognitive radio, we
refer the reader to [11].
Table 1 [12] summarizes the differences between the underlay, overlay
and interweave cognitive radio approaches. While underlay and overlay
techniques permit concurrent cognitive and noncognitive communication,
avoiding simultaneous transmissions with noncognitive or existing users
is the main goal in the interweave technique. We also point out that the
cognitive radio approaches require different amounts of side information:
underlay systems require knowledge of the interference caused by the
cognitive transmitter to the noncognitive receiver(s), interweave systems
require considerable side information about the noncognitive or existing
user activity (which can be obtained from robust primary user sensing) and
overlay systems require a large amount of side information (non-causal
knowledge of the noncognitive user’s codebook and possibly its message).
Apart from device level power limits, the cognitive user’s transmit power
in the underlay and interweave approaches is decided by the interference
constraint and range of sensing, respectively. While underlay, overlay and
interweave are three distinct approaches to cognitive radio, hybrid schemes
can also be constructed that combine the advantages of different approaches.
For example, the overlay and interweave approaches are combined in [7].
Before launching into capacity results for these three cognitive radio
networks, we will first review capacity results for the interference channel.
Since cognitive radio networks are based on the notion of minimal
interference, the interference channel provides a fundamental building
block to the capacity as well as encoding and decoding strategies for these
networks.
                  
    
       
     †   

         
   
      
                  Š 
   
  
            
   
            ‰
8 Information
    and
 Coding
 Theory in 
  Computer

 Science
 ˆƒ
    
                
                  
Table
 1. Comparison
   of underlay,
   overlay andinterweave
  cognitive
 radio
 tech-
 
        
    
        
niques.
 



   
    
  
    
   

INTERFERENCE-MITIGATING

 COGNITIVE
     


BEHAVIOR: THE
      CONGNITIVE
  RADIO
 „  CHANNEL
       

             
    †  
  
This
  discussion
 is 
has 
beentaken
  from
 € Natasha’s
  paper. This
   consideration
   
is
simplest
 possible
 scenario
   ­ €in ­
which a cognitive
†  radio
 ‰   could
 be employed.
       
             Š     
Assume
  there exists
  a primary
  transmitter
  and
   receiver
  pair (S1
  — R1),
   as

well
  asthe
cognitive secondary
   ­ transmitter and receiver
‡ƒ     pair 
 (S2
— R2).  As
 
shown in Fig.
   

  
1.1, there are
  
three possibilities for transmitter
          
          
cooperation in
  
     
these
two point-to-point
 ‚ƒ  channels. We have
   chosen
 to focus
   on transmitter
   
cooperation
   because such cooperation is often more insightful and general
than receiver-side cooperation [12, 13]. Thus assume that each receiver
decodes independently. Transmitter cooperation in this figure is denoted
by a directed double line. These three channels are simple examples of
the cognitive decomposition of wireless networks seen in [14]. The three
possible types of transmitter cooperation in this simplified scenario are:
• Competitive
www.intechopen.com behavior: The two transmitters transmit
independent messages. There is no cooperation in sending the
messages, and thus the two users compete for the channel. This is
the same channel as the 2 sender, 2 receiver interference channel
[14, 15].
• Cognitive behavior: Asymmetric cooperation is possible
between the transmitters. This asymmetric cooperation is a result
of S2 knowing S1’s message, but not vice-versa. As a first step,
we idealize the concept of message knowledge: whenever the
cognitive node S2 is able to hear and decode the message of the
primary node S1, we assume it has full a priori knowledge. This
Information Theory of Cognitive Radio System 9

is called the genie assumption, as these messages could have been


given to the appropriate transmitters by a genie. The one way
double arrow indicates that S2 knows S1’s message but not vice
versa.
This is the simplest form of asymmetric non-causal cooperation at the
transmitters. Usage of the term cognitive behavior is to emphasize the need
for S2 to be a “smart” device capable of altering its transmission strategy
according to the message of the primary user. We can motivate considering
asymmetric side information in practice in three ways:
• Depending on the device capabilities, as well as the geometry and
channel gains between the various nodes, certain cognitive nodes
may be able to hear and/or obtain the messages to be transmitted
by other nodes. These messages would need to be obtained in real
time, and could exploit the geometric gains between cooperating
transmitters relative to receivers in, for example, a 2 phase
protocol [3].
• In an Automatic Repeat reQuest (ARQ) system, a cognitive
transmitter, under suitable channel conditions (if it has a better
channel to the primary transmitting node than the primary
receiver), could decode the primary user’s transmitted message
during an initial transmission attempt. In the event that the primary
receiver was not able to correctly decode the message, and it must
be retransmitted, the cognitive user would already have the to-be-
transmitted message, or asymmetric side information, at no extra
cost (in terms of overhead in obtaining the message).
• The authors in [16] consider a network of wireless sensors in
which a sensor S2 has a better sensing capability than another
sensor S1 and thus is able to sense two events, while S1 is only
able to sense one. Thus, when they wish to transmit, they must do
so under an asymmetric side-information assumption: sensor S2
has two messages, and the other has just one.
1. Cooperative behavior: The two transmitters know each others’
messages (two way double arrows) and can thus fully and sym-
metrically cooperate in their transmission. The channel pictured in
Fig. 1.1 (c) may be thought of as a two antenna sender, two single
antenna receivers broadcast channel [17].
Many of the classical, well known information theoretic channels fall into
the categories of competitive and cooperative behavior. For more details, we
10 Information and Coding Theory in Computer Science

refer the interested reader to the cognitive network decomposition theorem


of [13] and [18]. We now turn to the much less studied behavior which
spans and in a sense interpolates between the symmetric cooperative and
competitive behaviors. We call this behavior asymmetric cognitive behavior.
In this section we will consider one example of cognitive behavior: a two
sender, two receiver (with two independent messages) interference channel
with asymmetric and a priori message knowledge at one of the transmitters,
as shown in Fig. 1. (b).
Certain asymmetric (in transmitter cooperation) channels have been
considered in the literature: for example in [19], the capacity region of a
multiple access channel with asymmetric cooperation between the two
transmitters is computed. The authors in [20] consider a channel which could
involve asymmetric transmitter cooperation, and explore the conditions
under which the capacity of this channel coincides with the capacity of the
channel in which both messages are decoded at both receivers. In [21, 18] the
authors introduced the cognitive radio channel, which captures the most basic
form of asymmetric transmitter cooperation for the interference channel.
We now study the information theoretic limits of interference channels with
asymmetric transmitter cooperation, or cognitive radio channels.

Figure 1. a) Competitive behavior, the interference channel. The transmit-


ters may not cooperate. (b) Cognitive behavior, the cognitive radio channel.
Asymmetric transmitter cooperation. (c) Cooperative behavior, the two antenna
broadcast channel. The transmitters, but not the receivers, may fully and sym-
metrically cooperate.
The channel is thus expressed via following pair of equations:

Yp = X p + aX s + Z p (1)

Ys = bX p + X s + Z s (2)
Information Theory of Cognitive Radio System 11

While deriving the channel capacity, an assumption of low interference-


gain has been made. Low interference regime corresponds to the scenario
where the cognitive user is assumed to be near its own base station rather
than that of primary user, which normally is the case. When applied to the
interference channel in standard form, this situation corresponds to a ≤ 1.
At the same time the two devices are assumed to work in an environment
where the co-existence conditions exist, ensuring that cognitive radio
generates no interference with the primary user in its vicinity and the primary
receiver is a single user decoder. With all these assumptions, the channel
is named as the cognitive (1,a,b,1) channel. The capacity R s (in bps) of
the cognitive radio under the conditions existing as above is expressed in a
closed form relation as:

(3)
where α ∈ [0,1] and its value is determined using the following arguments:

To ensure that the primary user remains unconscious of the presence of


the cognitive device and communicates at a rate of , the maximum
achievable rate of the primary system is found to be:

(4)
Now for 0 < a < 1, using Intermediate Value Theorem, this quadratic
equation in a always has a unique root in [0, 1]:

(5)
It is to be noted that the above capacity expressions hold for any b∈R.
For detailed proofs the reader is referred to [8].
A few important points are worth mentioning here. Since the cognitive
radio knows both mp (the message to be transmitted by primary user) and
ms (the message to be transmitted by the cognitive device), it generates its
codeword Xns such that it also incorporates the codeword Xnp to be generated
by the primary user. By doing so, the cognitive device can implement the
concept of dirty paper coding that helps it mitigating the interference caused
by the primary user at the cognitive receiver. Thus the cognitive device
performs superposition coding as follows:
12 Information and Coding Theory in Computer Science

(6)
Where encodes ms and is generated by performing dirty paper

coding, treating as a known interference that will affect the


secondary receiver. It is evident from (6) that the secondary device uses part
of its power 𝛼Ps to relay the primary user’s message to the primary receiver.
This relaying of message from the secondary user results in an elevated
value of SNR at the primary receiver. At the same time, the secondary user’s
message with power (1 - 𝛼) Ps, transmitted towards the primary receiver
balances the increase in SNR and as a result the primary device remains
completely oblivious to the presence of the cognitive user. This approach
has been named as selfless approach in [23].
In [8], results corresponding to high interference regime a > 1 have also
been presented as ancillary conclusions. But such results are of not much
significance as they present the scenario of high interference caused by the
secondary user which is an event with low probability of occurrence.
Similar results have also been obtained in [3], [16] and [15]. The results
in [15] correspond to the selfish approach of [23] and thus represent an upper
bound on information rates of cognitive radio but with interference, because
in this case the cognitive user does not spend any of its power in relaying the
message to the primary receiver. Similarly authors in [16] have shown that,
in the Gaussian noise case, their capacity region is explicitly equal to that of
[8] and, numerically, to that of [23]. Very recently [24] has also extended the
results of [8] to the case of Gaussian Multiple Access channel (MAC) with n
cognitive users. [24] has simply used the results for a general MAC channel
i.e., the achievable rate for n-users is the sum of the achievable rates of
individual users. Using this, together with the result of [8], it has determined
the achievable rate region of a Gaussian MAC cognitive channel.
Information Theory of Cognitive Radio System 13

Figure 2. The Gaussian interference channel in its standard form.

INTERFERENCE AVOIDING CHANNEL


This channel model, as devised in [22] and [6], works on the principle of
opportunistic communication i.e., the secondary communication takes
place only when the licensed user is found to be idle and a spectrum hole is
detected. Thus this model conforms to the basic requirement of the cognitive
device not interfering with the licensed users. The secondary sender SS and
receiver RS are assumed to have a circular sensing region with them being
in the center. The secondary transmitter and receiver sensing regions are
circular with each having radius Rr. The distance between them is d. They
are further supposed to be communicating in the presence of primary users
A, B and C as shown in Fig. 3.

Figure 3. The scenario for interference avoiding channel.


The cognitive sender SS can detect a spectral hole when both A and
B are inactive whereas the secondary receiver RS determines this when it
finds both B and C to be not involved in a communication scenario. Since
14 Information and Coding Theory in Computer Science

secondary transmitter and receiver do not have complete knowledge of


primary users activity in each other’s sensing regions, the spectral activity
in their respective regions corresponds to the notion of being distributed.
Similarly, the primary user activity sensed by the secondary transmitter-
receiver pair continues to change with time i.e., different primary users
become active and inactive at different time. Thus, the spectral activity
is also assumed to be dynamic. To incorporate both these features, the
conceptual model of Fig. 3 is reduced to the two switch mathematical model
as shown in Fig. 4.

Figure 4. The two switch model.


The two switches ss, sr are treated as binary random variables with ss, sr
∈ {0, 1}. The value of ss = 1 or sr = 1 means that there are no primary users
in the sensing region of the secondary transmitter or receiver and vice versa
holds if either of these two values is zero. The input X is related to the output
7 via following equation:

(7)
where N is additive white Gaussian noise (AWGN) at the secondary receiver.
sr is either 1 or 0 as mentioned above. So when it is multiplied as done in
(7), it simply shows whether the secondary receiver has detected the primary
device or not.
If ss is only known to the transmitter and sr is only available to the
receiver, the situation corresponds to communication with partial side
information. Determination of capacity with partial side information
involves input distribution maximization that is difficult to solve. This is not
done and instead a tight capacity upper and lower bound is obtained for this
communication channel. A capacity upperbound is determined by assuming
that the receiver knows both ss and sr whereas, the transmitter only knows
ss. The expression of capacity Css* of secondary user for this case is [23]:
Information Theory of Cognitive Radio System 15

 
   1  d2   λπ R 
2
−λ
C =e  2 Rr2  π − arccos    dRr 1 −  log  1 + Pe r

   2 Rr  2   
4R
 r  (8)
where P is the secondary transmitter power constraint. For the capacity
lowerbound, [23] uses the results of [25]. For this a genie g argument is
used. It should be noted that utilization of genie concept represents the
notion that either the sender or receiver is provided with some information
noncausally. To determine the capacity lower bound the genie is supposed
to provide some side information to the receiver.
So if the genie provides some information it must have an associated
entropy rate. The results in [25] suggest that the improvement in capacity
due to this genie information cannot exceed the entropy rate of the genie
information itself. Using this argument and that the genie provides
information to the receiver every T channel uses, it is easy to establish that
the capacity lower bound approaches the upper bound even for very highly
dynamic environments.
It is assumed that the location of primary users in the system follows
Poisson point process with a density of X nodes per unit area. And that the
primary user detection at the secondary transmitter and receiver is perfect.
The capacity expression in (8), as given in [23], is evaluated to be:
 
   1  d2   λπ R 
2
C =e − λ  2 Rr2  π − arccos    dRr 1 −  log  1 + Pe r

   2 Rr  2   
4R
 r  (9)
where, again, P is the secondary power constraint.

COLLOBORATIVE COGNITIVE CHANNEL


In this channel, the cognitive user is modeled as a relay, with no information
of its own, working between the primary transmitter and receiver. Capacity
limits for collaborative communications have been recently explored [26]
that suggest sufficient conditions on the geometry and the signal path loss of
the transmitting entities for which performance close to the genie bound can
be achieved [Natasha’s tutorial].
Consider three nodes, source (s), relay (r) and destination (d) as
illustrated in Fig. 5.
16 Information and Coding Theory in Computer Science

Figure 5. The collaborative communication channel incorporating the source s,


relay r and the destination d node.
The relay node is assumed to work in half duplex mode, meaning that,
it cannot receive and transmit data simultaneously. Thus the system works
in two phases i.e., the listening phase and the collaborative phase. During
the first phase the relay node receives data from the source node for the first
n1 transmissions while for the collaborative phase the relay transmits to the
destination for the remaining n - n1 transmissions, where n are the number of
channel uses, in which the source node wishes to transmit the 2nR messages.
Taking the channels as AWGN, considering X and U as column vectors
representing the transmission from the source and relay node respectively
and denoting by Y and Z the received messages at the relay and destination
respectively, the listening phase is described via following equations:
(10)

(11)
where Nz and Ny represent complex AWGN with variance 1/2, Hs is the
fading matrix between the source and destination nodes and Hr is the fading
matrix between the source and relay nodes. In the collaborating phase:

(12)
where Hc is the channel matrix that contains Hs as a submatrix. It is well
known that a Multiple Input Multiple Output (MIMO) system with Gaussian
codebook and rate R bits/channel can reliably communicate over any channel
with transfer matrix H such that where I denotes
the identity matrix and H† represents the conjugate transpose of H. Before
providing an explicit formula for rates, a little explanation is in order here.
Information Theory of Cognitive Radio System 17

During the first phase, relay listens for an amount of time n1 and since
it knows Hr, it results in nR ≤ n1C(Hr). On the other hand, the destination
node receives information at the rate of C(Hs) bits/channel during the first
phase and at the rate of C(Hc) bits/channel during the second phase. Thus it
may reliably decode the message provided that nR ≤ n1C(Hs)+(n-n1)C(Hc).
Taking limit n^><x> the ratio n1/n tends to a fractionf such that the code of
rate R for the set of channels (Hr,Hc) satisfies:
(13)
(14)
where
Similarly [23] presents a corollary which suggests that if the cognitive
user has no message of its own, it can aid the primary system because it
knows the primary’s message, resulting in an improvement of primary
user’s data rates.
New outer bounds to the individual rates and the conditions under which
these bounds become tight for the symmetric Gaussian cognitive radio (CR)
channel in the low interference gain regime are presented in [27]. The CR
transmitter is assumed to use dirty paper coding while deriving the outer
bounds. The capacity of the CR channel in the low interference scenario is
known when the CR employs “polite” approach by devoting some portion
of its power to transmit primary user’s (PU’s) message. However, this
approach does not guarantee any quality of service for the CR users. Hence,
focus will be on the scenario when the CR goes for the “rude” approach,
does not relay PU’s message and tries to maximize its own rates only. It
is shown that when both CR and the PU operate in low interference gain
regime, then treating interference as additive noise at the PU receiver and
doing dirty paper coding at the CR is nearly optimal.

COMPARSIONS
The channel model presented in the first section uses complex coding
techniques to mitigate channel interference that naturally results in higher
throughput than that of the channel model of the second section. But there is
one serious constraint in the first channel model, the information throughput
of the cognitive user is highly dependent upon the distance between the
primary transmitter and the cognitive transmitter. If this distance is large,
secondary transmitter spends considerable time in obtaining the primary
18 Information and Coding Theory in Computer Science

user’s message. After obtaining and decoding the message, the cognitive
device dirty paper codes it and sends it to its receiver. This message transfer
from the primary to the cognitive transmitter results in lower number of bits
transmitted per second by the cognitive radio and hence results in reduced
data rates.
The capacity of the two switch model is independent of the distance
between the two transmitters as the secondary transmitter refrains from
sending data when it finds the primary user busy. Thus the benefit of
the channel interference knowledge in the first channel model quickly
disappears as the distance between the primary and secondary transmitter
tends to increase. Accurate estimation of the primary system’s message and
the interference channel coefficient needed in the first channel model for
dirty paper coding purposes is itself a problem. Inaccurate estimation of
these parameters will result in a decrease in the rates of the cognitive radio
because the dirty paper code, based on the knowledge of primary user’s
message and channel interference coefficient, will not be able to completely
mitigate the primary user’s interference. At the same time determination
of channel interference coefficient requires a handshaking protocol to be
devised which is a serious overhead and may result in poor performance.
On the other hand, the interference avoiding channel cannot overcome
the hidden terminal problem. This problem naturally arises as the cognitive
user would not be able to detect the presence of distant primary devices.
The degraded signals from the primary users due to multipath fading and
shadowing effects would further aggravate this problem.
This requires the secondary systems to be equipped with extremely
sensitive detectors. But very sensitive detectors have prohibited long sensing
times. Thus a protocol needs to be devised by virtue of which the sensed
information is shared between the cognitive devices.
Finally, the role of CR as a relay in the third channel model restricts the
real usefulness of the concept of dynamic spectrum utilization. Although
limited, yet significant gains in data rates of the existing licensed user system
can be obtained by restricting a CR device to relays of PU’s message only.
Information Theory of Cognitive Radio System 19

REFERENCES
1. Federal Communications Commission Spectrum Policy Task Force,
“Report of the Spectrum Efficiency Working Group,” Technical Report
02-135, no. November, 2002.
2. Shared Spectrum Company, “Comprehensive Spectrum occupancy
measurements over six different locations,”August 2005.
3. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” IEEE Transactions on Information Theory, vol. 52,
no. 5, pp. 1813-1827, May 2006.
4. T. Han and K. Kobayashi, “A new achievable rate region for the
interference channel,” IEEE Transactions on Information Theory, vol.
27, no. 1, pp. 49-60,1981.
5. S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random
parameters,” Problems of Control and Information Theory, vol. 9, no.
1, pp. 19-31, 1980.
6. F. G. Awan and M. F. Hanif, “A Unified View of Information-Theoretic
Aspects of Cognitive Radio,” in Proc. International Conference on
Information Technology: New Generations, pp. 327-331, April 2008.
7. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive
radio - a theoretical perspective,” IEEE Communications Magazine,
vol. 45, no. 5, pp. 73-79,2007.
8. Jovicic and P. Viswanath, “Cognitive radio: An information theoretic
perspective,” 2006 IEEE International Symposium on Information
Theory, July 2006.
9. M. H. M. Costa, “Writing on dirty paper,” IEEE Transactions on
Information Theory, vol. 29, no. 3, pp. 439-441, May 1983.
10. Joseph Mitola, “Cognitive Radio: An Integrated Agent Architecture
for Software Defined Radio,” PhD Dissertation, KTH, Stockholm,
Sweden, December 2000.
11. Paul J. Kolodzy, “Cognitive Radio Fundamentals,” SDR Forum,
Singapore, April 2005.
12. C.T.K.Ng and A. Goldsmith, “Capacity gain from transmitter and
receiver cooperation,” in Proc. IEEE Int. Symp. Inf. Theory, Sept.
2005.
13. N. Devroye, P. Mitran, and V. Tarokh, “Cognitive decomposition of
wireless networks,” in Proceedings of CROWNCOM, Mar. 2006.
20 Information and Coding Theory in Computer Science

14. Carleial, “Interference channels,” IEEE Trans. Inf. Theory, vol. IT-24,
no. 1, pp. 60-70, Jan. 1978.
15. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 1813-
1827, May 2006.
16. W. Wu, S. Vishwanath, and A. Arapostathis, “On the capacity of the
interference channel with degraded message sets,” IEEE Trans. Inf.
Theory, June 2006.
17. H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of
the Gaussian MIMO broadcast channel,” IEEE Trans. Inf. Theory, vol.
52, no. 9, pp. 3936-3964, Sept. 2006.
18. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
networks,” in 2005 IEEE International Symposium on Information
Theory, Sept. 2005.
19. E. C. van der Meulen, “Three-terminal communication channels,”
Adv. Appl. Prob., vol. 3, pp. 120-154, 1971.
20. I.Maric, R. Yates, and G. Kramer, “The strong interference channel with
unidirectional cooperation,” in Information Theory and Applications
ITA Inaugural Workshop, Feb. 2006.
21. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” in 39th Annual Conf. on Information Sciences and
Systems (CISS), Mar. 2005.
22. S. Jafar and S. Srinivasa, “Capacity limits of cognitive radio with
distributed dynamic spectral activity,” in Proc. of ICC, June 2006.
23. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive
radio: A theoretical perspective,” in Fortieth Asilomar Conference on
Signals, Systems and Computers, 2006., Oct. 2006.
24. P. Cheng, G. Yu, Z. Zhang, H.-H. Chen, and P. Qiu, “On the achievable
rate region of gaussian cognitive multiple access channel,” IEEE
Communications Letters, vol. 11, no.5, pp. 384-386, May. 2007.
25. S. A. Jafar, “Capacity with causal and non-causal side information-a
unified view,” IEEE Transactions on Information Theory, vol. 52, no.
12, pp. 5468-5474, Dec. 2006.
26. P. Mitran, H. Ochiai, and V. Tarokh, “Space-time diversity
enhancements using collaborative communication,” IEEE Transactions
on Information Theory, vol. 51, no.6, pp. 2041-2057, June 2005.
Information Theory of Cognitive Radio System 21

27. F. G. Awan, N. M. Sheikh and F. H. Muhammad, “Outer Bounds for


the Symmetric Gaussian Cognitive Radio Channel with DPC Encoded
Cognitive Transmitter,” in Proc. The World Congress on Engineering
2009 (WCE 2009) by the International Association of Engineers
(IAENG), London, UK, July 2009.
Chapter

INFORMATION THEORY
AND ENTROPIES FOR
QUANTIZED OPTICAL
2
WAVES IN COMPLEX
TIME-VARYING MEDIA

Jeong Ryeol Choi


Department of Radiologic Technology, Daegu Health College, Yeongsong-ro 15, Buk-
gu, Daegu 702-722, Republic of Korea

INTRODUCTION
An important physical intuition that led to the Copenhagen interpretation of
quantum mechanics is the Heisenberg uncertainty relation (HUR) which is a
consequence of the noncommutativity between two conjugate observables.
Our ability of observation is intrinsically limited by the HUR, quantifying
an amount of inevitable and uncontrollable disturbance on measurements
(Ozawa, 2004).

Citation: Jeong Ryeol Choi, “Information Theory and Entropies for Quantized Optical
Waves in Complex Time-Varying Media”, Open Systems, Entanglement and Quantum
Optics, 2013, http://dx.doi.org/10.5772/56529.
Copyright: © 2013 The Author(s) and IntechOpen. This chapter is distributed under
the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is
properly cited.
24 Information and Coding Theory in Computer Science

Though the HUR is one of the most fundamental results of the whole
quantum mechanics, some drawbacks concerning its quantitative formulation
are reported. As the expectation value of the commutator between two
arbitrary noncommuting operators, the value of the HUR is not a fixed lower
bound and varies depending on quantum state (Deutsch, 1983). Moreover,
in some cases, the ordinary measure of uncertainty, i.e., the variance of
canonical variables, based on the Heisenberg-type formulation is divergent
(Abe et al., 2002).
These shortcommings are highly nontrivial issues in the context of
information sciences. Thereby, the theory of informational entropy is
proposed as an alternate optimal measure of uncertainty. The adequacy of
the entropic uncertainty relations (EURs) as an uncertainty measure is owing
to the fact that they only regard the probabilities of the different outcomes
of a measurement, whereas the HUR the variances of the measured values
themselves (Werner, 2004). According to Khinchin’s axioms (Ash, 1990) for
the requirements of common information measures, information measures
should be dependent exclusively on a probability distribution (Pennini
& Plastino, 2007). Thank to active research and technological progress
associated with quantum information theory (Nielsen & Chuang, 2000;
Choi et al., 2011), the entropic uncertainty band now became a new concept
in quantum physics.
Information theory proposed by Shannon (Shannon, 1948a; Shannon,
1948b) is important as information-theoretic uncertainty measures in
quantum physics but even in other areas such as signal and/or image
processing. Essential unity of overall statistical information for a system
can be demonstrated from the Shannon information, enabling us to know
how information could be quantified with absolute precision. Another good
measure of uncertainty or randomness is the Fisher information (Fisher,
1925) which appears as the basic ingredient in bounding entropy production.
The Fisher information is a measure of accuracy in statistical theory and
useful to estimate ultimate limits of quantum measurements.
Recently, quantum information theory besides the fundamental quantum
optics has aroused great interest due to its potential applicability in three
sub-regions which are quantum computation, quantum communication,
and quantum cryptography. Information theory has contributed to the
development of the modern quantum computation (Nielsen & Chuang, 2000)
and became a cornerstone in quantum mechanics. A remarkable ability of
Information Theory and Entropies for Quantized Optical Waves in ... 25

quantum computers is that they can carry out certain computational tasks
exponentially faster than classical computers utilizing the entanglement and
superposition principle.
Stimulated by these recent trends, this chapter is devoted to the study of
information theory for optical waves in complex time-varying media with
emphasis on the quantal information measures and informational entropies.
Information theoretic uncertainty relations and the information measures
of Shannon and Fisher will be managed. The EUR of the system will also
be treated, quantifying its physically allowed minimum value using the
invariant operator theory established by Lewis and Riesenfeld (Lewis, 1967;
Lewis & Riesenfeld, 1969). Invariant operator theory is crucial for studying
quantum properties of complicated time-varying systems, since it, in general,
gives exact quantum solutions for a system described by time-dependent
Hamiltonian so far as its counterpart classical solutions are known.

QUANTUM OPTICAL WAVES IN TIME-VARYING


MEDIA
Let us consider optical waves propagating through a linear medium that has
time-dependent electromagnetic parameters. Electromagnetic properties of
the medium are in principle determined by three electromagnetic parameters
such as electric permittivity ϵ, magnetic permeability µ, and conductivity
σ. If one or more parameters among them vary with time, the medium
is designated as a time-varying one. Coulomb gauge will be taken for
convenience under the assumption that the medium have no net charge
distributions. Then the scalar potential vanishes and, consequently, the
vector potential is the only potential needed to consider when we develop
quantum theory of electromagnetic wave phenomena. Regarding this fact,
the quantum properties of optical waves in time-varying media are described
in detail in Refs. (Choi & Yeon, 2005; Choi, 2012; Choi et al, 2012) and they
will be briefly surveyed in this section as a preliminary step for the study of
information theory.
According to separation of variables method, it is favorable to put vector
potential in the form

(1)
Then, considering the fact that the fields and current density obey the
relations, D = ϵ(t)E, B = µ(t)H, and J = σ(t)E, in linear media, we derive
26 Information and Coding Theory in Computer Science

equation of motion for ql from Maxwell equations as (Choi, 2012; Choi,


2010a; Pedrosa & Rosas, 2009)

(2)
Here, the angular frequency (natural frequency) is given by ωl(t) =
c(t)kl where c(t) is the speed of light in media and kl (= |kl|) is the wave
number. Because electromagnetic parameters vary with time, c(t) can be
represented as a time-dependent form, i.e., . However, kl (=
|kl|) is constant since it does not affected by time-variance of the parameters.
The formula of mode function ul (r) depends on the geometrical boundary
condition in media (Choi & Yeon, 2005). For example, it is given by
(ν = 1, 2) for the fields propagating under
the periodic boundary condition, where V is the volume of the space, is
a unit vector in the direction of polarization designated by ν.
From Hamilton’s equations of motion, and
, the classical Hamiltonian that gives Eq. (3) can be easily
established. Then, by converting canonical variables, ql and pl, into quantum
operators, and , from the resultant classical Hamiltonian, we have the
quantum Hamiltonian such that (Choi et al., 2012)

(3)
where is an arbitrary time function, and

(4)

(5)
The complete Hamiltonian is obtained by summing all individual
Hamiltonians:
From now on, let us treat the wave of a particular mode and drop the
under subscript l for convenience. It is well known that quantum problems
of optical waves in nonstationary media are described in terms of classical
solutions of the system. Some researchers use real classical solutions (Choi,
2012; Pedrosa & Rosas, 2009) and others imaginary solutions (Angelow &
Trifonov, 2010; Malkin et al., 1970). In this chapter, real solutions of classical
Information Theory and Entropies for Quantized Optical Waves in ... 27

equation of motion for q will be considered. Since Eq. (2) is a second order
differential equation, there are two linearly independent classical solutions.
Let us denote them as s1(t) and s2(t), respectively. Then, we can define an
Wronskian of the form

(6)
This will be used at later time, considering only the case that Ω > 0 for
convenience.
When we study quantum problem of a system that is described by a
time-dependent Hamiltonian such as Eq. (3), it is very convenient to
introduce an invariant operator of the system. Such idea (invariant operator
method) is firstly devised by Lewis and Riesenfeld (Lewis, 1967; Lewis &
Riesenfeld, 1969) in a time-dependent harmonic oscillator as mentioned in
the introductory part and now became one of potential tools for investigating
quantum characteristics of time-dependent Hamiltonian systems. By solving
the Liouville-von Neumann equation of the form

(7)
we obtain the invariant operator of the system as (Choi, 2004)

(8)
where Ω is chosen to be positive from Eq. (6) and and are annihilation
and creation operators, respectively, that are given by

(9)

(10)
with

(11)
Since the system is somewhat complicate, let us develop our theory with
b(t) = 0 from now on. Then, Eq. (5) just reduces to . Since the
formula of Eq. (8) is very similar to the familiar Hamiltonian of the simple
harmonic oscillator, we can solve its eigenvalue equation via well known
28 Information and Coding Theory in Computer Science

conventional method. The zero-point eigenstate is obtained


from . Once is obtained, nth eigenstates are also
derived by acting on times. Hence we finally have (Choi,
2012)

(12)
where and Hn are Hermite polynomials.
According to the theory of Lewis-Riesenfeld invariant, the wave
functions that satisfy the Schrödinger equation are given in terms of ϕn(q, t):

(13)
where θn(t) are time-dependent phases of the wave functions. By substituting
Eq. (13) with Eq. (3) into the Schrödinger equation, we derive the phases to
be θn(t) = − (n + 1/2) η(t) where (Choi, 2012)

(14)
The probability densities in both q and p spaces are given by the
square of wave functions, i.e., ρn(q) = |ψn(q, t)|2 and ,
respectively. From Eq. (13) and its Fourier component, we see that

(15)

(16)
where

(17)
The wave functions and the probability densities derived here will be
used in subsequent sections in order to develop the information theory of
the system.
Information Theory and Entropies for Quantized Optical Waves in ... 29

INFORMATION MEASURES FOR THERMALIZED


QUANTUM OPTICAL FIELDS
Informations of a physical system can be obtained from the statistical
analysis of results of a measurement performed on it. There are two
important information measures. One is the Shannon information and the
other is the Fisher information. The Shannon information is also called as
the Wehrl entropy in some literatures (Wehrl, 1979; Pennini & Plastino,
2004) and suitable for measuring uncertainties relevant to both quantum
and thermal effects whereas quantum effect is overlooked in the concept
of ordinary entropy. The Fisher information which is also well known in
the field of information theory provides the extreme physical information
through a potential variational principle. To manage these informations, we
start from the establishment of density operator for the electromagnetic field
equilibrated with its environment of temperature T. Density operator of the
system obeys the Liouville-von Neumann equation such that (Choi et al.,
2011)

(18)
Considering the fact that invariant operator given in Eq. (8) is also
established via the Liouville-von Neumann equation, we can easily construct
density operator as a function of the invariant operator. This implies that the
Hamiltonian in the density operator of the simple harmonic oscillator
should be replaced by a function of the invariant operator where
is inserted for the purpose of dimensional
consideration. Thus we have the density operator in the form

(19)
where W = y(0)Ω, Z is a partition function, β = kbT, and kb is Boltzmann’s
constant. If we consider Fock state expression, the above equation can be
expand to be

(20)
while the partition function becomes
30 Information and Coding Theory in Computer Science

(21)
If we consider that the coherent state is the most classical-like quantum
state, a semiclassical distribution function associated with the coherent state
may be useful for the description of information measures. As is well known,
the coherent state is obtained by solving the eigenvalue equation of :

(22)
Now we introduce the semiclassical distribution function related
with the density operator via the equation (Anderson & Halliwell, 1993)

(23)
This is sometimes referred to as the Husimi distribution function (Husimi,
1940) and appears frequently in the study relevant to the Wigner distribution
function for thermalized quantum systems. The Wigner distribution function
is regarded as a qusi-distribution function because some parts of it are not
positive but negative. In spite of the ambiguity in interpreting this negative
value as a distribution function, the Wigner distribution function meets
all requirements of both quantum and statistical mechanics, i.e., it gives
correct expectation values of quantum mechanical observables. In fact,
the Husimi distribution function can also be constructed from the Wigner
distribution function through a mathematical procedure known as “Gaussian
smearing” (Anderson & Halliwell, 1993). Since this smearing washes out
the negative part, the negativity problem is resolved by the Hisimi’s work.
But it is interesting to note that new drawbacks are raised in that case, which
state that the probabilities of different samplings of q and p, relevant to the
Husimi distribution function, cannot be represented by mutually exclusive
states due to the lack of orthogonality of coherent states used for example
in Eq. (23) (Anderson & Halliwell, 1993; Nasiri, 2005). This weak point is
however almost fully negligible in the actual study of the system, allowing
us to use the Husimi distribution function as a powerful means in the realm
of semiclassical statistical physics.
Notice that coherent state can be rewritten in the form

(24)
Information Theory and Entropies for Quantized Optical Waves in ... 31

where is the displacement operator of the form .A


little algebra leads to

(25)
Hence, the coherent state is expanded in terms of Fock state wave
functions. Now using Eqs. (20) and (25), we can evaluate Eq. (23) to be

(26)
Here, we used a well known relation in photon statistics, which is

(27)
As you can see, the Husimi distribution function is strongly related to
the coherent state and it provides necessary concepts for establishment of
both the Shannon and the Fisher informations. If we consider Eqs. (9) and
(22), α (with b(t) = 0) can be written as

(28)
Hence there are innumerable number of α-samples that correspond to
different pair of (q,p), which need to be examined for measurement.
A natural measure of uncertainty in information theory is the Shannon
information as mentioned earlier. The Shannon information is defined as
(Anderson & Halliwell, 1993)

(29)
where d α = dRe(α) dIm(α). With the use of Eq. (26), we easily derive it:
2

(30)
This is independent of time and, in the limiting case of fields propagating
in time-independent media that have no conductivity, W becomes natural
32 Information and Coding Theory in Computer Science

frequency of light, leading this formula to correspond to that of the simple


harmonic oscillator (Pennini & Plastino, 2004). This approaches to IS ≃
ln[kbT/(ħW)] for sufficiently high temperature, yielding the dominance of the
thermal fluctuation and, consequently, permitting the quantum fluctuation to
be neglected. On the other hand, as T decreases toward absolute zero, the
Shannon information is always larger than unity (IS ≥ 1). This condition
is known as the Lieb-Wehrl condition because it is conjectured by Wehrl
(Wehrl, 1979) and proved by Lieb (Lieb, 1978). From this we can see that IS
has a lower bound which is connected with pure quantum effects. Therefore,
while usual entropy is suitable for a measure of uncertainty originated only
from thermal fluctuation, IS plays more universal uncertainty measure
covering both thermal and quantum regimes (Anderson & Halliwell, 1993).
Other potential measures of information are the Fisher informations
which enable us to assess intrinsic accuracy in the statistical estimation
theory. Let us consider a system described by the stochastic variable α =
α(x) with a physical parameter x. When we describe a measurement of α in
order to infer x from the measurement, it is useful to introduce the coherent-
state-related Fisher information that is expressed in the form

(31)
In fact, there are many different scenarios of this information depending
on the choice of x. For a more general definition of the Fisher information,
you can refer to Ref. (Pennini & Plastino, 2004).
If we take and x = β, the Fisher’s information measure
can be written as (Pennini & Plastino, 2004)

(32)
Since β is the parameter to be estimated here, Iβ reflects the change of
according to the variation of temperature. A straightforward calculation
yields

(33)
This is independent of time and of course agree, in the limit of the simple
harmonic oscillator case, to the well-known formula of Pennini and Plastino
(Pennini & Plastino, 2004). Hence, the change of electromagnetic param-
Information Theory and Entropies for Quantized Optical Waves in ... 33

eters with time does not affect to the value of β. IF,β reduces to zero at abso-
lute zero-temperature (T → 0), leading to agreement with the third law of
thermodynamics (Pennini & Plastino, 2007).
Another typical form of the Fisher informations worth to be concerned
is the one obtained with the choice of and x = {q, p}
(Pennini et al, 1998):

(34)
where σqq,α and σpp,α are variances of q and p in the Glauber coherent state,
respectively. Notice that σqq,α and σpp,α are inserted here in order to consider
the weight of two independent terms in Eq. (34). As you can see, this
information is jointly determined by means of canonical variables q and p.
To evaluate this, we need

(35)

(36)
It may be favorable to represent in terms of at this
stage. They are easily obtained form the inverse representation of Eqs. (9)
and (10) to be
(37)

(38)
Thus with the use of these, Eqs. (35) and (36) become

(39)

(40)
A little evaluation after substituting these quantities into Eq. (34) leads
to

(41)
34 Information and Coding Theory in Computer Science

Notice that this varies depending on time. In case that the time dependence
of every electromagnetic parameters vanishes and σ → 0, this reduces to
that of the simple harmonic oscillator limit, where
natural frequency ω is constant, which exactly agrees with the result of
Pennini and Plastino (Pennini & Plastino, 2004).

HUSIMI UNCERTAINTIES AND UNCERTAINTY


RELATIONS
Uncertainty principle is one of intrinsic features of quantum mechanics,
which distinguishes it from classical mechanics. Aside form conventional
procedure to obtain uncertainty relation, it may be instructive to compute a
somewhat different uncertainty relation for optical waves through a complete
mathematical description of the Husimi distribution function. Bearing in
mind this, let us see the uncertainty of canonical variables, associated with
information measures, and their corresponding uncertainty relation. The
definition of uncertainties suitable for this purpose are

(42)
(43)
(44)
where with an arbitrary operator is the expectation value
relevant to the Husimi distribution function and can be evaluated from

(45)
While and for l = 1, the rigorous algebra for higher
orders give

(46)

(47)

(48)
where
Information Theory and Entropies for Quantized Optical Waves in ... 35

(49)
Thus we readily have

(50)

(51)
Like other types of uncertainties in physics, the relationship between
σµ,qq and σµ,pp is rather unique, i.e., if one of them become large the other
become small, and there is nothing whatever one can do about it.
We can represent the uncertainty product σµ and the generalized
uncertainty product Σµ in the form

(52)

(53)
Through the use of Eqs. (50) and (51), we get

(54)
(55)
Notice that σµ varies depending on time, while Σµ does not and
is more simple form. The relationship between σµ and usual thermal
uncertainty relations σ obtained using the method of thermofield
dynamics (Choi, 2010b; Leplae et al., 1974) are given by σµ = r(β)σ where
.

ENTROPIES AND ENTROPIC UNCERTAINTY


RELATIONS
The HUR is employed in many statistical and physical analyses of optical data
measured from experiments. This is a mathematical outcome of the nonlocal
Fourier analysis (Bohr, 1928) and we can simply represent it by multiplying
standard deviations of q and p together. From measurements, simultaneous
prediction of q and p with high precision for both beyond certain limits
levied by quantum mechanics is impossible according to the Heisenberg
uncertainty principle. It is plausible to use the HUR as a measure of the
36 Information and Coding Theory in Computer Science

spread when the curve of distribution involves only a simple hump such as
the case of Gaussian type. However, the HUR is known to be inadequate
when the distribution of the statistical data is somewhat complicated or
reveals two or more humps (Bialynicki-Birula, 1984; Majernik & Richterek,
1997).
For this reason, the EUR is suggested as an alternative to the HUR by
Bialynicki-Birula and Mycielski (Biatynicki-Birula & Mycielski, 1975).
To study the EUR, we start from entropies of q and p associated with the
Shannon’s information theory:

(56)

(57)
By executing some algebra after inserting Eqs. (15) and (16) into the
above equations, we get

(58)

(59)
where E(Hn) are entropies of Hermite polynomials of the form (Dehesa et
al, 2001)

(60)
By adding Eqs. (58) and (59) together,

(61)
we obtain the alternative uncertainty relation, so-called the EUR such that

(62)
Information Theory and Entropies for Quantized Optical Waves in ... 37

The EUR is always larger than or at least equal to a minimum value


known as the BBM (Bialynicki-Birula and Mycielski) inequality: UE ≥ 1 + ln
π ≃ 2.14473 (Haldar & Chakrabarti, 2012). Of course, Eq. (62) also satisfy
this inequality. The BBM inequality tells us a lower bound of the uncertainty
relation and the equality holds for the case of the simple harmonic oscillation
of fields with n = 0.
The EUR with evolution in time, as well as information entropy
itself, is a potential tool to demonstrate the effects of time dependence of
electromagnetic parameters on the evolution of the system and, consequently,
it deserves a special interest. The genera1 form of the EUR can also be
extended to not only other pairs of observables such as photon number
and phase but also more higher dimensional systems even up to infinite
dimensions.

APPLICATION TO A SPECIAL SYSTEM


The application of the theory developed in the previous sections to a
particular system may provide a better understanding of information theory
for the system to us. Let us see the case that and

(63)
where µ0[= µ(0)], h, and ω0 are real constants and |h| ≪ 1. Then, the classical
solutions of Eq. (2) are given by
(64)
(65)
where s0 is a real constant, Ceν and Seν are Mathieu functions of the cosine
and the sine elliptics, respectively, and . Figure 1 is
information measures for this system, plotted as a function of time. Whereas
IS and IF,β do not vary with time, IF,{q,p} oscillates as time goes by.
38 Information and Coding Theory in Computer Science

Figure 1. The time evolution of IF,{q,p}. The values of (k, h) used here are (1, 0.1)
for solid red line, (3, 0.1) for long dashed blue line, and (3, 0.2) for short dashed
green line. Other parameters are taken to be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5,
and s0 = 1.
In case of h → 0, the natural frequency in Eq. (2) become constant and
W → ω. Then, Eqs. (64) and (65) become s1 = s0 cos ωt and s2 = s0 sin ωt,
respectively. We can confirm in this situation that our principal results, Eqs.
(30), (33), (41), (54), and (62) reduce to those of the wave described by the
simple harmonic oscillator as expected.

SUMMARY AND CONCLUSION


Information theories of optical waves traveling through arbitrary time-
varying media are studied on the basis of invariant operator theory. The
time-dependent Hamiltonian that gives classical equation of motion for the
time function q(t) of vector potential is constructed. The quadratic invariant
operator is obtained from the Liouville-von Neumann equation given in Eq.
(7) and it is used as a basic tool for developing information theory of the
system. The eigenstates ϕn(q, t) of the invariant operator are identified using
the annihilation and the creation operators. From these eigenstates, we are
possible to obtain the Schrödinger solutions, i.e., the wave functions ψn(q,
t), since ψn(q, t) is merely given in terms of ϕn(q, t).
Information Theory and Entropies for Quantized Optical Waves in ... 39

Figure 2. The Uncertainty product σµ (thick solid red line) together with σµ,qq
(long dashed blue line) and σµ,pp (short dashed green line). The values of (k, h)
used here are (1, 0.1) for (a) and (3, 0.1) for (b). Other parameters are taken to
be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5, and s0 = 1.

The semiclassical distribution function is the expectation value


of in the coherent state which is the very classical-like quantum state.
From Eq. (30), we see that the Shannon information does not vary with
time. However, Eq. (41) shows that the Fisher information IF,{q,p} varies
depending on time. It is known that the localization of the density is
determined in accordance with the Fisher information (Romera et al, 2005).
For this reason, the Fisher measure is regarded as a local measure while the
Shannon information is a global information measure of the spreading of
density. Local information measures vary depending on various derivatives
of the probability density whereas global information measures follow the
Kinchin’s axiom for information theory (Pennini & Plastino, 2007; Plastino
& Casas, 2011).
40 Information and Coding Theory in Computer Science

Figure 3. The EUR UE (thick solid red line) together with S(ρn) (long dashed
blue line) and (short dashed green line). The values of (k, h) used here are
(1, 0.1) for (a), (3, 0.1) for (b), and (3, 0.2) for (c). Other parameters are taken
to be ϵ0 = 1, µ0 = 1, ħ = 1, ω0 = 5, n = 0, and s0 = 1.
Two kinds of uncertainty products relevant to the Husimi distribution
function are considered: one is the usual uncertainty product σµ and the other
Information Theory and Entropies for Quantized Optical Waves in ... 41

is the more generalized product Σµ defined in Eq. (53). While σµ varies as


time goes by Σµ is constant and both have particular relations with those in
standard thermal state.
Fock state representation of the Shannon entropies in q- and p-spaces
are derived and given in Eqs. (58) and (59), respectively. The EUR which is
an alternative uncertainty relation is obtained by adding these two entropies.
The EUR is more advantageous than the HUR in the context of information
theory. The information theory is not only important in modern technology
of quantum computing, cryptography, and communication, its area is now
extended to a wide range of emerging fields that require rigorous data
analysis like neural systems and human brain. Further developments of
theoretical and physical backgrounds for analyzing statistical data obtained
from a measurement beyond standard formulation are necessary in order to
promote the advance of such relevant sciences and technologies.
42 Information and Coding Theory in Computer Science

REFERENCES
1. Abe, S.; Martinez, S.; Pennini, F. & Plastino, A. (2002). The EUR for
power-law wave packets. Phys. Lett. A, Vol. 295, Nos. 2-3, pp. 74-77.
2. Anderson, A. & Halliwell, J. J. (1993). Information-theoretic measure
of uncertainty due to quantum and thermal fluctuations. Phys. Rev. D,
Vol. 48, No. 6, pp. 2753-2765.
3. Angelow, A. K. & Trifonov, D. A. (2010). Dynamical invariants and
Robertson-Schrödinger correlated states of electromagnetic field in
nonstationary linear media. arXiv:quant-ph/1009.5593v1.
4. Ash, R. B. (1990). Information Theory. Dover Publications, New York,
USA.
5. Bialynicki-Birula, I. (1984). Entropic uncertainty relations in quantum
mechanics. In: L. Accardi and W. von Waldenfels (Editors), Quantum
Probability and Applications, Lecture Notes in Mathematics 1136,
Springer, Berlin.
6. Biatynicki-Birula, I. & Mycielski, J. (1975). Uncertainty relations for
information entropy in wave mechanics. Commun. Math. Phys. Vol.
44, No. 2, pp. 129-132.
7. Bohr, N. (1928). Como Lectures. In: J. Kalckar (Editor), Niels Bohr
Collected Works, Vol. 6, North-Holand Publishing, Amsterdam, 1985.
8. Choi, J. R. (2004). Coherent states of general time-dependent harmonic
oscillator. Pramana-J. Phys., Vol. 62, No. 1, pp. 13-29.
9. Choi, J. R. (2010a). Interpreting quantum states of electromagnetic
field in time-dependent linear media. Phys. Rev. A, Vol. 82, No. 5, pp.
055803(1-4).
10. Choi, J. R. (2010b). Thermofield dynamics for two-dimensional
dissipative mesoscopic circuit coupled to a power source. J. Phys. Soc.
Japan, Vol. 79, No. 4, pp. 044402(1-6).
11. Choi, J, R, (2012). Nonclassical properties of superpositions of coherent
and squeezed states for electromagnetic fields in time-varying media.
In: S. Lyagushyn (Editor), Quantum Optics and Laser Experiments, pp.
25-48, InTech, Rijeka.
12. Choi, J. R.; Kim, M.-S.; Kim, D.; Maamache, M.; Menouar, S. &
Nahm, I. H. (2011). Information theories for time-dependent harmonic
oscillator. Ann. Phys.(N.Y.), Vol. 326, No. 6, pp. 1381-1393.
Information Theory and Entropies for Quantized Optical Waves in ... 43

13. Choi, J. R. & Yeon, K. H. (2005). Quantum properties of light in linear


media with time-dependent parameters by Lewis-Riesenfeld invariant
operator method. Int. J. Mod. Phys. B, Vol. 19, No. 14, pp. 2213-2224.
14. Choi, J. R.; Yeon, K. H.; Nahm, I. H.; Kweon, M. J.; Maamache, M.
& Menouar, S. (2012). Wigner distribution function and nonclassical
properties of schrodinger cat states for electromagnetic fields in time-
varying media. In: N. Mebarki, J. Mimouni, N. Belaloui, and K. Ait
Moussa (Editors), The 8th International Conference on Progress in
Theoretical Physics, AIP Conf. Proc. Vol. 1444, pp. 227-232, American
Institute of Physics, New York.
15. Dehesa, J. S.; Martinez-Finkelshtein, A. & Sanchez-Ruiz, J. (2001).
Quantum information entropies and orthogonal polynomials. J.
Comput. Appl. Math., Vol. 133, Nos. 1-2, pp. 23-46.
16. Deutsch, D. (1983). Uncertainty in quantum measurements. Phys. Rev.
Lett., Vol. 50, No. 9, pp. 631-633.
17. Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge
Philos. Soc., Vol. 22, No. 5, pp. 700-725.
18. Haldar, S. K. & Chakrabarti, B. (2012). Dynamical features of quantum
information entropy of bosonic cloud in the tight trap. arXiv:cond-mat.
quant-gas/1111.6732v5.
19. Husimi, K. (1940). Some formal properties of the density matrix. Proc.
Phys. Math. Soc. Jpn., Vol. 22, No. 4, pp. 264-314.
20. Leplae, L.; Umezawa, H. & Mancini, F. (1974). Derivation and
application of the boson method in superconductivity. Phys. Rep., Vol.
10, No. 4, pp. 151-272.
21. Lewis, H. R. Jr. (1967). Classical and quantum systems with time-
dependent harmonic-oscillator-type Hamiltonians. Phys. Rev. Lett.,
Vol. 18, No. 13, pp. 510-512.
22. Lewis, H. R. Jr. & Riesenfeld W. B. (1969). An exact quantum theory
of the time-dependent harmonic oscillator and of a charged particle in
a time-dependent electromagnetic field. J. Math. Phys., Vol. 10, No. 8,
pp. 1458-1473.
23. Lieb, E. H. (1978). Proof of an entropy conjecture of Wehrl. Commun.
Math. Phys., Vol. 62, No. 1, pp. 35-41.
24. Majernik, V. & Richterek, L. (1997). Entropic uncertainty relations.
Eur. J. Phys., Vol. 18, No. 2, pp. 79-89.
44 Information and Coding Theory in Computer Science

25. Malkin, I. A.; Man’ko, V. I. & Trifonov, D. A. (1970). Coherent states


and transition probabilities in a time-dependent electromagnetic field.
Phys. Rev. D, Vol. 2, No. 8, pp. 1371-1384.
26. Nasiri, S. (2005). Distribution functions in light of the uncertainty
principle. Iranian J. Sci. & Tech. A, Vol. 29, No. A2, pp. 259-265.
27. Nielsen, M. A. & Chuang, I. L. (2000). Quantum Computation and
Quantum Information. Cambridge University Press, Cambridge,
England.
28. Ozawa, M. (2004). Uncertainty relations for noise and disturbance in
generalized quantum measurements. Ann. Phys., Vol. 311, No. 2, pp.
350-416.
29. Pedrosa, I. A. & Rosas, A. (2009). Electromagnetic field quantization
in time-dependent linear media. Phys. Rev. Lett., Vol. 103, No. 1, pp.
010402(1-4).
30. Pennini, F. & Plastino, A. (2004). Heisenberg-Fisher thermal uncertainty
measure. Phys. Rev. E, Vol. 69, No. 5, pp. 057101(1-4).
31. Pennini, F. & Plastino, A. (2007). Localization estimation and global
vs. local information measures. Phys. Lett. A, Vol. 365, No. 4, pp. 263-
267.
32. Pennini, F.; Plastino, A. R. & Plastino, A. (1998) Renyi entropies and
Fisher informations as measures of nonextensivity in a Tsallis setting.
Physica A, Vol. 258, Nos. 3-4, pp. 446-457.
33. Plastino, A & Casas, A. (2011). New microscopic connections of
thermodynamics. In: M. Tadashi (Editor), Thermodynamics, pp. 3-23,
InTech, Rijeka.
34. Romera, E.; Sanchez-Moreno, P. & Dehesa, J. S. (2005). The Fisher
information of single-particle systems with a central potential. Chem.
Phys. Lett., Vol. 414, No. 4-6, pp. 468-472.
35. Shannon, C. E. (1948a). A mathematical theory of communication.
Bell Syst. Tech., Vol. 27, No. 3, pp. 379-423.
36. Shannon, C. E. (1948b). A mathematical theory of communication II.
Bell Syst. Tech., Vol. 27, No. 4, pp. 623-656.
37. Wehrl, A. (1979). On the relation between classical and quantum-
mechanical entropy. Rep. Math. Phys., Vol. 16, No. 3, pp. 353-358.
38. Werner, R. F. (2004). The uncertainty relation for joint measurement of
position and momentum. Quantum Inform. Comput., Vol. 4, Nos. 6-7,
pp. 546-562.
Chapter

SOME INEQUALITIES IN
INFORMATION THEORY
USING TSALLIS ENTROPY
3

Litegebe Wondie and Satish Kumar


Department of Mathematics, College of Natural and Computational Science, University of
Gondar, Gondar, Ethiopia

ABSTRACT
We present a relation between Tsallis’s entropy and generalized Kerridge
inaccuracy which is called generalized Shannon inequality and is well-
known generalization in information theory and then give its application in
coding theory. The objective of the paper is to establish a result on noiseless
coding theorem for the proposed mean code length in terms of generalized
information measure of order ξ.

Citation: Litegebe Wondie, Satish Kumar, “Some Inequalities in Information Theory


Using Tsallis Entropy”, International Journal of Mathematics and Mathematical Scienc-
es, vol. 2018, Article ID 2861612, 4 pages, 2018. https://doi.org/10.1155/2018/2861612.
Copyright: © 2018 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
46 Information and Coding Theory in Computer Science

INTRODUCTION
Throughout the paper N denotes the set of the natural numbers and for n ∈
N we set

(1)
where n = 2, 3, 4, . . . denote the set of all n-components complete and
incomplete discrete probability distributions.
For we define a
nonadditive measure of inaccuracy, denoted by as

(2)
If then reduces to nonadditive entropy.

(3)
Entropy (3) was first of all characterized by Havrda and Charvát [1].
Later on, Daróczy [2] and Behara and Nath [3] studied this entropy. Vajda
[4] also characterized this entropy for finite discrete generalized probability
distributions. Sharma and Mittal [5] generalized this measure which is known
as the entropy of order α and type β. Pereira and Gur Dial [6] and Gur Dial
Some Inequalities in Information Theory Using Tsallis Entropy 47

[7] also studied Sharma and Mittal entropy for a generalization of Shannon
inequality and gave its applications in coding theory. Kumar and Choudhary
[8] also gave its application in coding theory. Recently, Wondie and Kumar
[9] gave a Joint Representation of Renyi’s and Tsallis Entropy. Tsallis [10]
gave its applications in physics for and reduces
to Shannon [11] entropy

(4)
Inequality (6) has been generalized in the case of Renyi’s entropy.

FORMULATION OF THE PROBLEM


For and then an important property of Kerridge’s
inaccuracy [12] is that

(5)
Equality holds if and only if A = B. In other words, Shannon’s entropy is
the minimum value of Kerridge’s inaccuracy. If then (5)
is no longer necessarily true. Also, the corresponding inequality

(6)
is not necessarily true even for generalized probability distributions. Hence,
it is natural to ask the following question: “For generalized probability
distributions, what are the quantity the minimum values of which are
?” We give below an answer to the above question separately for
by dividing the discussion into two parts (i) and (ii)
Also we shall assume that because the problem is trivial for n = 1.
Case 1. Let . If then as remarked earlier (5) is
true. For it can be easily seen by using Jenson’s inequality
that (5) is true if equality in (5) holding if and only if

(7)
48 Information and Coding Theory in Computer Science

Case 2. Let Since (6) is not necessarily true, we need an inequality

(8)
such that and equality holds if and only if
Since by reverse Hölder inequality, that is, if
and are positive real numbers, then

(9)

Let and
Putting these values into (9), we get

(10)
where we used (8), too. This implies however that

(11)
Or

(12)
using (12) and the fact that ξ >1,, we get (6).
Particular’s Case. If ξ =1, then (6) becomes

(13)
which is Kerridge’s inaccuracy [12].
Some Inequalities in Information Theory Using Tsallis Entropy 49

MEAN CODEWORD LENGTH AND ITS BOUNDS


We will now give an application of inequality (6) in coding theory for

(14)
Let a finite set of n input symbols

(15)
be encoded using alphabet of D symbols, then it has been shown by Feinstein
[13] that there is a uniquely decipherable code with lengths
if and only if the Kraft inequality holds; that is,

(16)
where D is the size of code alphabet.
Furthermore, if

(17)
is the average codeword length, then for a code satisfying (16), the inequality

(18)
is also fulfilled and equality holds if and only if

(19)
and by suitable encoded into words of long sequences, the average length
can be made arbitrarily close to H(A), (see Feinstein [13]). This is Shannon’s
noiseless coding theorem.
By considering Renyi’s entropy (see, e.g., [14]), a coding theorem,
analogous to the above noiseless coding theorem, has been established by
Campbell [15] and the authors obtained bounds for it in terms of

(20)
Kieffer [16] defined a class rules and showed that Hξ(A) is the best
decision rule for deciding which of the two sources can be coded with
expected cost of sequences of length N when N →∞, where the cost of
50 Information and Coding Theory in Computer Science

encoding a sequence is assumed to be a function of length only. Further, in


Jelinek [17] it is shown that coding with respect to Campbell’s mean length
is useful in minimizing the problem of buffer overflow which occurs when
the source symbol is produced at a fixed rate and the code words are stored
temporarily in a finite buffer. Concerning Campbell’s mean length the reader
can consult [15].
It may be seen that the mean codeword length (17) had been generalized
parametrically by Campbell [15] and their bounds had been studied in terms
of generalized measures of entropies. Here we give another generalization
of (17) and study its bounds in terms of generalized entropy of order ξ.
Generalized coding theorems by considering different information
measure under the condition of unique decipherability were investigated by
several authors; see, for instance, the papers [6, 13, 18].
An investigation is carried out concerning discrete memoryless sources
possessing an additional parameter ξ which seems to be significant in
problem of storage and transmission (see [9, 16–18]).
In this section we study a coding theorem by considering a new
information measure depending on a parameter. Our motivation is, among
others, that this quantity generalizes some information measures already
existing in the literature such as Arndt [19] entropy, which is used in physics.
Definition 1. Let be arbitrarily fixed, then the
mean length corresponding to the generalized information measure
is given by the formula

(21)
where and are positive
integers so that

(22)
Since (22) reduces to Kraft inequality when ξ = 1, therefore it is called
generalized Kraft inequality and codes obtained under this generalized
inequality are called personal codes.
Some Inequalities in Information Theory Using Tsallis Entropy 51

Theorem 2. Let be arbitrarily fixed. Then there exist code


length so that

(23)
holds under condition (22) and equality holds if and only if

(24)
where and are given by (3) and (21), respectively.
Proof. First of all we shall prove the lower bound of .
By reverse Hölder inequality, that is, if and
are positive real numbers then

(25)

Let and
Putting these values into (25), we get

(26)
where we used (22), too. This implies however that

(27)
52 Information and Coding Theory in Computer Science

For ξ >1, (27) becomes

(28)
using (28) and the fact that ξ >1, we get

(29)
From (24) and after simplification, we get

(30)
This implies

(31)
which gives . Then equality sign holds in (29).
Now we will prove inequality (23) for upper bound of .
We choose the codeword lengths in such a way that

(32)
is fulfilled for all
From the left inequality of (32), we have

(33)

multiplying both sides by and then taking sum over k, we get the
generalized inequality (22). So there exists a generalized code with code
lengths
Some Inequalities in Information Theory Using Tsallis Entropy 53

Since ξ >1, then (32) can be written as

(34)

Multiplying (34) throughout by and then summing up


from k = 1 to n, we obtain inequality

(35)
Since for ξ >1, we get from (35) inequality (23).
Particular’s Cases. For ξ →1, then (23) becomes

(36)
which is the Shannon [11] classical noiseless coding theorem

(37)
where H(A) and L are given by (4) and (17), respectively.

CONCLUSION
In this paper we prove a generalization of Shannon’s inequality for the case
of entropy of order ξ with the help of Hölder inequality. Noiseless coding
theorem is proved. Considering Theorem 2 we remark that the optimal code
length depends on ξ in contrast with the optimal code lengths of Shannon
which do not depend of a parameter. However, it is possible to prove coding
theorem with respect to (3) such that the optimal code lengths are identical
to those of Shannon.
54 Information and Coding Theory in Computer Science

REFERENCES
1. J. Havrda and F. S. Charvát, “Quantification method of classification
processes. Concept of structural α-entropy,” Kybernetika, vol. 3, pp.
30–35, 1967.
2. Z. Daróczy, “Generalized information functions,” Information and
Control, vol. 16, no. 1, pp. 36–51, 1970.
3. M. Behara and P. Nath, “Additive and non-additive entropies of finite
measurable partitions,” in Probability and Information Theory II, pp.
102–138, Springer-Verlag, 1970.
4. I. Vajda, “Axioms for α-entropy of a generalized probability scheme,”
Kybernetika, vol. 4, pp. 105–112, 1968.
5. B. D. Sharma and D. P. Mittal, “New nonadditive measures of entropy
for discrete probability distributions,” Journal of Mathematical
Sciences, vol. 10, pp. 28–40, 1975.
6. R. Pereira and Gur Dial, “Pseudogeneralization of Shannon inequality
for Mittal’s entropy and its application in coding theory,” Kybernetika,
vol. 20, no. 1, pp. 73–77, 1984.
7. Gur Dial, “On a coding theorems connected with entropy of order α
and type β,” Information Sciences, vol. 30, no. 1, pp. 55–65, 1983.
8. S. Kumar and A. Choudhary, “Some coding theorems on generalized
Havrda-Charvat and Tsallis’s entropy,” Tamkang Journal of
Mathematics, vol. 43, no. 3, pp. 437–444, 2012.
9. L. Wondie and S. Kumar, “A joint representation of Renyi’s and
Tsalli’s entropy with application in coding theory,” International
Journal of Mathematics and Mathematical Sciences, vol. 2017, Article
ID 2683293, 5 pages, 2017.
10. C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,”
Journal of Statistical Physics, vol. 52, no. 1-2, pp. 479–487, 1988.
11. C. E. Shannon, “A mathematical theory of communication,” Bell
System Technical Journal, vol. 27, no. 4, pp. 623–656, 1948.
12. D. F. Kerridge, “Inaccuracy and inference,” Journal of the Royal
Statistical Society. Series B (Methodological), vol. 23, pp. 184–194,
1961.
13. A. Feinstein, Foundations of Information Theory, McGraw-Hill, New
York, NY, USA, 1956.
Some Inequalities in Information Theory Using Tsallis Entropy 55

14. A. Rényi, “On measures of entropy and information,” in Proceedings


of the 4th Berkeley Symposium on Mathematical Statistics and
Probability, pp. 547–561, University of California Press, 1961.
15. L. L. Campbell, “A coding theorem and Rényi’s entropy,” Information
and Control, vol. 8, no. 4, pp. 423–429, 1965.
16. J. C. Kieffer, “Variable-length source coding with a cost depending
only on the code word length,” Information and Control, vol. 41, no.
2, pp. 136–146, 1979.
17. F. Jelinek, “Buffer overflow in variable length coding of fixed rate
sources,” IEEE Transactions on Information Theory, vol. 14, no. 3, pp.
490–501, 1968.
18. G. Longo, “A noiseless coding theorem for sources having utilities,”
SIAM Journal on Applied Mathematics, vol. 30, no. 4, pp. 739–748,
1976.
19. C. Arndt, Information Measure-Information and Its Description in
Science and Engineering, Springer, Berlin, Germany, 2001.
Chapter

THE COMPUTATIONAL
THEORY OF INTELLIGENCE:
INFORMATION ENTROPY
4

Daniel Kovach
Kovach Technologies, San Jose, CA, USA

ABSTRACT
This paper presents an information theoretic approach to the concept
of intelligence in the computational sense. We introduce a probabilistic
framework from which computation alintelligence is shown to be an entropy
minimizing process at the local level. Using this new scheme, we develop a
simple data driven clustering example and discuss its applications.

Keywords: Machine Learning, Artificial Intelligence, Entropy, Computer


Science, Intelligence

Citation: Kovach, D. (2014), “The Computational Theory of Intelligence: Information


Entropy”. International Journal of Modern Nonlinear Theory and Application, 3, 182-
190. doi: 10.4236/ijmnta.2014.34020.
Copyright: © 2014 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0.
58 Information and Coding Theory in Computer Science

INTRODUCTION
This paper attempts to introduce a computational approach to the study
of intelligence that the researcher has accumulated for many years. This
approach takes into account data from Psychology, Neurology, Artificial
Intelligence, Machine Learning, and Mathematics.
Central to this framework is the fact that the goal of any intelligent
agent is to reduce the randomness in its environment in some meaningful
ways. Of course, formal definitions in the context of this paper for terms like
“intelligence”, “environment”, and “agent” will follow.
The approach draws from multidisciplinary research and has many
applications. We will utilize the construct in discussions at the end of the
paper. Other applications will follow in future works. Implementations
of this framework can apply to many fields of study including General
Artificial Intelligence (GAI), Machine Learning, Optimization, Information
Gathering, Clustering, and Big Data, and extend outside of the applied
mathematics and computer science realm to even more areas including
Sociology, Psychology, and Neurology, and even Philosophy.

Definitions
One cannot begin a discussion about the philosophy of artificial intelligence
without a definition of the word “intelligence” in the first place. With the
panoply of definitions available, it is understandable that there may be
some disagreement, but typically each school of thought generally shares a
common thread. The following are three different definitions of intelligence
from respectable sources:
• “The aggregate or global capacity of the individual to act
purposefully, to think rationally, and to deal effectively with his
environment.”[1] .
• “A process that entails a set of skills of problem solving enabling
the individual to resolve genuine problems or difficulties that he
or she encounters and, when appropriate, to create an effective
product and must also entail the potential for finding or creating
problems and thereby providing the foundation for the acquisition
of new knowledge.” [2] .
• “Goal-directed adaptive behavior.” [3] .
Vernon’s hierarchical model of intelligence from the 1950’s [1] , and
Hawkins’ On Intelligence from g 2004 [4] are some other great resources
The Computational Theory of Intelligence: Information Entropy 59

on this topic. Consider the following working definition of this paper, with
regard to information theory and computation: Computational Intelligence
(CI) is an information processing algorithm that
• Records data or events into some type of store, or memory.
• Draws from the events recorded in memory, to make assertions,
or predictions about future events.
• Using the disparity between the predicted and events and the new
incoming events, the memory structure in step 1 can be updated
such that the predictions of step 2 are optimized.
The mapping in 3 is called learning, and is endemic to CI. Any entity
that is facilitating the CI process we will refer to as an agent, in particular
when the connotation is that the entity is autonomous. The surrounding
infrastructure that encapsulates the agent together with the ensuing events is
called the environment.

Structure
The paper is organized as follows. In Section 2 we provide a brief summary
of the concept of information entropy as it is used for our purposes. In
Section 3, we provide a mathematical framework for intelligence and show
discuss its relation to entropy. Section 4 discusses the global ramifications
of local entropy minimization. In Section 5 we present a simple application
of the framework to data analytics, which is available for free download.
Sections 6 and 7 discuss relevant related research, and future work.

ENTROPY
A key concept of information theory is that of entropy, which amounts to
the uncertainty in a given random variable, [5] . It is essentially, a measure
of unpredictability (among other interpretations). The concept of entropy
is a much deeper principal of nature that penetrates to the deepest core of
physical reality and is central to physics and cosmological models [6] -[8] .

Mathematical Representation
Although terms like Shannon entropy are pervasive in the field of information
theory, it will be insightful to review the formulation in our context. To arrive
at the definition of entropy, we must first recall what is meant by information
60 Information and Coding Theory in Computer Science

content. The information content of a random variable,


denoted I[X], is given by

(1)
where ℙ[X] is the probability of X. The entropy of X, denoted 𝔼[X], is then
defined as the expectation value of the information content.

(2)
Expanding using the definition of the expectation value, we have

(3)

Relationship of Shannon Entropy to Thermodynamics


The concept of entropy is deeply rooted at the heart of physical reality. It is
a central concept in thermodynamics, governing everything from chemical
reactions to engines and refrigerators. The relationship of entropy as it is
known in information theory, however, is not mapped so straightforwardly
to its use in thermodynamics.
In statistical thermodynamics, the entropy S, of a system is given by

(4)
where pi denote the probability of each microstate, or configuration of the
system, and kb is the Boltzmann constant which serves to map the value of
the summation to physical reality in terms of quantity and units.
The connection between the thermodynamic and information theoretic
versions of entropy relate to the information needed to detail the exact state
of the system, specifically, the amount of further Shannon information
needed to define the microscopic state of the system that remains ambiguous
when given its macroscopic definition in terms of the variables of Classical
Thermodynamics. The Boltzmann constant serves as a constant of
proportionality.
The Computational Theory of Intelligence: Information Entropy 61

Renyi Entropy
We can extend the logic of the beginning of this section to a more general
formulation called the Renyi entropy of order α, where α ≥ 0 and α ≠ 1
defined as

(5)
Under this convention we can apply the concept of entropy more
generally to extend the utility of the concept to a variety of applications. It
is important to note that this formulation approaches 1 in the limit as α →1.
Although the discussions of this paper were inspired by Shannon entropy,
we wish to present a much more general definition and a bolder proposition.

INTELLIGENCE: DEFINITION AND ASSUMPTIONS


: S → O. The function 𝕀 represents the intelligence process, a member of ℐ,
the set of all such functions. It maps input from set S to output in O. First, let

(6)
reflect the fact that 𝕀 is mapping one element from S to one element in O,
each tagged by the identifier i ∈ ℕ, which is bounded by the cardinality of
the input set. The cardinality of these two sets need not match, nor does
the mapping between 𝕀 need to be bijective, or even surjective. This is an
iterative process, as denoted by the index, t. Finally, let Ot represent the
collection of .
Over time, the mapping should converge to the intended element, oi ∈ O,
as is reflected in notation by

(7)
Introduce the function

(8)
which in practice is usually some type of error or fitness measuring
function. Assuming that 𝕃t is continuous and differentiable, let the updating
mechanism at some particular t for 𝕀 be
62 Information and Coding Theory in Computer Science

(9)
In other words, 𝕀 iteratively updates itself with respect to the gradient
of some function, 𝕃. Additionally, 𝕃 must satisfy the following partial
differential equation

(10)
where the function d is some measure of the distance between O and
Ot, assuming such a function exists, and ρ is called the learning rate. A
generalization of this process to abstract topological spaces where such a
distance function is a commodity is an open question.
Finally, for this to qualify as an intelligence process, we must have

(11)

Unsupervised and Supervised Learning


Some consideration should be given to the sets S and O. If O = P(S) where P(S)
is the power set of S, then we will say that the mapping 𝕀 is an unsupervised
mapping. Otherwise, the mapping is supervised. The ramifications of this
distinction are as follows. In supervised learning, the agent is given two
distinct sets and trained to form a mapping between them explicitly. With
unsupervised learning, the agent is tasked with learning subtle relationships
in a single data set or, put more succinctly, to develop the mapping between
S and its power set discussed above [9] [10] .

Overtraining
Further, we should note that just because we have some function : S →
O satisfying the definitions and assumptions of this section does not mean
that this mapping be necessarily meaningful. After all, we could make a
completely arbitrary but consistent mapping via the prescription above, and
although this would satisfy all the definitions and assumptions, it would be
complete memorization on the part of the agent. But this, in fact is exactly the
definition of overtraining a common pitfall in the training stage of machine
learning and about which one must be very diligent to avoid.
The Computational Theory of Intelligence: Information Entropy 63

Entropy Minimization
One final part of the framework remains, and that is to show that entropy is
minimized, as was stated at the beginning of this section. To show that, we
consider 𝕀 as a probabilistic mapping, with

(12)
indicating the probability that 𝕀 maps si ∈ S to some particular oj ∈ O.
From this, we can calculate the entropy in the mapping from S to O, at each
iteration t. If the projection [si] has N possible outcomes, then the Shannon
entropy of each si ∈ S is given by

(13)

The total entropy is simply the sum of 𝔼t[si] over i. Let , then
for the purposes of standardization across varying cardinalities, it may be
insightful to speak of the normalized entropy 𝔼t[S],

(14)

As t → ∞, the mapping from each to its corresponding oj converges,


and we have

(15)
Therefore

(16)
Further, using the definition for Renyi entropy in 5 for each t and i

(17)
To show that the Renyi entropy is also minimized, we can use an identity
involving the p-norm
64 Information and Coding Theory in Computer Science

(18)
and show that the log function is maximized t → ∞ for α > 1, and mini-
mized for . The case α → 0 was shown above with the definition of
Shannon entropy. To continue, note that

(19)
since the summation is taken over all possible states oj ∈ O. But from 15,
we have

(20)
for finite t and thus the log function is minimized only as t → ∞. To show that
the Renyi entropy is also minimized for , we repeat the abovelogic

but note that the with the sign reversal of , the quantity is
minimized as t → ∞.
Finally, we can take a normalized sum over all i to obtain the total Renyi
entropy of . By this definition, then the total entropy is minimized
along with its components.

Entropic Self Organization


In section 3 we talked about the definitions of intelligence via the mapping
: S → O. Here, we seek to apply the entropy minimization concept to P(S)
itself, rather than a mapping. Explicitly, σ ⊂ P(S), where

(21)
and for every s ∈ S, there is aunique s ∈σ such that s ∈ S. That is, every
element of S has one and only one element of σ containing it. The term
entropic self-organization refers to finding the Σ ⊂ P(S) such that ℍα[σ] is
minimized over all σ satisfying 21

(22)
The Computational Theory of Intelligence: Information Entropy 65

GLOBAL EFFECTS
In nature, whenever a system is taken from a state of higher entropy to a
state of lower entropy, there is always some amount of energy involved in
this transition, and an increase in the entropy of the rest of the environment
greater than or equal to that of the entropy loss [6] . In other words, consider
a system S composed of two subsystems, s1 and s2, then

(23)
Now, consider that system in equilibrium at times t1, and t2t1 > t2 and
denote the state at each S1 and S2, respectively. Then due to the second law
of thermodynamics,

. (24)
Therefore

(25)
Now, suppose one of the subsystems, say, s1 decreases in entropy by
some amount, Δs between t1, and t2, i.e. . Then to preserve the
inequality, the entropy of the rest of the system must be such that

. (26)
So the entropy of the rest of the system has to increase by an amount
greater than or equal to the loss of entropy in s1. This will require some
amount of energy, ΔE.
Observe that all we have done thus far is follow the natural consequences
of the Second Law of Thermodynamics with respect to our considerations
with intelligence. While the second law of thermodynamics has been verified
time and again in virtually all areas of physics, few have extended it as a
more general principal in the context of information theory. In fact, we will
conclude this section with a postulate about intelligence:
Computational intelligence is a process that locally minimizes and
globally maximizes Renyi entropy.
66 Information and Coding Theory in Computer Science

It should be stressed that although the above is necessary of intelligence,


it is not sufficient in the justification of an algorithm or process as being
intelligent.

APPLICATIONS
Here, we implement the discussions of this paper to practical examples.
First, we consider a simple example of unsupervised learning; a clustering
algorithm based on Shannon entropy minimization. Next we look at some
simple behavior of an intelligent agent as it acts to maximize global entropy
in its environment.

Clustering by Entropy Minimization


Consider a data set consisting of a number of elements organized into rows.
The experiment that follows, we consider 300 samples, each a vector from R3.
In this simple proof of concept we will group the data into like neighborhoods
by minimizing the entropy across all elements at each respective index in
the data set. This is a data driven example, so essentially we use a genetic
algorithm to perturb the juxtaposition of members of each neighborhood
until the global entropy reaches a minimum (entropic self organization),
while at the same time avoiding trivial cases such as a neighborhood with
only one element.
We leverage the Python framework heavily for this example, which is
freely available for many operating systems at [11] .
Please note that this is a simple prototype, a proof of concept used to
exemplify the material in this paper. It is not optimized for latency, memory
utilization, and it has not been optimized or performance tested against other
algorithms in its comparative class, although dramatic improvements could
be easily achieved by integrating the information content of the elements
into the algorithm. Specifically, we would move elements with high
information content to clusters where that element would otherwise have
low information content. Furthermore, observe that for further efficacy, a
preprocessing layer may be beneficial, especially with topological data set.
Nevertheless, applications of this concept applied to clustering on small and
large scales will be discussed in a future work.
The Computational Theory of Intelligence: Information Entropy 67

We can visualize the progression of the algorithm and the final results,
respectively, in Figure 1. For simplicity, only the first two (non-noise)
dimensions are plotted. The accuracy of the clustering algorithm was 8.3%
error rate in 10,000 iterations, with an average simulation time: 480.1
seconds.
Observe that although there are a few “blemishes” in the final clustering
results, with a proper choice of parameters including the maximum
computational epochs the clustering algorithm will eventually succeed with
100% accuracy.
Also pictured in Figure 2 are the results of the clustering algorithm applied
to a data set containing four additional fields of pseudo-randomly generated
noise, each in the interval [−1,1]. The performance of this trial was worse
than the last in terms of speed, but was had about the same classification
accuracy. The accuracy of the clustering algorithm was 6.0% error rate in
10,000 iterations, with an average simulation time: 1013.1 seconds.

Entropy Maximization
In our next set of examples consider a virtual agent confined to move about
a “terrain”, represented by a three- dimensional surface, given by one of the
two following equations, each of which are plotted visually in Figure 3, and
defined by the following functions, respectively:

(27)
and

(28)
68 Information and Coding Theory in Computer Science

Figure 1. Entropic clustering algorithm results over time.

We will confine x and Y such that


and note that the range of each respective surface is z ∈ [0,1]. The algorithm
proceeds as follows. First, the agent is initialized with a starting position, p0
= (x0, y0). It updates the coordinates of the agent’s position by incrementing
or decrementing by some small value, ε = (εx, εy). As the agent meanders
about the surface, data is collected as to its position on the z-axis.
If we partition the range of each surface into equally spaced intervals,
we can form a histogram H of the agent’s positional information. From this
H we can construct a discrete probability function, ℙH and thus calculate the
Renyi entropy. The agent can then use feedback from the entropy determined
using H to calculate an appropriate ε from which it upates its position, and
the cycle continues. The overall goal is to maximize its entropy, or timeout
after a predetermined number of iterations.
In this particular simulation, the agent is initialized using a “random
walk”, in which is is chosen at random. Next, it is updated using feedback
from the entropy function.
The Computational Theory of Intelligence: Information Entropy 69

From the simple set of rules, we see emergent desire for parsimony with
respect to position on the surface, even in the less probable partitions of
z, as z →1. As our simulation continues to run, so tends ℙH to a uniform
distribution.
The Figure 3 depict a random walk on surface 1 and surface 2
respectively, where the top and bottom right figures show surface traversal
using an entropic search algorithm.

RELATED WORKS
Although there are many approaches to intelligence from the angle of
cognitive science, few have been proposed from the computational side.
However, as of late, some great work in this area is underway.

Figure 2. Entropic clustering algorithm results over time.


70 Information and Coding Theory in Computer Science

Figure 3. Surfaces for hill climbing agent simulation.


Many sources claim to have computational theories of intelligence, but
for the most part these “theories” merely act to describe certain aspects of
intelligence [12] [13] . For example, Meyer in [12] suggests that performance
on multiple tasks is dependent on adaptive executive control, but makes no
claim on the emergence of such characteristics. Others discuss how data is
aggregated. This type of analysis is especially relevant in computer vision
and image recognition [13] .
The efforts in this paper seek to introduce a much broader theory of
emergence of autonomous goal directed behavior. Similar efforts are
currently under way.
Inspired by physics and cosmology, Wissner-Gross asserts autonomous
agents act to maximize the entropy in their environment [14] . Specifically
he proposes a path integral formulation from which he derives a gradient
that can be analogized as a causal force propelling a system along a gradient
of maximum entropy over time. Using this idea, he created a startup called
Entropica [15] that applies this principal in ingenious ways in a variety of
different applications, ranging from anything to teaching a robot to walk
upright, to maximizing profit potential in the stock market.
Essentially, what Wissner-Gross did was start with a global principal
and worked backwards. What we did in this paper was to arrive at a similar
result from a different perspective, namely entropy minimization.

CONCLUSIONS
The purpose of this paper was to lay the groundwork for a generalization of
the concept of intelligence in the computational sense. We discussed how
entropy minimization can be utilized to facilitate the intelligence process,
and how the disparities between the agent’s prediction and the reality of
The Computational Theory of Intelligence: Information Entropy 71

the training set can be used to optimize the agent’s performance. We also
showed how such a concept could be used to produce a meaningful, albeit
simplified, practical demonstration.
Some future work includes applying the principals of this paper to data
analysis, specifically in the presence of noise or sparse data. We will discuss
some of these applications in the next paper.
More future work includes discussing the underlying principals under
which data can be collected hierarchically, discussing how computational
processes can implement the discussions in this paper to evolve and work
together to form processes of greater complexity, and discussing the
relevance of these contributions to abstract concepts like consciousness and
self awareness.
In the following paper we will examine how information can aggregate
together to form more complicated structures, the roles these structures can
play.
Daniel Kovach More concepts, examples, and applications will follow
in future works.
72 Information and Coding Theory in Computer Science

REFERENCES
1. Wechsler, D. and Matarazzo, J.D. (1972) Wechsler’s Measurement and
Appraisal of Adult Intelligence. Oxford UP, New York.
2. Gardner, H. (1993) Frames of the Mind: The Theory of Multiple
Intelligences. Basic, New York.
3. Sternberg, R.J. (1982) Handbook of Human Intelligence. Cambridge
UP, Cambridge Cambridgeshire.
4. Hawkins, J. and Sandra, B. (2004) On Intelligence. Times, New York.
5. Ihara, S. (1993) Information Theory for Continuous Systems. World
Scientific, Singapore.
6. Schroeder, D.V. (2000) An Introduction to Thermal Physics. Addison
Wesley, San Francisco.
7. Penrose, R. (2011) Cycles of Time: An Extraordinary New View of the
Universe. Alfred A. Knopf, New York.
8. Hawking, S.W. (1998) A Brief History of Time. Bantam, New York.
9. Jones, M.T. (2008) Artificial Intelligence: A Systems Approach. Infinity
Science, Hingham.
10. Russell, S.J. and Peter, No. (2003) Artificial Intelligence: A Modern
Approach. Prentice Hall/Pearson Education, Upper Saddle River.
11. (2013) Download Python. N.p., n.d. Web. 17 August 2013. http://www.
python.org/getit
12. Marr, D. and Poggio, T. (1979) A Computational Theory of Human
Stereo Vision. Proceedings of the Royal Society B: Biological Sciences,
204, 301-328. http://dx.doi.org/10.1098/rspb.1979.0029
13. Meyer, D.E. and Kieras, D.E. (1997) A Computational Theory of
Executive Cognitive Processes and Multiple-Task Performance: Part
I. Basic Mechanisms. Psychological Review, 104, 3-65. http://dx.doi.
org/10.1098/rspb.1979.0029
14. Wissner-Gross, A. and Freer, C. (2013) Causal Entropic Forces.
Physical Review Letters, 110, Article ID: 168702. http://dx.doi.
org/10.1103/PhysRevLett.110.168702
15. (2013) Entropica. N.p., n.d. Web. 17 August 2013. http://www.
entropica.com
SECTION 2: BLOCK AND STREAM CODING
Chapter

BLOCK-SPLIT ARRAY
CODING ALGORITHM FOR
LONG-STREAM DATA
5
COMPRESSION

Qin Jiancheng,1 Lu Yiqin,1 and Zhong Yu2,3


School of Electronic and Information Engineering, South China University of Technology,
1

Guangdong, China
Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China
2

School of Software, South China University of Technology, Guangdong, China


3

ABSTRACT
With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in
IoT (Internet of Things) may generate massive data, which will challenge
the limited sensor storage and network bandwidth. Hence, the study of
big data compression is valuable in the field of sensors. A problem is how
to compress the long-stream data efficiently with the finite memory of a

Citation: Qin Jiancheng, Lu Yiqin, Zhong Yu, “Block-Split Array Coding Algorithm for
Long-Stream Data Compression”, Journal of Sensors, vol. 2020, Article ID 5726527,
22 pages, 2020. https://doi.org/10.1155/2020/5726527.
Copyright: © 2020 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
76 Information and Coding Theory in Computer Science

sensor. To maintain the performance, traditional techniques of compression


have to treat the data streams on a small and incompetent scale, which will
reduce the compression ratio. To solve this problem, this paper proposes a
block-split coding algorithm named “CZ-Array algorithm,” and implements
it in the shareware named “ComZip.” CZ-Array can use a relatively
small data window to cover a configurable large scale, which benefits the
compression ratio. It is fast with the time complexity O(N) and fits the big
data compression. The experiment results indicate that ComZip with CZ-
Array can obtain a better compression ratio than gzip, lz4, bzip2, and p7zip
in the multiple stream data compression, and it also has a competent speed
among these general data compression software. Besides, CZ-Array is
concise and fits the hardware parallel implementation of sensors.

INTRODUCTION
With the advent of IR (Industrial Revolution) 4.0 and the following rapid
expanding of IoT (Internet of Things), lots of sensors are available in
various fields, which will generate massive data. The soul of IR 4.0 and IoT
with intelligent decision and control relies on these valuable data. But the
spreading sensors’ data also bring problems to the smart systems, especially
in the WSN (wireless sensor network) with precious bandwidth. Due to the
limited storage capacity and network bandwidth, GBs or TBs of data in IoT
make an enormous challenge to the sensors.
Data compression is a desirable way to reduce storage usage and speed
up network transportation. In practice, stream data are widely used to support
the large data volume which exceeds the maximum storage of a sensor. For
example, a sensor with an HD (high-definition) camera can deal with its
video data as a long stream, despite its small memory. And in most cases,
a lot of sensors in the same zone may generate similar data, which can be
gathered and compressed as a stream, and then transmitted to the back-end
cloud platform.
This paper focuses on the sensors’ compression which has strict demands
about low computing consumption, fast encoding, and energy saving. These
special demands exclude most of the unfit compression algorithms. And we
pay attention to the lossless compression because it is general. Even a lossy
compression system usually contains an entropy encoder as the terminal
unit, which depends on the lossless compression. For example, in the image
Block-Split Array Coding Algorithm for Long-Stream Data Compression 77

compression, DCT (discrete cosine transform) algorithm [1] needs a lossless


compressor. Some lossy compression algorithms can avoid the entropy
encoder, such as SVD (singular value decomposition) algorithm [2], but
they often consume more computation resources and energy than a lossless
compressor.
A problem is about the finite memory of each sensor under the long-
stream data. The sensors have to compress GBs of data or more, while a
sensor has only MBs of RAM (random access memory) or less. In most of
the traditional compression algorithms, the compression ratio depends on
the size of the data window, which is limited by the capacity of the RAM.
To maintain the performance, traditional techniques have to treat the data
streams on a small and incompetent scale, which will reduce the compression
ratio.
For example, the 2 MB data window cannot see the stream data out
of 2 MB at a time; thus, the compression algorithm cannot merge the data
inside and outside this window, even if they are similar and compressible.
The window scale restricts the compression ratio. Unfortunately, due to the
limited hardware performance and energy consumption of the sensors, it is
difficult to enlarge the data window for the long-stream data compression.
Moreover, multiple data streams are common in IoT, such as the dual-
camera video data, and these streams may have redundant data that ought to
be compressed. But since the small data window can see only a little part of
a stream, how can it see more streams so that they can be merged?
In our previous papers, we have designed and upgraded a compression
format named “CZ format” which can support the data window up to 1 TB
(or larger) [3, 4] and implemented it in our compression software named
“ComZip.” But the sensor’s RAM still limits the data window size. And
using flash memory to extend the data window is not good, because the
compression speed will fall evidently.
To solve the problems of long-stream data compression with limited
data window size, this paper proposes a block-split coding algorithm named
“CZ-Array algorithm” and implements it in ComZip. CZ-Array algorithm
has the following features:
• It splits the data stream into blocks and rebuilds them with the
time complexity O(N) so that the data window can cover a larger
scale to fit the big data compression
78 Information and Coding Theory in Computer Science

• It builds a set of matching links to speed up the LZ77 compression


algorithm [5], depressing the time complexity from O(N2) into
O(N)
• It is concise for the hardware design in the LZ77 encoder pipeline,
so that the sensors may use parallel hardware to accelerate the LZ
compression
We do some experiments on both platforms x86/64 and ARM (Advanced
RISC Machines), to compare the efficiencies of data compression among
ComZip with CZ-Array, gzip, lz4, bzip2, and p7zip. The experiment results
indicate that ComZip with CZ-Array can obtain the best compression ratio
among these general data compression software in the multiple stream data
compression, and it also has competent speed. Besides, the algorithm analysis
infers CZ-Array is concise and fits the hardware parallel implementation of
sensors.
To make further experiments, we provide 2 versions of ComZip on the
website: for Ubuntu Linux (x86/64 platform) and Raspbian (ARM platform).
Other researchers may download them from http://www.28x28.com:81/doc/
cz_array.html.
The remainder of this paper is structured as follows:
Section 2 expresses the problems of long-stream data compression for
sensors. Section 3 introduces the algorithm of CZ-Array coding. Section
4 analyzes the complexities of the CZ-Array algorithm. The experiment
results are given in Section 5. The conclusions are given in Section 6.

PROBLEMS OF LONG-STREAM DATA


COMPRESSION FOR SENSORS
Video/audio or other special sensors in IoT can generate long-stream data,
but the bottlenecks of data transportation, storage, and computation in
the networks of sensors need to be eliminated. Data compression meets
this requirement. Figure 1 shows a typical scene in a network with both
lightweight and heavy sensors, where long-stream data are generated.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 79

Figure 1. Network of lightweight and heavy sensors.


This network has lots of lightweight nodes to sense the situation and
generate long-stream data. Since they have limited energy, storing capacity,
and computing resources, they can neither keep the data locally nor make
strong compression. Meanwhile, a few heavy nodes in the network can
gather and compress the data to reduce the long-distance bandwidth, and
then transport them to the back-end cloud platform. The cloud platform has
plenty of resources to store, decompress, and analyze the data.
In our previous papers, we have discussed the big data compression and
encryption in the heavy nodes among a network of sensors [4, 6], but if
the heavy nodes compress long-stream data, we still have problems with
the finite memory, energy consumption, speed, and compression ratio. This
paper focuses on the following problems:
• Limited by the data window size; how to cover a larger scale to
see more data streams for a better compression ratio?
• Keeping the compression ratio roughly unchanged; how to
improve the speed of the compression algorithm?
In our previous [3] and [4], we have shown that a larger data window can
gain a better compression ratio. In this paper, we still use the same definition
of the compression ratio as follows:
80 Information and Coding Theory in Computer Science

(1)
Dzip and D are the volumes of the compressed and original data,
respectively. If the original data are not compressed, R = 0. If the compressed
data are larger than the original data, R < 0. Always R < 1.
The compression algorithms can merge similar data fragments within a
data window. Thus, a larger data window can see more data and have more
merging opportunities. But the data window size is limited by the capacity
of RAM. Although in Figure 1 the cloud platforms have plenty of RAM for
large data windows, a typical heavy-node sensor has only MBs of RAM at
present. Using the VM (virtual memory) to enlarge the data window is not
good enough, because the flash memory is much slower than RAM.
Moreover, a heavy node may gather data streams from different
lightweight nodes, and the streams may have similar data fragments. But
how can a data window see more than one stream to get more merging
opportunities?
For the second problem, the compression speed is important for
the sensors, especially the heavy nodes. GBs of stream data have to be
compressed in time, while the sensors’ computing resources are finite.
Cutting down the data window size to accelerate the computations is not
a good way, because the compression ratio will sink evidently, which runs
into the first problem again.
To solve the problems, we need to review the main related works around
sensors and stream data compression.
In [3] and [4], we have discussed current mathematic models and
methods of lossless compression can be divided into 3 classes:
• The compression based on probabilities and statistics: typical
algorithms in this class are Huffman and arithmetic coding [7].
The data window size can hardly influence the speed of such a
class of compression
• To maintain the statistic data for compression, the time complexity
of Huffman coding is O(lbM) and that of traditional arithmetic
coding is O(M). M is the amount of coding symbols, such as 256
characters and the index code-words in some LZ algorithms.
O(M) is not fast enough, but current arithmetic coding algorithms
Block-Split Array Coding Algorithm for Long-Stream Data Compression 81

have been optimized and reached O(lbM), including that in


ComZip [3].
• The compression based on the dictionary indexes: typical
algorithms in this class are the LZ series [5], such as LZ77/LZ78/
LZSS
• To achieve the string matching, the time complexity of
traditional LZ77, LZ78, and LZSS coding are O(N2), O(NlbN),
and O(NlbN).N is the data window size. While in ComZip, we
optimize LZ77 and reach O(N) by the CZ-Array algorithm. This
paper focuses on it.
• The compression based on the order and repeat of the symbols:
typical algorithms in this class are BWT (Burrows-Wheeler
transform) [8], MTF (move-to-front) [9], and RLE (run-length
encoding).
The time complexity of traditional BWT coding is O(N2lbN), which is
too slow. But current BWT algorithms have been optimized and reached
O(N), including the CZ-BWT algorithm in ComZip [6].
Moreover, we use a new method to split the data stream into blocks
and rebuild them with the time complexity O(N). This is neither BWT
nor MTF/RLE, but it is another way to improve the compression ratio by
accommodating the data window scale. This paper focuses on it as a part of
the CZ-Array algorithm.
MTF and RLE are succinct and easy to be implemented, but their
compression ratios are uncompetitive in the current compression field.
To achieve better compression ratio and performance, current popular
compression software commonly combine different compression models
and methods. Table 1 lists the features of popular compression software and
ComZip.
82 Information and Coding Theory in Computer Science

Table 1. Features of compression software.

Software Format Basic algorithms Maximum data Shortages


window
Support Current
WinZip Deflate LZSS & Huffman 512 KB 512 KB Small data window; low
(LZSS) compression ratio; weak
big data support
WinRAR RAR LZSS & Huffman 4 MB 4 MB Small data window; low
(LZSS) compression ratio; weak
big data support
PPMd PPM — — Good compression ratio
for text data only; weak
big data support
gzip Deflate LZ77 & Huffman 32 KB 32 KB Small data window; low
(LZ77) compression ratio; weak
big data support
lz4 lz4 LZ77 64 KB 64 KB Small data window; low
(LZ77) compression ratio; lim-
ited big data support
bzip2 bz2 BWT & Huffman 900 KB 900 KB Small BWT block; low
(BWT) compression ratio; weak
big data support
7-zip/ LZMA LZSS & arithmetic 4 GB 1.5 GB Separated data windows
p7zip (LZSS) for multithreads; limited
big data support
ComZip cz BWT & LZ77 & 1 TB or 512 GB Need larger data
arithmetic more window for higher
(LZ77) compression ratio

To focus on the balance of compression ratio and speed, this paper


ignores some existing methods, such as MTF, RLE, PPMd [10], and PAQ,
which have too low compression ratio or speed.
In this paper, the term “data window” refers to different data structures in
each compression algorithm. If software combines multiple data windows,
we take its bottleneck window as the focus. Table 2 shows the data windows
in typical compression algorithms.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 83

Table 2. Data windows in different algorithms.

Algorithm Data window Example of window size


LZ77 Sliding window 32 KB (gzip)
LZSS Sliding window binary tree 1GB (p7zip)
LZ78/LZW Dictionary trie 4 KB (GIF)
BWT BWT block 900 KB (bzip2)
Huffman Statistic table 64 KB (order-1 context)
Arithmetic 16 MB (order-2 context)
PPMd

To enlarge the data window and improve performance, researchers try


various ways. Each compression class has a different algorithm optimization.
As mentioned above, ComZip also keeps up with the optimal algorithms of
the 3 classes, but each time we cannot focus on too many points. In [6], we
have compared the time complexities of traditional BWT, BWT with SA-
IS [11], BWT with GSACA [12], and CZ-BWT, which are current BWT
studies.
The study of compression with AI (artificial intelligence) is a hopeful
direction. Current researches mostly surround some special fields, such as
image compression by neural networks [13–15]. Yet, it is still a problem for
AI to achieve the general-field lossless compression efficiently, especially
in the sensors with limited computing resources. This paper discusses the
general compression, and some algorithms are more practical than AI. We
may continue the study of AI algorithms as our future work.
The performance of compression is important for sensors, so the studies
of parallel computing and hardware acceleration such as ASIC (application-
specific integrated circuit) and FPGA (field-programmable gate array) [16–
18] are valuable. A hotspot of researches is the GPU (graphics processing
unit) acceleration [19–21]. But as we mentioned in [4], the problem is the
parallel threads split the data window into smaller slices and then reduce the
compression ratio. Exactly, this paper cares about the data window size and
scale.
In [6], we have considered the concision of CZ-BWT for hardware
design, and this paper also considered the hardware implementation of CZ-
Array. To enlarge the data window scale, CZ-Array follows the proposition
84 Information and Coding Theory in Computer Science

of RAID (redundant arrays of independent disks) [22], which was previously


used for the storage capacity, the performance, and the fault tolerance. To
improve the compression speed, CZ-Array draws the reference from lz4
[17], one of the current fastest LZ algorithms based on LZ77. Both RAID
and lz4 are concise, but RAID was not used for the compression before, and
lz4 has small data windows and a low compression ratio.
We are arranging our work in data coding research. Table 3 shows
the relationship within our work. We use the same platform: the ComZip
software system. ComZip is a complex platform with unrevealed parts for
the research of various coding problems, and it is still developing. To make
the paper description clear, each time we can only focus on a few points. So
each paper shows different details, and we call them Focus A, B, C, and D.

Table 3. Relationship of data coding research on ComZip.

Paper Similar base Different focus


[3] ComZip platform (shown A. Compression format for multi-level coding
framework) (including its framework)
[4] ComZip platform (shown B. Combined compression & encryption coding
parallel pipeline) (including its parallel pipeline)
[6] ComZip platform (shown C. Fast BWT coding (including its matching link
BWT filter) builder); hardware design
This ComZip platform (shown D. Array coding (including block array builder &
paper CZ-Array units) matching link builder); hardware design (compar-
ing the similar structure with [6])

Figure 2 shows these focuses within the platform ComZip. We can see
that the figures in each paper have some different appearances of ComZip
because different problems are considered. Typically, this paper focuses on
array coding, so the green arrows in Figure 2 points to the ComZip details
which are invisible in the figures of [3, 4, 6]. Besides, this paper compares
the similar structure in Focus C and D: the Matching Link Builder (MLB).
Despite these MLBs in CZ-BWT [6] and CZ-Array have different functions,
we cover them with a similar structure, which can simplify the hardware
design and save the cost.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 85

Figure 2. Focuses on research of ComZip.

CZ-ARRAY CODING

Concepts of CZ-Array
The compression software ComZip uses the parallel pipeline named “CZ
pipeline.” We have introduced the framework of the CZ encoding pipeline
in [4, 6], and the reverse framework is the CZ decoding pipeline. Figure 3 is
the same encoding framework, and the difference is that CZ-Array units are
embedded, including
86 Information and Coding Theory in Computer Science

• The BAB (block array builder), which can mix multiple data
streams into a united array stream and enlarge the data window
scale
• The MLB (matching link builder), which can improve the speed
of LZ77 and BWT encoding

Figure 3. The framework of CZ encoding pipeline with CZ-Array.


CZ-Array combines the following methods to cover a larger data window
scale and speed up the compression:
• CZ-Array uses the BAB to mix multiple data streams into a united
array stream so that the multiple streams can be seen in the same
data window
Block-Split Array Coding Algorithm for Long-Stream Data Compression 87

As shown in Figure 4, no matter how long a data stream is, the


compression unit can only see the part in the data window. A traditional
compression algorithm in Figure 4(a) has to treat the streams serially, which
means the streams in the queue are out of the window scale and invisible. A
parallel compression algorithm in Figure 4(b) can treat multiple streams to
accelerate, but the window has to be divided into parts, and different parts
cannot see each other; thus, the window scale is shrunk.
The window scale in Figure 4(c) is not shrunk while CZ-Array treats
multiple streams in the same window. This case implies a better compression
ratio because similar parts of different streams may be seen and matched by
chance. Such a chance will not appear in Figure 4(b).
Mixing multiple streams into a united stream by BAB is fast. We got
the hint from RAID [22] and implemented the BAB in the field of lossless
compression. In BAB, streams are split into blocks, and the blocks are
rearranged in a different sequence and then transmitted into the data window
as a new stream.
• CZ-Array uses the MLB to make a BWT or LZ77 encoding
pipeline so that the encoding speed can be optimized
The MLB can create matching links before BWT or LZ77 encoding.
As shown in Figure 5, the matching links are helpful for the fast location
without byte-searching in the data stream. Figure 5(a) shows the basic
matching links, which are used in CZ-BWT. In [6], we call them “bucket
sorting links.” Figure 5(b) shows the endless matching links, which are used
in LZ77 shown in Figure 5(c).
From Figure 5(c), we can see that the reversed string “ZYXCBA…”
is the current target to be encoded. k is the basic match length. The LZ77
encoder may follow the matching link to find the maximum match length
and its location, which is much faster than simply searching the data window.
In the rest of this section, we will introduce the algorithms of making and
following the matching links.
• Both BAB and MLB are concise for the hardware design so
that the sensors can gain better compression performance by the
acceleration of FPGA/ASIC
CZ-Array algorithm is made up of several subalgorithms, such as the
block array building algorithm and the matching link building algorithm.
These 2 algorithms are concise and fit for the hardware design. We will
provide their primary hardware design in Section 4.
88 Information and Coding Theory in Computer Science

Figure 4. Data window scale for compression.

Figure 5. Matching link for compression (k = 3).

BAB Coding
The BAB benefits from the hint of RAID, although they are different in
detail. Figure 6 shows the simplest scenes of RAID-0 and BAB coding. In
Block-Split Array Coding Algorithm for Long-Stream Data Compression 89

Figure 6(a), we suppose the serial file is written to the empty RAID-0. This
file data stream is split into blocks and then written into different disks in
the RAID. In Figure 6(b), the blocks are read from these disks and then re-
arranged into a serial file.

Figure 6. RAID-0 and BAB coding (n = 4).


In Figure 6(c), we assume all data streams are endless or have the same
length. They are split into blocks, organized as a block array, and then
arranged into a united array stream in the BAB. In Figure 6(d), we see the
reversed process: a united array stream is restored into multiple data streams.
As RAID-5 can use n + 1 blocks (n data blocks and 1 parity-check block)
to obtain redundant reliability, the block array can also use m + 1 blocks (m
data blocks and 1 error-correction-code block) for the information security.
m and n are different. For example, m = 1000 and n = 4. If the compressed
data stream is changed, the error-correction-code block will be found unfit
90 Information and Coding Theory in Computer Science

for the data blocks. But this paper just cares about the compression ratio and
speed, so we focus on the simplest block array, like RAID-0.
In Figure 6, the RAID and BAB coding algorithms can easily determine
the proper order of data blocks because the cases are simple and the block
sizes are the same. But in practice, the data streams are more complex, such
as the multiple streams with different lengths, the sudden ending stream,
and the newly starting stream. Thus, the BAB coding algorithm has to treat
these situations.
In the BAB encoding process, we define n as the row amount of the data
block array; B as the size of a block; A as the amount of original data streams
(to be encoded); Stream[i] as the data stream with ID i (i = 0, 1, ⋯, A − 1);
Stream[i].length is the length of Stream[i] (i = 0, 1, ⋯, A − 1).
Figure 6 shows the simplest situation n = A. But in practice, the data
window size is limited, so this n cannot be too large. Otherwise, the data in the
window may be overdispersed and decrease the compression ratio. Hence, n
< A is usual, and each stream (to be encoded/decoded) has 2 status: active and
inactive. A stream under the BAB coding (especially in the block array rows)
is active, while a stream in the queue (outside of the rows) is inactive.
As shown in Figure 7, we divide the situations into 2 cases to simplify
the algorithm design:
• Case 1: Stream[i].length and A are already known before encoding.
For example, ComZip generates an XML (Extensible Markup
Language) document at the beginning of the compression. Each
file is regarded as a data stream, and the XML document stores
the files’ catalog
As shown in Figure 7(a), this case indicates all the streams are ready and
none is absent during the encoding process. So the algorithm can use a fixed
amount n to encode the streams. When an active stream ends, a new inactive
stream in the queue will connect this end. It is a seamless connection in the
same block. And in the decompression process, streams can be separated by
Stream[i].length.
• Case 2: Stream[i].length and A are unknown before encoding.
For example, the sensors send video streams occasionally. If a
sensor stops the current stream and then continues, we regard the
continued stream as a new one
As shown in Figure 7(b), this case indicates the real-time stream amount
A is dynamic, and each length is unknown until Stream[i] ends. So the
Block-Split Array Coding Algorithm for Long-Stream Data Compression 91

algorithm has a maximum amount Nx and maintains the dynamic amount


n. When an active stream ends, n − 1. When n < Nx, the algorithm will try
to get a new inactive stream in the queue to add to the empty array row. If it
succeeds, n + 1.

Figure 7. BAB encoding algorithm design (n = 4).


The different cases in Figure 7 lead to different block structures. A
block in Figure 7(a) contains pure data, while a block in Figure 7(b) has the
structure shown in Table 4, which has a block header before the payload data.
We design this block header to provide information for the BAB decoding
algorithm so that it can separate the streams properly.

Table 4. Block structure for Case 2.

Type Content Length


Block header Structure version 2B
Payload length (B) 3B
Stream ending tag 1b
Reserved tags 7b
Payload Payload data Fixed length (e.g., 1 MB)
92 Information and Coding Theory in Computer Science

To focus on the accurate problems around compression ratio and speed,


this paper skips the implement for Case 2 and discusses the algorithms for
Case 1 only. Algorithms 1 and 2 show the BAB encoding and decoding for
case 1.

Algorithm 1. BAB Encoding for Case 1.

Algorithm 2. BAB Decoding for Case 1.


Block-Split Array Coding Algorithm for Long-Stream Data Compression 93

To make the algorithm description clear, the implementing details


are omitted. For example, since each stream length are known in Case 1,
Algorithms 1 and 2 simply take the length control for granted: If the current
stream ends, the input/output procedure will get the actual data length only,
even if the fixed block length B is wanted.

MLB and LZ77 Encoding


The MLBs can accelerate BWT and LZ77 encoding. We have introduced
CZ-BWT encoding in [6], which actually includes the MLB algorithm in
its phase 1. This paper just focuses on the other MLB algorithm and LZ77
encoding with matching links.
As shown in Figure 5(b), the MLB can use a bucket array to make
endless matching links, such as the “ACO” link and “XYZ” link, and then
provide them to the LZ77 encoder. Figure 5(c) shows the example of string
matching in the LZ77 encoder. Following the “XYZ” link, we can see 2
matching points: 4 bytes matching and 5 bytes matching.
More details about these algorithms are shown in Figures 8(a) and 8(b).
There are 2 phases in our LZ77 encoding: (Figure 8(a)) building the matching
links in the MLB and (Figure 8(b)) each time following a matching link
to find the maximum matching length in LZ77 encoder. So, the outputting
length/index code-words come from these maximum matching lengths and
the corresponding matching point positions. When a matching point cannot
be found, a single character code-word is outputted.

Figure 8. LZ77 string match with matching link (k = 3).


94 Information and Coding Theory in Computer Science

In Figure 8(a), the data stream and its matching links can be endless. But
in practice, the data window is limited by the RAM of the sensor. While the
data stream passes through, the data window slides in the stream buffer, and
the parts of data out of the buffer will lose information so that we cannot
follow the matching links beyond the data window. Cutting the links is slow,
but we may simply distinguish whether the link pointer has exceeded the
window bound.
To locate the data window sliding on the stream buffer, we define the
“straight” string s in the window as follows:

(2)
where buf[0...M − 1] is the stream buffer (M = 2N), pos is the current position
of the data window on the stream buffer, s[i...i + 2] is a 24b integer (i = 0, 1,
⋯, N − 1,), and buf[i...i + 2] is also a 24b integer (i = 0, 1, ⋯, M − 3).
In [6], we have built the matching links for CZ-BWT, which are also
called “bucket sorting links.” Now, we build the matching links for LZ77 on
buf. Figure 8 shows the example of the “XYZ” link, and we see 3 matching
points in it. The bucket array has 2563 link headers, and all links are endless.
We define the structure of the links and their headers as follows:

(3)

(4)
k is the fixed match length of each matching point in the links. In the example
of Figure 5, k = 3 for “ACO” and “XYZ”. i is the plain position in Equations
(3) and (4). We can see link[i] stores the distance (i.e., relative position) of the
next matching point. Be aware that if the next matching point is out of the
window bound, link[i] stores a useless value. The algorithm can distinguish this
condition: Let the value null = M, if i -link[i]<0, link[i] is expired.
Figure 8(b) shows the fast string matching by following the matching
links. We can see the reversed string “ZYXCBA…” starting in s[N] gets
2 matching points: reversed “ZYXC” and “ZYXCB,” with the matching
lengths 4 and 5.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 95

The data stream may be very long or endless, while the stream buffer is
finite; thus, the data window sliding on this buffer will reach the edge buf[M
− 1]. We can move the data window back to buf[0], but it is slow. So, we
assume this buffer is a “cycle” string buf[0...M − 1], which has the following
feature:
(5)
Then, Equation (2) is changed and extended into a congruent modulus
equation:

(6)
As M = 2N, the data window always occupies half of the stream buffer,
and the fresh stream data fill the other half. To simplify the algorithm
description, this paper hides the implementing detail of Equation (6) but
takes for granted that Equation (5) is working.
Following the matching links is not fast enough and needs further
acceleration. Seeking all the matching points along a link for the maximum
matching length consumes too much time, yet most of the attempts fail to
increase the matching length. So, we propose a configurable parameter sight
to control the matching times and observe that LZ77 encoding speeds up
evidently while the compression ratio decreased slightly.
Algorithm 3 shows the MLB coding, including the matching link
building and LZ77 encoding. And we can use a common LZ77 decoding
algorithm without matching links.

Algorithm 3. MLB Coding (k = 3).


96 Information and Coding Theory in Computer Science

ANALYSES OF CZ-ARRAY ALGORITHM


The CZ-Array algorithm includes BAB encoding/decoding and MLB
encoding. And MLB decoding is unnecessary because MLB encoding is
just creating matching links to accelerate LZ77 encoding. We may refer
to RAID-0 in the analyses of BAB coding, and we may compare LZ77
with MLB encoding and the other typical LZ algorithms in some popular
compression software such as 7-zip/p7zip, gzip, and lz4.

Time Complexities

BAB Coding
In practice, RAID-0 is very fast, and its algorithm is simple. The key factors
of its performance are N and n. N is the input/output data stream length,
and n is the disk amount of RAID-0. We regard n as a constant. So the time
complexity of RAID-0 encoding/decoding is O(N/n) = O(N).
As shown in Algorithms 1 and 2 and Figure 6, BAB coding is similar to
RAID-0. The difference is BAB coding has to treat multiple streams with
different lengths, but this operation is also fast and does not influence the
performance. So the time complexity of BAB coding is also O(N). We can
expect the practical BAB coding is as fast as RAID.

MLB and LZ77 Encoding


As shown in Algorithm 3 and Figure 8, we can divide the algorithm into
2 parts: Phase (a) is building the matching links, and Phase (b) is LZ77
encoding with matching links.
Phase (a) is similar to Phase 1 of CZ-BWT encoding in [6]. We have
discussed the time complexity of bucket sorting is O(N) in [6]. N is the data
window (BWT block) size. So Phase (a) also has the time complexity O(N).
With the aid of matching links, Phase (b) is faster than the traditional
LZ77 algorithm. According to the principle of LZ77 compression [5], the
key computation of LZ77 encoding is the string matching, which determines
the encoding speed. The traditional LZ77 encoding has the complexity of
time O(N2) because it has to find the matching points by seeking the data
window, a byte after another.
The software gzip also uses matching links to accelerate the LZ77
encoding, but each time it traces a whole link to find all matching points
in the data window, so the time complexity is still O(N2). While 7-zip uses
Block-Split Array Coding Algorithm for Long-Stream Data Compression 97

binary trees for LZSS encoding. Seeking a binary tree is faster than tracing a
matching link, so the time complexity of 7-zip (LZSS) encoding is O(NlbN).
We have a faster way in Phase (b) of Algorithm 3. A configurable
parameter sight is used to control the amount of the applicable matching
points in each link. Then we need not trace the whole matching link. Take
sight = 10 for example, each time only the first 10 matching points in the link
are inspected to find their maximum matching length. In practice, we regard
sight as a constant, so the time complexity of ComZip (LZ77) encoding is
O(sight N) = O(N).
As we have investigated, lz4 is one of the fastest parallel compression
software of the LZ series. It has the top speed because each time it just
inspects a single matching point, and it makes full use of the CPU cache.
Then, we may roughly regard lz4 as a fast LZ77 with the short matching
links of sight = 1. The time complexity of lz4 encoding is also O(N).
In [6], we have discussed bzip2 (BWT) encoding has the time complexity
O(N2lbN).

Space Complexities

BAB Coding
The memory usage is important for the sensors. As shown in Algorithms 1
and 2, BAB coding needs a block buffer and a set of arrays. The block buffer
has space complexity O(B). B is the block size. The stream information
arrays have space complexity O(n). n is the row amount of the array. But n
is very small, so the space complexity of BAB coding is O(B). B < N.

MLB and LZ77 Encoding


The software gzip has a 32 KB data window. It needs the RAM 10 N (2 N
for dual window buffer, 4 N for matching links, and 4 N for 4*32 KB hash
table). So, the space complexity of gzip (LZ77) encoding is O(N). And lz4
is a multithread parallel compression software. Each thread has a 64 KB data
window and needs the RAM 1.25 N (N for window buffer, and 0.25 N for
16 KB hash table). So the space complexity of lz4 encoding is O(pN). p is
the amount of the parallel threads. In [6], we have discussed bzip2 (BWT)
encoding needs the RAM 2.5 N. It has the time complexity O(N).
Both 7-zip and ComZip support multithread parallel compression, and
they also support the large data window at the GB level. Each thread of 7-zip
98 Information and Coding Theory in Computer Science

(LZSS) encoding needs the RAM 9.5 N (mainly for the binary tree). So, the
space complexity of 7-zip (LZSS) encoding is O(pN).
ComZip needs the RAM for the window buffer, matching links, and link
headers. In Figure 8, the stream buffer is a dual window buffer and needs
the RAM 2 N, but in ComZip we have optimized the dynamic RAM usage
which needs 1 − 2N. The matching links need the RAM 4 N (supporting 2 GB
data window) or 5 N (supporting 512 GB). The link headers form a bucket
array with 256k elements. If k = 3, a bucket array needs the RAM 64 MB
(supporting 2 GB data window) or 80 MB (supporting 512 GB), which can
be ignored if the data window is larger than 16 MB. In summary, ComZip
(LZ77) encoding needs the RAM 5-7 N, so it has the space complexity O(N).

Compression Ratios
The LZ series compression bases on the string matching. So the compression
ratio depends on the matching opportunities and maximum matching length
of each matching point. If a matching point goes out of the data window, it
cannot contribute to the compression ratio.
BAB coding rebuilds the data stream so that the multiple streams can
be seen in the same data window. If the streams are similar, many matching
points will benefit the compression ratio. But if the streams are completely
different, the compression ratio will slightly decrease, like the effect of
reducing the data window size. These results were observed in the data
experiments, and we may infer an experimental rule: a closer matching point
may have a larger matching length.
The data in practice are complex, so we need to do various data
experiments for this rule modeling. This paper just takes a simple example
to explain these BAB effects. According to Equation (1) and Figure 4, we
assume Dzip has the following trend:

(7)
N is the data window size. When BAB mixes n streams, the data window
size for each stream is N/n. Then, Equation (7) becomes

(8)
We regard N as a constant in this example. Then, Equation (8) becomes

(9)
Block-Split Array Coding Algorithm for Long-Stream Data Compression 99

Equation (9) infers that the compression ratio will slightly decrease if
the streams in the block array are completely different. But if the streams are
similar, as a simple example we may assume Dzip has the following trend:

(10)
The effects of Equations (9) and (10) get together, the results will show
the main factor Equation (10), which infers the compression ratio will
increase.
We may still use Equation (7) as an example to estimate the compression
ratio limited by the data window size, but we should also consider the
influence of the compression format. The 32 KB window of gzip is far away
from the 1 GB data window of 7-zip, but the length/index code-word of gzip
is short, which helps to improve the compression ratio. After all, the size
of 32 KB is too small, we may expect the compression ratio of gzip is the
lowest in our experiments. But the results show lz4 is the lowest.
Lz4 is designed for high-speed parallel compression. Each compression
thread has an independent 64 KB data window. But to reach the time
complexity of O(N), each time in the LZ encoding it tries only one matching
point, which will lose the remainder of matching opportunities in the window.
As a contrast, gzip tries all the matching points in the 32 KB window, and it
also has Huffman coding, so its compression ratio is higher than lz4.
Although bzip2 uses BWT encoding instead of the LZ series, we may
still use the example of Equation (7), which is supported by the experiment
results. Because the data windows of gzip, lz4, and bzip2 are all small, their
compression ratios are uncompetitive. And if they use larger windows, their
compression speed will slow down.
7-zip and ComZip have the advantage of large data windows in the
level of GB, so their compression ratios are high. And both of them support
multithread parallel compression for high speed. But we ought to consider
the RAM consumption of the multiple threads. If the RAM of a sensor is
limited, can it still support the large window for a high compression ratio?
Like the behavior of lz4, 7-zip also uses independent data windows for
the parallel threads. As mentioned above, 7-zip (LZSS) encoding has space
complexity O(pN). For example, if it uses p = 16 threads, and each thread
uses the N = 512 MB window, the total window size is pN = 8 GB. On the
other hand, if the RAM limits the total window size to pN = 512 MB, the
independent window for each thread is only N = 32 MB. So the parallel 7-zip
encoding decreases the compression ratio by N.
100 Information and Coding Theory in Computer Science

ComZip shares a whole data window for all parallel threads. If the total
window size is N = 512 MB, each (LZ77) encoding thread can make full use
of the 512 MB window. So the parallel encoding of ComZip can keep the
high compression ratio.
Moreover, ComZip (LZ77) encoding has the time complexity O(N),
which implies it has the latent capacity to enlarge the data window for a
higher compression ratio, without obvious rapid speed loss. As a contrast,
7-zip (LZSS) encoding has the time complexity O(NlbN).
ComZip uses sight to control the amount of the applicable matching
points for high speed and skips the matching link remainder. This will
decrease the compression ratio slightly, but we can enlarge either N or sight
to cover this loss.
Table 5 shows the comparison of CZ-Array and other LZ/BWT encoding
of these popular compression software.

Table 5. Comparison of data encoding.

Compression Window Time com- Space Compression ratio


encoding size sup- plexity complex-
port ity
gzip (LZ77) 32 KB O(N2) O(N) Very low; small data window to keep
the speed
lz4 (LZ77) 64 KB O(N) O(pN) Very low; small data window and
single matching point for fast com-
pression
bzip2 (BWT) 900 KB O(N2lbN) O(N) Low; small data window to keep the
speed
7-zip (LZSS) 4 GB O(NlbN) O(pN) High; independent data windows for
multi-threads limit the window size
ComZip (BAB) 512 GB O(N) O(B) High for similar data streams
ComZip (MLB) 512 GB O(N) O(N) Matching point control for speed
slightly decreases the compression
ratio
ComZip (LZ77) 512 GB O(N) O(N) High; shared data window for multi-
threads

Complexities of Hardware Design


Hardware acceleration is valuable for the sensors which have limited
computing resources. The algorithms of CZ-Array, including BAB and
MLB coding, are concise for the hardware implementation.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 101

As shown in Figure 6, BAB and RAID coding are alike. In practice,


the hardware of the RAID controller is mature and fast, which may benefit
the BAB hardware implementation. Figure 9 shows the primary hardware
design of BAB. The encoder uses a multiplexer to mix the data streams,
and the decoder uses a demultiplexer to divide the united array stream. The
accumulators count the clock. When a block is outputted, the multiplexer/
demultiplexer will switch to the next data stream.

Figure 9. Primary hardware design of BAB (n = 4).


This is the primary design that can merely treat the data streams with the
same length. The implementation of Algorithms 1 and 2 needs additional
logic circuits for the situation control.
Figure 10 shows the primary hardware design of MLB. To show the
design clearly, we use 3 blocks of RAM. But in practice, we can use the
same block of RAM and divide it into 3 data areas by the address. This is
also the primary design, and the implementation of Algorithm 3 Phase (a)
needs additional design in detail. For example, this design cannot stop and
reset the accumulator when it goes out of the data window.

Figure 10. Primary hardware design of MLB (k = 3).


102 Information and Coding Theory in Computer Science

In the view of hardware, both Figures 9 and 10 are succinct and easy to
optimize the hardware speed.

EXPERIMENT RESULTS
In [3, 4, 6], we have done some experiments to compare ComZip and other
compression software, but they are in different versions and different cases.
This paper focuses on CZ-Array and compares ComZip with p7zip (the
Linux version of 7-zip), bzip2, lz4, and gzip. These compression software
are popular, and a lot of other compression software have compared with
them. We may just compare ComZip with them because the comparison
with others can be estimated.
In this paper, we use the parameter “-9” for gzip and lz4 to gain their
maximum compression ratio. For ComZip itself, we use n = 4 to test BAB
coding and then use n = 1 to simulate ComZip without BAB. This is workable
by modifying the parameter “column” in the configuration file “cz.ini”. We
also use sight = 20 and B = 1 MB by modifying the parameter “sight” and
“cluster.”
We find 4 public files to simulate the data streams, and thank the
developers of the OpenWRT project. Table 6 shows these files. They can be
downloaded from http://downloads.openwrt.org/

Table 6. Experiment data file.

File name Data file size Method of fetch


(B)
lede-imagebuilder-17.01.0-x86-64.Linux-x86_64.tar 128 225 280 Download and
lede-imagebuilder-17.01.1-x86-64.Linux-x86_64.tar 128 716 800 decompress the xz
file.
lede-imagebuilder-17.01.2-x86-64.Linux-x86_64.tar 128 788 480
lede-imagebuilder-17.01.3-x86-64.Linux-x86_64.tar 128 798 720
Total 514 529 280

The experiments in this paper are on 2 hardware platforms: x86/64 and


ARM. Their performances may provide references to the future and current
heavy sensor nodes. The operating systems of both experiment platforms
are Linux. We still provide ComZip on the website, but the port is changed
into 81. Researchers may use it to do more experiments with new data. It can
be downloaded from http://www.28x28.com:81/doc/cz_array.html
Block-Split Array Coding Algorithm for Long-Stream Data Compression 103

Tests on x86/64 Platform


This platform is a common desktop computer with the following equipment:
AMD Ryzen 2700X 8-core and 16-thread CPU, 64 GB DDR4 RAM,
250 GB SSD (solid state disk) and Ubuntu Linux 16.04 (x64). We regard
this computer as a future high-end mobile sensor because the carrier’s motor
can provide enough energy. The software versions are ComZip v20180421
(64b), p7zip v16.02, bzip2 v1.0.6, lz4 v1.8.3, and gzip v1.6. And to enhance
the efficiency of matching links, ComZip uses k = 3.25 in this experiment,
which means the bucket array in Figure 8 has matching strings with 26b
instead of 24b.
Table 7 and Figure 11 show the relationship of the compressed file size
Dzip and the data window size N. From Equation (1), we can find that this
relationship is virtually the relationship of the compression ratio and N.

Table 7. x86/64: comparison of compressed file size (B).

Data window ComZip ComZip p7zip bzip2 lz4 Gzip


with BAB without
BAB
0.03 MB — — — — — 234 066 273
0.06 MB — — — — 261 855 600 —
0.1 MB — — — 231 491 898 — —
0.2 MB — — — 228 255 698 — —
0.3 MB — — — 226 771 672 — —
0.4 MB — — — 225 478 871 — —
0.5 MB — — — 224 732 694 — —
0.6 MB — — — 223 767 839 — —
0.7 MB — — — 223 097 527 — —
0.8 MB — — — 222 165 703 — —
0.9 MB — — — 221 587 446 — —
1 MB 223 780 225 163 088 191 232 707 — — —
016
2 MB 127 784 220 720 560 186 481 526 — — —
120
4 MB 126 559 214 789 848 179 617 113 — — —
056
8 MB 107 222 210 012 232 175 085 076 — — —
416
16 MB 106 406 174 312 656 148 283 229 — — —
528
104 Information and Coding Theory in Computer Science

32 MB 104 671 167 591 040 132 927 341 — — —


480
64 MB 93 272 224 166 164 400 130 137 294 — — —
128 MB 91 735 488 92 899 880 96 912 630 — — —
256 MB 91 472 240 92 575 008 96 763 233 — — —
512 MB 91 476 872 92 516 000 79 314 280 — — —

Figure 11. Compressed file size and data window size in Table 7.
Table 8 and Figure 12 show the relationship between the compression/
decompression time and N. Figure 12 hides the decompression time because
we have not optimized the decompression of ComZip yet. The current
decompression program is just to verify the compression algorithms of
ComZip are correct, so it may be slower than other software. We focus on
the compression performance first, and the optimization of decompression
for ComZip is our future work.

Table 8. x86/64: comparison of file compression/decompression time (sec-


onds).

Data ComZip with ComZip without p7zip bzip2 lz4 gzip


window BAB BAB
Encode Decode Encode Decode Encode Decode Encode Decode Encode Decode Encode Decode
0.03 MB — — — — — — — — — — 60 3
0.06 MB — — — — — — — — 15 1 — —
0.1 MB — — — — — — 41 18 — — — —
0.2 MB — — — — — — 39 19 — — — —
0.3 MB — — — — — — 39 19 — — — —
0.4 MB — — — — — — 40 19 — — — —
Block-Split Array Coding Algorithm for Long-Stream Data Compression 105

0.5 MB — — — — — — 40 20 — — — —
0.6 MB — — — — — — 40 19 — — — —
0.7 MB — — — — — — 42 19 — — — —
0.8 MB — — — — — — 42 18 — — — —
0.9 MB — — — — — — 42 19 — — — —
1 MB 18 26 19 26 18 10 — — — — — —
2 MB 18 16 19 25 18 10 — — — — — —
4 MB 20 15 22 24 20 9 — — — — — —
8 MB 18 14 19 24 21 9 — — — — — —
16 MB 18 14 19 20 21 8 — — — — — —
32 MB 18 13 19 19 32 7 — — — — — —
64 MB 19 12 20 19 59 7 — — — — — —
128 MB 19 11 16 12 65 5 — — — — — —
256 MB 18 12 16 11 64 5 — — — — — —
512 MB 15 12 14 11 128 4 — — — — — —

Figure 12. Compression time and data window size in Table 8.


In Figure 11 and Table 7, we observe that p7zip (N = 512 MB) has the
highest compression ratio R, but we ought to know its total window size. In
this experiment, p7zip has 16 parallel compression threads. If Table 7 shows
N = 512 MB, the total window size is 8 GB. On the other hand, we may
compare the compression ratios of ComZip (N = 512 MB) and p7zip (N =
106 Information and Coding Theory in Computer Science

16MB) because their RAM consumptions are near: ComZip (5N or 6N) vs
p7zip (16∗9.5N).
So, we may consider ComZip with BAB has the best compression
ratio among these software, and lz4 has the worst R owing to the 64 KB
small window and the single matching point algorithm. We also observe
the compression ratio of ComZip with BAB is obviously higher than that
without BAB. But when N = 128 MB or larger, ComZip gains the almost
same compression ratio, despite whether it uses BAB. The reason is that
the window is large enough to see 2 data streams or more, which has the
same effect as BAB coding. Another special point is in N = 1 MB, ComZip
also gains the almost same compression ratio, because B = N leads to the
invisible blocks for each other and useless BAB coding.
According to the compression speed shown in Figure 12 and Table 8,
ComZip without BAB (N = 512 MB) is the fastest, while ComZip with BAB
(N = 512 MB) and lz4 are very close to it. Both lz4 and ComZip have the
advantages of compression speed because they have the time complexity
O(N). But we observe abnormal curves which show ComZip with a small
N is slower than itself with a large N. The reason is the large N brings more
matching opportunities and decreases the total encoding operations.
The curve of p7zip is complex. When N < 16 MB, p7zip compression is
also fast. But when N is between 16 and 64 MB, its speed decreases rapidly
and fits the time complexity O(NlbN). When N = 128 MB, the window can
see 2 data streams and brings lots of matching opportunities, which changes
the trend of O(NlbN). When N = 512 MB, the speed decreases rapidly again
because the memory requirement for data windows (16*9.5N) exceeds the
RAM (64 GB), and the VM (virtual memory) is used. Fortunately, the VM
in SSD is faster than that in HDD (hard disk driver). As a contrast, ComZip
can use 64 GB RAM to build a 10 GB data window (6N).
The compression ratios and speeds of gzip and bzip2 are all uncompetitive,
so this paper simply provides their results.
Above all, the experiment results on this x86/64 platform show that CZ-
Array in ComZip has the following advantages to use large data windows:
high compression ratio and speed, shared window, and efficient RAM
consumption. ComZip with BAB can have the highest compression ratio
and speed among these software, which is practical for the long-stream data
compression.
Moreover, the experimental data are virtual router image files of
OpenWRT, and the results infer that CZ-Array in ComZip has the latent
Block-Split Array Coding Algorithm for Long-Stream Data Compression 107

capacity to compress quantities of virtual machine files in the cloud platform


with good performance.

Tests on ARM Platform


This platform is a popular Raspberry Pi 3B+ with the following equipment:
ARM Cortex-A53 4-core CPU, 1 GB DDR2 RAM, 32 GB Micro SDXC (SD
eXtended Capacity), and Raspbian Linux 9. We regard this Raspberry Pi as
a current heavy node of mobile sensors which is inexpensive. The software
versions are: ComZip v20180421 (32b), p7zip v16.02, bzip2 v1.0.6, lz4
v1.8.3, and gzip v1.6. In this experiment, ComZip uses k = 3.
Table 9 and Figure 13 show the relationship of the compressed file
size Dzip and the data window size N. Table 10 and Figure 14 show the
relationship of the compression/decompression time and N.

Table 9. ARM: comparison of compressed file size (B).

Data win- ComZip with ComZip p7zip bzip2 lz4 gzip


dow BAB without BAB
0.03 MB — — — — — 234 066 467
0.06 MB — — — — 261 738 789 —
0.1 MB — — — 231 491 802 — —
0.2 MB — — — 228 255 442 — —
0.3 MB — — — 226 773 089 — —
0.4 MB — — — 225 478 012 — —
0.5 MB — — — 224 733 757 — —
0.6 MB — — — 223 766 291 — —
0.7 MB — — — 223 090 167 — —
0.8 MB — — — 222 267 771 — —
0.9 MB — — — 221 587 957 — —
1 MB 227 955 840 230 038 056 191 232 707 — — —
2 MB 130 044 680 224 420 768 186 481 526 — — —
4 MB 128 748 280 218 482 480 179 617 112 — — —
8 MB 109 182 888 213 749 592 175 085 076 — — —
16 MB 108 328 480 178 654 224 148 283 229 — — —
32 MB 106 662 856 172 251 192 131 328 609 — — —
48 MB 97 264 744 171 096 888 — — — —
64 MB 95 014 848 170 786 384 129 993 993 — — —
80 MB 94 946 696 165 405 752 — — — —
96 MB Insufficient Insufficient — — — —
RAM RAM
108 Information and Coding Theory in Computer Science

Figure 13. Compressed file size and data window size in Table 9.

Table 10. ARM: comparison of file compression/decompression time (seconds).

Data ComZip with ComZip with- p7zip bzip2 lz4 gzip


window BAB out BAB
Encode Decode Encode Decode Encode Decode Encode Decode Encode Decode Encode Decode
0.03 MB — — — — — — — — — — 385 31
0.06 MB — — — — — — — — 117 24 — —
0.1 MB — — — — — — 261 98 — — — —
0.2 MB — — — — — — 269 114 — — — —
0.3 MB — — — — — — 281 119 — — — —
0.4 MB — — — — — — 297 127 — — — —
0.5 MB — — — — — — 312 125 — — — —
0.6 MB — — — — — — 324 128 — — — —
0.7 MB — — — — — — 338 130 — — — —
0.8 MB — — — — — — 356 133 — — — —
0.9 MB — — — — — — 365 132 — — — —
1 MB 374 466 375 472 255 33 — — — — — —
2 MB 217 269 363 455 469 47 — — — — — —
4 MB 216 268 352 443 384 37 — — — — — —
8 MB 221 229 340 431 295 42 — — — — — —
16 MB 186 222 282 353 318 25 — — — — — —
32 MB 191 224 280 343 591 39 — — — — — —
Block-Split Array Coding Algorithm for Long-Stream Data Compression 109

48 MB 183 206 284 339 — — — — — — — —


64 MB 186 197 286 348 943 38 — — — — — —
80 MB 186 200 273 335 — — — — — — — —
96 MB Insufficient Insufficient — — — — — — — —
RAM RAM

Figure 14. Compression time and data window size in Table 10.
This experiment is limited by the platform hardware, especially the 1 GB
RAM. When N = 96 MB, ComZip aborts for insufficient RAM. But other
software are worse. Only p7zip can use N = 64 MB (p = 1). p is the number
of parallel compression threads. And p7zip can only support N = 32 MB
(p = 2) and N = 16 MB (p = 4), and the larger N cannot work.
Figure 12 and Table 9 show that in this experiment p7zip cannot reach
N = 128 MB, so its compression ratio does not keep up with that of ComZip
with BAB. Then, ComZip with BAB has the best compression ratio, and lz4
has the worst.
In Figure 14 and Table 10, we observe that lz4 is the fastest, and ComZip
with BAB is the second fastest. We estimate the reason may be that the
CPU cache of ARM is small, which fits the small 64 KB data window of
lz4 to exhibit the performance. And to ComZip, the 4-core CPU limits the
performance of the parallel CZ encoding pipeline.
The curve of p7zip is complex. When N is between 2 and 8 MB, the
larger N brings more matching opportunities and benefits the speed, and
110 Information and Coding Theory in Computer Science

the RAM is sufficient to support p = 4. When N = 32 MB (p = 2) and N =


64 MB (p = 1), the speed falls seriously because the compression threads are
reduced. Again, this paper simply provides the results of gzip and bzip2 as
the comparison.
Above all, the experiment results on this ARM platform also show that
CZ-Array in ComZip has a high compression ratio and speed. Especially
in the limited RAM of a sensor, in these software, only ComZip with BAB
can see 2 streams or more in the data window, and its shared window brings
efficient RAM consumption. On this platform, ComZip with BAB can have
the largest data window, highest compression ratio, and the second fastest
speed among these software, which is practical for the long-stream data
compression.
From all of the above experiment results, we can get some support about
the advantages of CZ-Array: large and shared window and efficient RAM
consumption, high compression ratio and speed contrasting to the other
popular compression software. And these results provide some references to
the performance of CZ-Array running on x86/64 and ARM platforms, which
may infer the feasibilities and practicalities of using CZ-Array in the future
and current sensors.
Based on the current results, we may continue the research in the
following points as our future works:
• Analyzing the relationship between the compression ratio and the
matching opportunities
• Enhancing the MLB and LZ77 coding to obtain a better
compression ratio
• Optimizing the decompression algorithms of ComZip for better
speed
• Accelerating the encoding/decoding with GPU, ASIC or other
hardware equipment.

CONCLUSIONS
The rapid expansion of IoT leads to numerous sensors, which generate
massive data and bring the challenges of data transmission and storage. A
desirable way for this requirement is stream data compression, which can
treat long-stream data in the limited memory of a sensor.
But the problems of long-stream compression in the sensors still exist.
Owing to the limited computation resources of each sensor, enlarging the
Block-Split Array Coding Algorithm for Long-Stream Data Compression 111

data window without a rapid decrease of the encoding/decoding speed is a


problem. Compressing multiple data streams and seeing them in the same
window limited by a sensor’s RAM is another problem. If the sensor needs
hardware acceleration, simplifying the compression algorithms for the
hardware design is necessary.
To solve these problems, this paper presents the CZ-Array algorithm, a
block-split coding algorithm including BAB and MLB coding. CZ-Array
is implemented in the shareware ComZip. It supports the data window up
to 512 GB currently, which meets the requirements of long-stream data
compression.
The analyses indicate that CZ-Array encoding has the time complexity
O(N), which exceeds 7-zip and keeps up with lz4, one of the currently fastest
compression software. The space complexity of CZ-Array encoding is also
O(N), which exceeds 7-zip and lz4 in the multithread parallel compression.
So the compression ratio of CZ-Array with the large data window can
be high. The primary hardware design of BAB and MLB infers that the
hardware acceleration for CZ-Array is relatively easy to realize.
The experiment results support that if CZ-Array in ComZip treats
multiple streams with similar data it has the ability to obtain the best
compression ratio and the fastest or second fastest compression speed, in the
comparison with the popular compression software: p7zip, bzip2, lz4, and
gzip. And these results provide references to the performance of CZ-Array
running on x86/64 and ARM platforms, which may infer that it is feasible
and practical to use CZ-Array in the future and current sensors.
On the other hand, these experiment results also provide some proof of
the weakness in CZ-Array. Comparing to p7zip, the compression ratio of
ComZip without BAB is not high enough. The reason is the parameter sight
limits the matching opportunities.

ACKNOWLEDGMENTS
This paper is supported by the R&D Program in Key Areas of Guangdong
Province (2018B010113001, 2019B010137001), Guangzhou Science and
Technology Foundation of China (201802010023, 201902010061), and
Fundamental Research Funds for the Central Universities.
112 Information and Coding Theory in Computer Science

REFERENCES
1. G. Y. Lee, S. H. Lee, and H. J. Kwon, “DCT-based HDR exposure
fusion using multiexposed image sensors,” Journal of Sensors, vol.
2017, Article ID 2837970, 14 pages, 2017.
2. C. M. Li, K. Z. Deng, J. Y. Sun, and H. Wang, “Compressed sensing,
pseudodictionary-based, superresolution reconstruction,” Journal of
Sensors, vol. 2016, Article ID 1250538, 9 pages, 2016.
3. J. C. Qin and Z. Y. Bai, “Design of new format for mass data
compression,” The Journal of China Universities of Posts and
Telecommunications, vol. 18, no. 1, pp. 121–128, 2011.
4. J. C. Qin, Y. Q. Lu, and Y. Zhong, “Parallel algorithm for wireless data
compression and encryption,” Journal of Sensors, vol. 2017, Article
ID 4209397, 11 pages, 2017.
5. J. Ziv and A. Lempel, “A universal algorithm for sequential data
compression,” in IEEE Transactions on Information Theory, vol. 23,
no. 3, pp. 337–343, 1977.
6. J. C. Qin, Y. Q. Lu, and Y. Zhong, “Fast algorithm of truncated Burrows-
Wheeler transform coding for data compression of sensors,” Journal of
Sensors, vol. 2018, Article ID 6908760, 17 pages, 2018.
7. M. Alistair, M. N. Radford, and H. W. Ian, “Arithmetic coding
revisited,” ACM Transactions on Information Systems, vol. 16, no. 3,
pp. 256–294, 1998.
8. M. Burrows and D. J. Wheeler, A Block-Sorting lossless Data
Compression Algorithm, DIGITAL System Research Center, 1994.
9. J. L. Bentley, D. D. Sleator, R. E. Tarjan, and V. K. Wei, “A locally
adaptive data compression scheme,” Communications of the ACM, vol.
29, no. 4, pp. 320–330, 1986.
10. M. Alistair, “Implementing the PPM Data Compression Scheme,” in
IEEE Transactions on Communications, vol. 38, no. 11, pp. 1917–
1921, 1990.
11. N. Timoshevskaya and W. C. Feng, “SAIS-OPT: on the characterization
and optimization of the SA-IS algorithm for suffix array construction,”
in 4th IEEE International Conference on Computational Advances in
Bio and Medical Sciences (ICCABS), pp. 1–6, Florida, USA, 2014.
12. U. Baier, Linear-time Suffix Sorting - a New Approach for Suffix Array
Construction, Master Thesis at Ulm University, 2015.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 113

13. F. Hussain and J. Jeong, “Efficient deep neural network for digital
image compression employing rectified linear neurons,” Journal of
Sensors, vol. 2016, Article ID 3184840, 7 pages, 2016.
14. G. L. Sicuranza, G. Ramponi, and S. Marsi, “Artificial neural network
for image compression,” Electronics Letters, vol. 26, no. 7, pp. 477–
479, 1990.
15. J. Jiang, “Image compression with neural networks - a survey,” Signal
Processing: Image Communication, vol. 14, no. 9, pp. 737–760, 1999.
16. P. C. Tseng, Y. C. Chang, Y. W. Huang, H. C. Fang, C. T. Huang, and
L. G. Chen, “Advances in hardware architectures for image and video
coding – a survey,” in Proceedings of the IEEE, vol. 93, no. 1, pp.
184–197, 2005.
17. S. M. Lee, J. H. Jang, J. H. Oh, J. K. Kim, and S. E. Lee, “Design
of hardware accelerator for Lempel-Ziv 4 (LZ4) compression,” IEICE
Electronics Express, vol. 14, no. 11, pp. 1–6, 2017.
18. U. I. Cheema and A. A. Khokhar, “High performance architecture for
computing Burrows-Wheeler transform on FPGAs,” in Proceedings of
the International Conference on Reconfigurable and FPGAs, pp. 1–6,
Cancun, Mexico, 2013.
19. S. Funasaka, K. Nakano, and Y. Ito, “Adaptive loss-less data
compression method optimized for GPU decompression,” Concurrency
and Computation-practice & Experience, vol. 29, no. 24, 2017.
20. A. Ozsoy and M. Swany, “CULZSS: LZSS lossless data compression
on CUDA,” in 2011 IEEE International Conference on Cluster
Computing, pp. 403–411, Austin, TX, USA, 2011.
21. M. Deo and S. Keely, “Parallel suffix array and least common prefix
for the GPU,” ACM SIGPLAN Notices, vol. 48, no. 8, pp. 197–206,
2013.
22. D. A. Patterson, P. Chen, G. Gibson, and R. H. Katz, “Introduction
to redundant arrays of inexpensive disks (RAID),” in IEEE Computer
Society International Conference: Intellectual Leverage, pp. 112–117,
San Francisco, CA, USA, 1989.
Chapter

BIT-ERROR AWARE
LOSSLESS IMAGE
COMPRESSION WITH
6
2D-LAYER-BLOCK CODING

Jungan Chen1, Jean Jiang2, Xinnian Guo3, and Lizhe Tan4


1
Department of Electronic and Computer Science, Zhejiang Wanli University, Ningbo,
China
2
College of Technology, Purdue University Northwest, Indiana, USA
3
Department of Electronic Information Engineering, Huaiyin Institute of Technology,
Huaian, China
4
Department of Electrical and Computer Engineering, Purdue University Northwest,
Indiana, USA

ABSTRACT
With IoT development, it becomes more popular that image data is
transmitted via wireless communication systems. If bit errors occur during
transmission, the recovered image will become useless. To solve this
problem, a bit-error aware lossless image compression based on bi-level

Citation: Jungan Chen, Jean Jiang, Xinnian Guo, Lizhe Tan, “Bit-Error Aware Loss-
less Image Compression with 2D-Layer-Block Coding”, Journal of Sensors, vol. 2021,
Article ID 7331459, 18 pages, 2021. https://doi.org/10.1155/2021/7331459.
Copyright: © 2021 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
116 Information and Coding Theory in Computer Science

coding is proposed for gray image compression. But bi-level coding has
not considered the inherent statistical correlation in 2D context region. To
resolve this shortage, a novel variable-size 2D-block extraction and encoding
method with built-in bi-level coding for color image is developed to decrease
the entropy of information and improve the compression ratio. A lossless
color transformation from RGB to the YCrCb color space is used for the
decorrelation of color components. Particularly, the layer-extraction method
is proposed to keep the Laplacian distribution of the data in 2D blocks which
is suitable for bi-level coding. In addition, optimization of 2D-block start
bits is used to improve the performance. To evaluate the performance of our
proposed method, many experiments including the comparison with state-
of-the-art methods, the effects with different color space, etc. are conducted.
The comparison experiments under a bit-error environment show that the
average compression rate of our method is better than bi-level, Jpeg2000,
WebP, FLIF, and L3C (deep learning method) with hamming code. Also,
our method achieves the same image quality with the bi-level method. Other
experiments illustrate the positive effect of built-in bi-level encoding and
encoding with zero-mean values, which can maintain high image quality. At
last, the results of the decrease of entropy and the procedure of our method
are given and discussed.

INTRODUCTION
With cloud computing and Internet of Things (IoT) development, the
requirement for data transmission and storage is increasing. Fast and efficient
compression of data plays a very important role in many applications. For
instance, image data compression has been used in many areas such as
medical, satellite remote sensing, and multimedia.
There are many methods to compress image data including prediction-
based, transformation-based, and other methods such as fractal image
compression and deep learning with Auto Encoder (AE) [1, 2], Recurrent
Neural Network (RNN), Convolutional Neural Network (CNN) [3], and
Residual Neural Network (RestNet) [4]. The transformation-based method
includes Discrete Cosine Transform (DCT), Karhunen-Loeve Transform
(KLT), Hadamard transform, Slant transform, Haar transform, and singular
value decomposition [5]. Usually, transformation-based or deep learning
methods are used in lossy compression while prediction-based methods are
used for lossless compression.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 117

In some cases, lossless compression must be applied when data


acquisition is expensive. For example, lossless image compression must
be applied in aerial, medical, and space images [6, 7]. In industry, many
engineered lossless compression methods including Portable Network
Graphics (PNG), WebP [8], and Free Lossless Image Format (FLIF) [9] are
used. Also, some deep learning-based lossless compression methods [10–
12] are early researched. As one classical method for lossless compression,
the prediction-based method takes into account the difference between pixel
values and their predicted values, which are generally smaller numbers
than the pixel values themselves. Thus, each difference value needs a
smaller number of bits to encode [13]. It mainly has three kinds of methods
including context-based, least-square (LS)-based, and spatial structure-
based. Among these methods, the method based on spatial structure with 2D
context region is an effective solution to improve the compression ratio (CR)
because of considering the inherent statistical correlation using block-based
methods such as quadtree-based block [14], reference block [15], template-
matching[16], and hierarchical decomposition [17]. Quadtree-based block
and hierarchical decomposition methods split image to many subimages.
And the reference block method considers the phenomenon that a physical
object is constructed from numbers of structure components. Inspired from
these methods, splitting image into many blocks where each block has
similar color is taken as an effective method used by this work.
With IoT development, it becomes more usual that image data is
transmitted through wireless communication systems and lossless image
compression is used to improve transmission throughput. However, if bit
errors occur in a wireless noisy channel during transmission, the recovered
image will be damaged or become useless. So, lossless image compression
must resolve the problem and keep the recovered image be useful. Most
methods including engineered lossless compression methods and deep
learning based methods are not suitable to transmission in noise channel; to
the best of our knowledge, fewer researches have worked on this case except
our previous work [7]. By protecting the key information bits with error
control coding, our work proposed a bit-aware lossless image compression
based on bi-level coding for gray image as a one-dimensional signal. In the
coding method, only the linear predictive bi-level block coding parameters
are encoded using (7,4) Hamming codes and residue sequences are left as
they are to improve the performance of compression rate (CR). One reason
118 Information and Coding Theory in Computer Science

for the efficiency of bi-level coding is that it uses the sparsity property of
data which required fewer encoding bits.
In this work, we will use bi-level coding [7] for natural images with
red (R), green (G), and blue (B) components. As R, G, and B are highly
correlated, a linear transformation is applied to map RGB to other color space
and achieve better CR [17, 18]. As discussed above, the spatial structure-
based method with 2D context region is taken as an effective solution to
improve CR. Therefore, image is split into many 2D blocks which has
sparsity property and be suitable to be encoded with bi-level coding. Finally,
a novel variable-size 2D-block extraction and encoding method with built-
in bi-level coding is proposed to improve CR for color image and robust
to bit-error environment. An important 2D-layer-block extraction method
is used to split the image to many 2D blocks with similar color and keep
the Laplacian or Gaussian distribution of data in one 2D block, which has
sparsity property.
The contributions of this paper are summarized as follows:
• For color image compression, a lossless color transformation
from RGB to the YCrCb color space is used for the decorrelation
of color components. The prediction-based method is used to
remove data correlation and produce residue sequence
• To keep the data distribution with the sparsity property and be
suitable for bi-level coding, a novel 2D-layer-block extraction
method is proposed to keep the Laplacian or Gaussian distribution
of data in 2D blocks. Furthermore, by rearranging the order of
data encoded, the extraction method can decrease the entropy of
data and improve CR
• A novel variable-size 2D-block encoding method with built-
in bi-level is proposed to improve CR and robust to bit-error
environment just as the bi-level coding method. The mean or min
value in one 2D block and key information bits in built-in bi-level
coding are protected with hamming code. So, the image can be
recovered and useful
The rest of this paper is organized as follows. In Section 2, related
works on lossless compression are discussed. In Section 3, the details of the
proposed method are briefly introduced. In Section 4, several experiments
including comparison and analysis of basic principles are conducted. Finally,
the conclusion and future researches are drawn in Section 5.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 119

RELATED WORK ON LOSSLESS COMPRESSION

Prediction-Based Methods
The context-based adaptive prediction method is based on a static predictor
which is usually a switching predictor able to adapt to several types of
contexts, like horizontal edge, vertical edge, or smooth area. Many static
predictors can be found in [6, 19]. Median edge detector (MED) used in
LOCO-I uses only three causal pixels to determine a type of pixel area which
is currently predicted [20]. LOCO-I is further improved and standardized
as the JPEG-LS lossless compression algorithm, which has eight different
predictive schemes including three one-dimensional and four tow-
dimensional predictors [21]. To detect edges, Gradient Adjusted Predictor
(GAP) embedded in the CALIC algorithm uses local gradient estimation
and three heuristic-defined thresholds [22]. Gradient edge detection (GED)
predictor combines simplicity of MED and efficiency of GAP [23]. In [19],
the prediction errors are encoded using codes adaptively selected from the
modified Golomb-Rice code family. To enable processing of images with
higher bit depths, a simple context-based entropy coder is presented [6].
LS-based optimization is proposed as an approach to accommodate
varying statistics of coding images. To reduce computational complexity,
edge-directed prediction (EDP) initiates the LS optimization process only
when the prediction error is beyond a preselected threshold [24]. In [25],
the LS optimization is processed only when the coding pixel is around an
edge or when the prediction error is large. And a switching coding scheme
is further proposed that combines the advantages of both run-length and
adaptive linear predictive coding [26]. Minimum Mean Square Error
(MMSE) predictor uses least mean square principle to adapt k-order linear
predictor coefficients for optimal prediction of the current pixel, from a fixed
number of m causal neighbors [27]. The paper [28] presents a lossless coding
method based on blending approach with a set of 20 blended predictors, such
as recursive least squares (RLS) predictors and Context-Based Adaptive
Linear Prediction (CoBALP+).
Although individual prediction is favored, the morphology of 2D context
region would be destructed accordingly and inherent statistical correlation
among the correlated region gets obscure. As an alternative, spatial structure
has been considered to compensate the pixel-wise prediction [29]. In [14],
quadtree-based variable block-size partitioning is introduced into the
adaptive prediction technique to remove spatial redundancy in a given
120 Information and Coding Theory in Computer Science

image and the resulting prediction errors are encoded using context-adaptive
arithmetic coding. Inspired by the success of prediction by partial matching
(PPM) in sequential compression, the paper [30] introduces the probabilistic
modeling of the encoding symbol based on its previous context occurrences.
In [15], superspatial structure prediction is proposed to find an optimal
prediction of the structure components, e.g., edges, patterns, and textures,
within the previously encoded image regions instead of the spatial causal
neighborhood. The paper [17] presents a lossless color image compression
algorithm based on the hierarchical prediction and context-adaptive
arithmetic coding. By exploiting the decomposition and combinatorial
structure of the local prediction task and making the conditional prediction
with multiple max-margin estimation in a correlated region, a structured
set prediction model with max-margin Markov networks is proposed [29].
In [16], the image data is treated as an interleaved sequence generated by
multiple sources and a new linear prediction technique combined with
template-matching prediction and predictor blending method is proposed.
Our method uses a variable-size 2D-block extraction and encoding method
with built-in bi-level to improve the compression rate.

Engineered Lossless Compression Algorithms


PNG remove redundancies from the RGB representation with autoregressive
filters and then the deflate algorithm based on the LZ77 algorithm and
Huffman coding is used for data compression. Lossless WebP compression
uses many types of transformation including spatial transformation,
color transformation, green subtraction transformation, color indexing
transformation, and color cache coding and then performs the entropy
coding which uses a variation of LZ77 Huffman coding [8]. FLIF use Adam7
interlacing and YCoCg interleaving to traverse the image and perform
entropy coding with “meta-adaptive near-zero integer arithmetic coding”
(MANIAC) based on context-adaptive binary arithmetic coding CABAC
[9].

Deep Learning-Based Lossless Compression


Huffman, arithmetic coding, and asymmetric numeral systems are the
algorithms for implementing lossless compression, but they do not cater for
latent variable models, so bits back with asymmetric numeral systems (BB-
ANS) are proposed to solve the issue [10]. But BB-ANS become inefficient
when the number of latent variables grows, to improve its performance on
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 121

hierarchical latent variable models, Bit-Swap is proposed [11]. In contrast


to these works focusing on smaller datasets, a fully parallel hierarchical
probabilistic model (termed L3C) to enable practical compression on
superresolution images [12].

OUR PROPOSED METHOD


The proposed method is shown in Figure 1. In color image data, R, G, and
B are highly correlated. So, their straightforward encoding is not efficient.
Therefore, a linear transformation from RGB to the YCrCb color space
is used for the decorrelation of color components [31]. To approximate
the original color transform well enough, the lossless color transform
with equations (1) and (2) in [32] is adopted in our algorithm. As [7, 19]
mentioned, the prediction residues have reduced amplitudes and are assumed
to be statistically independent with an approximate Laplacian distribution.
Therefore, a predictor in Figure 1 is employed to further remove data
correlation in Y, Cb, and Cr channels, respectively. The predictor value of
Xp can be obtained with equation (3), where Ap, Bp, and Cp are the pixel
value and their location is illustrated in Figure 2. After the prediction step,
variable-size 2D blocks are extracted and key information about the blocks
are encoded with Hamming code. Finally, these 2D blocks are separately
encoding with built-in bi-level coding to make use of the sparsity property
of Laplacian distribution and achieve better signal quality and robust to bit
errors [7].

(1)

(2)
122 Information and Coding Theory in Computer Science

(3)

Figure 1. The proposed method.

Figure 2. Neighboring pixels for the predictor.


The procedure of 2D-block extraction and encoding is further shown in
Figure 3; the 2D-layer-block extraction method is used to keep the Laplacian
or Gaussian distribution of data in Layer-1~n or 2D blocks, which have the
sparsity property and are suitable for bi-level coding. n in Layer-n represents
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 123

n bits required to encode in the extracted blocks, and the remaining data not
belonging to any blocks is left to the next layer for extraction. The 2D-block
encoding method with built-in bi-level is used to improve CR and keeps
robust to bit errors. The built-in bi-level procedure split the 2D block into
many one-dimension signals, and each signal is encoding separately. It is
because the bi-level method has the maximum encoding length, which is
normally the same as the width of image.

Figure 3. The description of 2D-block extraction and encoding.

2D-Layer-Block Extraction Method

Principle of the Extraction Method


In the proposed algorithm, to keep the data distribution have the sparsity
property and be suitable for bi-level coding, a novel 2D-layer-block extraction
method is proposed to keep the Laplacian or Gaussian distribution of data
in 2D blocks. In addition, the extraction method can rearrange the order of
data encoded and the entropy of data is decreased, so CR can be improved.
The principle of the method is introduced as follows.
For encoding residues, if a two-dimension block, called 2D block, can
be encoded with n bits per residue, all of these datum x in the block must be
124 Information and Coding Theory in Computer Science

satisfied with condition shown in (4). Therefore, it is feasible to find these


blocks for n = 1 bits, then n = 2, ::8.

(4)
Let us consider all these datum x in the block governed by a probability
density f (x), and the entropy is calculated by (6) [33].

(5)

(6)
By inserting (5) into (6), the entropy for a Gaussian distribution is
expressed as

(7)
Since the residue sequence with Gaussian distribution has maximum
entropy, the following inequality holds in general.

(8)
According to (4), the fixed standard deviation σ is less than (2n – 1)/2
and (9) can be deducted when we assume μ = 0; L is the sample size in all of
these datum x in one block. By substituting (9) to (8), equation (10) can be
obtained. When blocks for n bits are found starting from 1 to 8, the entropy
of these data in blocks is increased later and later according to (10). So, the
entropy is decided by n and it is possible to improve the compression ratio
with this method.

(9)

(10)
According to the discussion above, we assume μ = 0. After performing
prediction and making it zero-mean by removing the average, many residue
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 125

values are close to zero and the residues follow a Laplacian distribution
as shown in Figure 4(a). That is, all the data in one of these encoding 2D
blocks will satisfy (11). Note that the sample size L in the block is above a
threshold value of thn and data in the block possess a Laplacian or Gaussian
distribution approximately.

(11)

Figure 4. Data distribution. (a) data distribution in blocks; (b) data distribution
after 2D-block extraction; (c) data in one block.
To proceed, if all of these 2D blocks with n = 1 bits are found, they will
be extracted from residue data. The rest of the residue data consists of three
portions. The first portion has values bigger than (2n – 1)/2, and the second
portion has values smaller than –(2n – 1)/2, while the third portion contains
data which size is smaller than thn. It is noted that after the residues ∈ [−(2n
– 1)/2, (2n – 1)/2] shown in Figure 4(a) are extracted, the rest of the residue
data will nearly keep the Laplacian distribution. When the extraction is
repeated from n = 1 to 8, the Laplacian distribution of the remained residue
data will change with decreasing the probability density around zeros as
shown in Figure 4(b). In addition, the Laplacian or Gaussian distribution
in these 2D blocks will be flattened as depicted in Figure 4(c) because of
126 Information and Coding Theory in Computer Science

increasing value of (2n – 1). In this paper, the procedure is called as layer
extraction.

Procedure of the Extraction Method


According to (11), 2D-layer blocks each having a sample size above the
threshold of thn are extracted repeatedly. For example, in Figure 5, image
residue data is given as a matrix 40 ∗ 40 and many 2D blocks belonging A
are extracted. The data which are not included in blocks are reshaped as a
matrix M with the same height as the original residue image, while other
remaining data are collected as an array of B. After the first layer is finished,
matrix M is processed similarly in the next layer. With the extraction and
matrix reshape operations, many edge values will be merged with other
data and have less effect on compression [24]. The pseudocode of block
extraction procedure is shown in Figure 6.

Figure 5. Example of extracting 2D blocks.

Figure 6. Extraction of 2D block.


Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 127

Built-In Bilevel Coding

Bilevel Coding
As most of these data in one of 2D blocks has a sparse distribution discussed
above, a bi-level coding scheme proposed by our previous works [7, 34] in
Figure 7 can be applied.

Figure 7. Bi-level coding rule.


Let p0 as the probability of a data sample requiring more than N1 bits and
less or equal to N0 bits to encode. Assuming that nbp0 ≤ 0:3 [34], the average
total length is expressed in the following:

(12)
For a given 2D block for n bits, N0 = n, the original total length is N0
∗ ns. When bi-level block coding is applied, the compression ratio will be
improved according to (12).
128 Information and Coding Theory in Computer Science

Optimization of 2D-Block Start Bits


According to (12), for an 8-bit gray image data, is a constant, . Given N1, p0,
which can be estimated, optimal nb can be determined to achieve a minimum
length of Lave. By taking the derivative of (12) and setting it equal to zero, the
optimized block size x can be calculated by equation (13). So, the minimum
N1 satisfying nbp0 ≤ 0:3 can be found through Figure 8. Finally, the “bits”
value in Figure 7 can start from the minimum N1 and the efficiency of
2D-block extraction will be improved.

(13)

Figure 8. Optimization of Start Bits.

2D-Block Encoding
Figured 9(a)–9(c) show the details of the encoding scheme. When one color
image is given, three channels are separately encoded and the head information
including the color space, predictor information, and their hamming coding
in (a). In each channel, 2D blocks are extracted with extraction method layer
by layer so encoding is implemented recursively layer by layer as well and
each layer is encoded separately. In each layer of (b), head data and image
data are separately encoded. (c) shows the encoding scheme of head data.
Width of parent matrix ∗ the height of image is the size of M, and length of
B is the length of the remaining data in Figures 5 and 6. Every block has start
position (x, y), its size (w, h), mean value of the data in block, the maximum
bits used of each data in block, and the key information of built-in bi-level
coding including N0, N1, nb, the number of block ns/nb, and bitstream of block
type. Particularly, the mean value of data in block has two functions. One is
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 129

used to improve the capability of robust to bit error because the mean value
keeps the key information of one block. Another is used to ensure zero-mean
of the block data, which is the feature in bi-level coding.

Figure 9. 2D-block encoding scheme: (a) data encoding of one image; (b) data
encoding in one channel of image; (c) header data encoding of one layer.

EXPERIMENTS
To validate our proposed algorithm, Open Images from http://data.
vision.ee.ethz.ch/mentzerf/validation_sets_lossless/val_oi_500_r.tar.gz,
CLIC mobile dataset from https://data.vision.ee.ethz.ch/cvl/clic/mobile_
valid_2020.zip shown in Table 1, and many classic images from http://sipi.
usc.edu/database/ and http://homepages.cae.wisc.edu/~ece533/images/ are
used.

Table 1. Images from Open Images and CLIC.


130 Information and Coding Theory in Computer Science

In the extraction of 2D blocks, an optimal threshold of sample size thn


is given in Table 1. All these data in one layer will be split into many pieces
with 512 samples, which is the same as the width of image. To evaluate the
effect of 2D-block encoding, built-in bi-level coding and color space, etc.,
experiments with different combinations are implemented. All the results
are the average values from 10 runs. A bit-error rate (BER) is default set as
0.001. The bi-level method is applied for RGB color image [7].
To evaluate the performances for the bit-error environment, the Peak
Signal to Noise Ratio (PSNR in dB), Structural Similarity (SSIM) as (16)
[35], Multi-Scale SSIM(MSSSIM_Y), Spectral Residual-based Similarity
(SRSIM), Riesz-transform-based Feature SIMilarity (RFSIM), Feature
Similarity (FSIM), FSIMc, and Visual Saliency-based Index (VSI) [36, 37]
are used as error metrics to measure the recovered image quality. In (15), I(x,
y) represents the original pixel, while is the recovered pixel. In (17)
and (18), K1 = 0:01, K2 = 0:03, and L = 255.

(14)

(15)

(16)

(17)

(18)

Comparison
In this experiment, our proposed method is compared with many state-
of-the-art methods from Refs. [16, 17], engineered lossless compression
algorithms including PNG, Jpeg2000, WebP, FLIF, and deep learning-based
lossless compression algorithm L3C. As all of these methods are not suitable
to be applied in bit-error situation, these methods with hamming code (7,4)
are supposed a solution robust to bit-error environment. The results are
given in Tables 2 and 3 and Figure 10.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 131

Table 2. Comparison with prediction-based methods.

Image Ref. [17] Ref. [16] Best CR(7,4) Our


Calic jpeg2000 jpeg-xr Ref. MRP CoBaLP2 LOCO-I
method
Lena 13.1787 13.5848 14.0942 13.6461 11.872 12.399 13.173 1.155179053 1.3876
Peppers 13.8661 14.8 15.3245 15.2102 0.989051407 1.2764
Mandrill 18.1511 18.0939 18.2553 18.5305 16.041 17.039 17.822 0.854952043 1.13
Barbara 14.9567 11.1612 12.1408 11.4575 1.228746525 1.2406
Airplane 10.121 11.06 1.355032676 1.5302
Average 1.116592 1.31296

Table 3. Comparison with engineering and deep learning methods.

Our CR Bilevel PNG(7,4) Jpeg2000(7,4) WebP(7,4) FLIF(7,4) L3C(7,4)


(1) 1.5141 1.4958 1.201423 1.195981 1.209832 1.26147 1.201423
(2) 1.8985 1.9171 1.589528 1.591757 1.629015 1.697848 1.589528
(3) 2.0597 1.9877 1.762672 1.724237 1.767789 1.811668 1.762672
(4) 1.8084 1.7509 1.350212 1.362216 1.343821 1.373226 1.350212
(5) 1.4849 1.4478 1.098277 1.11419 1.104521 1.131249 1.098277
(6) 1.7418 1.7397 1.578617 1.500231 1.544284 1.571556 1.578617
(7) 1.4761 1.4772 1.122815 1.098124 1.086112 1.100787 1.122815
(8) 1.7614 1.7707 1.667265 1.566052 1.611617 1.709558 1.667265
(9) 1.295 1.2774 0.918406 0.904853 0.911615 0.928585 0.918406
(10) 1.5852 1.5504 1.19649 1.176481 1.216922 1.201159 1.19649
(11) 1.807 1.8237 1.740748 1.670904 1.780918 1.795483 1.740748
(12) 1.7631 1.7602 1.528354 1.461382 1.506129 1.576181 1.528354
Average 1.682933 1.66655 1.396234 1.363867 1.392715 1.429898 1.396234
132 Information and Coding Theory in Computer Science

Figure 10. Comparison of recovered image quality: (a) PSNR; (b) other assess-
ment methods.
In Table 2, the results are taken from Refs. [15, 16] and the best results
of CR with hamming code are listed in the last second column. The average
CR of our method is 1.31296 and better than 1.116592. In Table 3, the results
of Jpeg2000, WebP, and FLIF are achieved through the compression tools
including OpenJpeg, WebP from Google, and FLIF from Cloudinary. As a
deep learning method, the result of L3C is achieved by using the neuron model
trained with Open Images dataset to compress images. The average CR of our
method is 1.682933 and better than others such as bi-level (1.66655), FLIF
(1.429898), and L3C (1.396234). In addition, it is noticed that CR of L3C
with images (5) and (9) from the CLIC dataset get the worst results, which are
1.098277 and 0.918406. One reason that leads to the result is that the neuron
model used to compress is trained with Open Images dataset and L3C do not
perform well on the images from a different dataset CLIC.
Figure 10 shows the image quality assessment results. PSNR, SSIM and
MSSSIM_Y can better reflect the situation of bit-error channel. Therefore,
only these three assessment results are discussed in the late section.
According to the comparison results in Tables 2 and 3 and Figure
10, the compression ratio of our proposed method is higher than bi-level
coding although 2D-block encoding requires more header bits to encode the
information about the position and size of block. And similar image quality
with bi-level coding is kept in Figure 10. The reason is that the 2D-layer-
block extraction method rearranges the data order to decrease the entropy
and the data distribution of one-layer blocks nearly keeps as Laplacian
distribution which is suitable for bi-level coding as discussed before. In
addition, the analysis will be further discussed in Section 4.6–4.7.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 133

The Effect of Built-In Bilevel Encoding


To investigate the advantage of the bi-level coding method, two experiments
including “built-in bi-level” and “no bi-level” are implemented. The results
are shown in Figures 11 and 12.

Figure 11. Effect of bi-level on CR (BER = 0.001, optimal thn).


134 Information and Coding Theory in Computer Science

Figure 12. Effect of bi-level on PSNR (BER = 0.001, optimal thn).


When the center of the Laplacian distribution is located at zero, bi-level
coding can require less bits to encode. So, the built-in bi-level encoding
method achieves the best compression ratio as shown in Figure 11. As bi-
level coding is proposed for noisy channel [7], it gets higher PSNR, SSIM,
and MSSSIM_Y and maintains a better image quality just as shown in
Figure 12.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 135

Comparison between Zero-Mean and Positive Integers


As we know, min value can also be as the key information to improve image
quality just as mean value. In this experiment, positive integer values by
removing min value are encoded. The results are shown in Figures 13 and
14.

Figure 13. Effect of zero-mean on CR (BER = 0.001, thn = 128, no bi-level).


136 Information and Coding Theory in Computer Science
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 137

Figure 14. Effect of zero-mean on PSNR (BER = 0.001, thn = 128, no bi-level).
As discussed in Section 4.2, the compression ratio with built-in bi-level
coding is higher when zero-mean values are used. While without the bi-
level coding used, the compression ratio with positive integer values by
removing the min value is higher than with zero-mean values in Figure 13.
The reason is that encoding with zero-mean values requires the sign bit.
But the usage of sign bit can result in that encoding with zero-mean values
has smaller amplitude around the mean value, so encoding with zero-mean
values has less effect on PSNR and SSIM than with positive integer values.
Consequently, according to the PSNR and SSIM shown in Figures 14(a) and
14(b), encoding with zero-mean values can achieve better PSNR and SSIM
than with positive integer, which is consistent with Section 3.3.
Figure 15 gives the reconstructed images in different methods. According
to the results of image (9) at the last row, encoding with “positive integer”
values shows worse image quality than others. But in Figure 14(c), the
MSSSIM value of “positive integer” is higher than “zero-mean.” Therefore,
the MSSSIM result cannot be consistent with the real image quality in some
cases.
138 Information and Coding Theory in Computer Science

Figure 15. Effects on images with different methods.

Evaluation with Optimization of 2D-Block Start Bits


In the 2D-block coding method, the excessive number of blocks can lead to
a decrease of compression ratio. So, the experiment based on optimization
is conducted and the results are given in Figure 16. It is observed that the
optimization does work and improves the compression ratio.

Figure 16. Effects on CR with optimization (BER = 0.001, thn = 128).


Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 139

Evaluation with Different Color Spaces and Predictors


In Figure 17, “RGB Direct” denotes that RGB image is directly encoded
without a predictor while “RGB Predictor” designates that the RGB residue
with a predictor is used. It shows that color space YCrCb in “2D block”
performs better than color space RGB and “RGB Direct” is the worst, which
validates that the difference between pixel values and their predictions is
generally smaller numbers than the pixel values themselves [13].

Figure 17. Comparison with different color space and predictor (BER = 0.001,
thn = 128).

The Decrease of Entropy


In Figure 18, the original entropy of gray image of (4) is 3.8453 while the
average entropy with the 2D-block encoding method is 3.772696. It is
indicated that the compression ratio is improved and less than the value
according to the information theory, which is coincident with the principle
in Section 3.1.
140 Information and Coding Theory in Computer Science

Figure 18. The entropy of 2D blocks for n bits (entropy ∗ number of samples,
thn = 128). Left is the entropy of blocks extracted.

The Change of Data Distribution in Every Layer


Figure 19 shows all the images with 2D blocks and the data distribution of
remained data or all the data in blocks from gray image of (4) without the
optimization of 2D-block start bits. The images indicated reshape operator
has changed the distribution of edge cared about by many predictors
[24]. The remained data distribution is close to the Laplacian distribution
shown in the left histogram, and the data distributions in blocks are close
to Gaussian or Laplacian distribution shown in the right histogram. All of
these are coincident with the analysis in Section 3.1.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 141
142 Information and Coding Theory in Computer Science

Figure 19. Change of data distribution (Thn = 128).

Discussion
Through these experiments, the results of comparison proved that our
method performs better than state-of-the-art methods, engineering lossless
compression algorithms and deep learning methods under bit-error situation.
There are four main reasons.
First, the 2D block extraction method extracts the data encoded with
smaller bits layer by layer; thus, the entropy is decreased as Section 4.6
show.
Second, the edge data always cause poor compression rate but the
2D-block extraction method has changed the edge data distribution. And the
data distribution of each layer block nearly keeps as Laplacian distribution
which is suitable for bi-level coding as Figure 19 of Section 4.7.
Third, built-in bi-level coding with zero-mean value can preserve high
image quality under bit-error environment as Section 4.2 and 4.3 discussed.
At last, optimization of 2D-block start bits and color space used in “2D
block” is an important mechanism to improve the compression rate as the
discussion in Sections 4.4 and 4.5.

CONCLUSIONS
When image data is transferred through wireless communication systems,
bit errors may occur and will cause corruption of image data. To reducing
the bit-error effect, a bit-aware lossless image compression algorithm
based on bi-level coding can be applied. But bi-level coding is one of
the one-dimension coding methods and has not considered the inherent
statistical correlation in 2D context region. So, to resolve this shortage,
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 143

a novel 2D-layer-block extraction and encoding method with built-in bi-


level coding is proposed to improve the compression ratio. With the layer
extraction method, the data distribution is close to the Laplacian distribution
after each layer extraction, which is suitable for bi-level coding. For color
image, a lossless color transformation from RGB to the YCrCb color space
is used for the decorrelation of color components. Through experiments,
it is demonstrated that our proposed method obtains the better lossless
compression ratio and keeps the same image quality with the bi-level
method under noise transmission channel. Although it is not as efficient
when compared to state-of-the-art methods in terms of lossless compression
ratio sometimes, it is more robust to bit errors caused by noisy channel.
Furthermore, after applying the feed-forward error control scheme, different
predictor, and coding method, we can achieve better compression efficiency,
since the bi-level block coder requires a smaller number of bits by the bit-
error protection algorithm than the amount required by the entropy coder.
Also, it is noted that deep learning methods are trained with Open Images
dataset but perform poor on the images from a different dataset. Therefore,
the generalization ability of deep learning methods is required to be improved
in the future.

ACKNOWLEDGMENTS
This work was supported in part by the National Natural Science Foundation
of China (No. 61502423), Zhejiang Provincial Natural Science Foundation
(No. LY16G020012), and Zhejiang Province Public Welfare Technology
Application Research Project (Nos. LGF19F010002, LGN20F010001,
LGF20F010004, and LGG21F030014).
144 Information and Coding Theory in Computer Science

REFERENCES
1. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston,
“Variational image compression with a scale hyperprior,” 2018, arXiv
preprint arXiv:1802.01436.
2. J. Lee, S. Cho, and M. Kim, An End-to-End Joint Learning Scheme of
Image Compression and Quality Enhancement with Improved Entropy
Minimization, 2019, https://arxiv.org/pdf/1912.12817.
3. F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-
end compression framework based on convolutional neural networks,”
IEEE Transactions on Circuits and Systems for Video Technology, vol.
28, no. 10, pp. 3007–3018, 2018.
4. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image
compression with discretized Gaussian mixture likelihoods and
attention modules,” in 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 2020.
5. K. N. Shruthi, B. M. Shashank, Y. S. Saketh et al., “Comparison analysis
of a biomedical image for compression using various transform coding
techniques,” in 2016 IEEE 6th International Conference on Advanced
Computing (IACC), pp. 297–303, Bhimavaram, India, February 2016.
6. A. Avramović and G. Banjac, “On predictive-based lossless
compression of images with higher bit depths,” Telfor Journal, vol. 4,
no. 2, pp. 122–127, 2012.
7. L. Tan and L. Wang, “Bit-error aware lossless image compression,”
International Journal of Modern Engineering, vol. 11, no. 2, pp. 54–
59, 2011.
8. “WebP image format,” https://developers.google.com/speed/webp/.
9. J. Sneyers and P. Wuille, “FLIF: free lossless image format based on
MANIAC compression,” in 2016 IEEE International Conference on
Image Processing (ICIP), Phoenix, AZ, USA, September 2016.
10. J. Townsend, T. Bird, and D. Barber, “Practical lossless compression
with latent variables using bits back coding,” The International
Conference on Learning Representations, Louisiana, USA, 2019,
arXiv preprint arXiv:1901.04866.
11. F. Kingma, P. Abbeel, and J. Ho, “Bit-swap: recursive bits-back
coding for lossless compression with hierarchical latent variables,”
in In International Conference on Machine Learning, pp. 3408–3417,
California, USA, 2019.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 145

12. F. Mentzer, E. Agustsson, M. Tschannen, R. Timofte, and L. Van Gool,


“Practical full resolution learned lossless image compression,” in 2019
IEEE/CVF Conference on Computer Vision and Pattern Recognition
(CVPR), Long Beach, CA, USA, June 2019.
13. R. Jovanovic and R. A. Lorentz, “Adaptive lossless prediction based
image compression,” Applied Mathematics & Information Sciences,
vol. 8, no. 1, pp. 153–160, 2014.
14. I. Matsuda, N. Ozaki, Y. Umezu, and S. Itoh, “Lossless coding using
variable block-size adaptive prediction optimized for each image,”
EUSIPCO, pp. 818–821, 2005.
15. X. Zhao and Z. He, “Lossless image compression using super-spatial
structure prediction,” IEEE Signal Processing Letters, vol. 17, no. 4,
pp. 383–386, 2010.
16. T. Strutz, “Context-based predictor blending for lossless color image
compression,” IEEE Transactions on Circuits and Systems for Video
Technology, vol. 26, no. 4, pp. 687–695, 2016.
17. S. Kim and N. Cho, “Hierarchical prediction and context adaptive
coding for lossless color image compression,” IEEE Transactions on
Image Processing, vol. 23, no. 1, pp. 445–449, 2014.
18. H. S. Malvar, G. J. Sullivan, and S. Srinivasan, “Lifting-based reversible
color transformations for image compression,” Proceedings of SPIE-
The International Society for Optical Engineering, 2008.
19. R. Starrosolski, “Simple fast and adaptive lossless compression
algorithm,” Softwware: Practice and Experience, vol. 37, no. 1, pp.
65–91, 2007.
20. M. Wienberger, G. Seroussi, and G. Sapiro, “LOCO-I: a low complexity,
context-based, lossless image compression algorithm,” in Proceedings
of the Conference on Data Compression, pp. 140–148, Snowbird, UT,
USA, 31 March-3 April 1996.
21. S. D. Rane and G. Sapiro, “Evaluation of JPEG-LS, the new lossless
and near-lossless still image compression standard for compression
of high-resolution elevation data,” IEEE Transactions of Geosciences
and Remote Sensing, vol. 39, no. 10, pp. 2298–2306, 2001.
22. X. Wu and N. Memon, “CALIC–a context based adaptive lossless
image codec,” in 1996 IEEE International Conference on Acoustics,
Speech, and Signal Processing Conference Proceedings, pp. 1890–
1893, Atlanta, GA, USA, May 1996.
146 Information and Coding Theory in Computer Science

23. A. Avramović and B. Reljin, “Gradient edge detection predictor for


image lossless compression,” in In Proceedings ELMAR-2010, pp.
131–134, Zadar, Croatia, September 2010.
24. L. Xin and M. T. Orchard, “Edge-directed prediction for lossless
compression of natural images,” IEEE Transactions on Image
Processing, vol. 10, no. 6, pp. 813–817, 2001.
25. L. Kau and Y. Lin, “Adaptive lossless image coding using least-squares
optimization with edge-look-ahead,” EEE Transactions on Circuits
and Systems II: Express Briefs, vol. 52, no. 11, pp. 751–755, 2005.
26. L. Kau and Y. Lin, “Least-squares-based switching structure for
lossless image coding,” IEEE Transactions on Circuits and Systems I:
Regular Papers, vol. 54, no. 7, pp. 1529–1541, 2007.
27. F. Hsieh and K. Fan, “A high performance lossless image coder,” in in
Conference on Computer Vision & Graphic Image Processing, Taipei,
Taiwan, 2005.
28. G. Ulacha and R. Stasinski, “Performance optimized predictor blending
technique for lossless image coding,” in In 2011 IEEE International
Conference on Acoustics, Speech and Signal Processing (ICASSP), pp.
1541–1544, Prague, Czech Republic, 2011.
29. W. Dai, H. Xiong, J. Wang, and Y. Zheng, “Large discriminative
structured set prediction modeling with max-margin Markov network
for lossless image coding,” IEEE Transactions on Image Processing,
vol. 23, no. 2, pp. 541–554, 2014.
30. Y. Zhang and D. A. Adjeroh, “Prediction by partial approximate
matching for lossless image compression,” IEEE Transactions on
Image Processing, vol. 17, no. 6, pp. 924–935, 2008.
31. M. Domanski and K. Rakowski, “Color transformations for lossless
image compression,” in IEEE Signal Processing Conference, pp. 1–4,
Tampere, Finland, 2000.
32. S. C. Pei and J. J. Ding, “Improved reversible integer-to-integer color
transforms,” in IEEE International Conference on Image Processing,
pp. 473–476, Cairo, Egypt, 2009.
33. P. J. Coles, M. Berta, M. Tomamichel, and S. Wehner, “Entropic
uncertainty relations and their applications,” Reviews of Modern
Physics, vol. 89, no. 1, pp. 1–58, 2017.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 147

34. L. Tan, J. Jiang, and Y. Zhang, “Bit-error aware lossless compression


of waveform data,” IEEE Signal Processing Letters, vol. 17, no. 6, pp.
547–550, 2010.
35. Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: from error measurement to structural similarity,”
IEEE Trans. on Image Processing, vol. 13, no. 4, pp. 1–14, 2004.
36. H. Amirpour, A. M. G. Pinheiro, M. Pereira, and M. Ghanbari,
“Reliability of the most common objective metrics for light field quality
assessment,” in ICASSP 2019-2019 IEEE international conference on
acoustics, speech and signal processing (ICASSP), pp. 2402–2406,
Brighton, United Kingdom, 2019.
37. https://sse.tongji.edu.cn/linzhang/IQA/IQA.htm.
Chapter

BEAM PATTERN SCANNING


(BPS) VERSUS SPACE-TIME
BLOCK CODING (STBC)
7
AND SPACE-TIME TRELLIS
CODING (STTC)

Peh Keong The and Seyed (Reza) Zekavat


Department of Electrical and Computer Engineering, Michigan Technology University,
Houghton, Michigan, USA

ABSTRACT
In this paper, Beam Pattern Scanning (BPS), a transmit diversity technique,
is compared with two well known transmit diversity techniques, space-time
block coding (STBC) and space-time trellis coding (STTC). In BPS (also
called beam pattern oscillation), controlled time varying weight vectors
are applied to the antenna array elements mounted at the base station (BS).
This creates a small movement in the antenna array pattern directed toward

Citation: P. Keong Teh and S. Zekavat, “Beam Pattern Scanning (BPS) versus Space-
Time Block Coding (STBC) and Space-Time Trellis Coding (STTC),” International
Journal of Communications, Network and System Sciences, Vol. 2, No. 6, 2009, pp.
469-479. doi: 10.4236/ijcns.2009.26051.
Copyright: © 2009 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0.
150 Information and Coding Theory in Computer Science

the desired user. In rich scattering environments, this small beam pattern
movement creates an artificial fast fading channel. The receiver is designed to
exploit time diversity benefits of the fast fading channel. Via the application
of simple combining techniques, BPS improves the probability-of-error
performance and network capacity with minimal cost and complexity.In this
work, to highlight the potential of the BPS, we compare BPS and Space-Time
Coding (i.e., STBC and STTC) schemes. The comparisons are in terms of
their complexity, system physical dimension, network capacity, probability-
of-error performance, and spectrum efficiency. It is shown that BPS leads to
higher network capacity and performance with a smaller antenna dimension
and complexity with minimal loss in spectrum efficiency. This identifies
BPS as a promising scheme for future wireless communications with smart
antennas.

Keywords: Antenna Array, Beam Pattern Sweeping, Transmit Diversity,


Space-Time Block Codes, and Space-Time Trellis Coding.

INTRODUCTION
Transmit diversity schemes use arrays of antennas at the transmitter to
create diversity at the receiver. Different transmit diversity techniques have
been introduced to mitigate fading effects in wireless communications [1–
5]. Examples are space-time block coding [1–3], space-time trellis coding
[3–5], antenna hopping [6] and delay diversity [6,7].
In Space-Time Block Coding (STBC), data is encoded by a channel
coder and the encoded data is split into N unique streams, simultaneously
transmitted over N antenna array elements. At the receiver, the symbols
are decoded using a maximum likelihood decoder. This scheme combines
the benefits of channel coding and diversity transmission, providing BER
performance gains. However, receiver complexity increases as a function of
bandwidth efficiency [3] and requires high number of antennas to achieve
high diversity orders. Moreover, antenna elements should be located far
enough to achieve space diversity and when antenna arrays at the base
station (BS) are used in this fashion, directionality benefits are no longer
available [1–3]. This reduces the network capacity of wireless systems in
terms of number of users.
In Space-Time Trellis Coding (STTC) information symbols are encoded
by a unique space-time channel coder and the encoded information symbols
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 151

are split into N unique streams, simultaneously transmitted over N antenna


arrays elements. At the receiver, after receiving a block of symbols denoted
by frame (e.g., 130 symbols per frame), Viterbi algorithm is used to recover
and error-correcting the information symbols in the frame [3–5]. This
scheme combines the benefits of space diversity and coding gain, providing
a significant probability-of-error performance gain. However, the receiver
complexity increases exponentially as a function of number of trellis states
(transmit antennas); and, in general, high order of trellis states (transmit
antennas) are required to achieve high diversity and coding gain [8,9].
Moreover, similar to STBC, in STTC antenna array elements should be
located far enough to achieve space diversity which reduces STTC network
capacity in terms of number of users.
BPS has been introduced as a powerful transmit diversity technique
capable of enhancing both wireless network capacity and probability-of-
error performance with minimal cost [10–13]. In this scheme, antenna
elements located at the distance of half a wavelength form an antenna array.
These antenna arrays are mounted at the BS. They are incorporated to create
directional beams steered toward the desired users. Time varying phase
shifts are applied to antenna elements to move the antenna pattern within
the symbol duration Ts. The antenna pattern starts from a point in space at
time zero, sweeps an area of space from time 0 to Ts, and returns back to
its initial position after time Ts, and repeats similar sweeping again. The
beam pattern movement is small, e.g., in the order of 5% of half power
beam width (HPBW). Simulations in [10] has shown that in rich scattering
environments, BPS leads to a time varying channel with a small coherence
time Tc with respect to Ts. This generates an artificially created fast fading
channel leading to a time diversity that can be exploited at the receiver
[10,11]. Hence, BPS leads to: a) high performance via time diversity, and
b) high network capacity (in terms of number of users) via directionality
inherent in BPS.
Here, BPS is compared with STBC and STTC schemes with their antenna
replaced by directional antenna arrays (without scanning) [9] in order to
achieve directionality (i.e., Spatial Division Multiple Access (SDMA)
benefit) available in BPS. The elements of comparison are: 1) probability-
of-error (bit-error-rate, BER, and frame-error-rate, FER) performance, 2)
network capacity, 3) system complexity (in terms of physical dimension),
and 4) bandwidth efficiency.
152 Information and Coding Theory in Computer Science

Figure 1. Space-time block codes system (N = 2)

Table 1. STBC structure for (N=2).

The results confirm that BPS scheme leads to higher network capacity
and BER/FER performance and lower complexity. However, BPS technique
relative spectral efficiency is less than STBC and STTC, e.g., in the order
of 5%. In other words, BPS technique offers higher quality-of-service and
network capacity with a minimal cost of spectrum efficiency. This introduces
BPS as a powerful scheme for future generation of wireless communications
with smart antenna arrays.
Section 2 introduces STBC, STTC and BPS schemes. Section 3 compares
their characteristics and, Section 4 presents and compares their capacity and
BER/FER performance simulations. Section 5 concludes the paper.

INTRODUCTION OF STBC, STTC AND BPS


TECHNIQUES
Here, we briefly introduce the fundamentals of the three techniques, STBC,
STTC and BPS.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 153

STBC
STBC is a transmit diversity technique capable of creating diversity at the
receiver to improve the performance of communications systems. STBC
utilizes N transmit antennas separated far apart to ensure independent fades
[1,2]. At a given symbol period, N signals are transmitted simultaneously
from N antennas. The signal transmitted from each antenna has a unique
structure that allows the signal to be combined and recovered at the receiver.
For simplicity in presentation, we only consider STBC with 2 transmit
antennas (N = 2) (see Figure 1).
We consider s0 and s1 two consecutive signals generated at two
consecutive times t0 and t1 = t0+Ts, respectively. The signal transmitted from
antenna zero is denoted by s0 and the one from antenna one is denoted by
s1. At the next symbol period, the transmitted signal from antenna zero is

and the signal transmitted from antenna one is where * is the


complex conjugate operation (see Table 1). The channel is denoted by h0
for transmit antenna 0 and h1 for transmit antenna 1. The main assumption
here is that the fading is constant across two consecutive symbols (i.e., over
t and t1 = t +Ts, t ∈ [0,Ts]); we can represent the channel fading for antenna
0 and 1 as:

(1)
respectively, where Ts is the symbol duration, ai, qi, i ∈{0,1} are the Rayleigh
fading gain and phase, respectively. The received signal at time t and t + Ts,
corresponds to

(2)
respectively. Here, nt and are complex random variables representing
receiver noise and interference at time t and t + Ts, respectively.
In the STBC receiver, Maximal Ratio Combining (MRC) leads to an
estimation of s0 and s1, corresponding to:
154 Information and Coding Theory in Computer Science

(3)
respectively (note: rt=r(t)). Substituting (1) and (2) into (3), we obtain

(4)
In other word, a maximum likelihood receiver leads to the removal of
the s1 and s0 dependent terms in ŝ0 and ŝ1, respectively. This generates a high
probability-of-error performance at the receiver.

STTC Technique
STTC is a transmit diversity technique that combines space diversity and
coding gain to improve the performance of communication systems [3,5,8].
STTC utilizes N transmit antennas separated far apart to ensure independent
channels. At a given symbol period, N signals are transmitted simultaneously
from N antennas. The signal transmitted from each antenna has a unique
structure with inherent error-correction capability to allow signal to be
recovered and corrected at the receiver [8]. In this paper, we only consider
the simulation scenario presented in [3], that is p/4-QPSK, 4-states, 2 b/s/
Hz STTC (hereafter, denoted as STTC-QPSK) that utilizes two transmit
antennas and one receive antenna.
The trellis structure of STTC-QPSK is shown in Figure 2(a) and
the constellation mapping in Figure 2(b). In STTC-QPSK, information
symbols are encoded using a channel coder by mapping input symbols to
a vector of output (codewords) based on a trellis structure (Figure 2(a)).
Here, information symbols are encoded based on the current state of the
encoder and the current information symbols. Thus, the encoded codewords
are correlated in time.
At the left of the trellis structure (Figure 2(a)) are the STTC codewords
(s1,s2), s1,s2 ∈ {0,1,2,3}. In Figure 2(a), there are four emerging branches
from each trellis state, because there are four possible QPSK symbols,
namely {0,1,2,3}. For example, consider the space time trellis coder that
starts at state (q1,q2) = (0,0) (represented by 00). When the information
symbol is 10, the coder transition from state 00 to 10 produces the output
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 155

code-words (s1,s2) of (0,2). When the next information symbol is 11, the
coder transition from state 10 to 11 produces the output codeword (2,3). The
channel coder continues to change from its current state to a new state based
on the incoming information symbols. Based on the design, the channel
coder resets to state 0 after completing the coding of a frame (e.g., 130
symbols). The output code-words of the encoder is then mapped into a p/4-
QPSK constellation (Figure 2(b)). The mapping results in two information
symbols. Each information symbol is then transmitted on each antenna
simultaneously. Through this encoding scheme, redundancy is introduced
into the system but at the same time, the symbols are transmitted over two
antennas. Therefore, coding redundancy does not impact the throughput. In
order to achieve SDMA to improve network capacity, each STTC-QPSK
antenna element is replaced with one antenna array [9] to generate two static
beams directed toward the desired users (Figure 3).

Figure 2. (a) STTC-QPSK trellis structure, and (b) Constellation mapping us-
ing gray code

Figure 3. STTC far located antenna elements are replaced by antenna arrays to
support SDMA.
156 Information and Coding Theory in Computer Science

The channel is denoted by h0 for transmit antenna 0 and h1 for transmit


antenna 1. We represent the channel fading for antenna i, i ∈ {0,1} as:
(5)
respectively, where ai, qi, i ∈ {0,1} are the Rayleigh fading gain and phase,
respectively. The received signal at time t can be modeled as

(6)
where si(t) is the transmitted symbol and n(t) is the complex random variable
representing receiver noise at time t. The receiver is designed using Viterbi
algorithm. The branch metric for a transition labeled q1(t) q2(t) corresponds
to [3]

(7)
where P is the number of transmit antenna. Viterbi algorithm is used to
compute the path with the lowest accumulated metric [3].

BPS
BPS is a new transmit diversity technique utilizing an antenna array to
support directionality and transmit diversity via carefully controlled time
varying phase shifts applied to each antenna element. This creates a slight
motion of the beam pattern directed toward the desired users [10]. Beam
pattern movement creates an artificial fast fading environment that leads to
time diversity exploitable by the BPS receiver [11]. Beam pattern movement
is created by applying time varying phase q(t) to the elements of antenna
array (see Figure 4).

Figure 4. Antenna array structure.


In BPS, the beam pattern sweeps an area of space within Ts (symbol
duration) and returns to its initial position and starts moving again. Properly
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 157

selecting the phase offset q(t) leads to a movement of antenna beam pattern
that ensures: 1) constant large scale fading over Ts, and 2) the generation of
L independent fades within each Ts.
1) Achieving constant large-scale fading: In order to ensure constant
large-scale fading over each symbol period Ts, the mobile must
remain within the antenna array’s HPBW at all times. This
corresponds to

(8)
where b is the HPBW, φ is the azimuth angle, dφ / dt is the rate of antenna
pattern movement, and Ts·(dφ / dt) is the amount of antenna pattern
movement within Ts. The received antenna pattern amplitude is ensured
to remain within the HPBW for the entire symbol duration, Ts, using the
control parameter k, 0 < k < 1.
2) Achieving L independent fades within each Ts: Using (8), the
phase offset applied to the antenna array is found to be (see
[3,6,7]):

(9)
where l is the wavelength of the carrier and d is the distance between adjacent
antenna elements.
The sweeping of the beam pattern creates an artificial fast fading channel
with a coherence time that may lead to L independent fades over Ts. This is a
direct result of the departure and the arrival of scatterers within the antenna
array beam pattern window. Simulation results in [10] and [11] assuming a
medium size city center, with 0.0005 < k < 0.05, reveals that time diversity
gains as high as L = 7 is achievable using BPS scheme.
Assuming BPSK modulation, the transmitted signal can be represented
as

(10)
where b0 ∈ {–1,+1} is the transmitted bit, fo is the carrier frequency, and
gTs(t) is the pulse shape (e.g., a rectangular waveform with unity height over
0 to Ts). The normalized signal received at the mobile receiver input cor-
responds to:
158 Information and Coding Theory in Computer Science

(11)
where m ∈ {0,1,2,…, M–1} is the mth antenna array element (Figure 2),
nl(t) is an additive white Gaussian noise (AWGN), which is considered
independent for different time slots (l), al is the fade amplitude in the lth time
slot, and xl is its phase offset (hereafter, this phase offset is assumed to be
tracked and removed). Moreover, in (11),
(12)
where (2pd/l)×cosφ is the phase offset caused by the difference in distance
between antenna array elements and the mobile (assuming the antenna array
is mounted horizontally), and θ(t) is introduced in Equation (9). Applying
the summation over m, Equation (11) corresponds to

(13)
Here,

(14)
is the antenna array factor. Assuming the mobile located at φ = p/2, (12) can
be approximated by g(t, φ) = g(t) = -q(t). Moreover, assuming that antenna
array’s peak is directed towards the intended mobile at time 0, and small
movements of antenna array pattern over Ts, i.e., in Equation (9), k is small,
the array factor is well approximated by AF(t, φ) @ 1.
The time varying phase of (9) in (12) and (13) leads to a spectrum
expansion of the transmitted (and the received) signal. Because the parameter
k in (9) is considered small (e.g., k = 0.05), this expansion is minimal (see
Subsection 3.2). After returning the signal to the base-band the received
signal corresponds to:

(15)
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 159

BPS VERSUS STBC, STTC


STBC, STTC and BPS are compared in terms of physical antenna dimension,
complexity, spectrum efficiency, network capacity and BER performance.

Complexity and Physical Antenna Dimension


The main complexity of BPS scheme is at the transmitter mounted at the
BS to generate a time varying beam pattern directed toward the desired
user, whereas, the complexity of STBC scheme is mainly due to the number
of transmitting antennas, N, at the BS and the combining scheme at the
receiver [3].
The complexity of STTC scheme is mainly due to both the encoder
(transmitter) and decoder (receiver). The encoding process requires a
space-time channel coder to encode the information symbols according to
a specific trellis structure (e.g., Figure 1). The decoding complexity that
utilizes Viterbi algorithm increases exponentially with the number of states
(transmit antennas) of the trellis structure [3].
Here, we consider:
• Space-Time Coding (STC) techniques (i.e., both STBC and STTC)
use two antenna arrays to generate directional beam pattern: a)
Each antenna array contains six antenna elements (each element
is separated by lo / 2), and b) The antenna arrays are separated far
enough (e.g., by 5lo) to ensure independent fades. Here, lo is the
wavelength of the carrier frequency (or the average wavelength
of all carrier frequencies if multi-carrier transmission is used).
• BPS technique uses: a) a single 6-element antenna array (elements
are separated by lo/2), and b) Beam-pattern movement is assumed
to result in up to seven fold diversity (in general, a function of
parameter k) [10].
STC schemes’ antenna dimension is higher than BPS since STBC
scheme utilizes 2 antenna arrays (in general, any number of antenna arrays).
Considering, antenna array elements are separated by lo/2, the length of the
antenna array would be 2.5lo. To ensure independent fades, these antennas
should be located apart enough (e.g., 5lo). This leads to the total length of
10lo for STBC antenna array while BPS needs just 2.5lo length antenna
array. Thus, the physical antenna dimensions of STC techniques are much
greater than the antenna array dimensions for BPS scheme. Moreover,
STTC physical antenna array dimensions (specifically, with each antenna
160 Information and Coding Theory in Computer Science

element replaced by an antenna array) increase as the number of antenna


arrays increases.
Antenna array pattern characteristics (e.g., its HPBW) changes with
frequency [12,13]. Hence, in wideband multi-carrier systems, (e.g., in multi-
carrier code division multiple access, MC-CDMA, or orthogonal frequency
division multiplexing (OFDM) systems) each group of sub-carriers might
be required to be transmitted over unique antenna arrays in order to create an
ideal SDMA; and hence, a number of antenna array clusters or antenna array
vector clusters are required (see [12,13] for more information). In this case,
the complexity and the dimensions of STBC and STTC are much higher than
BPS scheme. In general, the dimensions (and, as a result, the complexity)
of STC schemes increase as the number of antenna arrays increases. In
addition, the complexity of STTC increases as the number of trellis states
increases and as a result the required number of antenna arrays increases (in
order to create higher orders of space diversity and coding gain).

Spectrum Efficiency and Throughput


BPS technique creates a bandwidth expansion as it is discussed in the
previous section, while STBC scheme with static beam patterns does not
generate this expansion. BPS system bandwidth is expanded by a factor
corresponds to

(16)
where (B.W.)BPS = bandwidth needed with BPS and (B.W.)withoutBPS =
bandwidth needed without BPS. Considering (13) and using (12) and (9),
the expansion factor fexp. corresponds to

(17)
Hence, with a constant Ts, l, b, d and M, for both BPS and STBC systems,
the relative reduction in bandwidth efficiency due to BPS corresponds to

(18)
Considering d = l/2, and typical values of b (e.g., b = 0.5 rad.), and M =
6, (18) can be approximated by
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 161

(19)
With this definition, the relative reduction in BPS spectrum efficiency is
determined by the control parameter, k. For example, considering k = 0.05
(an antenna sweeping is equivalent to 5% of HPBW), hR = 95%. On the
other hand, with a constant bandwidth available to both BPS, and STBC and
STTC, the throughput of BPS is less than STC techniques by the factor fexp.
(e.g., by a factor of less than 5%). This disadvantage of BPS is very minimal
with respect to advantages of BPS techniques as discussed in this paper.

Capacity and Performance


In this paper, we have assumed the same antenna arrays (with the same
HPBW and approximately the same dimension and complexity) for both
BPS and STC systems. This assumption leads to higher order of diversity
via BPS compared to STC (e.g., up to 7 fold diversity in BPS versus 2
fold diversity in STC), which better mitigates fading effects in BPS system
compared to STC systems. Hence, while this leads to a higher probability-
of-error performance in BPS systems, considering a constant signal power
to noise power ratio, it leads to a higher network capacity as the number of
users’ increases. The details of capacity and performance enhancements are
presented in the next section via simulations.

SIMULATIONS

BER Performance Simulations


Simulations are performed assuming:
• Mid-size city center (e.g., 3 scatterers per 1000m2) that leads to 7
fold diversity with BPS technique;
• BPSK transmission for STBC and BPS comparison and QPSK
transmission for STTC and BPS comparison;
• One received antenna;
• Switched beam smart antenna arrays (with HPBW = 18o) are
mounted at the BS;
• Quasi-static channel, i.e., channel characteristic is static over 2
consecutive symbol periods, Ts, for STBC and over the entire
162 Information and Coding Theory in Computer Science

frame, for STTC-QPSK and then changes in an independent


manner; and
• STTC-QPSK frame is equal 130 symbols.
For simplicity of comparison and to illustrate the benefits of time
diversity induced by BPS scheme, Equal Gain Combining (EGC) over time
components is assumed. EGC technique does not rely on channel estimation
to perform the combining. The performance simulations for STBC compared
to BPS are shown in Figure 5(a). It can be observed that BPS scheme offers
5 dB and 15 dB improvement in performance at probability-of-error 10-3
compared to STBC scheme and traditional BPSK system without diversity,
respectively. The performance improvement in BPS scheme is due to the
high order of time diversity gains achieved through beam pattern movement.
The diversity order achievable via STBC is lower than BPS, and, therefore,
its BER performance is lower compared to BPS scheme.
The performance simulations for BPS versus STTC-QPSK are shown
in Figure 5(b). It is observed that BPS scheme offers 12 dB and 22 dB
improvements in performance at probability-of-error 10-3 compared to STTC-
QPSK scheme with antenna arrays and without beam pattern movement,
respectively. The performance improvement via BPS is the result of high
order of time diversity gains achieved through beam pattern movement.
Although STTC-QPSK offers both diversity and coding gain, the diversity
order offered by STTC-QPSK is much inferior compared to BPS-QPSK;
thus, even without coding gain benefit in BPS-QPSK scheme, it surpasses
the performance of STTC-QPSK with relatively lower complexity.
In Figure 6, BER performance of BPS system is generated for different
relative spectrum efficiency, hR. Increasing the parameter k leads to higher
order of diversity that enhances BER performance of the system; and, on
the other hand, it reduces BPS relative bandwidth efficiencies. For example,
as it is discussed in [10], in a rich scattering environment, k = 0.005 leads
to two-fold diversity which is equivalent to hR = 99.5%. Increasing k from
0.005 to 0.05 increases the diversity achievable to 7 folds, and reduces the
relative spectrum efficiency to hR = 95%. This is equivalent to a decrease in
throughput from 0.5% to 5%.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 163

Figure 5. BER/FER performance comparing (a) STBC versus BPS scheme,


and (b) STTC-QPSK versus BPS.
164 Information and Coding Theory in Computer Science

Figure 6. BPS performance for different hR values.

Network Capacity Simulations


Network capacity simulations are performed assuming:
• MC-CDMA transmission with N = 32 carriers;
• Four fold frequency diversity over the entire bandwidth;
• For STBC-BPS comparison, we consider inter-cell interference
effects from the first tier cells (see Figure 7). This interference
is reduced via long codes assigned to signals transmitted to the
users of each cell;
• For STTC-BPS comparison, inter-cell interference effects are
ignored, (see Figure 7);
• Mid-size city center (e.g., 3 scatterers per 1000m2) that leads to 7
fold diversity with BPS technique;
• Users are distributed uniformly in the cell;
• Inter-user-interference within the cell is reduced via random
assignment of Hadamard-Walsh codes (in MC-CDMA systems);
• Equal Gain Combining (EGC) over frequency components;
• Switched beam smart antenna arrays (with HPBW = 18o) are
mounted at the BS; and
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 165

• Signal power to noise power ratio is SNR = 10dB for STBC and
SNR = 12dB for STTC.

Figure 7. Interfering cells assuming one-tier cellular network. The direction of


beam patterns that will interfere with intended mobile is represented.
With these assumptions, the received BPS/MCCDMA signal corresponds
to [12]:

(20)
Here, AF(t, φc) is the array factor introduced in (14), nl(t) is an additive
white Gaussian noise (AWGN), which is considered independent for
different time slots (l), bc,k∈{+1,-1} is the cth cell’s kth user’s
transmitted bit, is the Hadamard-Walsh spreading code for kth user and
nth sub-carrier in the cth cell, ψcn is the long code of the nth sub-carrier for
cth cell, is the Rayleigh fade amplitude on the nth sub-carrier in the lth
time slot in the cth cell and is its phase (which is assumed to be tracked
and removed). is assumed independent over time components, l, and
correlated over frequency components, n [14]. Kc represents the number of
users effectively interfere with the desired user.
In the neighboring cells, these users are located at the antenna pattern
(sector) with directions shown in Figure 7. Considering assumptions (f) and
(i)
166 Information and Coding Theory in Computer Science

(21)
when E(·) denotes the expectation and K is the number of users available
in each cell. In (20) 1/(Rc)a represents the long-term path loss of the signal
received by the mobile (MS) in the cell 0. This signal is transmitted by
the BS of neighboring cells to the users located in those cells, and in the
directions which interfere with the intended mobile (see Figure 7).
In Figure 7, D is the cell radius. Assuming the intended mobile is located at
and approximating the coverage area by a triangle, represents
the approximate center of mass of users in the beam pattern coverage area.
Rc represents the distance between the BS of the cell c, c ∈ {0,1,2,…6}, and
the intended mobile in the cell 0. From the geometry in Figure 7, vector R
formed by the elements Rc, c ∈ {0,1,2,…6}, corresponds to [12]

(22)
where R0 is normalized to one and the others are normalized with respect to
this value.
In (20), the power factor a is a function of user location, BS antenna
height and environment. Considering urban areas, parameter a changes with
the carrier frequency and BS antenna height. In urban areas, a = 1, if Rc <
Dmax, and a = 2 if Rc > Dmax, where Dmax = D(fo,ha), (Dmax is a function of the
carrier freuquency fo and antenna height ha). Considering fo = 900MHz, and
BS height, ha > 25m, Dmax ≈ 1000m (see [15]). Assuming a cell of radius D ≈
500m, and by referring to [15], we find that a = 2 for cells 1, 2 and 6 whereas
a = 1 for cells 3, 4 and 5. Thus, in the simulations we ignore the interference
from cells 1, 2 and 6 and only consider inter-cell interference from cells 3, 4
and 5 with a little loss in accuracy.
With the model introduced in (20), the received STBC/MC-CDMA
signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 167

(23)
where bc,k[i] and bc,k[i+1], i ∈ {0,2,4,…} is the kth user ith information bit in
the cth cell for STBC, and are the Rayleigh fade amplitude due to
antenna 0 and antenna 1 in the nth sub-carrier in the cth cell and and
are their phase, respectively, is the HadamardWalsh spreading code for
k user and n subcarrier, ψc is the long code of the nth sub-carrier in the
th th n

cth cell, 1/(Rc)a characterizes the long-term path loss and n(t) is an additive
white Gaussian noise (AWGN).
Figure 8(a) represents network capacity simulation results generated
considering MRC across time components in BPS and across space
components in STBC (see [3] and [4]) and EGC across frequency components
in both BPS and STBC. It is observed that a higher network capacity is
achievable with BPS/MC-CDMA. For example, at the probability-of-error
of 10-2 BPS/MC-CDMA offers up to two-fold higher capacity. It is also
observed that STBC/MC-CDMA offers a better performance compared
to the traditional MC-CDMA without diversity when the number of users
in the cell are less than 80. However, as the number of users in the cell
increases beyond 80, the performance of STBC/MC-CDMA becomes even
worse than traditional MC-CDMA (i.e., MC-CDMA with antenna array but
without diversity benefits). This is because STBC scheme discussed in this
paper (see [1]) is designed to utilize MRC. It has been shown that MRC
combining scheme is the optimal combining scheme when there is only one
user available, while in a Multiple Access environment, MRC enhances the
Multiple Access Interference (MAI) and therefore degrades the performance
of the system [16].
168 Information and Coding Theory in Computer Science

Figure 8. Capacity performance (a) STBC and BPS, and (b) STTC and BPS.
Considering STTC-QPSK, with assumption (d), STTC-QPSK/MC-
CDMA received signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 169

(24)
Here s0,k and s1,k is the k user information bit transmitted from antenna
th

0 and antenna 1, respectively, and are the Rayleigh fade amplitude


due to antenna 0 and antenna 1 in the nth sub-carrier and and are their
phase, respectively, is the Hadamard-Walsh spreading code for kth user
and nth sub-carrier, and n(t) is an additive white Gaussian noise (AWGN).
Network capacity simulations for STTC-QPSK are generated assuming
EGC across time components (in BPS), space components (in STTC-QPSK)
and frequency components for BPS and STTC-QPSK [Figure 8(b)]. Figure
8(b) represents STTC versus BPS-QPSK simulation results. This figure
shows that BPS-QPSK is superior compared to STTC-QPSK and QPSK
without diversity. In this simulation, BPS-QPSK leads to significantly better
capacity due to the time diversity induced by beam-pattern movement and
frequency diversity inherent in MC-CDMA. The results also show that
QPSK performance is superior compared to STTC-QPSK. This agrees with
the FER simulation results in Figure 5(b), where QPSK is better than STTC-
QPSK at low SNRs (e.g., at SNR = 10 dB). This is because STTC-QPSK
is designed under the assumption of high enough SNR values; thus, it is
less efficient compared to QPSK at low SNRs [17]. (The capacity curve for
higher SNR values may lead to better STTC-QPSK performance compared
to QPSK; however, STTC-QPSK shows a lower performance compared
to BPS-QPSK for all SNRs). Thus, it is observed that a higher network
capacity is achievable via BPS/MC-CDMA. It is also worth mentioning that
STTC-QPSK performance can be significantly improved via interference
suppression/cancellation techniques at the cost of system complexity as
discussed in [19–21]. In this paper, we conducted the comparison without
a complexity added to the STTC scheme via implementing interference
suppression algorithms.
Simulations confirm that BPS offers superior network capacity compared
to STC schemes; however, there are two issues associated with BPS scheme:
1) diversity achievable via BPS changes with distance; greater the distance
of mobile from the BS, higher the diversity and network capacity [10]. It
170 Information and Coding Theory in Computer Science

is notable that in general, the average number of users located in constant


width annuluses (with BS at the center) increases as the distance from the
BS increases; and 2) BPS works just in urban areas (or in rich scattering
environments); but, because a high network capacity is only required in
urban areas, this is not a critical issue. Moreover, BPS can also be merged
with STC techniques, e.g., via the structure shown in Figure 4. In this case,
the traditional antenna arrays are replaced with time varying weight vector
antenna arrays to direct and move the antenna pattern. Another approach for
merging BPS with STBC is introduced in [18].
Nevertheless, it is worth mentioning that BPS scheme achieve the
probability-of-error performance and the network capacity benefits with a
relatively less complexity. This makes BPS a prominent scheme for future
wireless generations with smart antennas. However, the spectrum efficiency
of BPS is about 5% less than STC which is a minimal disadvantage compared
to the benefits created by BPS technique.

CONCLUSIONS
A comparison was preformed between STBC, STTCQPSK and BPS transmit
diversity techniques in terms of network capacity, BER/FER performance,
spectrum efficiency, complexity and antenna dimensions. BER performance
and network capacity simulations are generated BPS, STBC, and STTC
schemes. This comparison shows that BPS transmit diversity scheme is
much superior compared to both STBC and STTC-QPSK schemes: a) The
BS physical antenna dimensions of BPS is much smaller than that of STC
techniques, and b) The BER/FER performance and network capacity of BPS
is much higher than that of STC schemes. The complexity of BPS system
is minimal because the complexity is mainly located at the BS, and the
receiver complexity is low because all the diversity components enter the
receiver serially in time. In terms of spectrum efficiency, both STC schemes
outperform BPS scheme by a very small percentage (e.g., in the order
of 5%). BPS scheme introduces a small bandwidth expansion due to the
movement in the beam pattern that eventually results in a lower throughput
per bandwidth.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 171

REFERENCES
1. S. M. Alamouti, “A simple transmit diversity technique for wireless
communications,” IEEE Journal on Selected areas in Communications,
Vol. 16, No. 8, pp. 1451–1458, 1998.
2. V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block
codes from orthogonal designs,” IEEE Transactions on Information
Theory, Vol. 45, No. 5, pp. 1456–1467, July 1999.
3. V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for
high data rate wireless communication: Performance criterion and code
construction,” IEEE Transactions on Information Theory, Vol. 44, pp.
744–765, March 1998.
4. V. Tarokh, A. F. Naguib, N. Seshadri, and A. Calderbank, “Space-
time codes for high data rate wireless communications: Performance
criteria in the presence of channel estimation errors, mobility, and
multiple paths,” IEEE Transactions on Communications, Vol. 47, No.
2, February 1999.
5. A. F. Naguib, V. Tarokh, N. Seshadri, and A. R Calderbank, “A space-
time coding modem for high-data-rate wireless communications,”
IEEE Journal on Selected Areas in Communications, Vol. 16, No. 8,
October 1998.
6. N. Seshadri and J. H. Winters, “Two signaling schemes for improving
the error performance of frequency division-duplex transmission
system using transmitter antenna diversity,” International Journal
Wireless Information Networks, Vol. 1, No. 1, pp. 49–60, January
1994.
7. J. H. Winters, “The diversity gain of transmit diversity in wireless
systems with Rayleigh fading,” in Proceedings of the 1994 ICC/
SUPERCOMM, New Orleans, Vol. 2, pp. 1121–1125, May 1994.
8. R. W. Heath, S. Sandhu, and A. J. Paulraj, “Space-time block coding
versus space-time trellis codes,” Proceedings of IEEE International
Conference on Communications, Helsinki, Finland, June 11–14, 2001.
9. V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, “Combined
array processing and space-time coding,” IEEE Transactions on
Information Theory, Vol. 45, No. 4, pp. 1121–1128, May 1999.
10. S. A. Zekavat and C. R. Nassar, “Antenna arrays with oscillating
beam patterns: Characterization of transmit diversity using semi-
elliptic coverage geometric-based stochastic channel modeling,” IEEE
172 Information and Coding Theory in Computer Science

Transactions on Communications, Vol. 50, No. 10, pp. 1549–1556,


October 2002.
11. S. A. Zekavat, C. R. Nassar, and S. Shattil, “Oscillating beam adaptive
antennas and multi-carrier systems: Achieving transmit diversity,
frequency diversity and directionality,” IEEE Transactions on Vehicular
Technology, Vol. 51, No. 5, pp. 1030 –1039, September 2002.
12. S. A. Zekavat and C. R. Nassar, “Achieving high capacity wireless
by merging multi-carrier CDMA systems and oscillating-beam smart
antenna arrays,” IEEE Transactions on Vehicular Technology, Vol. 52,
No. 4, pp. 772– 778, July 2003.
13. P. K. Teh and S. A. Zekavat, “A merger of OFDM and antenna array
beam pattern scanning (BPS): Achieving directionality and transmit
diversity,” accepted in IEEE 37th Asilomar Conference on Signals,
Systems and Computers, November 9–12, 2003.
14. J. W. C. Jakes, Microwave Mobile Communications, New York, Wiley,
1974.
15. A. J. Rustako, N. Amitay, G. J. Owens, and R. S. Roman, “Radio
propagation at microwave frequencies for line-ofsight microcellular
mobile and personal communications,” IEEE Transactions on Vehicular
Technology, Vol. 40, No. 1, pp. 203–210, February 1991.
16. J. M. Auffray and J. F. Helard “Performance of multicarrier CDMA
technique combined with space-time block coding over Rayleigh
channel,” IEEE 7th International Symposium on Spread-Spectrum
Technology, Vol. 2, pp. 348–352, September 2–5, 2002.
17. A. G. Amat, M. Navarro, and A. Tarable, “Space-time trellis codes
for correlated channels,” IEEE International Symposium on Signal
Processing and Information Technology, Darmstadt, Germany,
December 14–17, 2003.
18. S. A. Zekavat and P. K. Teh, “Beam-pattern-scanning dynamic-time
block coding,” Proceedings of Wireless Networking Symposium, The
University of Texas at Austin, October 22–24, 2003.
19. B. Lu and X. D. Wang, “Iterative receivers for multiuser space-time
coding systems,” IEEE Journal on Selected Areas in Communications,
Vol. 18. No. 11, pp. 2322– 2335, November 2000.
20. E. Biglieri, A. Nordio, and G. Taricco, “Suboptimum receiver interfaces
and space-time codes,” IEEE Transactions on Signal Processing, Vol.
51, No. 11, pp. 2720– 2728, November 2003.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 173

21. H. B. Li and J. Li, “Differential and coherent decorrelating multiuser


receivers for space-time-coded CDMA systems,” IEEE Transactions
on Signal Processing, Vol. 50, No. 10, pp. 2529–2537, October 2002.
Chapter

PARTIAL FEEDBACK
BASED ORTHOGONAL
SPACE-TIME BLOCK
8
CODING WITH FLEXIBLE
FEEDBACK BITS

Lei Wang and Zhigang Chen


School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an,
China.

ABSTRACT
The conventional orthogonal space-time block code (OSTBC) with limited
feedback has fixed p-1 feedback bits for the specific nTp transmit antennas. A
new partial feedback based OSTBC which provides flexible feedback bits is
proposed in this paper. The proposed scheme inherits the properties of having
a simple decoder and the full diversity of OSTBC, moreover, preserves
full data rate. Simulation results show that for nTp transmit antennas, the
proposed scheme has the similar performance with the conventional one

Citation: Wang, L. and Chen, Z. (2013), “Partial Feedback Based Orthogonal Space-
Time Block Coding With Flexible Feedback Bits”. Communications and Network, 5,
127-131. doi: 10.4236/cn.2013.53B2024.
Copyright: © 2013 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
176 Information and Coding Theory in Computer Science

by using p-1 feedback bits, whereas has the better performance with more
feedback bits.

Keywords: MIMO, Transmit Diversity, Space-time Block Coding, Partial


Feedback

INTRODUCTION
Orthogonal space-time block coding (OSTBC) is a simple and effective
transmission paradigm for MIMO system, due to achieving full diversity
with low complexity [1]. One of the most effective OSTBC schemes is the
Alamouti code [2] for two transmit antennas, which has been adopted as the
open-loop transmit diversity scheme by current 3GPP standards. However,
the Alamouti code is the only rate-one OSTBC scheme [3]. With higher
number of transmit antennas, the OSTBC for complex constellations will
suffer the rate loss.
Focusing on this drawback, the open-loop solutions have been presented,
such as the quasi-OSTBC (QOSTBC) [4] with rate one for four transmit
antennas, and other STBC schemes [5,6] with full rate and full diversity.
Alternatively, the close-loop solutions have also been designed to improve
the performance of OSTBC by exploiting limited channel information
feedback at the transmitter. In this paper, we focus on the close-loop scheme.
Based the group-coherent code, the nTp bits feedback based OSTBC
for p-1 transmit antennas has been constructed in [7], and generalized to
an arbitrary number of receive antennas in [8]. The partial feedback based
schemes in [7,8] exhibit a higher diversity order while preserving low
decoding complexity. However, these schemes for nTp transmit antennas
require a fixed number of p−1 bits feedback. That is to say, for such scheme,
improving the performance by increasing the feedback bits implies that
the number of transmit antennas nTp must be increased at the same time.
Therefore, the scheme is inflexible in compromising the performance and
the feedback overhead.
In this paper, by multiplying a well-designed feedback vector to each
signal to be transmitted from each antenna, we propose a novel partial
feedback based OSTBC scheme with flexible feedback bits. In this
scheme, the OSTBC can be straightly extended to more than two antennas.
Importantly, we can show that the proposed scheme preserves the simple
decoding structure of OSTBC, full diversity and full data rate.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 177

Notations: Throughout this paper, (⋅) T and (⋅)H represent “transpose” and
“Hermition”, respectively. Re(a) denotes the real part of a complex a, and
.

PROPOSED CODE CONSTRUCTION AND SYSTEM


MODEL
Consider a MIMO system with nTp transmit and nR receive antennas.
Assuming we have an OSTBC for nT transmit antennas, and can be
denoted as where cm is the T ×1 signal to be transmitted
from the mth antenna for . Then a code to be transmitted from nTp
antennas, where p ≥ 2 is an integer, may be constructed as

(1)
where is the 1 × nTp feedback vector for the mth antenna, which is defined
as where ⊗ denotes the Kronecker product, is the mth
row of the identity matrix , and 1 × p vector bl is given by

(2)
where . For the feedback vector at the mth
antenna, it contains a subset of all possible Qp−1 feedback vectors i.e.,
.
With the transmission of T × nTp code matrix , the T × nR receive signal
can be written as
(3)
where is the nTp × nR channel matrix, and
is the T × nR complex Gaussian noise matrix. The entries of H and N are
independent samples of a zero-mean complex Gaussian random variable
with variance 1 and nTp/ρ respectively, where ρ is the average signal-to-
noise ratio (SNR) at each receive antenna.
178 Information and Coding Theory in Computer Science

LINEAR DECODER AT THE RECEIVER


The received signal at ith receive antenna can be rewritten as

(4)
where the nT × nTp matrix bl is composed of nT feedback vectors, and can be
expressed in a stacked form given by

We divide nTp × 1 channel vector hi into nT segments in the following


way

(5)

where each segment can be denoted as with dimension


p × 1. Then the equivalent channel vector in (4) has the form of

(6)
For convenience, we will use the Alamouti code as the basic OSTBC
matrix in the rest of this paper, and the results can be straightly
extended to other OSTBC. For the received signal in (4), After performing
the conjugate operation to the second entry of yi, the received signal yi can
be equivalently expressed as

(7)
where is the equivalent channel matrix corresponding to the entries of
and their conjugates, and has a pair of symbols in the Ala-

mouti code. Denote the kth entry of and according to the


linearity of the OSTBC [9], the equivalent channel matrix has the form
of
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 179

(8)
where the matrices Ck and Dk specifying the Alamouti code are defined in
[9]. Since matched filtering is the first step in the detection process, left-
multiplying by will yield

(9)
where . Due to the properties of Ck and Dk for the Alamouti
code, we get

(10)
where denotes the equivalent channel gain for receive antenna i. It is
clear that is a diagonal matrix, therefore, the simple decoder of OSTBC
can be straightly applied for (7), thus s1 and s2 can be decoded independently.

FEEDBACK BITS SELECTION AND PROPERTIES


In this section, we will discuss the feedback bits selection criterion and the
key properties of the proposed scheme.

Feedback Bits Selection

At the ith receive antenna, can be expressed in the following quadratic


form

(11)
where

and
180 Information and Coding Theory in Computer Science

For all the nR receive antennas, then the total channel gain is given by

(12)
It is clear that in order to improve the system performance, we
must feedback the specific l with (p−1)logQ bits, which provides the

largest lγl. Denote the (m, n) entry of Al as , thus , and

, where b0 = 0 is preset. Moreover, it is easy to verify

that . Then the quadratic form in (11) can be represented as

(13)
where gik(n) denotes the nth element in gik, and

Substituting (13) in (11) and leads to

(14)
Thus, the (p−1)logQ feedback bits will be selected as

(15)
In this way, we can choose the optimal feedback vector bl, further
construct for the mth transmit antenna.

Diversity Analysis
The key property of the proposed partial feedback based OSTBC scheme is
proved in the following.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 181

Property 1: The partial feedback based OSTBC in (1) can


achieve full diversity.
Proof: For simplicity, we denote L = Qp−1. Selecting the optimal lopt will

provide the largest channel gain which can be lower


bounded by

(16)

For the summed matrix it is clear that its diagonal elements


equal to L, and its non-diagonal elements have the form of

(17)

Let since is reduced to

(18)

Therefore, we can obtain which can be substituted into


(16) and yields

(19)
Since the lower bound of the channel gain provides full diversity of
nTpnR the proposed scheme can certainly guarantee the full diversity.
182 Information and Coding Theory in Computer Science

Configuration of Flexible Feedback Bits

Furthermore, the proposed scheme has the flexible feedback bits. For
a specific p, has the feedback bits of (p−1)logQ. However, for the
number that not equal to (p−1)logQ, we can rewrite the vector bl in (2) as

thus the number of feedback

bits is . For example, for nT = 2 and p = 4, the number of feedback


bits are 3 and 6 in the case of Q = 2, and Q = 4, respectively. If we set Q1 = Q2
= 2, and Q3 = 4 in bl, then the number of feedback bits is 4, and if we set Q1
= 2, and Q2 = Q3 = 4 in bl, then the number of feedback bits is 5, and so on.

BER Analysis
Assuming the power of each symbol in x = [ s1 s2]T is normalized to unity,

i.e., for i = 1, 2, we can obtain the average SNR per bit has
the form of

Furthermore, assuming QPSK modulation and maximum likelihood


(ML) decoding are used in the considered system, the conditional BER is
given by

(20)
By using (16), the upper bound of the conditional BER can be formulated
as

(21)
Using the technique of Moment Generating Function (MGF)[10], the
average BER can be expressed as

(22)
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 183

where and is the MGF of η. The average


BER can be further expressed as

(23)
Using the result of (5A.4) in [10], this definite integrals has the closed-
form of

(24)

where

SIMULATION RESULTS
In all simulations, we consider QPSK symbols in Alamouti code, and
a single receive antenna with nR = 1, where the channels are assumed to
be independent and identically distributed (i.i.d.) quasi-static Rayleigh
flatfading channels. In Figure 1, we plot the bit error rate (BER) performance
of the generalized partial feedback based OSTBC scheme in [7,8] (“GPF”
for short ) and the proposed flexible feedback bits scheme (“FFB” for short)
with nTp = 4 transmit antennas. For this case p = 2, and the GPF scheme can
only feedback 1 bit, whereas the proposed scheme can feedback more bits to
improve the system performance. For comparison, in Figure 1 we also give
the BER figures of the complex orthogonal code for four transmit antennas
[11], and the numerical results of the upper bound in (24) of the proposed
scheme. Figure 1 shows that with 1 bit feedback, the GPF and FFB schemes
have close performance, whereas the FFB scheme has better performance
with more feedback bits. In comparison to the complex orthogonal code,
both two schemes have better performance.
184 Information and Coding Theory in Computer Science

Figure 1. BER performance of the two schemes with nTp = 4 transmit antennas.
In Figure 2, the BER performance of the two schemes with nTp = 8
transmit antennas is depicted. For this case p = 4, and the GPF scheme can
only feedback 3 bits, whereas the proposed FFB scheme can feedback more
bits. We can observe that with the same feedback bits 3, the two schemes
have very similar performance, and with more feedback bits, the proposed
FFB scheme can further improve the performance. In the simulations of
these two schemes, the exhaustive search over all possible feedback vectors
is used.

Figure 2. BER performance of the two schemes with nTp = 8 transmit antennas.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 185

CONCLUSIONS
In this paper, we proposed a partial feedback based OSTBC scheme with
flexible feedback bits. The new scheme inherits the OSTBC properties of
achieving full diversity, preserving low decoding complexity, and has full
rate. Moreover, compared with the conventional partial feedback based
OSTBC schemes, the new scheme can support flexible feedback bits and
can improve the system performance with more feedback bits.
186 Information and Coding Theory in Computer Science

REFERENCES
1. V. Tarokh, H. Jafarkhani and A. R. Calderbank, “Space-time Block
Codes from Orthogonal Designs,” IEEE Transactions on Information
Theory, Vol. 45, No. 5, 1999, pp. 1456-1467. doi:10.1109/18.771146
2. S. M. Alamouti, “A Simple Transmitter Diversity Scheme for Wireless
Communications,” IEEE Journal on Selected Areas in Communications,
Vol. 16, No. 8, 1998, pp. 1451-1458. doi:10.1109/49.730453
3. S. Sandhu and A. J. Paulraj, “Space-time Block Codes: A Capacity
Perspective,” IEEE, Communications Letters, Vol. 4, No. 12, 2000, pp.
384-386. doi:10.1109/4234.898716
4. H. Jafarkhani, “A Quasi-orthogonal Space-time Block Code,” IEEE
Transactions on Communications, Vol. 49, No. 1, 2001, pp. 1-4.
doi:10.1109/26.898239
5. W. Su and X. G. Xia, “Signal Constellations for Quasi- orthogonal
Space-time Block Codes with Full Diversity,” IEEE Transactions
on Information Theory, Vol. 50, 2004, pp. 2331-2347. doi:10.1109/
TIT.2004.834740
6. X. L. Ma and G. B. Giannakis, “Full-diversity Full-rate Complex-field
Space-time Coding,” IEEE Transactions on Signal Processing, Vol. 51,
No. 11, 2003, pp. 2917-2930. doi:10.1109/TSP.2003.818206
7. J. Akhtar and D. Gesbert, “Extending Orthogonal Block Codes with
Partial Feedback,” IEEE Transactions on Wireless Communications,
Vol. 3, No. 6, 2004, pp. 1959-1962. doi:10.1109/TWC.2004.837469
8. A. Sezgin, G. Altay and A. Paulraj, “Generalized Partial Feedback
Based Orthogonal Space-time Block Coding,” IEEE Transactions
on Wireless Communications, Vol. 8, No. 6, 2009, pp. 2771-2775.
doi:10.1109/TWC.2009.080352
9. B. Hassibi and B. M. Hochwald, “High-rate Codes That Are Linear in
Space and Time,” IEEE Transactions on Information Theory, Vol. 48,
No. 7, 2002, pp. 1804-1824. doi:10.1109/TIT.2002.1013127
10. M. K. Simon and M. S. Alouini, “Digital Communication over Fading
Channels,” John Wiley & Sons Inc., 2000.
11. G. Ganeon and P. Stoica, “Space-time Block Codes: A Maximum SNR
Approach,” IEEE Transactions Informations Theory, Vol. 47, No. 4,
2001, pp. 1650-1656. doi:10.1109/18.923754
Chapter

RATELESS SPACE-TIME
BLOCK CODES FOR 5G
WIRELESS COMMUNICA-
9
TION SYSTEMS

Ali Alqahtani
College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia

ABSTRACT
This chapter presents a rateless space-time block code (RSTBC) for massive
multiple-input multiple-output (MIMO) wireless communication systems.
We discuss the principles of rateless coding compared to the fixed-rate
channel codes. A literature review of rateless codes (RCs) is also addressed.
Furthermore, the chapter illustrates the basis of RSTBC deployments
in massive MIMO transmissions over lossy wireless channels. In such
channels, data may be lost or are not decodable at the receiver end due to
a variety of factors such as channel losses or pilot contamination. Massive

Citation: Ali Alqahtani, “Rateless Space-Time Block Codes for 5G Wireless Commu-
nication Systems”, Intech Open - The Fifth Generation (5G) of Wireless Communica-
tion, 2018, DOI: 10.5772/intechopen.74561.
Copyright: © 2018 the Author(s) and IntechOpen. This chapter is distributed under the
terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly
cited.
188 Information and Coding Theory in Computer Science

MIMO is a breakthrough wireless transmission technique proposed for


future wireless standards due to its spectrum and energy efficiencies. We
show that RSTBC guarantees the reliability of the system in such highly
lossy channels. Moreover, pilot contamination (PC) constitutes a particularly
significant impairment in reciprocity-based multi-cell systems. PC results
from the non-orthogonality of the pilot sequences in different cells. In this
chapter, RSTBC is also employed in the downlink transmission of a multi-
cell massive MIMO system to mitigate the effects of signal-to-interference-
and-noise ratio (SINR) degradation resulting from PC. We conclude that
RSTBC can effectively mitigate such interference. Hence, RSTBC is a
strong candidate for the upcoming 5G wireless communication systems.

Keywords: massive MIMO, rateless codes, STBC, pilot contamination, 5G

INTRODUCTION
In practice, the transmitted data over the channel are usually affected by
noise, interference, and fading. Several channel models, such as additive
white Gaussian noise (AWGN), binary symmetrical channel (BSC), binary
erasure channel (BECH), wireless fading channel, and lossy (or erasure)
channel, are introduced in which errors’ (or losses) control technique is
required to reduce the errors (or losses) caused by such channel impairments
[1].
This technique is called channel coding, which is a main part of the
digital communication theory. Historical perspective on channel coding is
given in [2]. Generally speaking, channel coding, characterized by a code
rate, is designed by controlled-adding redundancy to the data to detect and/
or correct errors and, hence, achieve reliable delivery of digital data over
unreliable communication channels. Error correction may generally be
realized in two different error control techniques, namely: forward error
correction (FEC), and backward error correction (BEC). The former omits
the need for the data retransmission process, while the latter is widely known
as automatic repeat request (or sometimes automatic retransmission query)
(ARQ).
For large size of data, a large number of errors will occur, and thereby, it is
difficult for FEC to work reasonably. The ARQ technique, in such conditions,
requires more retransmission processes, which will cause significant growth
in power consumption. However, these processes will sustain additional
overhead that includes data retransmission and adding redundancy into the
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 189

original data. They cannot correctly decode the source data when the packet
loss rate is high [3]. Therefore, it is of significant interest to design a simple
channel coding with a flexible code rate and capacity approaching behavior
to achieve robust and reliable transmission over universal lossy channels.
Rateless codes constitute such a class of schemes. We describe the concept
of rateless coding in the next section.

CONCEPT OF RATELESS CODES


Rateless codes (RCs) are channel codes used for encoding data to generate
incremental redundancy codes and then, are transmitted over channels with
variable packet error rate. The interpretation of the terminology “rateless”
is that the code does not fix its code rate before transmission. Rather, it
can only be determined after correctly recovering the transmitted data. In
the available literature, the rateless code is typically referred to by some
associated terminologies such as “variable-rate,” “rate-compatible,”
“adaptive-rate,” or “incremental redundancy” scheme [4]. However, the rate
of a rateless code can be considered in two perspectives as the instantaneous
rate and the effective rate. The instantaneous rate is the ratio of the number of
information bits to the total number of bits transmitted at a specific instant.
On the other hand, the effective rate is the rate realized at the specific point
when the codeword has been successfully received [5].
The counterpart of rateless coding is fixed-rate coding, which is basically
well known in the literature of channel codes. The relationship between
rateless and fixed-rate channel codes can be seen as the correspondence
between continuous and discrete signals or the construction of a video
clip from video frames. In this illustrating analogy, the fixed-rate code
corresponds to the discrete signal or to the video frame, while the rateless
code is viewed as the continuous signal or the video clip [5]. Basically,
rateless codes are proposed to solve the problem of data packet losses. They
can continuously generate potentially unlimited number of data streams
until an acknowledgment from the receiver is received declaring successful
decoding.
The basic concept of rateless codes is illustrated in Figure 1 [1]. From
the figure, a total of kc packets, obtained from the fragmented source data,
are encoded by the transmitter to get a large number of encoded packets
nc. Due to the lossy channel, several encoded packets are lost during the
transmission, and finally, only rc encoded packets are collected by the
190 Information and Coding Theory in Computer Science

receiver. The decoding process on the received packets should be able to


recover all original kc packets.

Figure 1. General encoding and decoding processes of rateless codes [1].


To illustrate the importance of rateless codes, let’s assume that we
have a fixed-rate code Cfixed of fixed-code rate Rfixed designed to achieve a
performance close to the channel capacity target Ctarget at a specific signal-to-
noise-ratio (SNR), ϕfixed. However, the channel fluctuations make the fixed-
rate code impose two limitations [1]. First, if the actual SNR at the receiver
is actually greater than ϕfixed, then the code essentially becomes an inefficient
channel code. That is because the code incorporates more redundancy than
the actual channel conditions require. Second, on the other hand, if the
actual SNR becomes lower than ϕfixed, then the channel will be in an outage
for the reason that the fixed-rate code Cfixed no longer provides sufficient
redundancy appropriate for the actual channel conditions. Contrasting with
fixed-rate code, the rateless code has a flexible code rate in accordance with
the channel quality, which is time varying in nature. Another benefit of RCs
is that they potentially do not require the channel state information (CSI)
at the transmitter. This property is of particular importance in the design of
codes for wireless channels. In particular, RCs can be employed in multi-
cell cellular systems when channel estimation errors severely degrade the
performance.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 191

RATELESS CODING AND HYBRID AUTOMATIC


RETRANSMISSION QUERY
For further discussion, there is an analogy between hybrid ARQ (HARQ)
and rateless codes, since they transmit additional symbols until the received
information is successfully decoded. On the other hand, they do have some
differences. HARQ refers to a special transmission mechanism, which
combines the conventional ARQ principle with error control coding. The
basic three ARQ protocols are stop-and-wait ARQ, go-back-N ARQ, and
selective repeat ARQ. All the three ARQ protocols use the sliding window
protocol to inform the transmitter on which data frames or symbols should
be retransmitted. Figure 2 illustrates the ARQ schemes: stop-and-wait ARQ
(half duplex), continuous ARQ with pullback (full duplex), and continuous
ARQ with selective repeat (full duplex). In each of them, time is advancing
from lift to right [6].

Figure 2. Automatic repeat request (ARQ) [6]. (a) Stop and wait ARQ (half
duplex); (b) continuous ARQ with pullback (full duplex); (c) continuous ARQ
with selective repeat (full duplex).
These protocols reside in the data link or transport layers of the open
systems interconnection (OSI) model. This is one difference between the
192 Information and Coding Theory in Computer Science

proposed RSTBC and HARQ, since RSTBC is employed in the physical


layer. Comparing rateless codes to HARQ, we summarize the following
points:
• RC is often viewed as a form of continuous incremental
redundancy HARQ in the literature [5].
• HARQ is not capable of working over the entire SNR range, and
therefore, it necessitates combination with some form of adaptive
modulation and coding (AMC). On the other hand, RC can
entirely eliminate AMC and work over a wide range of SNR [7].
• From the point of view of redundancy, HARQ has more
redundancy, since it requires many acknowledgments (ACK) or
negative acknowledgments (NACK) for each packet transmission
return to show successful/unsuccessful decoding, respectively.
In contrast, only a single-bit acknowledgment is needed for the
transmission of a message with RC [8]. When the number of
receivers is large, ARQ acknowledgments may cause significant
delays and bandwidth consumption. Consequently, using ARQ
for wireless broadcast is not scalable [9].
• It was seen in [8] that RC is capable of outperforming ARQ
completely at low SNRs in broadcast communication. However,
they behave the same in point to point as well as in high-SNR
broadcast communications.
• RC and the basic ARQ differ in code construction. RCs can
generate different redundant blocks, while ARQ merely
retransmits the same block [8]. For different receivers, distinct
and independent errors are often encountered. In such cases, the
merely retransmitted data packets are only useful to a specific
user while they are with no value for others. Hence, it is highly
undesirable to send respective erroneous data frames or symbols
to each user.
• The physical layer RCs are useful since the decoder can exploit
useful information from packets that are dropped by ARQ
protocols in higher layers [7].

RATELESS CODES’ LITERATURE REVIEW


In the past decade, rateless codes have gained a lot of concerns in both
communication and information theory research communities, which led
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 193

to the strong theory behind these codes mostly for erasure channels. Most
of the available works in the rateless codes literature are extensions of the
fountain codes over the erasure channels [10]. The name “fountain” came
from the analogy to a water supply capable of giving an unlimited number of
water drops. Due to this reason, rateless codes are also referred to as fountain
codes. They were initially developed to achieve efficient transmission in
erasure channels, to which the initial work on rateless codes has mainly been
limited, with the primary application in multimedia video streaming [10].
The first practical class of rateless codes is the Luby Transform (LT)
code which was originally intended to be used for recovering packets
that are lost (erased) during transmission over computer networks. The
fundamentals of LT are introduced in [11] in which the code is rateless
since the number of encoded packets that can be generated from the original
packets is potentially limitless. Figure 3 illustrates the block diagram of LT
encoder. The original packets can be generated from slightly larger encoded
packets. Although the encoding process of LT is quite simple, however, LT
requires the proper design of the degree distribution (Soliton distribution-
based) which significantly affects the performance of the code.

Figure 3. Block diagram of the LT encoder [5].


Afterward, LT code was extended to the well-known Raptor code [12]
by appending a weak LT encoder with an outer pre-code such as the irregular
low-density parity check code (LDPC). Figure 4 depicts the general block
diagram of Raptor code.

Figure 4. Block diagram of the Raptor code encoder [5].


194 Information and Coding Theory in Computer Science

The decoding algorithm of the Raptor code depends on the decoder


of the LT code and the pre-code used. However, the Raptor code requires
lesser overhead. But it has disadvantages such as the lower bound of the
total overhead depends on the outer code and the decoding algorithm
implementation is slightly more complicated due to multiple decoding
processes.
Online codes [13] also belong to the family of fountain rateless codes and
work based on two layers of packet processing (inner and outer). However,
in contrast to the LT and Raptor codes, online codes have more encoding and
decoding complexity as a function of the block length. The overall design of
the online code is shown in Figure 5. LT and Raptor codes were originally
intended to be used for transmission over the BEC channel such as Internet
channel, where the transmitted packets are erased by routers along the path.

Figure 5. Online code encoding and decoding design [13].


On the other hand, some works have studied their performance on
noisy channels such as BSC and AWGN channels [14]. Although it was
demonstrated that the Raptor codes have better performance on a wide
variety of noisy channels than LT codes, however, both schemes exhibit high
error floors in such channels. The previous rateless codes have fixed-degree
distribution, which causes degradation in performance when employing
over noisy channels.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 195

Motivated by this result, a reconfigurable rateless code was introduced


in [5] which adaptively can modify its encoding/decoding algorithms by
adjusting its degree distribution incrementally according to the channel
conditions. Such code is not systematic and remains fixed if no new
knowledge of channel condition is obtained from feedback. By dropping this
assumption, as in [15], the significant overhead reduction can be achieved
while still maintaining the same encoding and decoding complexities. In
addition, the effective code rate of [5] is actually determined by the decoder,
not the encoder.
In another perspective, the use of rateless codes in the physical layer
can be beneficial since the decoder can exploit useful information even
from packets that cannot be correctly decoded and therefore are ignored by
higher layers [7]. A construction of physical-layer Raptor codes based on
protographs was proposed in [16].
Other works of rateless coding over the AWGN channel were provided
in [17, 18]. For wireless channels, rateless code paradigm was found in
many works. In [19], a rateless coding scheme based on traditional Raptor
code is introduced in a single-input single-output (SISO) system over
fading channels. A similar approach is presented in [20] by the same authors
for relay networks. The authors in [21] have considered one of the latest
works of rateless coding over wireless channels. They tackle the high error
floor problem arising from the low-density generator matrix (LDGM)-like
encoding scheme in conventional rateless codes.
While there are significant works on rateless codes for AWGN channels,
few work exists on rateless codes for MIMO systems. Rateless codes
for MIMO channels were introduced in [22], where two rateless code
constructions were developed. The first one was based on simple layering
over an AWGN channel. The second construction used a diagonal layering
structure over a particular time-varying scalar channel. However, the latter
is merely concatenating a rateless code (outer code) using dithered repetition
with the vertical Bell Labs layered space-time (V-BLAST) code (inner code)
[23].
Away from digital fountain codes, discussed so far, performance limits
and code construction of block-wise rateless coding for conventional
MIMO fading channels are studied in [24]. The authors have used the
diversity multiplexing tradeoff (DMT) as a performance metric [25]. Also,
they have demonstrated that the design principle of rateless codes follows
the approximately universal codes [26] over MIMO channels. In addition,
196 Information and Coding Theory in Computer Science

simple rateless codes that are DMT optimal for a SISO channel have also been
examined. However, [24] considered the whole MIMO channel as parallel
sub-channels, in which each sub-channel is a MIMO channel. Furthermore,
for each block, the code construction of symbols within the redundant
block is not discussed. Hence, more investigation of other performance
metrics for the scheme proposed in [24] under different channel scenarios is
required. In [27], a cognitive radio network employs rateless coding along
with queuing theory to maximize the capacity of the secondary user while
achieving primary users’ delay requirement. Furthermore, [28] presents a
novel framework of opportunistic beam-forming employing rateless code
in multiuser MIMO downlink to provide faster and high quality of service
(QoS) wireless services.

RATELESS CODES APPLICATIONS


There are various applications of rateless codes:
• Video streaming over the Internet and packet-based wireless
networks: The application of rateless codes to video streaming
was initially proposed for multimedia broadcast multicast system
(MBMS) standard of the 3GPP [29, 30].
• Broadcasting has been extensively used in wireless networks
to distribute information of universal attention, for example,
safety warning messages, emergency information, and weather
information, to a large number of users [31, 32]. Rateless coding
has been utilized in the 3GPP’s Release 6 multimedia broadcast/
multicast service (MBMS) [33].
• Wearable wireless networks: A wearable body area network
(WBAN) is an emerging technology that is developed for
wearable monitoring application. Wireless sensor networks
are usually considered one of the technological foundations of
ambient intelligence. Agile, low-cost, ultra-low power networks
of sensors can collect a massive amount of valuable information
from the surrounding environment [34, 35]. Wireless sensor
network (WSN) technologies are considered one of the key
research areas in computer science and the health-care application
industries for improving the quality of life. A block-based scheme
of rateless channel erasure coding was proposed in [36] to reduce
the impact of wireless channel errors on the augmented reality
(AR) video streams, while also reducing energy consumption.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 197

MOTIVATION TO RATELESS SPACE-TIME CODING


According to literature survey, there is not enough research work yet on
rateless space-time codes (STCs), even for the regular MIMO systems. Few
works in rateless STCs are available such as [37, 38]. In [37], a rateless
coding scheme was introduced for the AWGN channel, using layering,
repetition and random dithering. The authors also extended their work to
multiple-input single-output (MISO) Gaussian channels where the MISO
channel is converted to parallel AWGN channels. In [38], the performance of
MIMO radio link is improved by a rate-varying STC under a high-mobility
communication system. Rateless coding can be extended to space-time
block codes (STBCs), where the coding process is performed blockwise
over time and space. The main advantage of STBCs is that they can provide
full diversity gain with relatively simple encoding and decoding schemes.
Unlike the conventional fixed-rate STBC, rateless STBC is designed such
that the code rate is not fixed before transmission. Instead, it depends on the
instantaneous channel conditions.
Incorporating RSTBC in massive MIMO systems is reasonable and very
attractive, since rateless coding is based on generating a massive number
of encoded blocks, and massive MIMO technique uses a large number of
antenna elements. Motivated by such fact, in this chapter, a new approach
has been developed to fill the gap between rateless STCs and massive
MIMO systems by exploiting significant degrees of freedom available in
massive MIMO systems for rateless coding. The contribution of RSTBC is
to convert lossy massive MIMO channels into lossless ones and provide a
reliable robust transmission when very large MIMO dimensions are used.

RATELESS SPACE-TIME BLOCK CODE FOR


MASSIVE MIMO SYSTEMS
Massive MIMO wireless communication systems have been targeted for
deployment in the fifth-generation (5G) cellular standards, to enhance the
wireless capacity and communication reliability [39] fundamentally. In
massive MIMO systems, a large number of antennas, possibly hundreds or
even thousands, work together to deliver big data to the end users. Despite
the significant enhancement in capacity and/or link quality offered by
MIMO systems and space-time codes (STCs) [40, 41], it has been shown
recently that massive MIMO can even improve the performance of MIMO
systems dramatically. This has prompted a lot of research works on massive
MIMO systems lately.
198 Information and Coding Theory in Computer Science

In this section, we illustrate the mechanism of rateless space-time block


code (RSTBC) in a massive MIMO system, as we have addressed in [42,
43, 44, 45]. Figure 6 shows simply the encoding and decoding processes,
where a part of the encoded packets (or blocks) cannot be received due
to channel losses. Hence, with the availability of slightly larger encoded
packets, the receiver can recover the original packets from the minimum
possible number of transmitted encoded packets that are already received.
The required number of blocks for recovery depends on the loss rate of the
channel. During the transmission, the receiver of a specific user measures
the mutual information between the transmitter and the receiver after it
receives each block and compares it to a threshold.

Figure 6. Encoding and decoding processes of RSTBC in a massive MIMO


system.
Namely, it is desired to decode a message of total mutual information
MM. Assume that the required packets to deliver the message correctly are
where is the codeword matrix transmitted
during the l block, Nt is the number of transmit antenna elements at the base
th

station (BS), T is the number of time slots, and L is the number of required
blocks at the receiver to recover the transmitted block. Let ml denote the
measured mutual information after receiving the codeword block . If ml
≤ M, the receiver waits for further blocks, else if ml > M, the receiver sends
a simple feedback to the transmitter to stop transmitting the remaining part
of the encoded packets and remove them from the BS buffer. This process
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 199

continues until the receiver accumulates enough number of blocks (L) to


recover the message or the time allowed is over the channel coherence
time. The decoding process is conducted sequentially, first using , then
if is not sufficient, and so forth. Once the check-sum condition is
satisfied, the received blocks are linearly combined at the receiver to decode
the whole underling message. It should be noted that the code is described as
“rateless” because the number of required blocks (L) to recover the message
is not fixed before transmission; rather, it depends on the channel state.
The dimensions in which the code is extended ratelessly are time (number
of channel uses) and space (number of functional antennas) as well as it
belongs to block codes. Therefore, it is called rateless space-time block code
(RSTBC).
Before proceeding, each of the RSTBC matrix is constructed based on
the following random process
(1)
where ⊙ denotes the element-wise multiplication operation (Hadamard
product); X is the complex data matrix to be transmitted, and Dl is
the lth random binary matrix generated randomly where each of its
entries is either 0 or 1 and occurs with probabilities P0 and P1, respectively.
For each l, a new lth Dl is constructed with different positions of zeros. This
means that D1 ≠ D2 ≠ Dl ≠ ⋯ ≠ DL and consequently, X1 ≠ X2 ≠ Xl ≠ ⋯ ≠ XL.
Such a method is considered as rateless coding in the sense that the encoder
can generate on the fly potentially a very large number of blocks. A power
constraint on each Xl is introduced as the average power does not exceed Nt.
Now, we consider a downlink massive multiuser MIMO (MU-MIMO)
system in which RSTBC is applied as shown in Figure 7.

Figure 7. RSTBC code for massive MU-MIMO system (BS-to-users scenario).


In this system, a BSTx, equipped with a large number of antennas,
communicates simultaneously with K independent users on the same time-
200 Information and Coding Theory in Computer Science

frequency resources where each user device has Nr receive antennas. The
overall channel matrix can be written as

(2)
where is the channel matrix corresponding to
the kth user. To eliminate the effects of the multiuser interference (MUI) at
the specific receiving users, a precoding technique is applied at the BSTx

with, for example, a zero-forcing (ZF) precoding matrix


which is calculated as

(3)
where β is a normalized factor.
In this system, channel reciprocity is exploited to estimate the downlink
channels via uplink training, as the resulting overhead is linearly a function
of the number of users rather than the number of BS antennas [46].
For a single-cell MU-MIMO system, the received signal at the kth user at
time instant t can be expressed as

(4)

where corresponds to the average SNR per user (Ex is the symbol
energy, and 0 is the noise power at the receiver); L is the maximum number
of required blocks of RSTBC at the user; is

the transmitted signal by the the nth antenna where is the

channel coefficient from the nth transmit antenna to the kth user;
which is the (n, l)th element of the matrix Dl; and wk is the noise at the kth
user receiver.
It has been demonstrated in [42, 43, 44, 45] that RSTBC is able to
compensate for data losses. For more details, the reader is referred to these
references. Here are some sample simulation results. The averaged symbol-
error-rate (SER) performance when RSTBC is applied for Nt = 100 with
QPSK is shown in Figure 8, where the loss rate is assumed to be 25%.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 201

Figure 8. SER curves for massive MU-MIMO system with 25%-rate loss and
Nt = 100, K = 10 users, with QPSK, when RSTBC is applied.
It is inferred from Figure 8 that for small values of L, the averaged
SER approaches a fixed level at high SNR because RSTBC, with the
current number of blocks, is no longer able to compensate for further losses.
Therefore, it is required to increase L to achieve enhancements until losses
effects are eliminated. As shown, for instance, RSTBC with L = 32, the
flooring in the SER curves has vanished due to the diversity gain achieved
by RSTBC (as the slopes of the SER curves increase) so that the effect of
losses is eliminated considerably. Thus, the potential for employing RSTBC
to combat losses in massive MU-MIMO systems has been shown.
Furthermore, Figure 9 shows the cumulative distribution function (CDF)
of the averaged downlink SINR (in dB) in the target cell for simulation
and analytical results for a multi-cell massive MU-MIMO system with Nt
= 100, K = 10 users, QPSK, and pilot reuse factor = 3/7, when RSTBC
is applied with L = 4, 8, 16, 32, where lossy channel of 25% loss rate is
assumed. Notably, RSTBC supports the system to alleviate the effects of
pilot contamination by increasing the downlink SINR. Simulation and
analytical results show good matching as seen. Also, it is obvious that the
improvements in SINR are linear functions of the number of RSTBC blocks
L. It should be mentioned that the simulation parameters are tabulated in
Table 1.
202 Information and Coding Theory in Computer Science

Figure 9. CDF simulation and analytical results’ comparisons of SINR for


multi-cell massive MU-MIMO system with Nt = 100, K = 10, QPSK, pilot reuse
factor = 3/7, RSTBC with L = 4, 8, 16, 32, and 25% loss rate. Analytical curves
are plotted using Eq. (21) in [43].

Table 1. Simulation parameters for massive MU-MIMO system.

Parameter Value
Cell radius 500 m
Reference distance from the BS 100 m
Path loss exponent 3.8
Carrier frequency 28 GHz
Shadow fading standard deviation 8 dB

CONCLUSION
In this chapter, we have considered the rateless space-time block code
(RSTBC) for massive MIMO wireless communication systems. Unlike
the fixed-rate codes, RSTBC adapts the amount of redundancy over time
and space for transmitting a message based on the instantaneous channel
conditions. RSTBC can be used to protect data transmission in lossy systems
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 203

and still guarantee the reliability of the regime when transmitting big data.
It is concluded that, using RSTBC with very large MIMO dimensions, it
is possible to recover the original data from a certain amount of encoded
data even when the losses are high. Moreover, RSTBC can be employed
in a multi-cell massive MIMO system at the BS to mitigate the downlink
inter-cell interference (resulting from pilot contamination) by improving
the downlink SINR. These results strongly introduce the RSTBC for the
upcoming 5G wireless communication systems.
204 Information and Coding Theory in Computer Science

REFERENCES
1. Abdullah A, Abbasi M, Fisal N. Review of rateless-network-
coding based packet protection in wireless sensor networks. Mobile
Information Systems. 2015;2015:1-13
2. Liew T, Hanzo L. Space–time codes and concatenated channel codes for
wireless communications. Proceedings of the IEEE. 2002;90(2):187-
219
3. Huang J-W, Yang K-C, Hsieh H-Y, Wang J-S. Transmission control
for fast recovery of rateless codes. International Journal of Advanced
Computer Science and Applications (IJACSA). 2013;4(3):26-30
4. Bonello N, Yang Y, Aissa S, Hanzo L. Myths and realities of rateless
coding. IEEE Communications Magazine. 2011;49(8):143-151
5. Bonello N, Zhang R, Chen S, Hanzo L. Reconfigurable rateless codes.
In: IEEE 69th Vehicular Technology Conference, 2009, VTC Spring
2009; IEEE. 2009, pp. 1-5
6. Bernard S. Digital Communications Fundamentals and Applications.
USA: Prentice Hall; 2001
7. Mehran F, Nikitopoulos K, Xiao P, Chen Q. Rateless wireless systems:
Gains, approaches, and challenges. In: 2015 IEEE China Summit and
International Conference on Signal and Information Processing (Chi-
naSIP). IEEE; 2015. pp. 751-755
8. Wang X, Chen W, Cao Z. ARQ versus rateless coding: from a point
of view of redundancy. In: 2012 IEEE International Conference on
Communications (ICC); IEEE. 2012. pp. 3931-3935
9. Wang P. Finite length analysis of rateless codes and their application
in wireless networks [PhD dissertation]. University of Sydney; 2015
10. Byers JW, Luby M, Mitzenmacher M, Rege A. A digital fountain
approach to reliable distribution of bulk data. ACM SIGCOMM
Computer Communication Review. 1998;28(4):56-67
11. Luby M. LT codes. In: The 43rd Annual IEEE Symposium on
Foundations of Computer Science, 2002. Proceedings. 2002. pp. 271-
280
12. Shokrollahi A. Raptor codes. IEEE Transactions on Information
Theory. 2006;52(6):2551-2567
13. Maymounkov P. Online codes. Technical report. New York University;
2002
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 205

14. Palanki R, Yedidia JS. Rateless codes on noisy channels. In:


International Symposium on Information Theory, 2004. ISIT 2004.
Proceedings; June 2004; p. 38
15. Chong KFE, Kurniawan E, Sun S, Yen K. Fountain codes with varying
probability distributions. In: 2010 6th International Symposium on
Turbo Codes Iterative Information Processing; Sept 2010. pp. 176-180
16. Kuo S-H, Lee H-C, Ueng Y-L, Lin M-C. A construction of physical layer
systematic Raptor codes based on protographs. IEEE Communications
Letters. 2015;19(9):1476-1479
17. Chen S, Zhang Z, Zhu L, Wu K, Chen X. Accumulate rateless codes
and their performances over additive white Gaussian noise channel.
IET Communications. March 2013;7(4):372-381
18. Erez U, Trott MD, Wornell GW. Rateless coding for Gaussian channels.
IEEE Transactions on Information Theory. Feb 2012;58(2):530-547
19. Castura J, Mao Y. Rateless coding over fading channels. IEEE
Communications Letters. Jan 2006;10(1):46-48
20. Castura J, Mao Y. Rateless coding and relay networks. IEEE Signal
Processing Magazine. Sept 2007;24(5):27-35
21. Tian S, Li Y, Shirvanimoghaddam M, Vucetic B. A physical-layer rateless
code for wireless channels. IEEE Transactions on Communications.
June 2013;61(6):2117-2127
22. Shanechi MM, Erez U, Wornell GW. Rateless codes for MIMO channels.
In: IEEE GLOBECOM 2008-2008 IEEE Global Telecommunications
Conference; Nov 2008. pp. 1-5
23. Wolniansky PW, Foschini GJ, Golden G, Valenzuela RA. V-BLAST:
An architecture for realizing very high data rates over the rich-scattering
wireless channel. In: 1998 URSI International Symposium on Signals,
Systems, and Electronics, 1998. ISSSE 98; IEEE. 1998. pp. 295-300
24. Fan Y, Lai L, Erkip E, Poor HV. Rateless coding for MIMO fading
channels: performance limits and code construction. IEEE Transactions
on Wireless Communications. 2010;9(4):1288-1292
25. Zheng L, Tse DNC. Diversity and multiplexing: A fundamental trade-
off in multiple-antenna channels. IEEE Transactions on Information
Theory. 2003;49(5):1073-1096
26. Tavildar S, Viswanath P. Approximately universal codes over
slow-fading channels. IEEE Transactions on Information Theory.
2006;52(7):3233-3258
206 Information and Coding Theory in Computer Science

27. Chen Y, Huang H, Zhang Z, Qiu P, Lau VK. Cooperative spectrum


access for cognitive radio network employing rateless code. In: ICC
Workshops-2008 IEEE International Conference on Communications
Workshops; IEEE. 2008. pp. 326-331
28. Chen X, Zhang Z, Chen S, Wang C. Adaptive mode selection for
multiuser MIMO downlink employing rateless codes with QoS
provisioning. IEEE Transactions on Wireless Communications.
2012;11(2):790-799
29. Afzal J, Stockhammer T, Gasiba T, Xu W. System design options for
video broadcasting over wireless networks. In: Proceedings of IEEE
CCNC, vol. 54. Citeseer; 2006. p. 92
30. Afzal J, Stockhammer T, Gasiba T, Xu W. Video streaming over MBMS:
A system design approach. Journal of Multimedia. 2006;1(5):25-35
31. Molisch AF. Wireless Communications, vol. 2. New York, USA: John
Wiley & Sons; 2011
32. Labiod H. Wireless ad hoc and Sensor Networks. Vol. 6. New York,
USA: John Wiley & Sons; 2010
33. Hartung F, Horn U, Huschke J, Kampmann M, Lohmar T, Lundevall
M. Delivery of broadcast services in 3G networks. IEEE Transactions
on Broadcasting. 2007;53(1):188-199
34. Culler D, Estrin D, Srivastava M. Guest editors’ introduction: Overview
of sensor networks. IEEE Computer Society. Aug 2004;37(8):41-49
35. Zhao F, Guibas LJ. Wireless Sensor Networks: An Information
Processing Approach. San Francisco, USA: Elsevier Science &
Technology; 2004
36. Razavi R, Fleury M, Ghanbari M. Rateless coding on a wearable
wireless network for augmented reality and biosensors. In: 2008 IEEE
19th International Symposium on Personal, Indoor and Mobile Radio
Communications; IEEE. 2008. pp. 1-4
37. Erez U, Wornell G, Trott MD. Rateless space–time coding. In:
Proceedings. International Symposium on Information Theory, 2005.
ISIT 2005; IEEE. 2005, pp. 1937-1941
38. Wang C, Zhang Z. Performance analysis of a rate varying space–time
coding scheme. In: 2013 International Workshop on High Mobility
Wireless Communications (HMWC). IEEE. 2013. pp. 151-156
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 207

39. Larsson E, Edfors O, Tufvesson F, Marzetta T. Massive MIMO for


next generation wireless systems. Communications Magazine, IEEE.
2014;52(2):186-195
40. Tarokh V, Seshadri N, Calderbank AR. Space–time codes for
high data rate wireless communication: Performance criterion
and code construction. IEEE Transactions on Information Theory.
1998;44(2):744-765
41. Alamouti SM. A simple transmit diversity technique for wireless
communications. IEEE Journal on Selected Areas in Communications.
1998;16(8):1451-1458
42. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block code
for massive MIMO systems. International Journal of Antennas and
Propagation. 2014;2014:1-10
43. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block
code for mitigating pilot contamination effects in multicell massive
MIMO system with lossy links. IET Communications Journal.
2016;10(16):2252-2259
44. Alqahtani AH, Sulyman AI, Alsanie A. Rateless space time block code
for antenna failure in massive MU-MIMO systems. IEEE Wireless
Communications and Networking Conference (WCNC); Doha, Qatar;
April 2016. pp. 1-6
45. Alqahtani AH, Sulyman AI, Alsanie A. Loss-tolerant large-scale
MU-MIMO system with rateless space time block code. In: 22nd
Asia-Pacific Conference on Communications (APCC); Yogyakarta,
Indonesia; August 2016. pp. 342-347
46. Marzetta TL. How much training is required for multiuser MIMO?
In: Fortieth Asilomar Conference on Signals, Systems and Computers,
2006. ACSSC’06; IEEE; 2006. pp. 359-363
SECTION 3: LOSSLESS DATA COMPRESSION
Chapter

LOSSLESS IMAGE
COMPRESSION
TECHNIQUE USING
10
COMBINATION METHODS

A. Alarabeyyat1, S. Al-Hashemi1, T. Khdour1, M. Hjouj Btoush1,


S. Bani-Ahmad1, and R. Al-Hashemi2
1
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied University,
Salt, Jordan
The Computer Information Systems Department, College of Information Technology, Al-
2

Hussein Bin Talal University, Ma’an, Jordan.

ABSTRACT
The development of multimedia and digital imaging has led to high quantity
of data required to represent modern imagery. This requires large disk space
for storage, and long time for transmission over computer networks, and
these two are relatively expensive. These factors prove the need for images

Citation: A. Alarabeyyat, S. Al-Hashemi, T. Khdour, M. Hjouj Btoush, S. Bani-Ah-


mad, R. Al-Hashemi and S. Bani-Ahmad, “Lossless Image Compression Technique Us-
ing Combination Methods,” Journal of Software Engineering and Applications, Vol. 5,
No. 10, 2012, pp. 752-763. doi: 10.4236/jsea.2012.510088.
Copyright: © 2012 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
212 Information and Coding Theory in Computer Science

compression. Image compression addresses the problem of reducing the


amount of space required to represent a digital image yielding a compact
representation of an image, and thereby reducing the image storage/
transmission time requirements. The key idea here is to remove redundancy
of data presented within an image to reduce its size without affecting the
essential information of it. We are concerned with lossless image compression
in this paper. Our proposed approach is a mix of a number of already existing
techniques. Our approach works as follows: first, we apply the well-known
Lempel-Ziv-Welch (LZW) algorithm on the image in hand. What comes
out of the first step is forward to the second step where the Bose, Chaudhuri
and Hocquenghem (BCH) error correction and detected algorithm is used.
To improve the compression ratio, the proposed approach applies the BCH
algorithms repeatedly until “inflation” is detected. The experimental results
show that the proposed algorithm could achieve an excellent compression
ratio without losing data when compared to the standard compression
algorithms.

Keywords: Image Compression, LZW, BCH

INTRODUCTION
Image applications are widely used, driven by recent advances in the
technology and breakthroughs in the price and performance of the hardware
and the firmware. This leads to an enormous increase in the storage space
and the transmitting time required for images. This emphasizes the need to
provide efficient and effective image compression techniques.
In this paper we provide a method which is capable of compressing
images without degrading its quality. This is achieved through minimizing
the number of bits required to represent each pixel. This, in return, reduces
the amount of memory required to store images and facilitates transmitting
image in less time.
Image compression techniques fall into two categories: lossless or lossy
image compression. Choosing which of these two categories depends on the
application and on the compression degree required [1,2].
Lossless image compression is used to compress images in critical
applications as it allows the exact original image to be reconstructed from
the compressed one without any loss of the image data. Lossy image
compression, on the other hand, suffers from the loss of some data. Thus,
repeatedly compressing and decompressing an image results in poor
Lossless Image Compression Technique Using Combination Methods 213

quality of image. An advantage of this technique is that it allows for higher


compression ratio than the lossless [3,4].
Compression is achieved by removing one or more of the three basic
data redundancies:
• Coding redundancy, which is presented when less than optimal
code words are used;
• Interpixel redundancy, which results from correlations between
the pixels of an image;
• Psychovisual redundancy, which is due to data that are ignored by
the human visual system [5].
So, image compression becomes a solution to many imaging applications
that require a vast amount of data to represent the images, such as document
imaging management systems, facsimile transmission, image archiving,
remote sensing, medical imaging, entertainment, HDTV, broadcasting,
education and video teleconferencing [6].
One major difficulty that faces lossless image compression is how to
protect the quality of the image in a way that the decompressed image
appears identical to the original one. In this paper we are concerned with
lossless image compression based on LZW and BCH algorithms, which
compresses different types of image formats. The proposed method repeats
the compression three times in order to increase the compression ratio.
The proposed method is an implementation of the lossless image
compression. The steps of our approach are as follows: first, we perform a
preprocessing step to convert the image in hand into binary. Next, we apply
the LZW algorithm on the image to compress. In this step, the codes from 0
to 255 represent 1-character sequences consisting of the corresponding 8-bit
character, and the codes from 256 through 4095 are created in a dictionary
for sequences encountered in the data as it is encoded. The code for the
sequence (without that character) is emited, and a new code (for the sequence
with that character) is added to the dictionary [7]. Finally, we use the BCH
algorithm to increase image compression ratio. An error correction method
is used in this step where we store the normal data and first parity data in a
memory cell array, the normal data and first parity data form BCH encoded
data. We also generate the second parity data from the stored normal data.
To check for errors, we compare the first parity data with the second parity
data as in [8,9].
214 Information and Coding Theory in Computer Science

Notice that we repeat compressing by the BCH algorithm until the


required level of compression is achieved. The method of decompression is
done in reversible order that produces image identical to original one.

LITERATURE REVIEW
A large number of data compression algorithms have been developed and
used throughout the years. Some of which are of general use, i.e., can be
used to compress files of different types (e.g., text files, image files, video
files, etc.). Others are developed to compress efficiently a particular type of
files. It has been realized that, according to the representation form of the
data at which the compression process is performed, below is reviewing
some of the literature review in this field.
In [10], the authors present lossless image compression with four modular
components: pixel sequence, prediction, error modeling, and coding. They
used two methods that clearly separate the four modular components.
These method are called Multi-Level Progressive Method (MLP), and
Partial Precision Matching Method (PPMM) for lossless compression, both
involving linear predictions, modeling prediction errors by estimating the
variance of a Laplace distribution (symmetric exponential), and coding
using arithmetic coding applied to pre-computed distributions [10].
In [11], a composite modeling method (hybrid compression algorithm
for binary image) is used to reduce the number of data coded by arithmetic
coding, which code the uniform areas with less computation and apply
arithmetic coding to the areas. The image block is classified into three
categories: all-white, all-black, and mixed, then image processed 16 rows at
a time, which is then operated by two global and local stages [11].
In [12], the authors propose an algorithm that works by applying a
reversible transformation on the fourteen commonly used files of the
Calgary Compression Corpus. It does not process its input sequentially, but
instead processes a block of texts as a single unit, to form a new block
that contains the same characters, but is easier to compress by simple
compression algorithms, group characters together based on their contexts.
This technique makes use of the context on only one side of each character
so that the probability of finding a character closer to another instance of the
same character is increased substantially. The transformation does not itself
compress the data, but reorder it to make it easy to compress with simple
algorithms such as move-to-front coding in combination with Huffman or
arithmetic coding [12].
Lossless Image Compression Technique Using Combination Methods 215

In [13], the authors present Lossless grayscale image compression


method—TMW—is based on the use of linear predictors and implicit
segmentation. The compression process is split into an analysis step and
a coding step. In the analysis step, a set of linear predictors and other
parameters suitable for the image are calculated in the analysis step in a way
that minimizes the length of the encoded image which is included in the
compressed file and subsequently used for the coding step. To do the actual
encoding, obviously, the chosen parameter set has to be considered as a part
of the encoded image and has to be stored or transmitted alongside with the
result of the Coding Stage [13].
In [14], the authors propose a lossless compression scheme for binary
images which consists of a novel encoding algorithm and uses a new edge
tracking algorithm. The proposed scheme consists of two major steps: the
first step encodes binary image data using the proposed encoding method that
encodes image data to only characteristic vector information of objects in
image by using a new edge tracing method. Unlike the existing approaches,
our method encodes information of edge lines obtained using the modified
edge tracing method instead of directly encoding whole image data. The
second is compressing the encoded image Huffman and Lempel-ZivWelch
(LZW) [14].
In [15], the author presents an algorithm for lossless binary image
compression which consists of two modules, called Two Modules Based
Algorithm (TMBA), the first module: direct redundancy exploitation and
the second: improved arithmetic coding [15].
In [16], a two-dimensional dictionary-based on lossless image
compression scheme for grayscale images is introduced. The proposed
scheme reduces a correlation in image data by finding two-dimensional
blocks of pixels that are approximately matched throughout the data and
replacing them with short codewords.
In [16], the two-dimensional Lempel-Ziv image compression scheme
(denoted GS-2D-LZ) is proposed. This scheme is designed to take advantage
of the two-dimensional correlations in the image data. It relies on three
different compression strategies, namely: two-dimensional block matching,
prediction, and statistical encoding.
In [17], the authors presented a lossless image compression method
that is based on Multiple-Table’s Arithmetic Coding (MTAC) method to
encode a gray-level image, first classifies the data and then encodes each
cluster of data using a distinct code table. The MTAC method employs a
216 Information and Coding Theory in Computer Science

median edge detector (MED) to reduce the entropy rate of f. The gray levels
of two adjacent pixels in an image are usually similar. A base-switching
transformation approach is then used to reduce the spatial redundancy of the
image. The gray levels of some pixels in an image are more common than
those of others. Finally, the arithmetic encoding method is applied to reduce
the coding redundancy of the image [17].
In [18], the authors used a lossless method of image compression
and decompression is proposed. It uses a simple coding technique
called Huffman coding. A software algorithm has been developed and
implemented to compress and decompress the given image using Huffman
coding techniques in a MATLAB platform. They concern with compressing
images by reducing the number of bits per pixel required to represent it, and
to decrease the transmission time for images transmission. The image is
reconstructed back by decoding it using Huffman codes [18].
This paper uses the adaptive bit-level text compression schema based on
humming code data compression used in [19]. Our schema consists of six
steps repeated to increase image compression rate. The compression ratio is
found by multiplying the compression ratio for each loop, and are referred
to this schema by HCDC (K) where (K) represents the number of repetition
[19].
In [20], the authors presented a lossless image compression based on
BCH combined with Huffman algorithm [20].

THE PROPOSED METHOD


The objective of the proposed method in this paper is to design an efficient
and effective lossless image compression scheme. This section deals with
the design of a lossless image compression method. The proposed method
is based on LZW algorithm and the BCH algorithm an error correcting
technique, in order to improve the compression ratio of the image comparing
to other compression techniques in the literature review. Later, we will
explain the methodology that will be used in details and the architecture of
the proposed method.
The proposed method is a lossless image compression scheme which
is applied to all types of image based on LZW algorithm that reduce the
repeated value in image and BCH codes that detect/correct the errors. The
BCH algorithm works by adding extra bits called parity bits, whose role is
to verify the correctness of the original message sent to the receiver so, the
Lossless Image Compression Technique Using Combination Methods 217

system in this paper benefit from this feature. This method of BCH convert
blocks of size k to n by adding parity bits, depending on the size of the
message k, which is encoded into a code word of the length n. The proposed
method is shown below in Figure 1.

Figure 1. Proposed image compression approach.

Lempel-Ziv-Welch (LZW)
The compression system improves the compression of the image through
the implementation of LZW algorithm. First, the entered image is converted
to the gray scale and then converted from decimal to binary to be a suitable
form to be compressed. The algorithm builds a data dictionary (also called
a translation table or string table) of data occurring in an uncompressed data
stream. Patterns of data are identified in the data stream and are matched
to entries in the dictionary. If the patterns are not present in the dictionary,
a code phrase is created based on the data content of that pattern, and it is
stored in the dictionary. The phrase is then written to the compressed output
stream. When a reoccurrence of a pattern is identified in the data, the phrase
of the pattern already stored in the dictionary is written to the output.

Bose, Chaudhuri and Hocquenghem (BCH)


The binary input image is firstly divided into blocks of size 7 bits each; only
7 bits needed to represent in each byte, 128 value in total, while eighth bits
represent sign of the number (most significant bit) that don’t affect the total
value of blocks, and converts it to a galoris field to be accepted as an input
to the BCH. Each block is decoded using BCH decoder, then is checked if
218 Information and Coding Theory in Computer Science

it is a valid codeword or not. The BCH decoder converts the valid block to
4 bits. The proposed method adds 1 as an indicator for the valid codeword
to an extra file called (map), otherwise if it is not a codeword, it remains 7
and adds 0 to the same file. The benefit of the extra file (map) is that it is
used as the key for image decompression in order to distinguish between
compressed blocks and the not compressed ones (codeword or not).
After the image is compressed, the file (map) is compressed by RLE to
decrease its size, and then it is attached to the header of the image. This step
is iterated three times, the BCH decoding repeat three times to improve the
compression ratio; we stopped repeating this algorithm at three times after
done experiment; conclude that if we try to decode more it will affect the
other performance factor that leads to increase time needed for compression,
and the map file becomes large in each time we decode by BCH so it leads
to the problem of increase the size of image, which opposes the objective of
this paper to reduce the image size. Below is an example of the compressed
image:

Example
Next is an example of the proposed system compression stage. In this example
a segment of the image is demonstrated using the proposed algorithm. First
of all it converts the decimal values into binary, compresses it by LZW and
then divides it into blocks of 7 bits
A = Original

The block is compressed by LZW algorithm and the output is:

Now it is converting to Binary and divided to 7 bit each:


Lossless Image Compression Technique Using Combination Methods 219

After dividing the image into blocks of 7 bits, the system implements
the BCH code that checks each block if it is a codeword or not by matching
the block with 16 standards codeword in the BCH. The first iteration shows
that we found four codewords. This block is compressed by using BCH
algorithm which is converted to blocks of 4 bit each.

When implementing the BCH algorithm, the file (Map 1) initializes. If


the block is a codeword, it is added to the file 1 and adds 0 if the block is a
non-codeword. In this example Map 1 is:
Map 1 = 0 1 0 0 1 0 0 0 1 0 1
This operation is repeated three times. The file (Map 3, Map 2, and Map
1) is compressed by RLE before attaching to the header of the image to gain
more compression ratio.

Compression Algorithm Steps


The proposed method compression the original image by implements a
number of steps Figure 2 represents the flowchart of the proposed method.
The algorithm steps are:
Input: image (f)
Output: compressed file
Begin
Initialize parameters
SET round to zero
READ image (f)
Convert (f) to gray scale
SET A = ( ) // set empty value to matrix A
A = image (f)
Bn = convert matrix A into binary
Initial matrix Map 1, Map 2, Map 3 to store parity bits
Out1 = Compress matrix by LZW algorithm function norm 2l zw (Bn);
Convert matrix compress by LZW into binary
220 Information and Coding Theory in Computer Science

Set N = 7, k = 4
WHILE (there is a codeword) and (round ≤3)
xxx = the size of the (Out 1)
remd = matrix size mod N;
div = matrix size /N;
FOR i = 1 to xxx-remd step N
FOR R = i to i + (N ‒ 1)
divide the image into blocks of size 7 save into parameter msg = out 1
[R]
END FOR R
c2 = convert (msg) to Galoris field;
origin = c2
d2 = decoding by BCH decoder (bchdec (c2, n, k,))
c2 = Encode by BCH encoder for test bchenc (d2, n, k)
IF (c2 == origin) THEN // original message parameter
INCREMENT the parameter test (the number of codeword found) by 1;
add the compressed block d2 to the matrix CmprsImg
add 1 to the map[round] matrix
ELSE
add the original block (origin) to the matrix CmprsImg
add 0 to the map[round] matrix
ENDIF
END FOR i
Pad and Add remd bits to the matrix CmprsImg and encode it
Final map file = map [round] to reuse map file in the iteration
FOR stp = 1 to 3
Compress map by RLE encoder and put in parameter map_RLE [stp] =
RLE (map [stp])
END FOR stp
INCREMENT round by 1
ENDWHILE
END
Lossless Image Compression Technique Using Combination Methods 221

Figure 2. Algorithm 1. Encoding algorithm.


222 Information and Coding Theory in Computer Science

Decompression
It is reversible steps to the compression stage to reconstruct the images. At
first the system decompress the attach file (map) by RLE decoder because
we depend on its values to know which block in the compress image is a
code word to be decompressed. That means if the value of the map file is
1, then it reads 4 bit block from the compressed image which means it’s a
codeword then decompressed by BCH encoder. If the value is 0, it reads
the 7 bit block from the compressed image which means that it is not a
codeword. This operation is repeated three times, after that the output from
BCH is decompressed using LZW algorithm. The below example explains
these steps.

Example
Read the map file after decompressing it by RLE algorithm.

Depending on the value of the map file, in the positions 4 and 7, the value
is 1 which means that the system will read 4 bit from the compressed image.
This means that a codeword and by compressing it by BCH, it reconstructs
7 bit from 16 codewords valid in BCH that match it, and the remained value
of the file is 0. This mean it a non codeword reads 7 bit from the compressed
image. The compressed image is:

Decompression procedure shown in the Figure 3 is implemented to


find the original image from the compressed image and it is performed as
follows:
Input: compressed image, attach file mapi
Lossless Image Compression Technique Using Combination Methods 223

Output: original image


Begin
Initial parameter
SET P = ( ) // set empty value to matrix P
SET j = 1
SET n = 7
SET k = 4
SET round = 3 // number of iteration
Rle_matrix = RLE decoder (mapi)
WHILE round > 0
FOR i = 0 to length of (Rle_matrix)
IF Rle_matrix [i] = 1 THEN encode by BCH
FOR s = j to j + (k − 1)
encode compress image by BCH encoder and put in parameter (c2)
c2 = bchenc (CmprsImg (s)), n, k)
INCREMENT j by 4
add c2 to matrix p
ENDFOR s
ELSE
//block is not compress then read it as it
FOR s1 = j to j + (n − 1)
add uncompress block from CmprsImg [s1] to matrix p
INCREMENT j by 7
ENDFOR s1
ENDIF
ENDFOR i
Decrement parameter round by 1
ENDWHILE
LZW_dec = decompress matrix p by LZW
Image Post processing
224 Information and Coding Theory in Computer Science

Original_image = bin2dec (LZW_dec) //convert from binary to decimal


to reconstruct original image
END
The above steps explain the implementation of the compression
and decompression of the proposed methods using combination of LZW
algorithm and BCH algorithm after many testing before reaching this final
decision.

Figure 3. Algorithm 2. Proposed decoding algorithm.


ENDFOR i
Lossless Image Compression Technique Using Combination Methods 225

Decrement parameter round by 1


ENDWHILE
LZW_dec = decompress matrix p by LZW
Image Post processing
Original_image = bin2dec (LZW_dec) //convert from binary to decimal to
reconstruct original image
END
The above steps explain the implementation of the compression
and decompression of the proposed methods using combination of LZW
algorithm and BCH algorithm after many testing before reaching this final
decision.
The next section shows the result from using this method by using
MatLab platform; calculating the compression ratio by using this equation:
Cr = original size compress size
Use the same dataset and the same size to compare between proposed
method and LZW, RLE, Huffman, and then compare it depending on the bit
that needs to represent each pixel according to the equation below:

Or by use the following equation:

(q) Is the number of bit represent each pixel in uncompressed image, (S0) the
size of the original data and (Sc) the size of the compressed data.
Also compare it with the standards of the compression technique, and
finally explain the test (the codeword found in image) comparing it with the
original size of the image in bit.

Result and Discussion


In order to evaluate the compression performance of our proposed method,
we compared the proposed method in this paper with other standard lossless
image compression schemes in the literature review. At first, the comparison
is based on the compression ratio, and the second is based on a bit per pixel.
226 Information and Coding Theory in Computer Science

Lossless images compression lets the images to occupy less space. In


lossless compression, no data are lost during the process, which means
that it protects the quality of the image. Decompression process restores
the original image without losing essential data. The tested images used
during compression are stored in GIF, PNG, JPG, and Tiff formats that are
all compressed automatically by the proposed method. Hardware used:
PC, processor Intel® core™ i3 CPU, hard size: 200 GB, RAM 2.00 GB,
Software using Windows 7 ultimate Operating System, and MatLab Version
7.5. 0.342 R (2007b).
Analyzes and discusses the results obtained in performing the LZW
and BCH algorithms discussed above on the set of images. The proposed
system uses the set of images that are commonly used in image processing
(airplane, baboon, F-18, Lina, and peppers, etc.) as a test set. The proposed
method has been tested on different image sizes. The simulation result is
compared with RLE, Huffman and LZW.
The results show that the proposed method has higher compression ratio
than the standard compression algorithm mentioned above. The results based
on the compression ratio shown in Tables 1 and 2 show the compression
based on bit per pixel.

Table 1. Compression with typical compression methods based on compression


ratio which divide original image size by size of compressed image.
Lossless Image Compression Technique Using Combination Methods 227

Table 2. Compression results of images in bit/pixel.

The above results show that the compression by the proposed system
is the best compared to the results of compressing the image by using RLE
algorithm, LZW algorithm or Huffman algorithm.
Here in Figure 4, we illustrate the comparison based on compression
ratio between the proposed algorithm (BCH and LZW) and the standard
image compression algorithms (RLE, Huffman and LZW) which can be
distinguished by color. And Figure 5 explains the size of original image
compared with image after compressed by the standard image compression
algorithm and the proposed method. Table 2 shows the results of the
compression based on bit per pixel rate for the proposed method, and the
standards compression algorithm.
228 Information and Coding Theory in Computer Science

Figure 4. Comparing the proposed method with (RLE, LZW and Huffman)
based on compression ratio.

Figure 5. Comparing the proposed method with (RLE, LZW and Huffman)
based on image size.
Figure 6 explains the result of the above Table 2.

Figure 6. Comparing the proposed method with (RLE, LZW and Huffman)
based on bit per pixel.
Lossless Image Compression Technique Using Combination Methods 229

Discussion
In this section we show the efficiency of the proposed system which
uses MatLab to implement the algorithm. In order to demonstrate the
compression performance of the proposed method, we compared it with
some representative lossless image compression techniques on the set of
ISO test images that were made available to the proposer that were shown in
the first column in all tables.
Table 1 lists compression ratio results of the tested images which
calculated depend on size of original image to the size of image after
compression; the second column of this table lists the compression ratio
result from compress image by the RLE algorithm. Column three and four
list the compression ratio result from compress by LZW and Huffman
algorithms respectively while the last column lists the compression ratio
achieved by the proposed method. In addition the average compression
ratio of each method after applied on all tested images (RLE 1.2017, LZW
1.4808, Huffman 1.195782 and BCH and, LZW the average is 1.676091).
The average of compression ratio on tested images based on the proposed
method is the best ratio achieved, this mean image size is reduced more when
compressed by using combination method LZW and BCH compared to the
standards of lossless compression algorithm, and Figure 2 can clear the view
of the proposed method that has higher compression ratio than the RLE,
LZW and Huffman. Figure 3 displays the original image size and the size of
image after compressed by each RLE, LZW, Huffman and compress by the
proposed method which show it had the less image size which achieves the
goal of this paper to utilize storage need to store the image and therefore,
reduce time for transmission.
The second comparison depends on bit per pixel shown in Table 2. The
goal of the image compression is to reduce the size as much as possible,
while maintaining the image quality. Smaller files use less space to store,
so it is better to have fewer bits need to represent in each pixel. The table
tests the same image sets and explains the proposed method that needs fewer
numbers of bit per pixel than the other standard image compression and the
average bit per pixel of all tested images are 6.904287, 5.656522, 6.774273
and 5.01157 to RLE, LZW, Huffman and proposed method respectively.

CONCLUSIONS
This paper was motivated by the desire of improving the effectiveness
of lossless image compression by improving the BCH and LZW. We
230 Information and Coding Theory in Computer Science

provided an overview of various existing coding standards lossless image


compression techniques. We have proposed a high efficient algorithm which
is implemented using the BCH coding approach.
The proposed method takes the advantages of the BCH algorithm with
the advantages of the LZW algorithm which is known for its simplicity and
speed. The ultimate goal is to give a relatively good compression ratio and
keep the time and space complexity minimum.
The experiments were carried on collection of dataset of 20 test
images. The result valuated by using compression ratio and bits per pixel.
The experimental results show that the proposed algorithm improves the
compression of images comparing compared with the RLE, Huffman
and LZW algorithms, the proposed method average compression ratio is
1.636383, which is better than the standard lossless image compression.

FUTURE WORK
In this paper, we develop a method for improve image compression based
on BCH and LZW. We suggest for future work to use BCH with another
compression method and that enable to repeat the compression more than
three times, and to investigate how to provide a high compression ratio
for given images and to find an algorithm that decrease file (map). The
experiment dataset in this paper was somehow limited so applying the
developed methods on a larger dataset could be a subject for future research
and finally extending the work to the video compression is also very
interesting, Video data is basically a three-dimensional array of color pixels,
that contains spatial and temporal redundancy. Similarities can thus be
encoded by registering differences within a frame (spatial), and/or between
frames (temporal) where data frame is a set of all pixels that correspond to a
single time moment. Basically, a frame is the same as a still picture.
Spatial encoding in video compression is performed by taking advantage
of the fact that the human eye is unable to distinguish small differences in
color as easily as it can perceive changes in brightness, so that very similar
areas of color can be “averaged out” in a similar way to JPEG images. With
temporal compression only the changes from one frame to the next are
encoded as often a large number of the pixels will be the same on a series
of frames.
Lossless Image Compression Technique Using Combination Methods 231

REFERENCES
1. R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital Image
Processing Using MATLAB,” Pearson Prentice Hall, USA, 2003.
2. K. D. Sonal, “Study of Various Image Compression Techniques,”
Proceedings of COIT, RIMT Institute of Engineering & Technology,
Pacific, 2000, pp. 799-803.
3. M. Rabbani and W. P. Jones, “Digital Image Compression Techniques,”
SPIE, Washington. doi:10.1117/3.34917
4. D. Shapira and A. Daptardar, “Adapting the Knuth-Morris-Pratt
Algorithm for Pattern Matching in Huffman Encoded Texts,”
Information Processing and Management, Vol. 42, No. 2, 2006, pp.
429-439. doi:10.1016/j.ipm.2005.02.003
5. H. Zha, “Progressive Lossless Image Compression Using Image
Decomposition and Context Quantization,” Master Thesis, University
of Waterloo, Waterloo.
6. W. Walczak, “Fractal Compression of Medical Images,” Master Thesis,
School of Engineering Blekinge Institute of Technology, Sweden.
7. R. Rajeswari and R. Rajesh, “WBMP Compression,” International
Journal of Wisdom Based Computing, Vol. 1, No. 2, 2011. doi:10.1109/
ICIIP.2011.6108930
8. M. Poolakkaparambil, J. Mathew, A. M. Jabir, D. K. Pradhan and
S. P. Mohanty, “BCH Code Based Multiple Bit Error Correction in
Finite Field Multiplier Circuits,” Proceedings of the 12th International
Symposium on Quality Electronic Design (ISQED), Santa Clara, 14-
16 March 2011, pp. 1-6. doi:10.1109/ISQED.2011.5770792
9. B. Ranjan, “Information Theory, Coding and Cryptography,” 2nd
Edition, McGraw-Hill Book Company, India, 2008.
10. P. G. Howard and V. J. Scott, “New Method for Lossless Image
Compression Using Arithmetic Coding,” Information Processing &
Management, Vol. 28, No. 6, 1992, pp. 749-763. doi:10.1016/0306-
4573(92)90066-9
11. P. Franti, “A Fast and Efficient Compression Method for Binary
Image,” 1993.
12. M. Burrows and D. J. Wheeler, “A Block-Sorting Lossless Data
Compression Algorithm,” Systems Research Center, Vol. 22, No. 5,
1994.
232 Information and Coding Theory in Computer Science

13. 13. B. Meyer and P. Tischer, “TMW—a New Method for Lossless
Image Compression,” Australia, 1997.
14. 14. M. F. Talu and I. Türkoglu, “Hybrid Lossless Compression Method
for Binary Images,” University of Firat, Elazig, Turkey, 2003.
15. 15. L. Zhou, “A New Highly Efficient Algorithm for Lossless Binary
Image Compression,” Master Thesis, University of Northern British
Columbia, Canada, 2004.
16. 16. N. J. Brittain and M. R. El-Sakka, “Grayscale True Two-
Dimensional Dictionary-Based Image Compression,” Journal of Visual
Communication and Image Representation, Vol. 18, No. 1, pp. 35-44.
17. 17. R.-C. Chen, P.-Y. Pai, Y.-K. Chan and C.-C. Chang, “Lossless
Image Compression Based on Multiple-Tables Arithmetic Coding,”
Mathematical Problems in Engineering, Vol. 2009, 2009, Article ID:
128317. doi:10.1155/2009/128317
18. 18. J. H. Pujar and L. M. Kadlaskar, “A New Lossless Method of Image
Compression and Decompression Using Huffman Coding Technique,”
Journal of Theoretical and Applied Information Technology, Vol. 15,
No. 1, 2010.
19. 19. H. Bahadili and A. Rababa’a, “A Bit-Level Text Compression
Scheme Based on the HCDC Algorithm,” International Journal of
Computers and Applications, Vol. 32, No. 3, 2010.
20. 20. R. Al-Hashemi and I. Kamal, “A New Lossless Image Compression
Technique Based on Bose,” International Journal of Software
Engineering and Its Applications, Vol. 5, No. 3, 2011, pp. 15-22.
Chapter

NEW RESULTS IN
PERCEPTUALLY LOSSLESS
COMPRESSION OF
11
HYPERSPECTRAL IMAGES

Chiman Kwan and Jude Larkin


Applied Research LLC, Rockville, Maryland, USA

ABSTRACT
Hyperspectral images (HSI) have hundreds of bands, which impose
heavy burden on data storage and transmission bandwidth. Quite a few
compression techniques have been explored for HSI in the past decades.
One high performing technique is the combination of principal component
analysis (PCA) and JPEG-2000 (J2K). However, since there are several
new compression codecs developed after J2K in the past 15 years, it is
worthwhile to revisit this research area and investigate if there are better
techniques for HSI compression. In this paper, we present some new

Citation: Kwan, C. and Larkin, J. (2019), “New Results in Perceptually Lossless Com-
pression of Hyperspectral Images”. Journal of Signal and Information Processing, 10,
96-124. doi: 10.4236/jsip.2019.103007.
Copyright: © 2019 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
234 Information and Coding Theory in Computer Science

results in HSI compression. We aim at perceptually lossless compression


of HSI. Perceptually lossless means that the decompressed HSI data cube
has a performance metric near 40 dBs in terms of peak-signal-to-noise
ratio (PSNR) or human visual system (HVS) based metrics. The key idea
is to compare several combinations of PCA and video/ image codecs. Three
representative HSI data cubes were used in our studies. Four video/image
codecs, including J2K, X264, X265, and Daala, have been investigated
and four performance metrics were used in our comparative studies.
Moreover, some alternative techniques such as video, split band, and PCA
only approaches were also compared. It was observed that the combination
of PCA and X264 yielded the best performance in terms of compression
performance and computational complexity. In some cases, the PCA + X264
combination achieved more than 3 dBs than the PCA + J2K combination.

Keywords: Hyperspectral Images (HSI), Compression, Perceptually Loss-


less, Principal Component Analysis (PCA), Human Visual System (HVS),
PSNR, SSIM, JPEG-2000, X264, X265, Daala

INTRODUCTION
Hyperspectral images (HSI) have found a wide range of applications,
including remote chemical monitoring [1] , target detection [2] , anomaly
and change detection [3] [4] [5] , etc. Due to the presence of hundreds of
bands in HSI, however, heavy burden in data storage and transmission
bandwidth has been introduced.
For many practical applications, it is unnecessary to compress data
losslessly because lossless compression can achieve only two to three times
of compression. Instead, it will be more practical to apply perceptually
lossless compression [6] [7] [8] [9] . A simple rule of thumb is that if the
peak-signal-to-noise ratio (PSNR) or human visual system (HVS) inspired
metric is above 40 dBs, then the decompressed image is considered as
“near perceptually lossless” [10] . In several recent papers, we have applied
perceptually lossless compression to maritime images [10] , sonar images
[10] , and Mastcam images [11] [12] [13] .
In the past few decades, there are some alternative techniques for
compressing HSI. In [14] , a tensor approach was proposed to compress
the HSI. In [15] , a missing data approach was presented to compress HSI.
Another simple and straightforward approach is to apply PCA directly
to HSI. For instance, in [3] , the authors have used 10 PCA compressed
New Results in Perceptually Lossless Compression of Hyperspectral Images 235

bands for anomaly detection. There are also some conventional, simple, and
somewhat naïve approaches, to compressing HSI. One idea known as split
band (SB) is to split the hundreds of HSI bands into groups of 3-band images
and then compress each 3-band image separately. Another idea known as the
video approach (Video) is to treat the 3-band images as video frames and
compress the frames as a video. The SB and Video approaches have been
used for multispectral images [13] and were observed to achieve reasonable
performance.
One powerful approach to HSI compression is the combination of PCA
and J2K [16] . The idea was to first apply PCA to decorrelate the hundreds of
bands and then a J2K codec is then applied to compress the few PCA bands.
In the compression literature, there are a lot of new developments after
J2K [17] in the past 15 years. X264 [18] , a fast implementation of H264
standard, has been widely used in Youtube and many other social media
platforms. X265 [19] , a fast implementation of H265, is a new codec that
will succeed X264. Moreover, a free video codec known as Daala, emerged
recently [20] . In light of these new codecs, it is about time and worthwhile
to revisit the HSI compression problem.
In this paper, we summarize our study in this area. Our aim is to achieve
perceptually lossless compression of HSI at 100 to 1 compression. The key
idea is to compare several combinations of PCA and video/image codecs.
Three representative HSI data cubes such as the Pavia and AVIRIS datasets
were used in our studies. Four video/image codecs, including J2K, X264,
X265, and Daala, have been investigated and four performance metrics were
used in our comparative studies. Moreover, some alternative techniques
such as video, split band, and PCA only approaches were also compared.
It was observed that the combination of PCA and X264 yielded the best
performance in terms of compression performance (rate-distortion curves)
and computational complexity. In the Pavia data case, the PCA + X264
combination achieved more than 3 dBs than the PCA + J2K combination. Most
importantly, our investigations showed that the PCA + X264 combination
can achieve more than 40 dBs of PSNR at 100 to 1 compression. This means
that perceptually lossless compression of HSI is achievable even at 100 to
1 compression.
The key contributions are as follows. First, we revisited the hyperspectral
image compression problem and extensively compared several approaches:
PCA only, Video approach, Split Band approach, and a two-step approach.
236 Information and Coding Theory in Computer Science

Second, for the two-step approach, we compared four variants: PCA + J2K,
PCA + X264, PCA + X265, and PCA + Daala. We observed that the two-
step approach is better than PCA only, Video, and Split Band approaches, as
perceptually lossless compression can be achieved at 100 to 1 ratio. Third,
within the two-step approach, our experiments showed that the PCA +
X264 combination is better than other variants in terms of performance and
computational complexity. To the best of our knowledge, we have not seen
such a study in the literature.
Our paper is organized as follows. Section 2 summarizes the HSI data,
the technical approach, the various algorithms, and performance metrics.
In Section 3, we focus on the experimental results, including the PCA only
results, video approach, split band approach, and two-step approach (PCA
+ video codecs). Four performance metrics were used to compare different
algorithms. Finally, some concluding remarks are included in Section 4.

DATA AND APPROACH

Data
We have used several representative HSI data in this paper. The Pavia and
AVIRIS image cubes were collected using airborne sensors and the Air
Force image was collected on the ground. The numbers of bands in the three
data sets vary from one hundred to more than two hundred.
Image 1: Pavia [21]
The first image we had tested was the Pavia data with a 610 × 340 × 103
image cube. The image was taken with a Reflective Optics System Imaging
Spectrometer (ROSIS) sensor during a flight over northern Italy. Figure 1
shows the RGB bands of the Pavia image cube.
Image 2: AF image
The second image was the image cube used in [3] and it consists of 124
bands and has a height of 267 pixels and a width of 342 pixels. The RGB
image of this data set is shown in Figure 2.
Image 3: AVIRIS
The third image was taken from NASA’s Airborne Visible Infrared
Imaging Spectrometer (AVIRIS). There are 213 bands with wavelengths
from 380 nm to 2500 nm. The image size is 300 × 300 × 213. Figure 3 shows
the RGB image of the data cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images 237

Figure 1. RGB image of the Pavia image cube.

Figure 2. RGB image of the AF image cube.


238 Information and Coding Theory in Computer Science

Figure 3. RGB image of the AVIRIS image cube.

Compression Approaches
Here, we first present the various work flows of several representative
compression approaches for HSI. We then include some background
materials for several video/image codecs in the literature. We will also
mention two conventional performance metrics and two other metrics
motivated by human visual systems (HVS).

PCA Only
PCA is also known as Karhunen-Loève transform (KLT). Comparing with
discrete cosine transform (DCT) and wavelet transform, PCA is optimal
because it is data-dependent whereas the DCT and WT are independent of
input data. The work flow is shown in Figure 4. After some preprocessing
steps, PCA compresses the raw HSI data cube (N bands) into a pre-defined
number of bands (r bands) and those r bands will be saved or transmitted.
At the receiving end, an inverse PCA will be performed to reconstruct the
HSI image cube.

Split Band (SB) Approach


This idea is very simple. The HSI bands are divided into groups of 3-band
images. Each 3-band image is then compressed as a still image with an image
codec. This approach has been observed to work well for multispectral (MS)
image cubes [13] where there are only nine bands. The work flow is shown
in Figure 5.
New Results in Perceptually Lossless Compression of Hyperspectral Images 239

Video Approach
This approach is similar to the SB approach. Here, the 3-band images are
treated as video frames and then a video codec is then applied. Details can
be found in Figure 5.
We include some details for some of the blocks.

Pre-processing
The preprocessing has a few components. First, it is important to ensure
the input image dimensions to have even numbers because some codecs
may crash if the image size has odd dimensions. Second, the input image
is normalized to double precision with values between 0 and 1. Third, the
different bands are saved into tiff format. Fourth, all the bands are written
into YUV444 and Y4M formats.

Codecs
Different codecs have different requirements. For J2K, we used Matlab’s
video writer to create a J2K format with certain quality parameters. We then
used Matlab’s data reader to decode the compressed data and the individual
frames will be retrieved. For X264 and X265, the videos are encoded using
the respective encoders with certain quality parameters. The video decoding
was done within FFMPEG. For Daala, we directly used the Daala’s functions
for encoding and decoding.

Performance Evaluation
In the evaluation part, each frame is reconstructed and compared to the
original input band. Four performance metrics have been used.

Figure 4. PCA only compression work flow.


240 Information and Coding Theory in Computer Science

Figure 5. Work flow for SB and Video approaches.

Two-step Approach: PCA + Video


The two-step approach has been used in [13] [16] before. In [13] , X265 was
observed to perform better in the second step. In [16] , the second step was
a J2K codec. However, the study in [13] was for MS images rather than an
HSI.
The work flow for the two-step approach is summarized in Figure 6. In
the second step, we propose to treat the PCA bands as a video.

Brief Review of Relevant Compression Algorithms


Instead of reinventing the wheels, we will use image codecs in the market
and objectively evaluate different codecs and eventually recommend the
best codec to our customer.
With the above in mind, we include a brief overview of some
representative codecs.

DCT based algorithms


• JPEG [22]: JPEG is the very first image compression standard.
The video counterparts are the MPEG-1 and MPEG-2 standards.
• JPEG-XR [23]: It was developed by Microsoft. The performance
is comparable to JPEG-2000. It is mainly used for still image
compression.
• VP8 and VP9 [24] [25]: These video compression algorithms are
owned by Google. The performance is somewhat close to X-264.
We did include VP8 and VP9 in our study because they are not as
popular as X264 and X265.
New Results in Perceptually Lossless Compression of Hyperspectral Images 241

• X-264 [18]: X264 is the current state-of-the-art in video


compression. Youtube uses X264. It has good still image
compression.
• X-265 [19]: This is the next-generation video codec and has
excellent still image compression and video compression.
However, the computational complexity is much more than
that of X264. In general, X265 has the same basic structure as
previous standards.

Figure 6. Two-step approach to HSI compression.


Several studies concluded that X265 yields the same quality as X264,
but with only half of the bitrate. It should be noted that X264 and X265 are
optimized versions of H264 and H265, respectively.

Daala [20]
Recently, there is a parallel activity at xiph.org foundation, which implements
a compression codec called Daala [20]. It is based on DCT. There are pre-
and post-filters to increase energy compaction and remove block artifacts.
Daalaborrows ideas from [26] .
The block-coding framework in Daala can be illustrated in Figure 7.
In this study, we compared Daala with X264, X265, and J2K in our
experiments.

Wavelet-based Algorithms
J2K is a wavelet [17] [27] [28] [29] based compression standard. It has
better performance than JPEG. However, J2K requires the use of the whole
image for coding and hence is not suitable for real-time applications. In
addition, motion-J2K for video compression is not popular in the market.

Performance Metrics
In almost all compression systems, researchers used peak signal-to-noise
ratio (PSNR) or structural similarity (SSIM) to evaluate the compression
algorithms. Given a fixed compression ratio, algorithms that yield higher
242 Information and Coding Theory in Computer Science

PSNR or SSIM will be regarded as better algorithms. However, PSNR or


SSIM do not correlate well with human perception. Recently, a group of
researchers investigated a number of different performance metrics [30]
. Extensive experiments were performed to investigate the correlation
between human perceptions with various performance metrics. According
to the results found in [30] , it was determined that two performance metrics
correlate well with human perception. One image example shown in Figure
8 demonstrates that HVS and HVSm have high correlation with human
subjective evaluation results. In the past, we have used HVS and HVSm in
several applications [11] [12] [13] .

EXPERIMENTAL RESULTS
Here, we briefly describe the experimental settings. In PCA only approach, a
program was written for PCA. The input is one hyperspectral image and the
number of principal components to be used in the compression. The outputs
are the PCA bands. The performance metrics are generated by comparing
the original hyperspectral image with the inverse-PCA outputs.
In the Video only approach, we used ffmpeg to call X264 and X265. For
Daala, we used the latest open-source code in Daala’s website. For J2K, we
used the built-in J2K function in Matlab.

Figure 7. Daala codec for block-based image coding systems.


New Results in Perceptually Lossless Compression of Hyperspectral Images 243

Figure 8. Comparison of SSIM and PSNR-HVS-M (HVSm). HVSm has better


correlation with human perception [30] .
In each codec, there is a quantization or quality parameter (qp) that controls
the compression ratio. We chose around 50 qp parameters in our experiments
in order to generate smooth performance curves.
In the two-step approach, the PCA is applied first, followed by the video
codecs.

Experiment 1: Pavia Data

PCA Only
Here, we applied PCA directly to compress the 103 bands to 3, 6, and 9
bands, which we denote as PCA3, PCA6, and PCA9, respectively. From
Figure 9, one can see that PCA3 achieved 33 times of compression with
44.75 dB of PSNR. The other metrics are also high. Similarly, PCA6 and
PCA9 also attained high values in performance metrics. This means that
PCA alone can achieve reasonable compression performance. However, if
our goal is to achieve 100 to 1 compression with higher than 40 dBs of
PSNR, then the PCA only approach may be insufficient.

Video Approach
As mentioned earlier, the video approach treats the HSI data cube as a video
where each frame takes 3 bands out of the data cube. There are 35 frames
in total in the video for the Pavia data. We then applied four video codecs
(J2K, X264, X265, and Daala) to the video. Four performance metrics were
generated as shown in Figure 10. If one compares the metrics in Figure 9 and
244 Information and Coding Theory in Computer Science

Figure 10, one can see that Video approach is slightly better than the PCA
only approach. For instance, at 0.03 compression ratio, PCA3 yielded 38.2
dBs and the Video approach yielded more than 40 dBs in terms of PSNR.
X265 performed better than others at compression ratios less than 0.1.

Split Band (SB) Approach


Here, SB approach means that every 3 bands in the hyperspectral image
cube are treated as a separate image. We then applied four image codecs
to each 3-band image. The averaged metrics from all 3-band images were
computed. Figure 11 summarizes the performance metrics. J2K has better
scores in three out of four metrics. Comparing the Video and SB approaches
as shown in Figure 10 and Figure 11, one can see that the Video approach
is slightly better. For example, at a compression ratio of 0.05, J2K has 43
dBs (HVSm) using the SB approach and X265 has 52 dBs (HVSm) using
the Video approach.

Two-step approach
Two-step approach first compresses the HSI cube by using PCA to a number
of bands (3, 6, 9, etc.) The second step applies a video codec to compress the
PCA bands. We have five case studies below.

Figure 9. Performance of PCA only: (a) PSNR in dB for Pavia; (b) SSIM for
Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
New Results in Perceptually Lossless Compression of Hyperspectral Images 245

Figure 10. Performance of video approach: (a) PSNR in dB for Pavia; (b) SSIM
for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.

Figure 11. Performance of SB approach: (a) PSNR in dB for Pavia; (b) SSIM
for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.

PCA3 + Video Codec


Figure 12 summarizes the two-step approach (PCA3 + Video). It can be
seen that, at 0.01 compression ratio, the two-step approach can get above
246 Information and Coding Theory in Computer Science

40 dBs of PSNR. The other metrics are also high. Daala has better visual
performance (HVS and HVsm) than others. We can also notice that the PCA3
+ Video approach can attain much higher compression than PCA only, SB,
and Video approaches. That is, the compression ratio can be more than 100
times compression with close to 40 dBs of HVSm in the two-step approach
whereas the SB and Video approach cannot achieve 100 to 1 compression
with the same performance metrics (40 dBs).

PCA6 + Video
Figure 13 summarizes the PCA6 + Video results. At 0.01 compression ratio,
the PCA6 + Video approach appears to be slightly better than that of PCA3
+ Video. X264 is better than others in three out of four metrics. In particular,
at 0.01 compression, X264 has 45 dBs in terms of HVSm. This value is very
high and can be considered as perceptually lossless.

PCA9 + Video Approach


From Figure 14, it is clear that PCA9 + Video is slightly worse as compared
to the PCA6 + Video case. For example, at 0.01 compression ratio, PCA9 +
Video has 44 dBs in terms of PSNR whereas PCA6 + Video has 45 dBs of
PSNR. X264 is better than other codecs.

Figure 12. Performance of two-step (PCA3 + Video) approach: (a) PSNR in


dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB
for Pavia.
New Results in Perceptually Lossless Compression of Hyperspectral Images 247

Figure 13. Performance of two-step (PCA6 + Video) approach: (a) PSNR in


dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB
for Pavia.

Figure 14. Performance of two-step (PCA9 + Video) approach: (a) PSNR in


dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB
for Pavia.
248 Information and Coding Theory in Computer Science

PCA12 + Video
As shown in Figure 15, the performance of PCA12 + Video is somewhat
similar to PCA9 + Video.

Figure 15. Performance of two-step (PCA12 + Video) approach: (a) PSNR in


dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB
for Pavia.

PCA15 + Video
As shown in Figure 16, the performance of PCA15 + Video is somewhat
similar to PCA12 + Video.
New Results in Perceptually Lossless Compression of Hyperspectral Images 249

Figure 16. Performance of two-step (PCA15 + video) approach: (a) PSNR in


dB for Pavia; (b) SSIM for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB
for Pavia.

Comparison of Different Combination of the Two-step


Approaches
The performance comparison of different combinations of the two-step
approaches is summarized in Table 1. First, we observe that PCA3 and
PCA6 have better performance than PCA9 to PCA15. Second, PCA6 has
better performance in X264 and Daala. Third, for PCA6, we observed that
X264 is 3.16 dBs in terms of PSNR better than J2K. In terms of HVSm,
X264 is 4.2 dBs better than J2K. This is quite significant. For PCA6, Daala
has slightly better performance than X264 and X265. However, we noticed
that Daala took more computational times than X264. Hence, for practical
applications, X264 may be a better choice for HSI compression.
250 Information and Coding Theory in Computer Science

Table 1. Performance comparison of different combinations of two-step ap-


proach. Bold numbers indicate the best performing method for each column.
New Results in Perceptually Lossless Compression of Hyperspectral Images 251

Experiment 2: AF Image Cube

PCA only Approach


Here, we applied PCA directly to compress the 124 bands to 3, 6, and 9
bands, which we denote as PCA3, PCA6, and PCA9, respectively. From
Figure 17, one can see that PCA3 achieved 42 times of compression with
40.7 dBs of PSNR. The other metrics are also high. Similarly, PCA6 and
PCA9 also attained high performance, but lower compression ratios. This
means PCA alone can achieve reasonable compression. However, PCA only
is not enough to achieve 100 to 1 compression.

Figure 17. Performance of PCA only: (a) PSNR in dB for AF image cube; (b)
SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB
for AF image cube.

Video Approach
Comparing the performance of video approach (Figure 18) with the PCA
only approach (Figure 17), one can immediately notice that the Video
approach allows higher compression ratios to be achieved. For instance,
at 0.01 compression ratio, X265 achieved about 38 dBs in PSNR. X265
performs well for small ratios (high compression).
252 Information and Coding Theory in Computer Science

Figure 18. Performance of Video approach: (a) PSNR in dB for AF image cube;
(b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in
dB for AF image cube.

SB Approach
Comparing the SB approach in Figure 19 with the Video approach in Figure
18, the Video approach is better. For instance, if one looks at the PSNR
values at 0.05 compression ratio, one can see that the X265 codec in the
Video approach has a value of 44 dBs whereas the best codec (J2K) has a
value of 41.5 dBs.

Figure 19. Performance of SB approach: (a) PSNR in dB for AF image cube;


(b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in
dB for AF image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images 253

Two-step Approach
Here, the PCA is combined with the Video approach. That is, the PCA is
first applied to the 124 bands to obtain 3, 6, 9, 12, and 15 bands. After that,
a video codec is applied to further compress the PCA bands.

PCA3 + Video
From Figure 20, we can see that the PCA3 + Video can achieve 0.01
compression ratio with more than 40 dBs of PSNR. Hence, the performance
is better than the earlier approaches (PCA only, Video, and SB). Daala has
better performance in terms of HVS and HVsm.

Figure 20. Performance of PCA3 + Video approach: (a) PSNR in dB for AF


image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube;
(d) HVSm in dB for AF image cube.

PCA6 + Video
From Figure 21 and Figure 20, we can see that PCA6 + Video is better than
PCA3 + Video. For example, at 0.01 compression ratio, Daala has 44 dBs
(HVSm) for PCA6 + Video whereas Daala only has 34.75 dB for PCA3 +
Video. X264 has better metrics in PSNR and SSIM, but Daala has better
performance in terms of HVS and HVSm.
254 Information and Coding Theory in Computer Science

Figure 21. Performance of PCA6 + Video approach: (a) PSNR in dB for AF


image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube;
(d) HVSm in dB for AF image cube.

PCA9 + Video
Comparing Figure 21 and Figure 22, PCA9 + Video is worse than that of
PCA6 + Video. For instance, at 0.01 compression ratio, PCA9 + Video has
42 dBs (PSNR) and PCA6 + Video has slightly over 44 dBs of PSNR. Daala
has better scores in HVS and HVSm, but X264 has higher values in PSNR
and SSIM.

Figure 22. Performance of PCA9 + Video approach: (a) PSNR in dB for AF


image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube;
(d) HVSm in dB for AF image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images 255

PCA12 + Video
As shown in Figure 23, the performance of PCA12 + Video is worse than
some of the earlier combinations. For example, Daala’s HVSm value is 40
dBs at 0.01 compression ratio and this is lower than PCA6 + Video (Figure
21) and PCA9 + Video (Figure 22). PCA12 + Video is better than PCA3 +
Video (Figure 20).

Figure 23. Performance of PCA12 + Video approach: (a) PSNR in dB for AF


image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube;
(d) HVSm in dB for AF image cube.

PCA15 + Video
From Figure 24, we can see that PCA15 + Video is similar to PCA3 + Video
(Figure 20), but worse than the other PCA + Video combinations (Figure 21,
Figure 22, Figure 23).
256 Information and Coding Theory in Computer Science

Figure 24. Performance of PCA15 + Video approach: (a) PSNR in dB for AF


image cube; (b) SSIM for AF image cube; (c) HVS in dB for AF image cube;
(d) HVSm in dB for AF image cube.

Comparison of Different Combinations of the Two-step Approach


From Table 2, we have the following observations. First, PCA6 + Video
combination has the best performance for each codec. Second, X264 has
the best performance in PSNR whereas Daala has the best performance in
HVS and HVSm. Third, X264 is 1.45 dBs better than J2K for PCA6 case.
As mentioned earlier, X264 is faster to run than Daala. Hence, X264 may be
more suitable in practical applications.

Table 2. Performance comparison of different combinations of two-step ap-


proach. Bold numbers indicate the best performing method for each column.
New Results in Perceptually Lossless Compression of Hyperspectral Images 257

Experiment 3: AVIRIS Image Cube

PCA Only Approach


Here, we applied PCA directly to compress the 213 bands to 3, 6, and 9
bands, which we denote as PCA3, PCA6, and PCA9, respectively. From
Figure 25, one can see that PCA3 achieved 72 times of compression with
39.7 dBs of PSNR. The other metrics are also high. Similarly, PCA6 and
PCA9 also attained high performance, but lower compression ratios. This
means PCA alone can achieve reasonable compression.
258 Information and Coding Theory in Computer Science

Figure 25. Performance of PCA only: (a) PSNR in dB for AVIRIS image cube;
(b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d)
HVSm in dB for AVIRIS image cube.

Video Approach
Here, the 213 bands are divided into groups of 3 bands. As a result, there are
71 groups, which are then treated as 73 frames in a video. After that, different
video codecs are applied. The performance metrics are shown in Figure 26.
Comparing with PCA only approach, the video approach is slightly inferior.
For instance, PCA6 has PSNR of 44 dBs at a compression ratio of 0.028
whereas the Video only approach has about 42.5 dBs at 0.028 ratio.
New Results in Perceptually Lossless Compression of Hyperspectral Images 259

Figure 26. Performance of Video approach: (a) PSNR in dB for AVIRIS image
cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube;
(d) HVSm in dB for AVIRIS image cube.

SB Approach
Here the 73 groups of 3-band images are compressed separately. The results
shown in Figure 27 are worse than the video approach. This is understandable
as the correlations between frames were not taken into account in the SB
approach.

Figure 27. Performance of SB approach: (a) PSNR in dB for AVIRIS image


cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube;
(d) HVSm in dB for AVIRIS image cube.
260 Information and Coding Theory in Computer Science

Two-step Approach
We have the following five case studies based on the number of PCA bands
coming out of the first step.

PCA3 + Video
From Figure 28, the performance metrics appear to flatten out after a
compression ratio of 0.005. The maximum PSNR value is below 40 dBs.
Other metrics are also not very high. Comparing with the PCA only, Video,
and SB approaches, PCA3 + Video does not show any advantages.

Figure 28. Performance of PCA3 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.

PCA6 + Video
From Figure 29, we can see that PCA6 + Video has much better performance
than PCA3 + Video as well as PCA only, Video, and SB approaches. At
0.01 compression ratio, the PSNR values reached more than 42 dBs. Other
metrics also performed well. Daala has higher scores in HVS and HVSm.
X265 is slightly better in PSNR and SSIM.
New Results in Perceptually Lossless Compression of Hyperspectral Images 261

Figure 29. Performance of PCA6 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.

PCA9 + Video
Comparing Figure 29 and Figure 30, we can see that PCA9 + Video has
better metrics than that of PCA6 + Video. For instance, at 0.01 compression
ratio, PCA9 + Video has achieved 40 dBs of HVSm (Daala), but PCA6 +
Video has 38.5 dBs.

Figure 30. Performance of PCA9 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.
262 Information and Coding Theory in Computer Science

PCA12 + Video
Comparing Figure 30 and Figure 31, we can see that PCA12 + Video is
slightly worse than that of PCA9 + Video.

Figure 31. Performance of PCA12 + Video approach: (a) PSNR in dB for


AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for
AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.

PCA15 + Video
Comparing Figure 31 and Figure 32, it can be seen that PCA15 + Video is
slightly worse than PCA12 + Video.

Figure 32. Performance of PCA15 + Video approach: (a) PSNR in dB for


AVIRIS image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for
AVIRIS image cube; (d) HVSm in dB for AVIRIS image cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images 263

Comparison of Different Combinations in the Two-step Approach


Table 3 summarizes the performance metrics of different combinations of
the two-step approach. First, PCA6 performed better than other PCA + Video
combinations. Second, X265 performed better than other codecs. However,
since X265 requires a lot of computational power and X264 is only slightly
inferior to X265, it is better to use the PCA6 + X264 combination. Third,
the HVSm values in Daala and X265 are somewhat strange, but we could
not find an explanation for that even though we repeated the experiments
several times. This behavior only happened in the AVIRIS data. Fourth, we
noticed that X264 is 0.25 dBs in PSNR better than that of J2K.

Table 3. Performance comparison of different combinations of two-step ap-


proach at 0.01 compression ratio for the AVIRIS data. Bold numbers indicate
the best performing method for each column.
264 Information and Coding Theory in Computer Science

CONCLUSION
In this paper, we summarize some new results for HSI compression. The
key idea is to revisit a two-step approach to HSI data compression. The first
step adopts PCA to compress the HSI data spectrally. That is, the number of
bands is greatly reduced to a few bands via PCA. The second step applies
the latest video/image codecs to further compress the few PCA bands. Four
well-known codecs (J2K, X264, X265, and Daala) were used in the second
step. Three HSI data sets with diversely varying numbers of bands were used
in our studies. Four performance metrics were utilized in our experiments.
We have several key observations. First, we observed that compressing of
the HIS to six bands has the best overall performance in all of the three
HSI data sets. This is different from the observation in [16] where more
PCA bands were included in the J2K step. Second, the X264 codec gave the
best performance in terms of compression performance and computational
complexity. Third, the PCA6 + X264 combination can be 3 dBs better than
the PCA6 + J2K combination in the Pavia data at 100 to 1 compression and
this is quite significant. Fourth, even at 100 to 1 compression, the PCA6 +
New Results in Perceptually Lossless Compression of Hyperspectral Images 265

X264 combination can attain better than 40 dBs in PSNR for all of the three
data sets. This means the compression performance is perceptually lossless
at 100 to compression.

ACKNOWLEDGEMENTS
This research was supported by NASA Jet Propulsion Laboratory under
contract # 80NSSC17C0035. The views, opinions and/or findings expressed
are those of the author(s) and should not be interpreted as representing the
official views or policies of NASA or the U.S. Government.
266 Information and Coding Theory in Computer Science

REFERENCES
1. Ayhan, B., Kwan, C. and Jensen, J.O. (2019) Remote Vapor Detection
and Classification Using Hyperspectral Images. Proceedings SPIE,
Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE)
Sensing XX, Vol. 11010, 110100U. https://doi.org/10.1117/12.2518500
2. Zhou, J., Kwan, C. and Ayhan, B. (2017) Improved Target Detection
for Hyperspectral Images Using Hybrid In-Scene Calibration. Journal
of Applied Remote Sensing, 11, Article ID: 035010. https://doi.
org/10.1117/1.JRS.11.035010
3. Zhou, J., Kwan, C., Ayhan, B. and Eismann, M. (2016) A Novel Cluster
Kernel RX Algorithm for Anomaly and Change Detection Using
Hyperspectral Images. IEEE Transactions on Geoscience and Remote
Sensing, 54, 6497-6504. https://doi.org/10.1109/TGRS.2016.2585495
4. Zhou, J., Kwan, C. and Budavari, B. (2016) Hyperspectral Image Super-
Resolution: A Hybrid Color Mapping Approach. Journal of Applied
Remote Sensing, 10, Article ID: 035024. https://doi.org/10.1117/1.
JRS.10.035024
5. Qu, Y., Wang, W., Guo, R., Ayhan, B., Kwan, C., Vance, S. and Qi, H.
(2018) Hyperspectral Anomaly Detection through Spectral Unmixing
and Dictionary Based Low Rank Decomposition. IEEE Transactions
on Geoscience and Remote Sensing, 56, 4391-4405. https://doi.
org/10.1109/TGRS.2018.2818159
6. Wu, H.R., Reibman, A., Lin, W., Pereira, F. and Hemami, S.
(2013) Perceptual Visual Signal Compression and Transmission.
Proceedings of the IEEE, 101, 2025-2043. https://doi.org/10.1109/
JPROC.2013.2262911
7. Wu, D., Tan, D.M., Baird, M., DeCampo, J., White, C. and Wu,
H.R. (2006) Perceptually Lossless Coding of Medical Images. IEEE
Transactions on Medical Imaging, 25, 335-344. https://doi.org/10.1109/
TMI.2006.870483
8. Oh, H., Bilgin, A. and Marcellin, M.W. (2013) Visually Lossless
Encoding for JPEG 2000. IEEE Transactions on Image Processing, 22,
189-201. https://doi.org/10.1109/TIP.2012.2215616
9. Tan, D.M. and Wu, D. (2016) Perceptually Lossless and Perceptually
Enhanced Image Compression System & Method. U.S. Patent
9,516,315,6.
New Results in Perceptually Lossless Compression of Hyperspectral Images 267

10. Kwan, C., Larkin, J., Budavari, B., Chou, B., Shang, E. and Tran, T.D.
(2019) A Comparison of Compression Codecs for Maritime and Sonar
Images in Bandwidth Constrained Applications. Computers, 8, 32.
https://doi.org/10.3390/computers8020032
11. Kwan, C., Larkin, J., Budavari, B. and Chou, B. (2019) Compression
Algorithm Selection for Multispectral Mastcam Images. Signal &
Image Processing: An International Journal, 10, 1-14. https://doi.
org/10.5121/sipij.2019.10101
12. Kwan, C. and Larkin, J. (2018) Perceptually Lossless Compression for
Mastcam Images. IEEE Ubiquitous Computing, Electronics & Mobile
Communication Conference, New York, 8-10 November 2018, 559-
565. https://doi.org/10.1109/UEMCON.2018.8796824
13. Kwan, C., Larkin, J. and Chou, B. (2019) Perceptually Lossless
Compression of Mastcam Images with Error Recovery. Proceedings
SPIE, Signal Processing, Sensor/Information Fusion, and Target
Recognition XXVIII, Vol. 11018. https://doi.org/10.1117/12.2518482
14. Li, N. and Li, B. (2010) Tensor Completion for On-Board Compression
of Hyperspectral Images. IEEE International Conference on
Image Processing, Hong Kong, 517-520. https://doi.org/10.1109/
ICIP.2010.5651225
15. Zhou, J., Kwan, C. and Ayhan, B. (2012) A High Performance
Missing Pixel Reconstruction Algorithm for Hyperspectral Images.
2nd International Conference on Applied and Theoretical Information
Systems Research, Taipei, 27-29 December 2012.
16. Du, Q. and Fowler, J.E. (2007) Hyperspectral Image Compression
Using JPEG2000 and Principal Component Analysis. IEEE Geoscience
and Remote Sensing Letters, 4, 201-205. https://doi.org/10.1109/
LGRS.2006.888109
17. JPEG-2000. https://en.wikipedia.org/wiki/JPEG_2000
18. X264. http://www.videolan.org/developers/x264.html
19. X265. https://www.videolan.org/developers/x265.html
20. Daala. https://xiph.org/daala/
21. http://lesun.weebly.com/hyperspectral-data-set.html
22. JPEG. https://en.wikipedia.org/wiki/JPEG
23. JPEG-XR. https://en.wikipedia.org/wiki/JPEG_XR
24. VP8. https://en.wikipedia.org/wiki/VP8
268 Information and Coding Theory in Computer Science

25. VP9. https://en.wikipedia.org/wiki/VP9


26. Tran, T.D., Liang, J. and Tu, C. (2003) Lapped Transform via Time-
Domain Pre- and Post-Filtering. IEEE Transactions on Signal
Processing, 51, 1557-1571. https://doi.org/10.1109/TSP.2003.811222
27. Kwan, C., Li, B., Xu, R., Tran, T. and Nguyen, T. (2001) Very
Low-Bit-Rate Video Compression Using Wavelets. Wavelet
Applications VIII, Proceedings SPIE, Vol. 4391, 176-180. https://doi.
org/10.1117/12.421197
28. Kwan, C., Li, B., Xu, R., Tran, T. and Nguyen, T. (2001) SAR Image
Compression Using Wavelets. Wavelet Applications VIII, Proceedings
SPIE, Vol. 4391, 349-357. https://doi.org/10.1117/12.421215
29. Kwan, C., Li, B., Xu, R., Li, X., Tran, T. and Nguyen, T. (2006) A
Complete Image Compression Codec Based on Overlapped Block
Transform with Post-Processing. EUROSIP Journal of Applied
Signal Processing, 2006, Article ID: 010968. https://doi.org/10.1155/
ASP/2006/10968
30. Ponomarenko, N., Silvestri, F., Egiazarian, K., Carli, M., Astola, J.
and Lukin, V. (2007) On Between-Coefficient Contrast Masking of
DCT Basis Functions. Proceedings of the 3rd International Workshop
on Video Processing and Quality Metrics for Consumer Electronics,
Scottsdale, Arizona.
Chapter

LOSSLESS COMPRESSION
OF DIGITAL MAMMOGRA-
PHY USING BASE
12
SWITCHING METHOD

Ravi kumar Mulemajalu1 and Shivaprakash Koliwad2


Department of IS&E., KVGCE, Sullia, Karnataka, India
1

Department of E&C., MCE, Hassan, Karnataka, India.


2

ABSTRACT
Mammography is a specific type of imaging that uses low-dose x-ray
system to examine breasts. This is an efficient means of early detection
of breast cancer. Archiving and retaining these data for at least three
years is expensive, difficult and requires sophisticated data compression
techniques. We propose a lossless compression method that makes use of
the smoothness property of the images. In the first step, de-correlation of the
given image is done using two efficient predictors. The two residue images

Citation: Mulemajalu, R. and Koliwad, S. (2009), “Lossless Compression of digital


mammography using base switching method”. Journal of Biomedical Science and En-
gineering, 2, 336-344. doi: 10.4236/jbise.2009.25049.
Copyright: © 2019 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
270 Information and Coding Theory in Computer Science

are partitioned into non overlapping sub-images of size 4x4. At every instant
one of the sub-images is selected and sent for coding. The sub-images with
all zero pixels are identified using one bit code. The remaining sub- images
are coded by using base switching method. Special techniques are used to
save the overhead information. Experimental results indicate an average
compression ratio of 6.44 for the selected database.

Keywords: Lossless Compression, Mammography image, Prediction, Stor-


age Space

INTRODUCTION
Breast cancer is the most frequent cancer in the women worldwide with 1.05
million new cases every year and represents over 20% of all malignancies
among female. In India, 80,000 women were affected by breast cancer in
2002. In the US, alone in 2002, more than 40,000 women died of breast
cancer. 98% of women survive breast cancer if the tumor is smaller than 2 cm
[1]. One of the effective methods of early diagnosis of this type of cancer is
non-palpable, non-invasive mammography. Through mammogram analysis
radiologists have a detection rate of 76% to 94%, which is considerably
higher than 57% to 70% detection rate for a clinical breast examination [2].
Mammography is a low dose x-ray technique to acquire an image of
the breast. Digital image format is required in computer aided diagnosis
(CAD) schemes to assist the radiologists in the detection of radiological
features that could point to different pathologies. However, the usefulness
of the CAD technique mainly depends on two parameters of importance:
the spatial and grey level resolutions. They must provide a diagnostic
accuracy in digital images equivalent to that of conventional films. Both
pixel size and pixel depth are factors that critically affect the visibility of
small low contrast objects or signals, which often are relevant information
for diagnosis [3]. Therefore, digital image recording systems for medical
imaging must provide high spatial resolution and high contrast sensitivity.
Due to this, mammography images commonly have a spatial resolution of
1024x1024, 2048x2048 or 4096x4096 and use 16, 12 or 8 bits/pixel. Figure
1 shows a mammography image of size 1024x1024 which uses 8 bits/pixel.
Lossless Compression of Digital Mammography Using Base Switching... 271

Figure 1. A Mammography image of size 1024x1024 which uses 8 bits/pixel.


Nevertheless, this requirement retards the implementation of digital
technologies due to the increment in processing and transmission time,
storage capacity and cost that good digital image quality implies. A typical
mammogram digitized at a resolution of 4000x5000 pixels with 50-µm spot
size and 12 bits results in approximately 40 Mb of digital data. Processing
or transmission time of such digital images could be quite long. An efficient
data compression scheme to reduce the digital data is needed.
The goal of the image compression techniques is to represent an image
with as few bits as possible in such a way that the original image can be
reconstructed from this representation without or with minimum error or
distortion. Basically image compression techniques have been classified
into two categories namely lossy and lossless methods. Lossy compression
methods cannot achieve exact recovery of the original image, but achieves
significant compression ratio. Lossless compression techniques, as their name
implies, involve no loss of information. The original data can be recovered
exactly from the compressed data. In medical applications, lossless coding
methods are required since loss of any information is usually unacceptable
[4]. Performance of the lossless compression techniques can be measured in
terms of their compression ratio, bits per pixels required in the compressed
image and the time for encoding and decoding. On the other hand, since
272 Information and Coding Theory in Computer Science

the lossy compression techniques discard some information, performance


measure includes the mean square error and peak signal to noise ratio
(PSNR) in addition to the measures used for the lossless compression.
Lossless image compression systems typically function in two stages
[5]. In the first stage, two-dimensional spatial redundancies are removed
by using an image model which can range from a relatively simple causal
prediction used in the JPEG-LS [6,7] standard to a more complicated multi-
scale segmentation based scheme. In the second stage, the two-dimensional
de-correlated residual which is obtained from the first stage, along with any
parameters used to generate the residual is coded with a one-dimensional
entropy coder such as the Huffman or the Arithmetic coder.
Existing lossless image compression algorithms can be broadly classified
into two kinds: Those based on prediction and those that are transform
based. The predictive coding system consists of a prediction that at each
pixel of the input image generates the anticipated value of that pixel based
on some of the past pixels and the prediction error is entropy coded. Various
local, global and adaptive methods can be used to generate prediction. In
most cases, however the prediction is formed by a linear combination of
some previous pixels. The variance of the prediction error is much smaller
than the variance of the gray levels in the original image. Moreover, the
first order estimate of the entropy of the error image is much smaller than
the corresponding estimate for the original image. Thus higher compression
ratio can be achieved by entropy coding the error image. The past pixels
used in the prediction are collectively referred to as a context. The popular
JPEG-LS standard uses the prediction based coding technique [8]. Transform
based algorithms, on the other hand, are often used to produce a hierarchical
or multi-resolution representation of an image and work in the frequency
domain. The popular JPEG-2000 standard uses the transform based coding
technique [9].
Several techniques have been proposed for the lossless compression of
the digital Mammography. A. Neekabadi et al. [10] uses chronological sifting
of prediction errors and coding the errors using arithmetic coding. For the 50
MIAS (Mammography Image Analysis Society) images, CSPE gives better
average compression ratio than JPEG-LS and JPEG-2000. Xiaoli Li et al.
[11] uses grammar codes in that the original image is first transformed into
a context free grammar from which the original data sequence can be fully
reconstructed by performing parallel and recursive substitutions and then
using an arithmetic coding algorithm to compress the context free grammar.
Lossless Compression of Digital Mammography Using Base Switching... 273

Compression ratio achieved is promising but it involves more complicated


processing and large computation time. Delaunay triangulation method [12]
is another approach. It uses geometric predictor based on irregular sampling
and the Delaunay triangulation. The difference between the original and the
predicted is calculated and coded using the JPEG-LS approach. The method
offers lower bit rate than the JPEG-LS, JPEG-lossless, JPEG2000 and PNG.
A limitation is the slow execution time. Lossless JPEG2000 and JPEG-LS
are considered as the best methods for the mammography images. Lossless
JPEG 2000 methods are preferred due to the wide variety of features, but are
suffered from a slightly longer encoding and decoding time [13].
Recently, there have been a few instances of using segmentation
for lossless compression. Shen and Rangayyan [14] proposed a simple
region growing scheme which generates an adaptive scanning pattern. A
difference image is then computed and coded using the JBIG compression
scheme. Higher compression ratio is possible with such a scheme for the
high resolution medical images. But the application of the same scheme
to normal images did not result in significant performance improvement.
Another scheme reported in literature involves using a variable block size
segmentation(VBSS) to obtain a context sensitive encoding of wavelet
coefficients, the residual being coded using a Huffman or Arithmetic coder
[15,16]. The performance of the method is comparable to that of the lossless
JPEG standard. Mar wan Y. et al. [17] proposed fixed block based (FBB)
lossless compression methods for the digital mammography. The algorithm
codes blocks of pixels within the image that contain the same intensity value,
thus reducing the size of the image substantially while encoding the image at
the same time. FBB method alone gives small compression ratio but when
used in conjunction with LZW it provides better compression ratio.
We propose a method based on Base switching (BS). Trees-Juen Chuang
et al. [18] have used Base-switching method to compress the general images.
[19] And [20] also have used the same concept for the compression of digital
images. The algorithm segments the image into non overlapping fixed blocks
of size n × n and codes the pixels of the blocks based on the amount of
smoothness. In the proposed work we have optimized the original BS method
for the compression of mammography images. Specific characteristics of
mammography images are well suited for the proposed method. These
characteristics include low number of edges and vast smooth regions.
The organization of the paper is as follows. Section 2 describes the
basic Base Switching (BS) method. The proposed algorithm is given in
274 Information and Coding Theory in Computer Science

Section 3. Experimental results and conclusion are given in Sections 4 and


5 respectively.

BASE-SWITCHING ALGORITHM
The BS method divides the original image (gray-level data) into non-
overlapping sub-images of size n × n. Given a n × n sub-image A, whose
N gray values are g0,g1,…gN-1, define the “minimum” m, “base” b and the
“modified sub-image “AI, whose N gray values are by

(1)

(2)

(3)
Also,

(4)
where N = n × n and each of the elements of I is 1. The value of ‘b’ is
related to smoothness of the sub-image where smoothness is measured as
the difference between maximum and minimum pixel values in the sub-
image.
The number of bits required to code the gray values is,

(5)
Then, total bits required for the whole sub-image is,

(6)
For example, for the sub-image of Figure 2, n = 4, N = 16, m = 95 & b = 9.
Modified sub-image of Figure 3 is obtained by subtracting 95 from every
gray values of A.

Figure 2. A sub-image A with n = 4, N = 16, m = 95, b = 9 & b = 4.


Lossless Compression of Digital Mammography Using Base Switching... 275

For the sub-image in Figure 3, since B = 4, ZA =64 bits.

Figure 3. Modified sub-image AI.


In order to reconstruct A, value of B and m should be known. Therefore
encoded bit stream consists of m, B and AI coded using B bits. In the
computation of B, If b is not an integer power of 2, log2 (b) is rounded to the
next higher integer. Thus, in such cases, higher number of bits is used than
absolutely required. BS method uses the following concept to exploit this
redundancy.
It is found that,

(7)
The image can be treated as an N digit number
in the base b number system. An integer value function f
can be defined such that f (AI, b) = decimal integer equivalent to the base-b
number.

(8)

(9)
Then, number of bits required to store the integer f (AI, b) is

(10)
Reconstruction of AI is done by switching the binary (base 2) number to
a base b number. Therefore, reconstruction of A needs the value of m and b.
The format of representation of a sub-image is as shown below.
276 Information and Coding Theory in Computer Science

For the example of Figure 3, b = 9 and therefore ZB = 51 bits. It is easy


to prove that always ZB ≤ ZA. We know that, Maximum value of f (AI, b) =
bN-1.
Total number of bits required to represent f in binary is

(11)
Always,

(12)
This verifies that
ZB ≤ ZA. (13)

Formats Used for Encoding


Original BS algorithm uses a block size of 3x3 for segmentation. There are
three formats used by the original algorithm for encoding the sub-images.

Format 1
If b∈{1,2 …,11}, then the coding format is

This format is economical when b < 23.4

Format 2
If b∈ (12, 13,…., 128}, then the coding format is
Lossless Compression of Digital Mammography Using Base Switching... 277

Here P(min,max) is a pair of two 3 bit numbers indicating the position of


minimum and maximum values. If b > 11, writing the positions of minimum
and maximum values is economical than coding them.

Format 3
If b∈ (129, 130,…., 256}, then the coding format is

Here, c stands for the category bit. If c is 0, then the block is encoded
using Formats 1 or 2; otherwise Format 3 is used.

Hierarchical Use of BS Technique


The encoded result of Subsection 2.1 can be compressed further in a
hierarchical manner. We can imagine that there is a so-called “base-image”,
whose gray values are b0, b1, b2, …, b255; then, since it is a kind of image
(except that each value is a base value of a sub-image rather than a gray
value of a pixel), we can use the same BS technique to compress these base
values. The details are omitted. Besides b, the minimal value m of each
block can also be grouped and compressed similarly. We can repeat the same
procedure to encode b and m values further.

PROPOSED METHOD
In the proposed method, we made following modifications to the basic BS
method.
• Prediction
• Increasing the block size from 3x3 to 4x4
• All-zero block removal
• Coding the minimum value and base value

Prediction
After reviewing the BS method, it is found that number of bits required for a
sub-image is decided by the value of base ‘b’. If ‘b’ is reduced, the number
of bits required for a sub-image is also reduced. In the proposed method,
prediction is used to reduce the value of ‘b’ significantly. A predictor
generates at each pixel of the input image the anticipated value of that
278 Information and Coding Theory in Computer Science

pixel based on some of the past inputs. The output of the predictor is then
rounded to the nearest integer, denoted x̂n and used to form the difference
or prediction error

(14)
This prediction error is coded by the proposed entropy coder. The
decoder reconstructs en from received code words and perform the reverse
operation

(15)
The quality of the prediction for each pixel directly affects how efficiently
the prediction error can be encoded. The better this prediction is less is the
information that must be carried by the prediction error signal. This, in turn,
leads to fewer bits. One way to improve the prediction used for each pixel
is to make a number of predictions then chose the one which comes closest
to the actual value [21]. This method also called as switched prediction has
the major disadvantage that the choice of prediction for each pixel must
be sent as overhead information. The proposed prediction scheme uses two
predictors and one of them is chosen for every block of pixels of size 4x4.
Thus, the choice of prediction is to be made only once for the entire 4x4
block. This reduces the amount of overhead. The two predictions are given
in Eq.16 and 17, in that, Pr1 is the popular MED predictor used in JPEG-LS
standard and Pr2 is the one used by [5] for the compression of mammography
images.
For the entire pixels of a block of size 4 × 4, one of the two predictions is
chosen depending on the smoothness property of the predicted blocks. Here,
smoothness is measured as the difference between maximum and minimum
pixel values in the block. The predictor that gives the lowest difference
value will be selected for that block. The advantage here is that the overhead
required for each block is only one bit.

(16)

(17)
Lossless Compression of Digital Mammography Using Base Switching... 279

Here, A, B, C, D, E and F are the neighbors of pixel involved in prediction


as depicted in Figure 4.

Figure 4. Neighbors of pixel involved in prediction.


The proposed switched prediction is described in the equation form in
18 where d1 and d2 are the differences between maximum and minimum
values for the two blocks obtained using the two predictors Pr1 and Pr2
respectively.

(18)
Figure 5 illustrates the prediction technique. It shows two error images
which are obtained by using two predictors Pr1 and Pr2 respectively. The BS
algorithm divides them into 4x4 sub-images and computes the difference
between maximum and minimum pixel values for all the four sub-images.
For the first sub-image, difference ‘d1’ is 6 and ‘d2’ is 8, where d1 and d2
are the differences of the sub-images corresponding to the predictors Pr1 and
Pr2 respectively. Now, since d1<d2, the prediction Pr considers 4x4 sub-
image of predictor Pr1 for further processing. This procedure is repeated for
all the other sub-images. The resulting error image is shown in Figure 6.
We use a separate file predict to store the choice of predictor made at every
sub-image.
280 Information and Coding Theory in Computer Science

Figure 5. Prediction technique.

Figure 6. Resulting error image.

Increasing the Block Size from 3x3 to 4x4


It is obvious that smaller block size gives lower b value but at the cost of
increased overhead for the total image. The basic BST algorithm uses a block
size of 3x3 to achieve optimum balance between amount of overhead and
compression efficiency. Since the prediction increases smoothness, a larger
block size can be chosen without significant difference in the smoothness.
Certainly, this will improve the compression ratio. We have tested the
proposed algorithm using different block sizes and found that block size of
4x4 gives the best result. This is supported by Table 1 giving the average
compression ratios obtained for the 50 mammography images for various
block sizes.
Lossless Compression of Digital Mammography Using Base Switching... 281

Table 1. Average compression ratio obtained for various block sizes.

All-Zero Block Removal


Mammography images are highly correlated so that pixels inside most of
the sub-images are same. During prediction and subtraction they have the
highest chance of becoming zeros. If a sub-image of the error image has all
its pixel values as zero, then that is marked as an all-zero block by storing
a bit 1 in the encoding format. For each of the remaining sub-images, bit
0 is stored in the encoding format and is retained for further processing.
Mammography images of MIAS data-set shows very large amount of all-
zero blocks. This is supported by Table 2 showing average number of all-
zero blocks present in the 50 mammography images of the MIAS dataset.

Table 2. Average number of all-zero blocks present in the 50 MIAS images.

For these images, total bits required for marking the presence or absence
of all-zero blocks is 65536, since total sub-images are 65536. The error
image will have both negative and positive pixel values since the prediction
has changed the range of the pixel values from [0, 255] to [-255, 255].
Therefore, 9 bits are required to record the pixel values. The approximate
average compression ratio obtained by all-zero block removal for the 50
MIAS images considered in Table 2 can be estimated by the following
formula.

Here the value 65536 indicates the overhead bits required for marking
the presence or absence of all-zero blocks. The numerator gives total bits
used by the original uncompressed image. This computation clearly shows
282 Information and Coding Theory in Computer Science

that removal of all-zero blocks alone gives an approximate compression


ratio of 1.73.

Coding the Minimum Value and the Base Value


The error image will have both negative and positive pixel values. The
prediction has changed the range of the minimum value from [0,255] to
[-255,255]. Therefore, 9 bits are required to record the minimum value. By
studying various images, it is found that minimum values are ranging from
-128 to 64. More concentration is observed between -8 to 0. Similarly, base
values are concentrated in the range 1 to 15. This statistics is supported by
Table 3 of average number of minimum values and base values for the 50
mammography images.

Table 3. Average number of minimum and base values in 50 MIAS images.

To exploit this redundancy, we use a typical categorize and coding


technique. A four bit code is used to identify the minimum values. Minimum
values less than 1 and greater than -15 are given with codes 0 to 14, whereas
other minimum values are represented by the code 15 followed by their
actual 9 bit values. A scheme similar to this can be used to code the base
values also. Base values between 1 and 15 are identified by using 4 bit codes
whereas values greater than 15 are identified using the code 15 followed by
their actual 9 bit values. The two four bit codes for each of the sub-images
are combined to get an 8 bit number. Such 8 bit numbers are stored in a file
and Huffman encoded at the end.

System Overview
As shown in Figure 7, we first divide the error image into sub-images of size
4x4.The sub-images are then processed one by one. For each sub-image, we
have to determine whether it belongs to all-zero category and if so they are
removed. The remaining blocks are retained for further processing.
Lossless Compression of Digital Mammography Using Base Switching... 283

Figure 7. System overview.


The two binary files zero and predict are separately run length encoded,
grouped and Huffman encoded. Also, the two four bit files min and base are
combined to form an eight bit file and Huffman encoded.

Decoding a Sub-Image
Following are the decoding steps:
• The files min, base, zero and predict are reconstructed.
• The decoding algorithm first checks whether the block is an
all-zero block or not. If all-zero, then a 4x4 block of zero’s is
generated.
• Otherwise, base value and minimum value are first obtained
by using the files min and base. The modified image AI is
reconstructed as explained in Section 2 and 4x4 error image is
obtained by adding min value to it.
284 Information and Coding Theory in Computer Science

• Type of prediction used is read from the predict file. The prediction
rule is applied to the 4x4 error image and the original 4x4 sub-
image is reconstructed.

RESULTS
We evaluated the performance of the proposed scheme using the 50 gray-
scale images of the MIAS dataset that include three varieties of images:
Normal, Benign and Malignant. The MIAS is a European Society that
researches mammograms and supplies over 11 countries with real world
clinical data. Films taken from the UK national Breast Screening Program
have been digitized to 50 micron pixel edge and representing each pixel with
an 8 bit word. MATLAB is the tool used for simulation. All the simulation
was conducted on a 1.7GHZ processor and was supplied with the same set of
50 mammography images. Each mammogram has a resolution of 1024x1024
and uses 8 bits/pixel. Results of the proposed method are compared with
that of the popular methods. Comparison of Compression results for the 50
MIAS images is shown in Table 4. This set includes all the three varieties of
images namely normal, benign and malignant.

Table 4. Comparison of compression results for the 50 MIAS images.

Figure 8 and Figure 9 show the two images mdb040. pgm and mdb025.
pgm that gives best and the worst compression ratio respectively.

Figure 8. Image mdb040. CR=14.16.


Lossless Compression of Digital Mammography Using Base Switching... 285

Figure 9. Image mdb025. CR=4.55.

CONCLUSIONS
Several techniques have been used for the lossless compression of
mammography images; none of them have used the smoothness property
of the images. Our study has shown that there is very large number of zero
blocks present in mammography images. We have picked up the concept of
Base switching transformation and success fully optimized and applied it
in conjunction with other existing compression methods to digitized high
resolution mammography images. Comparison with other approaches is
given for a set of 50 high resolution digital mammograms comprising of
normal, benign and malignant images. Compared with the PNG method, one
of the international standards, JBIG, performs better by 36%. Transformation
method based JPEG200, another international compression standard, when
used in lossless mode, performs slightly better than JBIG by 7%. Whereas, the
latest standard for lossless image compression JPEG-LS based on prediction
based method, performs best among the four international standards of
lossless coding techniques. It gives a compression ratio of 6.39 which is
1.5% better than the JPEG 2000. Finally, for these images, the proposed
method performs better than PNG, JBIG, JPEG2000 and JPEG-LS by 50%,
9.5%, 2.4% and approximately 1% respectively. The success of our method
is primarily due to its zero block removal procedure, compression of the
overheads and the switched prediction used. It should be also noted that the
speed of the BST method is very much comparable with the speed of other
standard methods as verified by [18]. Further investigation on improvement
of the performance of our method is under way, by developing more suitable
prediction methods. Motivated by the results obtained here, our next study
will carry out the compression of larger database of natural images and
medical images obtained by other modalities.
286 Information and Coding Theory in Computer Science

REFERENCES
1. Saxena, S., Rekhi, B., Bansal, A., Bagya, A., Chintamani, and Murthy,
N. S., (2005) Clinico-morphological patterns of breast cancer including
family history in a New Delhi Hospital, In-dia—a cross sectional study,
World Journal of Surgical Oncol-ogy, 1–8.
2. Cahoon, T. C., Sulton, M. A, and Bezdek, J. C. (2000), Breast
cancer detection using image processing techniques, The 9th IEEE
International Conference on Fuzzy Systems, 2, 973–976.
3. Penedo, M., Pearlman, W. A., Tahoces, P. G., Souto, M., and Vidal,
J. J., (2003) Region-based wavelet coding methods for digital
mammography, IEEE Transactions on Medical Imaging, 22, 1288–
1296.
4. Wu, X. L., (1997), Efficient lossless compression of Continu-ous-tone
images via context selection and quantization, IEEE Transaction. on
Image Processing, 6, 656– 664.
5. Ratakonda, K. and Ahuja, N. (2002), Lossless image compres-sion with
multi-scale segmentation, IEEE Transactions on Im-age Processing,
11, 1228–1237.
6. Weinberger, M. J., Rissanen, J., and Asps, R., (1996) Applica-tion
of universal context modeling to lossless compression of gray scale
images, IEEE Transaction on Image Processing, 5, 575–586.
7. Grecos, C., Jiang, J., and Edirisinghe, E. A., (2001) Two Low cost
algorithms for improved edge detection in JPEG-LS, IEEE Transactions
on Consumer Electronics, 47, 466–472.
8. Weinberger, M. J., Seroussi, G., and Sapiro, G., (2000) The LOCO-I
lossless image compression algorithm: Principles and standardization
into JPEG-LS, IEEE Transactions on Image processing, 9, 1309–1324.
9. Sung, M. M., Kim, H.-J., Kim, E.-K., Kwak, J.-Y., Kyung, J., and
Yoo, H.-S., (2002) Clinical evaluation of JPEG 2000 com-pression for
digital mammography, IEEE Transactions on Nu-clear Science, 49,
827–832.
10. Neekabadi, A., Samavi, S., and Karimi, N., (2007) Lossless compression
of mammographic images by chronological sift-ing of prediction
errors, IEEE Pacific Rim Conference on Communications, Computers
& Signal Processing, 58–61.
Lossless Compression of Digital Mammography Using Base Switching... 287

11. Li, X., Krishnan, S., and Marwan, N. W., (2004) A novel way of lossless
compression of digital mammograms using gram-mar codes, Canadian
Conference on Electrical and Computer Engineering, 4, 2085–2088.
12. da Silva, L. S. and Scharcanski, J., (2005) A lossless compres-sion
approach for mammographic digital images based on the delaunay
triangulation, International Conference on Image Processing, 2, 758–
761.
13. Khademi, A. and Krishnan, S., (2005) Comparison of JPEG2000
and other lossless vompression dchemes for figital mammograms,
Proceedings of the IEEE Engineering in Medi-cine and Biology
Conference, 3771– 3774.
14. Shen, L. and Rangayyan, R. M., (1997) A segmentation based lossless
image coding method for high-resolution medical im-age compression,
IEEE Transactions on Medical imaging, 16, 301–307.
15. Ranganathan, N., Romaniuk, S. G., and Namuduri, K. R., (1995)
A lossless image compression algorithm using variable block
segmentation, IEEE Transactions on Image Processing, 4, 1396–1406.
16. Namuduri, K. R., Ranganathan, N., and Rashedi, H., (1996) SVBS: A
high-resolution medical image compression algo-rithm using slicing
with variable block size segmentation, IEEE Proceedings of ICPR,
919–923.
17. Alsaiegh, M. Y. and Krishnan, S. (2001), Fixed block- based lossless
compression of digital mammograms, Canadian Con-ference on
Electrical and Computer Engineering, 2, 937–942.
18. Chuang, T.-J. and Lin, J. C., (1998) A new algorithm for loss-less still
image compression, Pattern Recognition, 31, 1343–1352.
19. Chang, C.-C., Hsieh, C.-P., and Hsiao, J.-Y., (2003) A new approach to
lossless image compression, Proceedings of ICCT’03, 1734–38.
20. Ravikumar, M. S., Koliwad, S., and Dwarakish, G. S., (2008) Lossless
compression of digital mammography using fixed block segmentation
and pixel grouping, Proceedings of IEEE 6th Indian Conference on
Computer Vision Graphics and Im-age Processing, 201–206.
21. Sayood, K., (2003) Lossless compression handbook, First edi-tion,
Academic Press, USA, 207–223.
Chapter

LOSSLESS IMAGE
COMPRESSION BASED
ON MULTIPLE-TABLES
13
ARITHMETIC CODING

Rung-Ching Chen,1 Pei-Yan Pai,2 Yung-Kuan Chan,3 and Chin-Chen


Chang2,4
1
Department of Information Management, Chaoyang University of Technology, No.
168, Jifong E. Rd., Wufong Township Taichung County 41349, Taiwan
2
Department of Computer Science, National Tsing-Hua University, No. 101, Section 2,
Kuang-Fu Road, Hsinchu 300, Taiwan
3
Department of Management Information Systems, National Chung Hsing University,
250 Kuo-kuang Road, Taichung 402, Taiwan
4
Department of Information Engineering and Computer Science, Feng Chia University,
No. 100 Wenhwa Rd., Seatwen, Taichung 407, Taiwan

Citation: Rung-Ching Chen, Pei-Yan Pai, Yung-Kuan Chan, Chin-Chen Chang, “Loss-
less Image Compression Based on Multiple-Tables Arithmetic Coding”, Mathematical
Problems in Engineering, vol. 2009, Article ID 128317, 13 pages, 2009. https://doi.
org/10.1155/2009/128317.
Copyright: © 2020 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
290 Information and Coding Theory in Computer Science

ABSTRACT
This paper is intended to present a lossless image compression method based
on multiple-tables arithmetic coding (MTAC) method to encode a gray-level
image f. First, the MTAC method employs a median edge detector (MED)
to reduce the entropy rate of f. The gray levels of two adjacent pixels in
an image are usually similar. A base-switching transformation approach is
then used to reduce the spatial redundancy of the image. The gray levels of
some pixels in an image are more common than those of others. Finally, the
arithmetic encoding method is applied to reduce the coding redundancy of
the image. To promote high performance of the arithmetic encoding method,
the MTAC method first classifies the data and then encodes each cluster
of data using a distinct code table. The experimental results show that, in
most cases, the MTAC method provides a higher efficiency in use of storage
space than the lossless JPEG2000 does.

INTRODUCTION
With the rapid development of image processing and Internet technologies,
a great number of digital images are being created every moment. Therefore,
it is necessary to develop an effective image-compression method to reduce
the storage space required to hold image data and to speed the image
transmission over the Internet [1–16].
Image compression reduces the amount of data required to describe a
digital image by removing the redundant data in the image. Lossless image
compression deals with reducing coding redundancy and spatial redundancy.
Coding redundancy consists in using variable-length codewords selected to
match the statistics of the original source. The gray levels of some pixels
in an image are more common than those of others–-that is, different gray
levels occur with different probabilities–-so coding redundancy reduction
uses shorter codewords for the more common gray levels and longer
codewords for the less common gray levels. We call this process variable-
length coding. This type of coding is always reversible and is usually
implemented using look-up tables. Examples of image coding schemes that
explore coding redundancy are the Huffman coding [4, 5, 7] and arithmetic
coding techniques [8, 9].
There exists a significant correlation among the neighbor pixels in an
image, which may result spatial redundancy in data. Spatial redundancy
reduction exploits the fact that the gray levels of the pixels in an image
region are usually the same or almost the same. Methods, such as the LZ77,
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 291

LZ88, and LZW methods, exploit the spatial redundancy in several ways,
one of which is to predict the gray level of a pixel through the gray levels of
its neighboring pixels [14].
To encode an image effectively, a statistical-model-based compression
method needs precisely to predict the occurrence probabilities of the data
patterns in the image. This paper proposes a lossless image compression
method based on multiple-tables arithmetic coding (MTAC) method to
encode a gray-level image.
A statistical-model-based compression method generally creates a code
table to hold the probabilities of occurrence of all data patterns. The type of
data pattern significantly affects the encoding efficiency when minimizing
storage space. When the data come from different sources, it is difficult to
find an appropriate code table to describe all the data. Therefore, this MTAC
method categorizes the data and adopts distinct code tables that record the
frequencies which the data patterns occur in different clusters.

THE MTAC METHOD


The proposed MTAC method contains three approaches: median edge
detector (MED) processing, base-switching transformation, and statistical-
model-based compressing. This section introduces these three approaches.

MED Processing Approach


Shannon’s entropy equation can estimate the average minimum number of
bits needed to encode a data pattern based on the frequency which the data
pattern occurs in a data set [11, 12]. Let l be the total number of different
data patterns in a data set and pi the probability of the ith data pattern’s
occurring in the data set. The entropy rate E of the data set is defined as

(2.1)
It is impossible to encode the data set, in a lossless manner, with a bit
rate higher than or equal to E. The bit rate is defined as the ratio of the
number of bits holding the compression data to the number of pixels in the
compressed image. The higher the entropy rate, the less one can compress it
using a statistical-model-based compression method.
292 Information and Coding Theory in Computer Science

Let f be the encoded image consisting of H × W pixels, where H and W


are the height and the width of f, respectively. MED [10] estimates the gray
level of a pixel by detecting whether there is an edge passing through the
pixel. MED scans each pixel in f, starting from the left-top pixel of f, in the
order shown in Figure 1. While scanning a pixel P(i, j), MED estimates the
gray level g(i, j) of P(i, j) via the gray-levels g(i, j − 1), g(i − 1, j − 1), and
g(i − 1, j) of the pixels P(i, j − 1), P(i − 1, j − 1), and P(i − 1, j), where P(i,
j) is the pixel located at the coordinates (i, j) in f. Figure 2 shows the spatial
relationships of P(i, j), P(i, j − 1), P(i − 1, j − 1), and P(i − 1, j).

Figure 1. The scanning order of the MED in an image represented by 8 × 8 pix-


els.

Figure 2. Part of the pixels in an image.

For i = 1 or j = 1, the estimated gray level of P(i, j) is defined as

(2.2)
In addition, for i = 1 to H, and j = 1 to W:
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 293

(2.3)
Here, Max( g(i, j − 1), g(i − 1, j)) and Min(g(i, j − 1), g(i − 1, j)) are the
maximum and the minimum between g(i, j − 1) and g(i − 1, j), respectively.
If , MED considers that a horizontal edge passes through
P(i, j) or some pixels above P(i, j). When , MED
perceives that one vertical edge passes through P(i, j) or some pixel on the
left of P(i, j).

Let be the difference between


and g(i, j); we call e(i, j) the estimated error of g(i, j). Similarly, in the
decompressing phrase, based on g(i, j − 1), g(i − 1, j − 1), and g(i − 1, j),
the MTAC method can compute through formulas (2.2) or (2.3); then
it can get . To recover f without loss, the MTAC
method needs to save g(0, 0) and the estimated errors of all other pixels in f.
The estimated error e(i, j) is within the interval between −255 and 255.
Each e(i, j) can be represented by an 8-bit memory space that describes the
absolution value | e(i, j) | of e(i, j) and one bit b that records the sign of e(i, j).
All the | e(i, j) |s compose a gray-level image fe, and all the sign bits bs make
up a binary image fs. We call fe the error image and fs the sign bit image of f.
Figure 3 shows two 512 × 512 gray-level images Airplane and Baboon,
and their gray-level histograms. Let l = 256, and

(2.4)
294 Information and Coding Theory in Computer Science

Figure 3. Two gray-level images, Airplane and Baboon, and their color histo-
grams.
According to formula (2.1), the entropy rates of Airplane and Baboon,
in theory, reach 6.5 and 7.2 bits/pixel, respectively. From Shannon’s limit
[11, 12], with such an entropy rate, the minimum number of bits required to
describe a pixel in Airplane (resp., Baboon) is 6.5 (resp., 7.2) bits/pixel. Since
the numbers of bits are over the acceptable maximum, the MTAC method
utilizes MED [10] to decrease the entropy rate of f before encoding f. Figure
4 demonstrates the error images of Airplane and Baboon shown in Figure 3,
and the gray-level histograms of the error images. The gray levels of most
pixels in the error images are close to 0; the entropy rates of the error images
of Airplane and Baboon are 3.6 and 5.2 bits/pixel, respectively, which are far
lower than the entropy rates of the original Airplane and Baboon.
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 295

Figure 4. The difference images and their histograms of Airplane and Baboon.
Figure 5 shows the sign bit images of Airplane and Baboon. It is clear that
both sign bit images are messy, so it is difficult to find a method effectively
with which to encode them effectively. To deal with this problem, the MTAC
method transforms the error image and sign bit image into a difference image
and an MSB (most significant bit) image, respectively. The MTAC method
pulls out the MSB of all the | e(i, j) |s to create an H × W binary image fMSB,
where the MSB of | e(i, j) | is given to the pixel located at the coordinates
(i, j) of fMSB. We call the binary image the MSB image fMSB of f. Meanwhile,
the MTAC method concatenates the sign bit b of e(i, j) and the remaining
| e(i, j) |, whose MSB has been drawn out, by appending b to the rightmost
bit of the remaining | e(i, j) | in order to generate another gray-level image.
We name the gray-level image the difference image of f. Figure 6 illustrates
these actions.

Figure 5. The sign bit images of the images Airplane and Baboon.
296 Information and Coding Theory in Computer Science

Figure 6. The pixel values of e(i, j).


Figure 7 shows the MSB images of Airplane and Baboon. Almost all
the pixels on the MSB images are 0. Figure 8 displays the difference images
of Airplane and Baboon and their gray-level histograms. Clearly, the gray
levels of most pixels in the difference images are equal to 0. The entropy
rates of the difference images of Airplane and Baboon are 4.4 and 6.2 bits/
pixel, respectively. These entropy rates are higher than the entropy rates of
their error images but are much lower than those of the original Airplane
and Baboon.

Figure 7. The MSB images of Airplane and Baboon.


Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 297

Figure 8. The difference images and color histograms of Airplane and Baboon.

Base-Switching Transformation Approach


The gray level of a pixel in a gray-level image is generally represented by
an 8-bit memory space. However, it is uneconomical if the gray levels of the
pixels in a gray-level image are similar. Hence, the MTAC method adopts
the base-switching transformation (BST) algorithm [1, 2] to compress a
difference image.
The BST algorithm partitions a difference image into small
nonoverlapping image blocks, each consisting of m × n pixels. Let gmin and
gmax be the minimal and maximal gray levels of the pixels in an image block
B. The difference between gmin and the gray level g of each pixel in B can
be depicted by bits. The MTAC method uses a 3-bit
memory space S to describe , where
298 Information and Coding Theory in Computer Science

(2.5)
For each image block, the BST algorithm needs to hold only gmin, S, and
the gray-level differences between gmin and the gray levels of all the pixels
in B. We call the difference between the gray level of a pixel P and gmin the
gray-level difference of P. Figure 9 is a 4 × 4 image block B. The 16 × 8
= 128 bits of memory space are required to store B. However, in the BST
algorithm, gmax, gmin, and S of B are 137, 122, and 3, respectively. The BST
algorithm uses 8 bits, 3 bits, and 4 × 16 bits to hold gmin, S, and the gray-level
differences of all the pixels in B; hence, the BST algorithm requires only a
total of 75 bits to store B.

Figure 9. An image block of 4 × 4 pixels.

Statistical-Model-Based Compressing Approach


After the MED processing approach, image f is transformed into an MSB
image and a difference image. In the base-switching transformation approach,
the difference image is segmented into nonoverlapping small image blocks.
The MTAC method then writes down gmin, S, and the gray-level differences
of all the pixels in each image block. However, a few pixels may have big
gray-level differences in an image block, so each gray-level difference in
this image block requires large number of bits to hold it. For example, the
maximal gray-level difference of the image block in Figure 9 is 15; therefore,
each gray-level difference can be expressed by at least 4 bits. To remedy this
problem, the MTAC method takes arithmetic coding algorithm continuously
to compress the data obtained in the MED processing and BST approaches.
The arithmetic coding algorithm [8, 15] is one of the statistical-model-
based compressing methods that decide the bit length of a code according
to the occurrence frequencies of data patterns. These methods give longer
codes to the data patterns that occur more frequently and shorter codes to
those that occur less often. Hence, the type of data pattern significantly
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 299

affects the encoding’s efficiency in minimizing storage space. The MTAC


method will adopt the arithmetic coding algorithm to compress the MSB
image, all the gmins, and all Ss. Since the MSB image, all the gmins, and all Ss
have different statistics, the MTAC method will require distinct code tables
to record the data patterns of the MSB image, all the gmins, and all Ss. Each
data pattern of the MSB image, all the gmins, and all Ss are described by
8-bits, 8-bits, and 3-bits in length, respectively.
Next, the MTAC method concatenates the gray-level differences of all
the image blocks into a binary string GDS, where the bit length of each
gray-level difference in the image blocks is S. For example, the gray-level
differences in all the image blocks with S = 4 are concatenated into GD4.
The MTAC method then uses an arithmetic coding algorithm to encode each
GDS, where the bit length of each data pattern in encoding GDS is S. Finally,
the MTAC method needs to hold only the height H and the width W of f,
all the code tables, and all the compression data generated by the arithmetic
coding algorithm.
After the statistical-model-based compressing approach has been
employed, the MTAC method concatenates W, H, StringCODE TABLE, StringMSB,
Stringg min, StringS, and StringGD into the compression data. Here, StringCODE
TABLE
represents all the code tables; StringMSB, Stringg min, StringS, and
StringGD are the compression data of the MSB image, all gmins, Ss, and GDSs,
respectively.

Image Decompression
In the decompression phrase, the MTAC method first draws W, H, and
StringCODE TABLE from the compression data. The bit length of each data pattern
in the MSB image is 8. Hence, the MTAC method can reconstruct the MSB
image based on StringMSB by using the arithmetic decoding method. Since
f consists of H × W/9 image blocks, the MTAC method will decompress
the (H × W/9)gmins from Stringg min using the arithmetical decoding method,
where the bit length of a data pattern is 8. Similarly, it can decode (H × W/9)
Ss from StringS, where each data pattern is described by 3 bits. How many
data patterns are in each GDS can be easily computed via Ss. Hence, each
GDS can be decoded as well.
300 Information and Coding Theory in Computer Science

EXPERIMENTS
The purpose of this section is to investigate the performance of the MTAC
method by experiments. In these experiments, ten 256 × 256 gray-level
images Airplane, Lena, Baboon, Gold, Sailboat, Boat, Toy, Barb, Pepper,
and Girl, shown in Figure 10, are used as test images. The first experiment
explores the effect of the MED processing approach on reducing the entropy
rate of the compressed image. Table 1 lists the entropy rates of the ten original
test images and the entropy rates of their error images. The experimental
results show that most of the entropy rates of the error images are close to
half those of the original test images.

Table 1. Entropies of ten original images and their error images.

Image Entropy rate of original image Entropy rate of error image


Airplane 6.529 3.567
Baboon 7.224 5.091
Lena 7.432 3.681
Toy 6.748 3.123
Gold 7.452 3.853
Sailboat 7.248 4.126
Boat 6.975 3.595
Barb 7.647 4.557
Pepper 7.570 3.535
Girl 7.260 3.935
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 301

Figure 10. The testing images.


In experiment 2, the arithmetic coding method is used to encode the
sign bit images of the ten test images, where the bit length of each data
pattern is 8 bits. Table 2 shows the sizes of the original sign bit images and
their compression data obtained by the arithmetic coding method where the
bit length of each data pattern is 8 bits. The experimental results illustrate
302 Information and Coding Theory in Computer Science

the difficulty of obtaining a good performance in compressing the sign bit


images. Since the sign bit images are very messy, the arithmetic coding
method cannot effectively encode it; even for most error images, the sizes of
their compression data are larger than the sizes of their original images; that
is, there hardly exist the coding and spatial redundancies in the error images.
Table 3 lists the entropy rates of the difference images of the ten test images
where m × n are set to 3 × 3. Although the entropy rates of the difference
images are higher than those of the error images, they are much lower than
those of the original images are.

Table 2. The size of the sign bit images and their compression data.

Image Size of original image (bytes) Size of compression data (bytes)


Airplane 8192 8069
Baboon 8192 8256
Lena 8192 8187
Toy 8192 8135
Gold 8192 8169
Sailboat 8192 8310
Boat 8192 8152
Barb 8192 8222
Pepper 8192 8201
Girl 8192 8296

Table 3. Entropy rates of the difference images.

Image Entropy rate


Airplane 4.368
Baboon 6.048
Lena 4.527
Toy 3.900
Gold 4.742
Sailboat 4.997
Boat 4.396
Barb 5.442
Pepper 4.390
Girl 4.801

The last experiment compares the performance of the MTAC method


with that of the lossless JPEG2000 [3, 6, 13]. Table 4 displays the bit rates
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 303

obtained by the MTAC method and the lossless JPEG2000 in encoding


the test images. Table 4 reveals that the MTAC method is more efficient in
storage space use than the lossless JPEG2000, except image Barb. Figure
11 shows that huge gray-level variations among most adjacent pixels in
the partial magnified difference image of Barb. The MTAC method obtains
better performance in terms of storage space use than that of the lossless
JPEG2000 in encoding an image with small gray-level variations among
adjacent pixels but performs worse in compressing an image with great
gray-level variations among adjacent pixels. Hence, the MTAC method
performs worse in terms of storage space use than the lossless JPEG2000
does when encoding Barb.

Table 4. Bit rates (bits/pixel) obtained by the MTAC and lossless JPEG 2000.

Method
Image
MTAC Lossless JPEG2000
Airplane 4.14 4.35
Lena 4.20 4.25
Baboon 6.10 6.11
Gold 4.71 4.90
Sailboat 4.88 5.10
Boat 4.24 4.44
Toy 3.90 4.16
Barb 5.28 5.14
Pepper 4.34 4.43
Girl 4.63 4.73

Figure 11. The difference image of Barb and its partial image.
304 Information and Coding Theory in Computer Science

CONCLUSIONS
This paper proposes the MTAC method to encode a gray-level image f. The
MTAC method contains the MED processing, BST, and statistical-model-
based compressing approaches. The MED processing approach reduces the
entropy rate of f. The BST approach decreases the spatial redundancy of
the difference image of f based on the similarity among adjacent pixels.
The statistical-model-based compressing approach further compresses the
data generated in the MED processing and BST approaches, based on their
coding redundancy. The data patterns of the data produced by the MED
processing approach and the BST approach have different bit lengths and
distinct occurrence frequencies. Hence, the MTAC method first classifies
the data into clusters before compressing the data in each cluster using the
arithmetic coding algorithm via separated code tables.
The experimental results reveal that the MTAC method usually gives a
better bit rate than the lossless JPEG2000 does, particularly for the images
with small gray-level variations among adjacent pixels. However, when the
gray-level variations among adjacent pixels in an image are very large, the
MTAC method performs worse in terms of bit rate.
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 305

REFERENCES
1. T. J. Chuang and J. C. Lin, “On the lossless compression of still
image,” in Proceedings of the International Computer Symposium-On
Image Processing and Character Reognition (ICS ‘96), pp. 121–128,
Kaohsiung, Taiwan, 1996.
2. T. J. Chuang and J. C. Lin, “New approach to image encryption,”
Journal of Electronic Imaging, vol. 7, no. 2, pp. 350–356, 1998.
3. S. C. Diego, G. Raphaël, and E. Touradj, “JPEG 2000 performance
evaluation and assessment,” Signal Processing: Image Communication,
vol. 17, no. 1, pp. 113–130, 2002.
4. Y. C. Hu and C.-C. Chang, “A new lossless compression scheme based
on huffman coding scheme for image compression,” Signal Processing:
Image Communication, vol. 16, no. 4, pp. 367–372, 2000.
5. D. A. Huffman, “A method for the construction of minimum redundancy
codes,” in Proceedings of the IRE, vol. 40, pp. 1098–1101, 1952.
6. ISO/IEC FCD 155444-1, Information Technology-JPEG 2000 Image
Coding System, 2000.
7. D. E. Knuth, “Dynamic Huffman coding,” Journal of Algorithms, vol.
6, no. 2, pp. 163–180, 1985.
8. G. G. Langdon Jr., “An introduction to arithmetic coding,” IBM Journal
of Research and Development, vol. 28, no. 2, pp. 135–149, 1984.
9. J. Rissanen and G. G. Langdon Jr., “Arithmetic coding,” IBM Journal
of Research and Development, vol. 23, no. 2, pp. 149–162, 1979.
10. S. A. Martucci, “Reversible compression of HDTV images using
median adaptive prediction and arithmetic coding,” in Proceedings of
the IEEE International Symposium on Circuits and Systems, vol. 2, pp.
1310–1313, New York, NY, USA, 1990.
11. C. E. Shannon, “A mathematical theory of communication,” Bell
System Technical Journal, vol. 27, pp. 379–423, 1948.
12. C. E. Shannon, “Prediction and entropy of printed English,” Bell
System Technical Journal, vol. 30, pp. 50–64, 1951.
13. A. N. Skodras, C. A. Christopoulos, and T. Ebrahimi, “JPEG 2000:
the upcoming still image compression standard,” Pattern Recognition
Letters, pp. 1337–1345, 2001.
306 Information and Coding Theory in Computer Science

14. U. Topaloglu and C. Bayrak, “Polymorphic compression,” in


Algorithms and Database Systems, P. Yolum et al., Ed., vol. 3733 of
Lecture Notes in Computer Science, pp. 759–767, Springer, New York,
NY, USA, 2005.
15. T. Welch, “A technique for high-performance data compression,” IEEE
Computer, vol. 17, no. 6, pp. 8–19, 1984.
16. M. U. Celik, G. Sharma, and A. M. Tekalp, “Gray-level-
embedded lossless image compression,” Signal Processing: Image
Communication, vol. 18, no. 6, pp. 443–454, 2003.
SECTION 4:
INFORMATION AND SHANNON ENTROPY
Chapter

ENTROPY—A UNIVERSAL
CONCEPT IN SCIENCES 14

Vladimír Majerník
Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia

ABSTRACT
Entropy represents a universal concept in science suitable for quantifying
the uncertainty of a series of random events. We define and describe
this notion in an appropriate manner for physicists. We start with a brief
recapitulation of the basic concept of the theory probability being useful
for the determination of the concept of entropy. The history of how this
concept came into its to-day exact form is sketched. We show that the
Shannon entropy represents the most adequate measure of the probabilistic
uncertainty of a random object. Though the notion of entropy has been
introduced in classical thermodynamics as a thermodynamic state variable

Citation: Majerník, V. (2014), “Entropy - A Universal Concept in Sciences”. Natural


Science, 6, 552-564. doi: 10.4236/ns.2014.67055.
Copyright: © 2014 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
310 Information and Coding Theory in Computer Science

it relies on concepts studied in the theory of probability and mathematical


statistics. We point out that whole formalisms of statistical mechanics can be
rewritten in terms of Shannon entropy. The notion “entropy” is differently
understood in various science disciplines: in classical physics it represents
the thermodynamical state variable; in communication theory it represents
the efficiency of transmission of communication; in the theory of general
systems the magnitude of the configurational order; in ecology the measure
for bio-diversity; in statistics the degree of disorder, etc. All these notions
can be mapped on the general mathematical concept of entropy. By means
of entropy, the configurational order of complex systems can be exactly
quantified. Besides the Shannon entropy, there exists a class of Shannon-
like entropies which converge, under certain circumstances, toward
Shannon entropy. The Shannon-like entropy is sometimes easier to handle
mathematically then Shannon entropy. One of the important Shannon-like
entropy is well-known Tsallis entropy. The application of the Shannon and
Shannon-like entropies in science is really versatile. Besides the mentioned
statistical physics, they play a fundamental role in the quantum information,
communication theory, in the description of disorder, etc.

Keywords: Probability, Uncertainty, Shannon Entropy, Shannon-Like En-


tropies

INTRODUCTION
At the most fundamental level, all our further considerations rely on the
concept of probability. Although there is a well-defined mathematical
theory of probability, there is no universal agreement about the meaning of
probability. Thus, for example, there is the view that probability is an objective
property of a system and another view that it describes a subjective state of
belief of a person. Then there is the frequentist view that the probability of an
event is the relative frequency of its occurrence in a long or infinite sequence
of trials. This latter interpretation is often employed in the mathematical
statistics and statistical physics. The probability means in everyday life the
degree of ignorance about the outcome of a random trial. This is why the
probability is commonly interpreted as degree of the subjective expectation
of an outcome of a random trial. Both subjective and statistical probability
are “normed”. It means that the degree of expectation that an outcome of a
random trial occurs, and the degree of the “complementary” expectation,
that it does not, is always equal to one [1] 1.
Entropy—A Universal Concept in Sciences 311

Although the concept of probability is here covered in a sophisticated


mathematical language, it expresses only the commonly familiar properties
of probability used in everyday life. For example, each number of spots at
the throw of a simple die represents an elementary random event to which
a positive real number is associated called its probability (relation (i)). The
probability of two (or more) numbers of spots at the throw of a simple die is
equal to the sum of their probabilities (relation (iii)). The sum of probabilities
of all possible numbers of spots is normed to one (relation (iv)).
The word “entropy”2 was first used in 1984 by Clausius in his book
Abhandlungen über Wärmetheorie to describe a quantity accompanying
a change from the thermal to mechanical energy and it continued to have
this meaning in thermodynamics. Boltzmann [2] in his Vorlesungen über
Gastheorie presented the statistical interpretation of the thermodynamical
entropy. He linked the thermodynamic entropy with the molecular
disorder. The general concept of entropy as a measure of uncertainty was
first introduced by Shannon and Wiener. Shannon is also credited for the
development of a quantitative measure of the amount of information [3] .
Shannon entropy may be considered as a generalization of entropy, defined
by Hartley, when the probability of each event is equal. Nyquist [4] was the
first author who introduced a measure of information. His paper has largely
remained unnoticed. After publication of Shannon seminal paper in 1948 [3]
, the use of entropy as measure of uncertainty grew rapidly and was applied
with various successes in most area of human endeavor.
Mathematicians were attracted to the possibility of providing axiomatic
structure of entropy and to the ramification thereof. The axiomatic approach to
the concept of entropy attempts to find a system of postulates which provides a
unique mathematical characteristic of entropy3 and which adequately reflects
the properties asked from the probabilistic uncertainty measure in a diversified
real situation. This has been very interesting and thought-provoking area for
scientists. Khinchin [5] was the first who gave a clear and rigorous presentation
of the mathematical foundation of entropy. A good number of works have
been done to describe the properties of entropy. An extensive list of works in
this field can be found in the book of Aczcél and Daróczy [6] .
The fundamental concept for the description of random processes is the
notion of the random trial. A random trial is characterized by a set of its
outcomes (values) and the corresponding probability distribution. A typical
random trial is the throw of a single dice characterized by the following
scheme ( are the positions of dice after its throwing)
312 Information and Coding Theory in Computer Science

S S1 S2 S3 S4 S5 S6
P 1/6 1/6 1/6 1/6 1/6 1/6
x 1 2 3 4 5 6

To any random trial it is assigned a random variable which represents


a mathematical quantity assuming a set of values with the corresponding
probabilities (see, e.g. [1] ).
There are two measures which express the uncertainty of a random trial:
(i) The moment measures containing in its definition both the values
assigned to trial outcomes and the set of the corresponding
probabilities. The moment uncertainty measures are given as a
rule by the higher statistical moments of a random variable
. As it is well-known, the k-th moment about zero (uncorrelated
moment) xk, and the central moment of the k-th order assigned
to a discrete random variable with the probability distribution
, is defined as

and

respectively.
The statistical moments of a random variable are often used as the
uncertainty measures of the random trial, especially in the experimental
physics, where, e.g., the standard deviation of measured quantities
characterizes the accuracy of a physical measurement. The moment
uncertainty measures of a random variable are also used by formulating the
uncertainty relations in quantum mechanics [7] .
(ii)The probabilistic or entropic measures of uncertainty of a random
trial contain in their expressions only the components of the
probability distribution of a random trial.
To determine the notion of entropy we consider quantities, called
as partial uncertainties which are assigned to individual probabilities
. A partial uncertainty we denote by symbol Hi. In any
probabilistic uncertainty measures, a partial uncertainty is function only of
Entropy—A Universal Concept in Sciences 313

the corresponding probability . The requirements


asked from a partial uncertainty H(Pi) are the following [8] : (see Appendix):
(i) It is a monotonously decreasing continuous and unique function
of the corresponding probability;
(ii) The common value of the uncertainty of a certain outcome of two
statistically independent trials H(Pi, Pj) is additive, i.e.

(1)
where Pi and Pj are the probability of the i-th and j-th outcome, respectively;
(iii) .
It was shown that the only function which satisfies these requirements
has the form [8]

The mean value of the partial uncertainties is

(2)
The quantity is called information-theoretical or Shannon entropy.
We denote it by symbol S. Shannon entropy is a real and positive number.
It is a function only of the components of the probability distribution
assigned to the set of outcomes of a random trial.
Shannon entropy satisfies the following demands (see Appendix):
(i) If the probability distribution contains only one component, e.g.
, and the rest components are equal to zero,
then . In this case, there is no uncertainty in a random
trial because an outcome is realized with certainty.
(ii) The more spread is the probability distribution P, the larger
becomes the entropy S.

(iii)
For a uniform probability distribution becomes
maximal. In this case, the probabilities of all outcomes are equal,
therefore the mean uncertainty of such a random trial becomes
maximum.
One uses for the characterization of a random trial a random scheme. If
is a discrete random variable assigned to a random trial then its random
scheme has the form
314 Information and Coding Theory in Computer Science

S S1 S2 Sn

P P(x1) P(x2) P(xn)

X x1 x2 xn

are the outcomes of a random trial (in quantum physics, e.g.


the quantum states), are their probabilities and
are the values defined on (in quantum physics,
e.g. the eigenvalues). A probability distribution, is the
complete set of probabilities of all individual outcomes of a random trial.
We note that there is a set of the probabilistic uncertainty measures
defined by means of other functions then . They are called
nonstandard or Shannon-like entropies. We shall deal with them in the next
sections.

ENTROPY AS A QUALIFICATOR OF THE


CONFIGURATIONAL ORDER
Since the simple rule holds that the smaller the order in a system the larger its
entropy, the entropy appears to be the appropriate quantity for the expression
of the measure of the configurational order (organization). The orderliness
and entropy of a physical system are related to each other inversely so that
any increase in the degree of configurational order must necessarily result
in the decrease of its entropy. The measure of the configurational order
constructed by using entropies is called Watanabe measure and is defined
as follows [9] :
configurational order of a system = (sum of entropies of the parts of the
system) − (entropy of the whole system).
The Watanabe measure for configurational order is related to the other
measure of configurational organization well-known in theory of information,
called redundancy. Both measures express quantitatively the property of the
configurationally organized systems to have order between its elements,
which causes that the system as a whole behaves in a more deterministic
way than its individual parts. If a system consists only of elements which
are statistically independent, the Watanabe measure for the configurational
organization becomes zero. If the elements of a system are deterministically
dependent, its configurational organization gets the maximum value. A
general system has its configurational organization between these extreme
Entropy—A Universal Concept in Sciences 315

values. To the prominent systems which can be organized configurationally


belong physical statistical systems (i.e., about all, Ising systems of spins)
[10] . High configurational organization is exhibited especially by systems
which have some spatial, temporal or spatio-temporal structures that have
arisen in a process which takes place far from thermal equilibrium (e.g. laser,
fluid instabilities, etc.) [11] . These systems can be sustained only by a steady
flow of energy and matter, therefore they are called open systems [12] . A
large class of systems, which are generally organized configurationally as
well as functionally, comprises the so-called string systems which represent
sequences of elements forming finite alphabets. To these systems belong,
e.g., language, music, genetic DNA and various bio-polymers. Since many
of such systems are goal-directed and have a functional organization as well,
they are especially appropriate for the study of the interrelation between the
configurational and functional organization [10] .

THE CONCEPT OF ENTROPY IN


THERMODYNAMICS AND STATISTICAL PHYSICS
A remarkable event in the history of physics was the interpretation of the
phenomenological thermodynamics in terms of motion and randomness. In
this interpretation, the temperature is related to motion while the randomness
is linked with the Clausius entropy. The homeomorphous mapping of the
phenomenological thermodynamics on the formalism of mathematical
statistics gave rise to two entropy concepts: the Clausius thermodynamic
entropy as a thermodynamic state variable of a thermodynamic system
and the Boltzmann statistical entropy as the logarithm of probability of
state of a physical ensemble. The fact that the thermodynamic entropy is a
state variable means that it is completely defined when the state (pressure,
volume, temperature, etc.) of a thermodynamic system is defined. This is
derived from mathematics, which shows that only the initial and final states
of a thermodynamic system determine the change of its entropy. The larger
the value of the entropy of a particular state of a thermodynamic system, the
less available is the energy of this system to do work.
The statistical concept of entropy was introduced in physics when
seeking a statistical quantity homeomorphous with the thermodynamic
entropy. As it is well-known, the Clausius entropy of a thermodynamic
system St is linked with ensemble probability W by the celebrated Boltzmann
law, , where W is so-called “thermodynamic” probability
determined by the configurational properties of a statistical system and KB is
316 Information and Coding Theory in Computer Science

the Boltzmann constant4. The Boltzmann law represents the solution to the
functional equation between St and W. Let us consider a set of the isolated
thermodynamic systems . According to Clausius, the total
entropy of this system is an additive function of the entropies of its parts,
i.e., it holds

(3)
On the other side, the joint “thermodynamic” probability of system (3)
is

(4)
To obtain the homomorphism between Equations (3) and (4), it is
sufficient that

(5)
which is just the Boltzmann law [2] .
We give some remarks regarding the relationship between the Clausius,
Boltzmann and Shannon entropies:
(i) The thermodynamic probability W in the Boltzmann law is given
by the number of the possibilities how to distribute N particles in
n cells having different energies

We show that the physical entropy given by Boltzmann’s law is equal to


the sum of Shannon entropies of energies taken as random variables defined
on the individual particles, i.e.,

(6)
The probability Pi that a particle of the statistical ensemble has the i-th
value of energy is given by the ratio Ni/N. Inserting the probabilities
Entropy—A Universal Concept in Sciences 317

(7)
into Boltzmann’s entropy formula we have

(8)
Supposing that the number of particles in a statistical ensemble is very
large, we can use the asymptotic formula

which inserted in Boltzmann’s entropy yields

(9)
For very large N, the second term in Equation (9) can be neglected and
we find

We see that the Boltzmann entropy of an ensemble with large N is equal


to the sum of Shannon entropies of the individual particles. The asymptotical
equality between Boltzmann and Shannon entropies for the large N makes
it possible to use the Shannon entropy also for describing an statistical
ensemble. The pioneer on this field was E. Jaynes who published, already
in fifties, works in which only Shannon entropy was used to formulate
statistical physics [13] . However, many authors advocating the use of
Shannon entropy in statistical physics do not fully realized the difference
between Boltzmann’s and Shannon entropy. The use of Shannon entropy
can be only justified if one considers the physical ensemble as a system of
random objects on which energy (or other physical quantity) is taken as a
random variable. Then, the total entropy of the whole ensemble is given
as the sum of Shannon entropies of individual statistical elements (e.g.,
particles). While the Boltzmann’s entropy loses its sense for an ensemble
containing only a few particles, Shannon entropy is defined also for an
“ensemble” with even one particle. Boltzmann’s entropy is typical ensemble
318 Information and Coding Theory in Computer Science

concept while Shannon entropy is a probabilistic concept. This is not only


the change of the methodology when treating statistical ensemble but it has
also long-reaching conceptual and even pedagogical consequences.
According to Jaynes [14] , the equilibrium probability distribution of
the particle energy of a statistical ensemble should maximize the Shannon
entropy

(10)
subject to given constraints. For example, by taking the mean energy
per particle as the constraint at the extremizing procedure, we obtain the
following probability distribution for the particle energy

where the constants λ and µ are to be determined by substituting P(Ei) into


constraint’s equations. We see how easily and quickly we obtain results
forming the essence of the classical statistical mechanics. The use of
Shannon entropy in statistical physics makes it possible to rewrite it in terms
of modern theory of probability where a statistical ensemble is treated as a
collection of the mutually interacting random objects [13] .

THE SHANNON-LIKE ENTROPIES


Recently, there is an endeavour in the applied sciences (see, e.g. [15] )
to employ entropic measures of uncertainty having similar properties as
information entropy, but they are simpler to handle mathematically. The
classical measure of probabilistic uncertainty which has dominated in the
literature since it was proposed by Shannon, is the information or Shannon
entropy defined for a discrete random variable according by the formula

(11)
Since Shannon has introduced his entropy, several other classes of
probabilistic uncertainty measures (entropies) have been described in the
literature (see, e.g., [16] ). We can broadly divide them into two classes:
(i) The Shannon-like uncertainty measures which for a certain value
of the corresponding parameters converge towards the Shannon
entropy, e.g., Rényi’s entropy
Entropy—A Universal Concept in Sciences 319

(ii) The Maassen and Uffink uncertainty measures which converges


also, under certain conditions, to the Shannon entropy,

(12)
(iii) The uncertainty measures having no direct connection to Shannon
entropy, e.g., information “energy” defined in information theory
as [16]

(13)
and called Hilbert-Schmidt norm in quantum physics. The most important
uncertainty measures of the first class are:
(i) The Rényi entropy defined as follows [17]

(14)
(ii) The Havrda-Charvat entropy (or α-entropy)5 is defined as [18]

(15)
For the sake of completeness, we list some other entropy-like uncertainty
measures presented in the literature [19] :
(i) The trigonometric entropy is defined as [10]

(16)

(ii) The R-norm entropy defined by the formula

All the above-listed Shannon-like entropies converge towards Shannon


entropy if α →1, b →1, R →1. γ →1 and β →1. In some instances, it is
simpler to compute and Hb and then recover by taking
limits .
320 Information and Coding Theory in Computer Science

A quick inspection shows that all five Shannon-like entropies listed


above are all mutually functionally related. For example, each of the Havrda-
Charvat entropies can be expressed as a function of the Rényi’s entropy, and
vice versa

There are six properties which are usually considered desirable for a
measure of a random trial: (i) symmetry, (ii) expansibility, (iii) subadditivity,
(iv) additivity, (v) normalization, and (vi) continuity. The only uncertainty
measure which satisfies all these requirements is Shannon entropy. Each
of the other entropies violates at least one of them, e.g. Rényi’s entropy
violates only the subadditivity property, Havrda-Charvat’s entropy violates
the additivity property, the R-norm entropies violate both subadditivity and
additivity. More details about the properties of each entropies can be found
elsewhere (e.g., [15] ). The Shannon entropy satisfies all above requirements
put on uncertainty measure and it exact matches the properties of physical
entropy6. All these classes of entropies represent the probabilistic uncertainty
measures which have similar mathematical properties as Shannon entropy.
The best known Shannon-like probabilistic uncertainty measure is the
Havrda and Charvat entropy [18] which is more general than Shannon
measure and much simpler than Renyi’s measure. It depends on a parameter
α which is from the interval . As such, it represents a family of
uncertainty measures which includes information entropy as a limiting case
when α →1. We note that in physics the Havrda Charvat entropy is known
as Tsallis entropy [20] . All the mentioned entropic measures of uncertainty
are functions of the components of the probability distribution of a random
variable and they have three important properties: (i) They assume their
maximal values for the uniform probability distribution of . (ii) They
become zero for the probability distributions having only one component.
(iii) They express a measure of the spread of a probability distribution. The
larger this spread becomes, the smaller values they assume. These properties
qualify them for being the measures of uncertainty (inaccuracy) in the
physical theory of measurement.
The entropic uncertainty measures for a discrete random variable are,
in the frame of theory of probability, exactly defined. The transition from
the discrete to the continuous entropic uncertainty measures is, however,
not always unique and has still many open problems. A continuous random
Entropy—A Universal Concept in Sciences 321

variable is characterized by the function of its probability density p(x). The


moment and probabilistic uncertainty measures exist also for the continuous
random variables. The typical moment measure is the k-th central moment of
. The classical probabilistic uncertainty measure of a continuous random
variable is the corresponding Shannon entropy . It is a function of
the probability density p(x) and consists of two terms7

with both terms H(1)(p) and H(2)H always diverges. Usually, one
“renormalizes” by taking only the term H(1)(x) (called differential
entropy or the Shannon entropy functional H(x)) for the entropic uncertainty
measure of a continuous random variable. This functional is well known
to play an important role in probability and statistics. We refer to [15] for
applications of the Shannon entropy functional to the theory of probability
and statistics.
As it is well-known, the Shannon entropy functionals of some continuous
variables represent complicated integrals which often are difficult to
compute analytically or even numerically. Everybody, who tried to calculate
analytically the differential entropies of the continuous variables, became
aware how difficult it may be. From the purely mathematical point of view,
the differential entropy can be taken as a formula for expressing the spread
of any standard single-valued function (the probability density belongs to
this class of functions). Generally, the Shannon entropy functional assigns
to a probability density function (belonging to the class of functions L2(R1))
a real number H through a mapping L2(R1) → H. H is a monotonously
increasing function of the degree of “spreading” of p(x), i.e. the larger H
becomes, the spread is p(x).
The Shannon entropy functional was studied just at the beginning of
information theory [17] . Since that time, besides the Shannon entropy
functional, several other entropy functionals were introduced and studied
in the probability theory. The majority of them are dependent on certain
parameters. As such, they form a whole family of different functionals
322 Information and Coding Theory in Computer Science

(including the Shannon entropy functional as a special case). In a sense,


they are a generalization of the Shannon entropy functional. Some of them
can equally well express the spread of the probability density functions as
differential entropy and are considerably easier to handle mathematically.
These include:
(i) The Rényi entropic functional [17]

(17)
(ii) The Havrda-Charvat entropic functional [18]

(18)
(iii) The trigonometric entropic functional [10]

(19)

Note that and tend to H(p) as α, β, γ tend to

1. Again, in some instances, it is simpler to compute and


and then recover H(p) by taking limits α, β, γ → 1. In treating

and , it is often enough to study the properties of the


functional

As it is known long ago, the entropy functionals and


have a mathematical shortcoming connected with the dimension of
the physical probability density function. In contrast with the probability
which is a dimensionless number, the probability density function has a
dimension so that its appearance behind the logarithm and cosine in the
entropy functionals and is mathematically inadmissible.
This brings complication when calculating and for a
physical random variable (see, e.g., [21] )8,9.
Entropy—A Universal Concept in Sciences 323

CONCLUSIONS
From what has been said so far it follows:
(i) The concept of entropy is inherently connected with the probability
distribution of outcomes of a random trial. The entropy quantifies
the probability uncertainty of a general random trial.
(ii) There are two ways how to express the uncertainty of a random
trial:
The moment and probabilistic measure. The former measure includes in
its definition both values assigned to trial outcomes and their probabilities.
The latter measure contains in its definition only the corresponding
probabilities. The moment uncertainty measures are given as a rule by the
higher statistical moments of a random variable whereas the probabilistic
measure is expressed by means of entropy. The most important probabilistic
uncertainty measure is the Shannon entropy defined by the formula

where Pi is the probability of i-th outcome of a random trial.


(iii) By means of Shannon entropy it is possible to quantify the
configurational order in the set of elements of a general system.
The corresponding quantity is called the Watanabe measure of
configurational order and is defined as follows configurational
order of a system = (sum of entropies of the parts of the system)
− (entropy of the whole system).
This measure expresses quantitatively the property of a configurationally
organized systems to have order between its elements, which causes that the
system as a whole behaves in a more deterministic way than its individual
parts.
(iv) The asymptotical equality between the Boltzmann and Shannon
entropies for the statistical systems with large particles makes
it possible to use the Shannon entropy for describing statistical
ensembles.
(v) Besides the Shannon entropy there exists a class of so-called
Shannon-like Entropies. The most important Shannon-like
entropies are (a) The Rényi entropy Equation (14); (b) The
Havrda-Charvat entropy Equation (13). The well-known Tsallis
entropy is mathematically identical with Havrda-Charvat entropy.
324 Information and Coding Theory in Computer Science

Vladimír Majerník (statistical) system and, therefore, it belongs to the im-


portant quantities for describing the natural phenomena. This is why entropy
represents in physics a fundamental quantity next the energy.

APPENDIX

The Essential Mathematical Properties of Entropy


We ask from the Shannon entropy the following desirable properties [24] :
(i) is a continuous function of all components of
a probability distribution of a random variable which is
invariant under any permutation of the indices of the probability
components.
(ii) If probability distribution of has only one component which is
different from zero then
(iii) For it holds

(A1)

(iv) (A2)

with equality if and only if .


(v) If is a joint probability
distribution whose marginal probability distributions

are and

respectively, then

(A3)

where the conditional entropy, defined as , is comput-

ed only for those values of k for which .


Entropy—A Universal Concept in Sciences 325

(vi) Using notation given above it holds

(A4)
The equality in (A4) is valid if and only if

in which case (A3) becomes

All these properties can be proved in an elementary manner. Without


entering into the technical details, we note that properties (i)-(iii) are obvious
while property (v) can be obtained by a straightforward computation taking
into account only the definition of entropy. Finally, from Jensen’s inequality

applied to the concave function we obtain property (iv)


by putting, and the inequality
(A4) by putting and, in
the last case, summing the resulting n inequalities.
Interpretation of the above properties agrees with common sense,
intuition, and the reasonable requirements that can be asked from a measure
of uncertainty. Indeed, a random experiment which has only one possible
outcome (that is, a strictly deterministic trial) contains no uncertainty at all;
we know what will happen before performing the experiment . This is just
property (ii). If to the possible outcomes having the probability zero, the
amount of uncertainty with respect to what will happen in the trial remains
unchanged (property (iii)). Property (iv) tells us that in the class of all
probabilistic trials having m possible outcomes, the maximal uncertainty
is contained in the special probabilistic trial whose outcomes are equally
likely. Before interpreting the last properties let us consider two discrete
random variable and , whose ranges contain m and n numerical values,
respectively. Using the notations as in property (v), suppose that is the
joint probability distribution of the pair , and P and Q are the marginal
326 Information and Coding Theory in Computer Science

probability distributions of and , respectively. In this case equality iii


may be written more compactly

(A5)
where

and

Here is the conditional entropy of given . According


to (v), the amount of uncertainty contains in a pair of random variables
(or, equivalently, in compoundor product-probabilistic trial) is obtained
by summing the amount of uncertainty contained in and the uncertainty
contained in conditioned by random variable . Similarly, we get

(A6)
where

and

Here

is the conditional entropy of given the l-th value of . Hm is defined only


for those values of l for which . From (A5) and (A6) we get
Entropy—A Universal Concept in Sciences 327

which is the so-called “uncertainty balance”, the only conservation law for
entropy.
Finally, property (vi) shows that some data on can only decrease the
uncertainty on , namely

(A7)
with equality if and only if and are independent. From (A5) and (A7)
we get

with equality if and only if and are independent.


Fortunately this inequality holds for any number of components. More
generally, for s random variables with arbitrary finite range we can write

with equality if and only if are globally independent. Therefore

(A8)

measures the global dependence between the random variables


, that is, the extent to which the system , due to interdependence,
makes up “something more” that the mere juxtaposition of components. In
particular, W = 0 if and only if are independent.
Note that the difference between the amount of uncertainty contained by
the pair and the amount of dependence between the components
and , namely

or, equivalently,

is the distance between the random variables and , with the two random
variables considered identical if either one completely determines the other,
328 Information and Coding Theory in Computer Science

or if and . Therefore, the “pure randomness”


contained in the pair , i.e., the uncertainty of the whole, minus the
dependence between the components, measured by , is a distance
between and .
Khintchin [25] proved that properties (i), (iii), (iv) and (v), taken as
axioms imply uniquely the formula of the Shannon entropy (except an
numerical factor). It is worthy to remark that there is also another way to
determine the uncertainty of a probabilistic object. The Shannon entropy is
a measure of the degree of uncertainty of random object whose probability
distribution is given. In algorithmic theory, the primary concept is that of
the information content of an individual object, which is a measure of how
difficult it is to specify or describe and how to construct or calculate that
object. This notion is also known as information-theoretical complexity. The
information content I(s) of a binary string s is defined to be the size in bits
of the smallest program for a canonical universal computer to calculate s
[26] [27] .

NOTES
1
The concept of probability was mathematically clarified and rigorously
determined about sixty years ago. The probability is interpreted as a
complete measure on the σ-algebra γ of the subsets S1, S2,···Sn of the set of
the elementary random events B. The probability measure P fulfils following
relations:

(i)

(ii) From Si ⊂ Sj it follows .


(iii) If are such elements of σ-algebra γ, for which
, then it holds the following equation:

(iv) P(B) = 1.
The σ-algebra, on which the set function P is defined, is called the
Kolmogorov probability algebra. The triplet [B, γ, P] denotes the probability
Entropy—A Universal Concept in Sciences 329

space. Under a random variable we understand each real-valued measurable


function defined on the elementary random events B .
2
The word “entropy” stems from the Greek word “τρøπη” which means
“transformation”.
3
Entropy is sometimes called “missing information”.
The probability as well as the Shannon entropy is dimensionless quantities.
4

On the other side, the thermodynamical entropy has the physical dimension

equal to . Therefore, in order to get the correct physical dimension


for the thermodynamic entropy we must multiply the Shannon entropy by

the Boltzmann constant, which has the value


.
5
The Havrda-Charvat α-entropy exactly matches the Tsallis non-extensive
entropy of statistical physics.
6
The Tsallis entropy which is mathematically identical with the Havrda-
Charvat α-entropy , was introduced by two mathematicians Havrda and
Charvat in sixties, violates the additivity property which is considered as an
essential property of physical entropy. This is why the other Shannon-like
entropies could be more suitable for formulation of an alternative statistical
physics.
7
In order to apply the formula for the Shannon entropy for the continuous
random variable with the probability density function p(x), we divide the
x-axis into n equidistant intervals. The probability that assumes value

from the interval is where is from


i-th interval. Inserting p(xi) into the Shannon entropy we have

Passing to the infinitesimal interval, we obtain

where

and
330 Information and Coding Theory in Computer Science

is the Shannon entropy functional.


8
It is typical for the entropy and even for the probability itself that their un-
certainty measures are determined trough a set of certain reasonable require-
ments (axioms), therefore they are more “abstract” than the measures, e.g.,
for work or energy in physics. Physical quantities are mostly derived from
the concepts of motion or field which are more concrete than the concept of
probability and entropy.
9
In many textbook and also in some advanced books, Shannon entropy and
information are used so as if were synonymous. This can be confusing and
may lead to the some conceptual shortcomings. Especially, the Brillouin’s
use of the concept of information, which he used in his celebrated work
“Science and Information Theory” , has been often criticized (see, e.g. ).
Entropy—A Universal Concept in Sciences 331

REFERENCES
1. Feller, W. (1968) An Introduction to Probability Theory and Its
Applications. Volume I., John Wiley and Sons, New York
2. Boltzmann, L. (1896) Vorlesungen über Gastheorie. J. A. Barth,
Leipzig.
3. Shannon, C.E. (1948) A Mathematical Theory of Communication.
The Bell System Technical Journal, 27, 53-75. http://dx.doi.
org/10.1002/j.1538-7305.1948.tb00917.x
4. Nyquist, H. (1924) Certain Factors Affecting Telegraph Speed.
Bell System Technical Journal, 3, 324-346. http://dx.doi.
org/10.1002/j.1538-7305.1924.tb01361.x
5. Khinchin, A.I. (1957) Mathematical Foundation of Information Theory.
Dover Publications, New York.
6. Aczcél, J. and Daróczy, Z. (1975) On Measures of Information and
Their Characterization. Academic Press, New York.
7. Merzbacher, E. (1967) Quantum Physics. 7th Edition, John Wiley and
Sons, New York.
8. Faddejew, D.K. (1957) Der Begriff der Entropie in der
Wahrscheinlichkeitstheorie. In: Arbeiten zur Informationstheorie I.
DVdW, Berlin.
9. Watanabe, S. (1969) Knowing and Guessing. John Wiley and Sons,
New York.
10. Majerník, V. (2001) Elementary Theory of Organization. Palacký
University Press, Olomouc.
11. Haken, H. (1983) Advanced Synergetics. Springer-Verlag, Berlin.
12. Ke-Hsuch, L. (2000) Physics of Open Systems. Physics Reports, 165,
1-101.
13. Jaynes, E.T. (1957) Information Theory and Statistical Mechanics.
Physical Review, 106, 620-630. http://dx.doi.org/10.1103/
PhysRev.106.620
14. Jaynes, E.T. (1967) Foundations of Probability Theory and Statistical
Mechanics. In: Bunge, M., Ed., Delavare Seminar in the Foundation
of Physics, Springer, New York. http://dx.doi.org/10.1007/978-3-642-
86102-4_6
15. Ang, A.H. and Tang, W.H. (2004) Probability Concepts in Engineering.
Planning, 1, 3-5.
332 Information and Coding Theory in Computer Science

16. Vajda, I. (1995) Theory of Information and Statistical Decisions.


Kluver, Academic Publisher, Dortrecht.
17. Rényi, A. (1961) On the Measures of Entropy and Information. 4th
Berkeley Symposium on Mathematical Statistics, 547-555.
18. Havrda, J. and Charvát, F. (1967) Quantification Method of Classification
Processes. Concept of Structural α-Entropy. Kybernetika, 3, 30-35.
19. Majerník, V. Majerníkova, E. and Shpyrko, S. (2003) Uncertainty
Relations Expressed by Shannon-Like Entropies. Central European
Journal of Physics, 6, 363-371. http://dx.doi.org/10.2478/s11534-008-
0057-6
20. Tsallis, C. (1988). Possible Generalization of Boltzmann-Gibbs
Statistics. Journal of Statistical Physics, 52, 479-487. http://dx.doi.
org/10.1007/BF01016429
21. Majerník, V. and Richterek L. (1997) Entropic Uncertainty Relations.
European Journal of Physics, 18, 73-81. http://dx.doi.org/10.1088/0143-
0807/18/2/005
22. Brillouin, L. (1965) Science and Information Theory. Academic Press,
New York.
23. Flower, T.B. (1983) The Notion of Entropy. International Journal of
General Systems, 9, 143-152.
24. Guiasu, S. and Shenitzer, A. (1985) The Principle of Maximum Entropy.
The Mathematical Intelligencer, 7, 42-48. http://dx.doi.org/10.1007/
BF03023004
25. Khinchin, A.I. (1957) Mathematical Foundation of Information Theory.
Dover Publications, New York.
26. Chaitin, G.J. (1982) Algorithmic Information Theory. John Wiley and
Sons, New York.
27. Kolmogorov, A.N. (1965) Three Approaches to the Quantittative
Definition of Information. Problems of Information Transmission, 1,
1-7.
Chapter

SHANNON ENTROPY:
AXIOMATIC CHARACTERI-
ZATION AND APPLICATION
15

C. G. Chakrabarti1 and Indranil Chakrabarty2


Department of Applied Mathematics, University of Calcutta, India
1

2
Department of Mathematics, Heritage Institute of Technology, Chowbaga Road, Anandapur,
India

We have presented a new axiomatic derivation of Shannon entropy for a


discrete probability distribution on the basis of the postulates of additivity
and concavity of the entropy function. We have then modified Shannon
entropy to take account of observational uncertainty. The modified entropy
reduces, in the limiting case, to the form of Shannon differential entropy.
As an application, we have derived the expression for classical entropy of
statistical mechanics from the quantized form of the entropy

Citation: C. G. Chakrabarti, Indranil Chakrabarty, “Shannon Entropy: Axiomatic


characterization and application”, International Journal of Mathematics and Mathe-
matical Sciences, vol. 2005, Article ID 234590, 8 pages, 2005. https://doi.org/10.1155/
IJMMS.2005.2847.
Copyright: © 2005 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
334 Information and Coding Theory in Computer Science

INTRODUCTION
Shannon entropy is the key concept of information theory [12]. It has found
wide applications in different fields of science and technology [3, 4, 5, 7]. It is
a characteristic of probability distribution providing a measure of uncertainty
associated with the probability distribution. There are different approaches
to the derivation of Shannon entropy based on different postulates or axioms
[1, 8].
The object of present paper is to stress the importance of the properties of
additivity and concavity in the determination of functional form of Shannon
entropy and its generalization. The main content of the paper is divided into
three sections. In Section 2, we have provided an axiomatic derivation of
Shannon entropy on the basis of the properties of additivity and concavity
of entropy function. In Section 3, we have generalized Shannon entropy
and introduced the notion of total entropy to take account of observational
uncertainty. The entropy of continuous distribution, called the differential
entropy, has been obtained as a limiting value . In Section 4, the differential
entropy along with the quantum uncertainty relation has been used to derive
the expression of classical entropy in statistical mechanics.

SHANNON ENTROPY: AXIOMATIC


CHARACTERIZATION
Let ∆n be the set of all finite discrete probability distribution

(2.1)
In other words, P may be considered as a random experiment having
n possible outcomes with probabilities (p1, p2,..., pn). There is uncertainty
associated with the probability distribution P and there are different measures
of uncertainty depending on different postulates or conditions. In general,
the uncertainty associated with the random experiment P is a mapping [9]

(2.2)
where ℝ is the set of real numbers. It can be shown that (2.2) is a reasonable
measure of uncertainty if and only if it is a Shur concave on ∆n [9]. A general
class of uncertainty measures is given by
Shannon Entropy: Axiomatic Characterization and Application 335

(2.3)
where φ : [0,1] → ℝ is a concave function. By taking different concave
function defined on [0,1], we get different measures of uncertainty or
entropy. For example, if we take φ(pi) = −pi log pi, we get Shannon entropy
[12]

(2.4)
where 0log0 = 0 by convention and k is a constant depending on the unit of
measurement of entropy. There are different axiomatic characterizations of
Shannon entropy based on different set of axioms [1, 8]. In the following,
we will present a different approach depending on the concavity character of
entropy function. We set the following axiom to be satisfied by the entropy
function H(P) = H(p1, p2,..., pn).
Axiom 1. We assume that the entropy H(P) is nonnegative, that is, for all
P = (p1, p2, ..., pn), H(P) ≥ 0. This is essential for a measure.
Axiom 2. We assume that generalized form of entropy function (2.3) is

(2.5)
Axiom 3. We assume that the function φ is a continuous concave function
of its arguments.
Axiom 4. We assume the additivity of entropy, that is, for any two
statistically independent experiments P = (p1, p2,..., pn) and Q = (q1,q2,...,qm),

(2.6)
Then we have the following theorem.
Theorem 2.1. If the entropy function H(P) satisfies Axioms 1 to 4, then
H(P) is given by

(2.7)
where k is a positive constant depending on the unit of measurement of
entropy.
336 Information and Coding Theory in Computer Science

Proof. For two statistically independent experiments, the joint probability


distribution pjα is the direct product of the individual probability distributions

(2.8)
Then according to the axiom of additivity of entropy (2.6), we have

(2.9)
Let us now make small changes of the probabilities pk and pj of the
probability distribution P = (p1, p2,..., pj,...,pk,..., pn) leaving others undisturbed
and keeping the normalization condition fixed. By the axiom of continuity
of φ, the relation (2.9) can be reduced to the form

(2.10)
The right-hand side of (2.10) is independent of qα and the relation (2.10)
is satisfied independently of p’s if

(2.11)
The above leads to the Cauchy functional equation

(2.12)
The solution of the functional equation (2.12) is given by

(2.13)
or

(2.14)
where A, B, and C are all constants. The condition of concavity (Axiom 3)
requires A < 0 and let us take A = −k where k(> 0) is positive constant by
Axiom 1. The generalized entropy (2.5) then reduces to the form

(2.15)
or
Shannon Entropy: Axiomatic Characterization and Application 337

(2.16)
where constants (B − A) and C have been omitted without changing the
character of the entropy function. This proves the theorem.

TOTAL SHANNON ENTROPY AND ENTROPY OF


CONTINUOUS DISTRIBUTION
The definition (2.4) of entropy can be generalized straightforwardly to
define the entropy of a discrete random variable.
Definition 3.1. Let X ∈ ℝ denote a discrete random variable which takes
on the values x1,x2,...,xn with probabilities p1, p2,..., pn, respectively, the
entropy H(X) of X is then defined by the expression [4]

(3.1)
Let us now generalize the above definition to take account of an
additional uncertainty due to the observer himself, irrespective of the
definition of random experiment. Let X denote a discrete random variable
which takes the values x1,x2,...,xn with probabilities p1, p2,..., pn. We
decompose the practical observation of X into two stages. First, we assume
that X ∈ L(xi) with probability pi, where L(xi) denotes the ith interval of the
set {L(x1),L(x2),...,L(xn)} of intervals indexed by xi. The Shannon entropy
of this experiment is H(X). Second, given that X is known to be in the ith
interval, we determine its exact position in L(xi) and we assume that the
entropy of this experiment is U(xi). Then the global entropy associated with
the random variable X is given by

(3.2)
Let hi denote the length of the ith interval L(xi), (i = 1,2,...,n), and define

(3.3)
We have then

(3.4)
338 Information and Coding Theory in Computer Science

The expression HT(X) given by (3.4) will be referred to as the total


entropy of the random variable X. The above derivation is physical. In fact,
what we have used is merely a randomization of the individual event X =
xi (i = 1,2,...,n) to account for the additional uncertainty due to the observer
himself, irrespective of the definition of random experiment [4]. We will
derive the expression (3.4) axiomatically as generalization of Theorem 2.1.
Theorem 3.2. Let the generalized entropy (2.3) satisfy, in addition to
Axioms 1 to 4 of Theorem 2.1, the boundary conditions
(3.5)
to take account of the postobservational uncertainty, where hi is the length of
the ith class L(xi) (or width of the observational value xi). Then the entropy
function reduces to the form of the total entropy (3.4).
Proof. The procedure is the same as that of Theorem 2.1 up to the relation
(2.13):

(3.6)
Integrating (3.6) with respect to pj and using the boundary condition
(3.5), we have

(3.7)
so that the generalized entropy (2.3) reduces to the form

(3.8)
where we have taken A = −k < 0 for the same unit of measurement of entropy
and the negative sign to take account of Axiom 1. The constants appearing in
(3.8) have been neglected without any loss of characteristic properties. The
expression (3.8) is the required expression of total entropy obtained earlier.
Let us now see how to obtain the entropy of a continuous probability
distribution as a limiting value of the total entropy HT(X) defined above. For
this let us first define the differential entropy H(X) of a continuous random
variable X.
Definition 3.3. The differential entropy HC(X) of a continuous random
variable with probability density f (x) is defined by [2]

(3.9)
Shannon Entropy: Axiomatic Characterization and Application 339

where R is the support set of the random variable X. We divide the range
of X into bins of length (or width) h. Let us assume that the density f(x)
is continuous within the bins. Then by mean-value theorem, there exists a
value xi within each bin such that

(3.10)
We define the quantized or discrete probability distribution (p1, p2,..., pn)
by

(3.11)
so that we have then

(3.12)
The total entropy HT(X) defined for hi = h (i = 1,2,...,n),

(3.13)
then reduces to the form

(3.14)
Let h → 0, then by definition of Riemann integral, we have HT(X) →
H(X) as h → 0, that is,

(3.15)
Thus we have the following theorem.
Theorem 3.4. The total entropy HT(X) defined by (3.13) approaches to
the differential entropy HC(X) in the limiting case when the length of each
bin tends to zero.

APPLICATION: DIFFERENTIAL ENTROPY AND


ENTROPY IN CLASSICAL STATISTICS
The above analysis leads to an important relation connecting quantized
entropy and differential entropy. From (3.13) and (3.15), we see that
340 Information and Coding Theory in Computer Science

(4.1)
showing that when h → 0 that is, when the length of the bins h is very small,
the quantized entropy given by the left-hand side of (4.1) approaches not to
the differential entropy HC(X) defined in (3.9) but to the form given by the
right-hand side of (4.1) which we call modified differential entropy. This
relation has important physical significance in statistical mechanics. As an
application of this relation, we now find the expression of classical entropy
as a limiting case of quantized entropy.
Let us consider an isolated system with configuration space volume V
and a fixed number of particles N, which is constrained to the energy shell
R = (E,E + ∆E). We consider the energy shell rather than just the energy
surface because the Heisenburg uncertainty principle tells us that we can
never determine the energy E exactly. we can make ∆E as small as we like.
Let f (XN) be the probability density of microstates defined on the phase
space Γ = {XN = (q1,q2,...,q2N ; p1, p2,..., p2N )}. The normalized condition is

(4.2)
where

(4.3)
Following (4.1), we define the entropy of the system as

(4.4)
The constant C appearing in (4.4) is to be determined later on. The
N

probability density for statistical equilibrium determined by maximizing the


entropy (4.4) subject to the condition (4.2) leads to

(4.5)
Shannon Entropy: Axiomatic Characterization and Application 341

where H(XN) is the Hamiltonian of the system, Ω(E,V,N) is the volume of


the energy shell (E,E +∆E) [10]. Putting (4.5) in (4.4), we obtain the entropy
of the system as [10]

(4.6)
The constant C has the same unit as Ω(E,V,N) and cannot be determined
N

classically. However, it can be determined from quantum mechanics.


Then we have CN = (h)3N for distinguishable particles and CN = N!(h)3N
for indistinguishable particles. From Heisenberg uncertainty principle, we
know that if h is the volume of a single state in phase space, then Ω(E,V,N)/
(h)3N is the total number of microstates in the energy shell (E,E + ∆E). The
expression (4.6) then becomes identical to the Boltzmann entropy. With this
interpretation of the constant CN, the correct expression of classical entropy
is given by [6, 10]

(4.7)
The classical entropy that follows a limiting case of von Neumann
entropy is given by [14]

(4.8)
This is, however, different from the one given by (4.7) and it does not
lead to the form of Boltzmann entropy (4.6).

CONCLUSION
The literature on the axiomatic derivation of Shannon entropy is vast [1,
8]. The present approach is, however, different. This is based mainly on
the postulates of additivity and concavity of entropy function. These are, in
fact, variant forms of additivity and nondecreasing characters of entropy in
thermodynamics. The concept of additivity is dormant in many axiomatic
derivations of Shannon entropy. It plays a vital role in the foundation of
Shannon information theory [15]. Nonadditive entropies like Renyi entropy
and Tsallis entropy need a different formulation and lead to different
342 Information and Coding Theory in Computer Science

physical phenomena [11, 13]. In the present paper, we have also provided
a new axiomatic derivation of Shannon total entropy which in the limiting
case reduces to the expression of modified differential entropy (4.1). The
modified differential entropy together with quantum uncertainty relation
provides a mathematically strong approach to the derivation of the expression
of classical entropy.
Shannon Entropy: Axiomatic Characterization and Application 343

REFERENCES
1. J. Aczel and Z. Dar ´ oczy, ´ On Measures of Information and Their
Characterizations, Mathematics in Science and Engineering, vol. 115,
Academic Press, New York, 1975.
2. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley
Series in Telecommunications, John Wiley & Sons, New York, 1991.
3. E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev.
(2) 106 (1957), 620–630.
4. G. Jumarie, Relative Information. Theories and Applications, Springer
Series in Synergetics, vol. 47, Springer, Berlin, 1990.
5. J. N. Kapur, Measures of Information and Their Applications, John
Wiley & Sons, New York, 1994.
6. L. D. Landau and E. M. Lifshitz, Statistical Physics, Pergamon Press,
Oxford, 1969.
7. V. Majernik, Elementary Theory of Organization, Palacky University
Press, Olomouc, 2001.
8. A. Mathai and R. N. Rathie, Information Theory and Statistics, Wiley
Eastern, New Delhi, 1974.
9. D. Morales, L. Pardo, and I. Vajda, Uncertainty of discrete stochastic
systems: general theory and statistical inference, IEEE Trans. Syst.,
Man, Cybern. A 26 (1996), no. 6, 681–697.
10. L. E. Reichl, A Modern Course in Statistical Physics, University of
Texas Press, Texas, 1980.
11. A. Renyi, ´ Probability Theory, North-Holland Publishing, Amsterdam,
1970.
12. C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication, The University of Illinois Press, Illinois, 1949.
13. C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J.
Statist. Phys. 52 (1988), no. 1-2, 479–487.
14. A. Wehrl, On the relation between classical and quantum-mechanical
entropy, Rep. Math. Phys. 16 (1979), no. 3, 353–358.
15. T. Yamano, A possible extension of Shannon’s information theory,
Entropy 3 (2001), no. 4, 280– 292.
Chapter

SHANNON ENTROPY IN
DISTRIBUTED SCIENTIFIC
CALCULATIONS ON
16
MOBILES AD-HOC
NETWORKS (MANETS)

Pablo José Iuliano and Luís Marrone


LINTI, Facultad de Informatica UNLP, La Plata, Argentina

ABSTRACT
This paper addresses the problem of giving a formal metric to estimate
uncertainty at the moment of starting a distributed scientific calculation on
clients working over mobile ad-hoc networks (MANETs). Measuring the
uncertainty related to the successful completion of a distributed computation
on the aforementioned network infrastructure is based on the Dempster-
Shafer Theory of Evidence (DST). Shannon Entropy will be the formal
mechanism by which the conflict in the scenarios proposed in this paper
will be estimated. This paper will begin with a description of the procedure

Citation: Iuliano, P. and Marrone, L. (2013), “Shannon Entropy in Distributed Sci-


entific Calculations on Mobiles Ad-Hoc Networks (MANETs)”. Communications and
Network, 5, 414-420. doi: 10.4236/cn.2013.53B2076.
Copyright: © 2013 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
346 Information and Coding Theory in Computer Science

by which connectivity probability is to be obtained and will continue


by presenting the mobility model most appropriate for the performed
simulations. Finally, simulations will be performed to calculate the Shannon
Entropy, after which the corresponding conclusions will be presented.

Keywords: MANETs, Shannon, Uncertain, Simulation, Distributed

INTRODUCTION
Mobile computing has been established as the de facto standard for Web
access, owing to users preferring it to other connection alternatives.
Mobile ad-hoc networks, or MANETs, are currently the focus of attention
in mobile computing, as they are the most flexible and adaptable network
technology in existence today [1]. These qualities are particularly desirable
in the development of applications meant for this kind of infrastructure—a
number of American government projects, such as the military investment
in resources for the development of this technology, bear witness to this fact.
As previously mentioned, ad-hoc mobile networks are the most flexible
and adaptable communication architecture currently in existence. These
wireless networks are comprised of interconnected autonomous nodes.
They are self-organized and self-generated, which eliminates the need for a
centralized infrastructure.
The use of this type of networks as a new alternative for the implementation
of distributed computing systems is closely related to the capability to begin
calculation, assign parts and collect results once computation is finished.
Due to the intrinsic nature of this kind of network, there is no certainty that
all the stages involved in this kind of calculation can be completed, which
makes estimating the uncertainty in these scenarios, a vital capability.

MEASURING THE PROBLEM


The movement patterns of the autonomous nodes, and consequently their
interaction, will have a significant impact in the success or failure in
collecting the results of a distributed computation. In order to incorporate
the notion of connectivity among the nodes, a development will now be
presented that shows a formalization of the Connectivity Probability among
all the nodes that make up a MANET, that is, the probability that there is a
path between one node and any of the rest.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 347

Afterwards, we will take on the task of characterizing the mobility of


the nodes, particularly their median speed and direction, the range of their
communication signal and the size of the surface on which they circulate.
Finally, another section will detail how to estimate Shannon Entropy.

Defining Connectivity Probability


Let D be the domain bounded by the Euclidean plane R2 = {x, y} , within D
there are n nodes. At initial time t = 0, the nodes are somehow located and
moving. Let ri = {xi, yi} be the radius vector of node i. Thus, we assume that
each node has a communication capacity in the range r: if the distance between
two nodes is greater than r, then they cannot establish communication. Nodes
can transmit information using multihop connections.
Therefore, we can define a network as connected if each pair of nodes
has a path between them. Connectivity Probability quantifies the likelihood
of obtaining a connected network from a set of nodes. Clearly, in scenarios
where nodes maintain fixed positions, the connectivity will depend on node
density and connection range. Typically the simulations of static scenarios
that attempt to determine the connection probability of a number of nodes
located randomly in the simulation area introduce a random variable that
equals 1 when the network is connected and 0 otherwise. Thus, the average of
the said variable over the number of trials gives the Connectivity Probability
[2].
For nodes with mobility, time interval divisions are introduced and
defined thus:

(1)

where denotes a time interval during which the network is


connected (unconnected). The following function can then be introduced:

(2)
time intervals can be considered to be randomly distributed, whereby the
previously presented function turns into a stochastic process. Consequently,
in dynamical environments, Connectivity Probability is defined as follows:
348 Information and Coding Theory in Computer Science

(3)
where is the expected value, as long as it exists. It can be seen that P+ is
time-dependent: P+ = P+(t). For stationary stochastic processes P+ = const. If
the stationary process is ergodic, then (3) can be substituted by:

(4)
This equality is equivalent to:

(5)
where and the mes function are used to measure the total length of
the interval. The problem of whether the network is connected is
thus reduced to determining the existence and estimation of the expected
value (3), and if the mobility model is stationary and ergodic, (5) can be used
to estimate connectivity [2].

Dynamical Systems and Stochastic Processes


In a homogeneous network system where node capacity and properties are
equal among all, it can be reasonably assumed that it can be described by
a single system of differential equations, both for a single node and for all
of them. If some form of randomness is introduced to node movement, a
differential stochastic process will be needed. If, moreover, the stochastic
process is considered to be stationary, a system of autonomous differential
equations can be used where the right side of the equation does not explicitly
depend on time and where nodes differ only from their initial conditions [2].
In dynamical systems theory, a phase flow is defined as a group of changes
along the trajectory during a time interval. Dynamical systems are generated
by phase flows and can be described by differential equations as follows:

(6)
where Π is the phase space, x is a set of coordinates in Π (usually position and
speed) and the dot indicates that time is the differential. Let n be a number
of nodes and its phase coordinates, then these coordinates satisfy
the following differential equation:

(7)
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 349

Thus the dynamic of the n nodes is completely defined by dynamical


system (7), which is the direct product of the n copies of the original
dynamical system, (6). Its phase space π̂ = π ×K × π = (π )n is a direct product of
the n copies of the initial phase space and phase coordinates
are a set of coordinates of individual nodes. If system (6) has an invariant
measure µ in π, system (7) will also have an invariant measure in and the
direct product will be . In the connectivity problem, phase space
can be divided into two domains D and thus: when
, all the nodes out of the existing n can communicate with each other. And
when , some nodes cannot be reached by some others. Following
the approach from dynamical systems, the connectivity probability can be
estimated as a time interval when .
Estimating the connectivity measure can be significantly simplified if
dynamical system (7) is ergodic in . By definition, a system is ergodic if
the measure of some invariant sub domain of the phase space equals zero or
the measure of the entire space.
Let be a measurable and integrable function in , for all the
solutions of ergodic system (6) there is:

(8)
where

(9)
is the measure of the entire phase space [2].
Let f be a function characteristic of a measurable domain D:

(10)
Since f is limited and D is measurable, is integrable. In this
case, the left side of (12) is equivalent to the time interval 0 ≤ t ≤ T when
resides in the domain D.
Thus the Connectivity Probability of an ad-hoc mobile network will be
equivalent to the right side of (12):
350 Information and Coding Theory in Computer Science

(11)
This approximation can be interpreted in terms of the theory of stochastic
processes in phase space . The probability for a system in measurable
domain D is determined by formula (11). Let f (x) be a function characteristic
of domain D and x (t) the solution of system (6). Thus the function f (x(t))
can be interpreted as a stochastic process. Let E[f(t)] be the expected value
of the function f(t) at time t. If the right side of Equation (6) is not time-
dependent, then the stochastic process is stationary. In particular, this means
that E[f(t)] does not depend on t. If the system is also ergodic, the expected
value can be calculated using formula (11):

(12)
Therefore, the problem of calculating expected value (3) is reduced to a
geometric problem in which we must determine the volume of the domains
in a phase space if the process is ergodic [2].

Shannon Entropy: Measuring Uncertainty


Uncertainty, in particular the amount of conflict in the system, will be
measured using the Dempster-Shafer Theory of Evidence (DST). Functions
for estimating the conflict in a system using a probability distribution must
fulfill certain axiomatic requirements [3], namely:
Let fc be the estimator of the amount of conflict and p =〈 p1, p2, … , pn〉
the probability distribution, fc must fulfill:
• Expansibility: adding a 0 component to the probability distribution
does not modify the value of the uncertainty measure.
• Symmetry: the calculated uncertainty does not vary in relation to
the permutation of the arguments.
• Continuity: function fc is continuous for all p =〈 p1, p2, … , pn〉.
• Subadditivity: the uncertainty of the joint probability distribution
is less than or equal to the uncertainties of the marginal
distributions.
• Additivity: for any pair of marginal probability dis tributions
that are non-interactive, the uncertainty of the associated joint
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 351

distribution must be equal to the sum of the uncertainties of the


marginal distributions.
• Monotonicity: uncertainty must increase if the number of
elements increases.
• Branching: Let p =〈p1, p2, … , pn〉 over X = {x1, x2, … , xn}.
If two partitions are generated from and
, then fc= (p1, p2, … , pn).
• Normalization: to ensure uncertainty can be measured in bits, it
is required that:

(13)
Shannon Entropy will be the formal mechanism by which the conflict
will be estimated in this document. This measure of uncertainty stems from a
probability distribution obtained from observing the results of an experiment
or any other research mechanism. Probability distribution p has the form p =
〈p(x) ∣ x ∈ X〉 where X is the domain of discourse. Additionally, a decreasing
function in relation to incidence probability is defined, called anticipatory
uncertainty, which must have a decreasing monotonous continuous mapping,
and be additive and normalized. This yields that the anticipatory uncertainty
of an x result is: −log2 p(x).
Thus, Shannon Entropy, which provides the expected value of the
anticipatory uncertainties for each element of the domain of discourse [3],
takes the following form:

(14)
The normalized version of (14) takes the following form:

(15)
and is the one used to calculate uncertainty in the simulations performed.
352 Information and Coding Theory in Computer Science

SIMULATION
Following, we present an adjustment to the previously obtained theoretical
results, in order to reach a simulation method that is consistent with them. A
description of scenarios posed and results obtained will follow.

Adjustment of mes and mes D


It is considered that the area where the computational model proposed
for distributed calculations on ad-hoc mobile networks operate will be
small—the work surface will be comparable to that of a university campus,
governmental building or office [4]. This results in being the total
simulation surface and mes D, the area where the nodes are in positions
that keep the network connected. However, calculating mes and mes D
as previously proposed is an extremely complicated and laborious task [2].
For this reason, an alternative method is presented to determine first the
Connectivity Probability and later the uncertainty involved.
With the goal of validating the scenario put forth in previous sections,
the simulation will take place using a modified version of the Monte-Carlo
method, where the nodes will be initially located in random positions in such
a way that they will form a connected network. Their position will be updated
in each instance of the simulation, in accordance with the specifications
of the RWMM model [5], and afterwards the network connectivity will
be verified. Thus, with the calculation framed within the aforementioned
simulation process, the Connectivity Probability will be obtained by means
of the M/N quotient, where N is the total number of simulations and M is
the number of simulations in which the network was connected where N was
great enough. Thus, mes is N and mes D is M [2].

Results of the Simulation


Different ad-hoc mobile network topologies will have different Connectivity
Probability values, and, therefore, the Shannon Entropy will vary.
In RWMM [5], the nodes for each simulation stage will select a direction
in which they will move randomly between (0, 2π], and the speed at which
they will move will be the expected value uniformly distributed between the
speeds of 1 m/s and 10 m/s—the rates at which we move by foot—which
will equal to 3.90 m/s. When a node reaches the edge of the simulation area,
it will rotate 180 degrees and will be placed again within the area, after
which the process will continue.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 353

All the simulations begin in a connected network topology and a fixed


connectivity radius within which a node can be connected to another. Then,
stage after stage, the following operations will take place:
• For each node nodei of the ad-hoc mobile network, a direction diri
is randomly chosen between (0, 2π], and its position posi = (xi,
yi) is updated in accordance with newposi = posi + (diri × V × dt)
where newposi is the new position of nodei.
• Once all the node positions have been updated, for each node nodei
in position posi it is verified whether it can establish a connection
with another node nodej located within its connection radius.
This verification is performed by means of the calculation of the
Euclidean distance Disti, j between the two nodes, later checking
whether, Disti, j <= 2RADIUS is fulfilled, where RADIUS is the
connection radius between the two nodes.
• If each of the nodes can establish a connection with all other
nodes in the network, the resulting network is still connected and
the connected topology incidence counter M is increased by one
unit.
• Once the N stages are completed, with N = 105 (this number is
sufficient to achieve at least 99:999999995% confidence and a
distance between the empirical and the real at most 0.01 using
by the Dvoretzky-Kiefer- Wolfowitz inequality [6] and obtain
adequate statistical guarantees), Connectivity Probability and
Shannon Entropy emerge as a result of the simulation process.
Two considerations must be emphasized regarding the calculation
of these two measures:
– As previously mentioned, because the calculation of
Connectivity Probability is framed within the simulation
process, it can be obtained by means of the M/N quotient,
where N is the total number of simulation stages and M is
the number of times when the network was connected.
– Shannon Entropy is calculated by means of the following
formula:
354 Information and Coding Theory in Computer Science

(16)
As shown, S(p) is zero when p reaches the value zero or one, therefore,
there will always be uncertain about the result. Another interesting fact is
that the formula of Shannon Entropy which results from this is normalized,
since:

(17)
The results of the simulation scenarios are shown in Figure 1.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 355

Figure 1. Simulation Results.

Correlation between Connectivity Probability and Shannon


Entropy
The theoretical development of the first lines of this document and the
results of the simulations will allow the reader to sense the existence of a
relationship between Connectivity Probability and uncertainty in a system.
For this reason, the following section will analyze this relationship formally.
We can begin to study the relationship between Shannon Entropy and
Connectivity Probability by observing what happens when the first of the
two magnitudes reach its limit values, i.e., its minimum and maximum.
The first of these values, equal to zero, is registered when the Connectivity
Probability has reached either one or zero. This shows that when there is
no possibility to maintain connectivity or when the possibility is absolute,
uncertainty disappears. The maximum value of entropy is reached when the
probability takes medium values, which means that there is a state of total
uncertainty.
In both cases, the relationship between Shannon Entropy and
Connectivity Probability is evident, but for the analysis of the other cases,
correlation coefficients must be used.
The correlation coefficient that is most widely used is Pearson’s
coefficient (r):
356 Information and Coding Theory in Computer Science

(18)

(19)

(20)

(21)

(22)

(23)
and the extreme values of its possible results are: 0 (no relationship) and ±1
(maximal relationship) [7]. The variables analyzed by means of this method,
p and s, must fulfill certain requirements. The two that are the most relevant
to this study are as follows:
• Variables p and s must be continuous.
• The relationship between p and s must be linear
Of the two previous conditions, the most relevant to our observations,
or restrictive of them, as the reader prefers, is 2, since it would require
the relationship between Connectivity Probability and Shannon Entropy
to be linear, which is not the case. Therefore, it is evident that Pearson’s
Coefficient cannot be used as proposed, since the relationship under analysis
is curvilinear. For this reason, this behavior will be analyzed dividing
variable p into two segments, which will result in two study groups: the
first, we will call g1 where p ∈ (0,0,5), and the second, g2, where p ∈ [0,5,1).
Correlation will therefore be calculated in separate groups.
The results obtained and detailed in Table 1, show that:
• g1 exhibits direct (positive) dependence between p and s, i.e., for
large values of p there will be large values of s.
• g2 shows that the relationship between p and s is inverse (negative)
dependence, i.e., for large values p the values of s will be small.
Based on these results, we can conclude the following for MANETs that
operate on surfaces of:
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 357

• 50 m × 50 m: uncertainty will decrease as the amount of network


nodes increases, due to greater Connectivity Probability.
• 100 m × 100 m: first, if the amount of nodes varies between two
and eleven, starting from two and taking eleven as a maximum, as
more nodes are added to the network, uncertainty and probability
will increase together. Once the twelve node threshold is reached,
uncertainty will begin to decrease, while probability increases.
• 150 m × 150 m: this case is similar to the last, with the exception
that node intervals are displaced—when the amount of nodes
varies between two and twentyeight, uncertainty and probability
will increase as nodes increase, and if the amount varies between
[29, 100], uncertainty will decrease while probability still
increases.
It is clear that the most interesting results are those registered in g2 for
all surfaces, as it is there that computations will have the greatest probability
to succeed with less uncertainty. However, the question remains as to what
amount of nodes and Connectivity Probability will bring a success certainty
high enough to begin computation. One valid criterion is to detect value vi
∈ [0.5,1] where Connectivity Probability and uncertainty are equal or close
enough and operate on the uncertainty interval between [0, vi]. Values vi for
the performed simulations are detailed in Table 2. Therefore, for surfaces of
50 m × 50 m, 100 m × 100 m and 150 m × 150 m, distributed calculations
will begin when 2, 14 and 34 nodes have been reached, respectively.

Table 1. Connectivity P/Shannon Entropy Correlation by Groups.

Table 2. Values of vi.


358 Information and Coding Theory in Computer Science

CONCLUSION
The development of this work duly evidenced and documented that the
uncertainty existing at the beginning of a distributed computation on a
MANET will depend directly on the amount of nodes participating in it and
on the surface involved. This statement is based on the results obtained from
the simulations detailed in this document, which allowed us to conclude
that uncertainty begins to decrease once node density has reached a certain
threshold, and that this threshold takes different values for different surfaces.
Works oriented towards correctly identifying the amount of uncertainty
existing at the time the results of a distributed calculation on ad-hoc mobile
networks are collected bring the potential benefit that they can be used
to develop more intelligent workload distribution strategies that take into
account the amount of uncertainty they will have to deal with, which will
necessarily results in more efficient computations. In this sense and based
on the latest studies oriented towards providing more certain mechanisms as
to the conservation of power in the devices that comprise a MANET [8] or
on equally relevant studies focusing on achieving the greatest cooperation
possible between the nodes of an ad hoc mobile network [9], thus mitigating
their intrinsic egotism, the results of having an uncertainty measure that would
either indicate that there is no certainty to achieve calculation completion
or ensure its success will be twofold. In the first of the aforementioned
two fields of study, preventing workload distribution in situations where
calculation concretion is not ensured will have a direct repercussion in the
conservation of power in devices, which will result in longer operational
periods which unable to identify the aforementioned scenarios. The second
research field seeks to maximize cooperation among the nodes. With
this in mind, in scenarios where completion certainty is medium or low,
one possible distribution strategy oriented toward collaboration could be
assigning workload only to the most collaborative nodes, to avoid the risk
of assigning load to un-collaborative nodes, which, in the event of result
collection failure, may take a more selfish or conservationist attitude toward
their resources (such as power) and leave the MANET. In a scheme of
mobile distributed calculation where all participants offer their collaboration
to find the answer to a common interest problem, such as the SETI@
Home program [10], measuring uncertainty can be used as a function to
grant credit to collaborators—when a participant is notified that there is
a medium to high level of uncertainty regarding computation success and
they decide to participate nonetheless, more credits can be granted than in
scenarios where total certainty of success exists. If more credits mean more
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 359

benefits for the participant in some way, for example, publicity of the most
committed participant in the calculation environment, then we would have
a psychological mechanism of positive reinforcement that would promote
node collaboration, which would enable a network conformed by more
collaborative and satisfied participants that are less egotistic.
All potential strategies of distributed computing over MANETs
presented in this document and others that can emerge from an intelligent
use of uncertainty measures will bring with them new types of applications
that will seize all the power of the underlying network infrastructure.
360 Information and Coding Theory in Computer Science

REFERENCES
1. P. Mohapatra and S. Krishhamurthy, “Ad Hoc Networks Technologies
and Protocols,” 2004.
2. T. K. Madsen, F. H. P. Fitzek, R. Prasad and G. Schulte, “Connectivity
Probability of Wireless Ad Hoc Networks: Definition, Evaluation,
Comparison,” Vol. 35, 2005, ISSN: 0929-6212, pp. 135-151.
3. C. S. Duque, “Medidas Para Estimar la Incertidumbre Basada en
Información Deficiente,” 2012.
4. F. Bai and A. Helmy, “A Survey of Mobility Models in Wireless Adhoc
Networks,” 2008.
5. C. Bettstetter, H. Hartenstein and X. Perez-Costa, “Stochastic
Properties of the Random Waypoint Mobility Model in ACM/Kluwer
Wireless Networks,” Special Issue on Modeling and Analysis of
Mobile Networks, Vol. 10, No. 5, 2004.
6. F. Xu, “Correlation-Aware Statistical Methods for Sampling-Based
Group by Estimates,” Doctoral Thesis, University of Florida, 2009.
7. P. M. Vallejo, Correlación y Covarianza. Revision, 30 de Octubre de
2007.
8. S. Prakash, J. P. Saini and S. C. Gupta, “A Vision of Energy Management
Schemes in Wireless Ad Hoc Networks,” ISSN: 2278-7844, 2012.
9. A. E. Hilal and A. B. Mackenzie, “Mitigating the Effect of Mobility
on Cooperation in Wireless Ad Hoc Networks,” 8th IEEE International
Conference on Wireless and Mobile Computing, Networking and
Communications (WiMob), 2012.
10. SETI@home. http://setiathome.berkeley.edu/
Chapter

ADVANCING SHANNON
ENTROPY FOR MEASURING
DIVERSITY IN SYSTEMS
17

R. Rajaram,1 B. Castellani,2 and A. N. Wilson3


Department of Mathematical Sciences, Kent State University, Kent, OH, USA
1

Department of Sociology, Kent State University, 3300 Lake Rd. West, Ashtabula, OH, USA
2

School of Social and Health Sciences, Abertay University, Dundee DD1 1HG, UK
3

ABSTRACT
From economic inequality and species diversity to power laws and the
analysis of multiple trends and trajectories, diversity within systems
is a major issue for science. Part of the challenge is measuring it.
Shannon entropy H has been used to rethink diversity within probability
distributions, based on the notion of information. However, there are

Citation: R. Rajaram, B. Castellani, A. N. Wilson, “Advancing Shannon Entropy for


Measuring Diversity in Systems”, Complexity, vol. 2017, Article ID 8715605, 10 pag-
es, 2017. https://doi.org/10.1155/2017/8715605.
Copyright: © 2017 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
362 Information and Coding Theory in Computer Science

two major limitations to Shannon’s approach. First, it cannot be used to


compare diversity distributions that have different levels of scale. Second,
it cannot be used to compare parts of diversity distributions to the whole.
To address these limitations, we introduce a renormalization of probability
distributions based on the notion of case-based entropy 𝐶𝑐 as a function of
the cumulative probability 𝑐. Given a probability density (𝑥), 𝐶𝑐 measures
the diversity of the distribution up to a cumulative probability of 𝑐, by
computing the length or support of an equivalent uniform distribution that
has the same Shannon information as the conditional distribution of
up to cumulative probability 𝑐. We illustrate the utility of our approach by
renormalizing and comparing three well-known energy distributions in
physics, namely, the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac
distributions for energy of subatomic particles. The comparison shows that
𝐶𝑐 is a vast improvement over H as it provides a scale-free comparison of
these diversity distributions and also allows for a comparison between parts
of these diversity distributions.

INTRODUCTION
Statistical distributions play an important role in any branch of science that
studies systems comprised of many similar or identical particles, objects,
or actors, whether material or immaterial, human or nonhuman. One of
the key features that determines the characteristics and range of potential
behaviors of such systems is the degree and distribution of diversity, that is,
the extent to which the components of the system occupy states with similar
or different features.
As Page outlined in a series of inquiries [1, 2], including The Difference
and Diversity and Complexity, diversity within systems is an important
concern for science, be it making sense of economic inequality, expanding
the trade portfolio of countries, measuring the collapse of species diversity
in various ecosystems, or determining the optimal utility/robustness of
a network. However, an important major challenge in the literature on
diversity and complexity, which Page also points out [1, 2], remains: the
issue of measurement. Although statistical distributions that directly reflect
the spread of key parameters (such as mass, age, wealth, or energy) provide
descriptions of this diversity, it can be difficult to compare the diversity
of different distributions or even the same distribution under different
conditions, mostly because of differences in scales and parameters. Also,
Advancing Shannon Entropy for Measuring Diversity in Systems 363

many of the measures currently available compress diversity into a single


score or are not intuitive [1–4].
At the outset, motivated by examples of measuring diversity in ecology
and evolutionary biology from [3, 4], we sought to address these challenges.
We begin with some definitions and a review of our previous research.
First, in terms of definitions, we follow the ecological literature, defining
diversity as the interplay of “richness” and “evenness” in a probability
distribution. Richness refers to the number of different diversity types in a
system. Examples include (a) the different levels of household income in a
city, (b) the number of different species in an ecosystem, (c) the diversity
of a country’s exports, (d) the distribution of different nodes in a complex
network, (e) the various health trends for a particular disease across time/
space, or (f) the cultural or ethnic diversity of an organization or company.
In all such instances, the greater the number of diversity types (be these
types discrete or continuous), the greater the degree of richness in a system.
In the case of the current study, for example, richness was defined as the
number of different energy states.
In turn, evenness refers to the uniformity or “equiprobability” of
occurrence of such states. In terms of the above examples, evenness would
be defined as (a) a city where household income was evenly distributed,
(b) an ecosystem where the diversity of its species was equal in number,
(c) a country with an even distribution of exports, (d) a complex network
where all nodes had the same probability of occurrence, (e) a disease
where all possible health trends were equiprobable, or (f) a company or
organization where people of different cultural or ethnic backgrounds were
evenly distributed. In the case of the current study, for example, evenness
was defined as the uniformity or “equiprobability” of the occurrence of all
possible energy states.
More specifically, as we will see later in the paper, we define the diversity
of a probability distribution as the number of equivalent equiprobable types
required to maintain the same amount of Shannon entropy 𝐻 (i.e., the
number of Shannon-equivalent equiprobable states). Given such a definition,
a system with a high degree of richness and evenness would have a higher
degree of 𝐻, whereas a system with a low degree of richness and evenness
would have a low degree of 𝐻. In turn, a system with high richness but low
evenness (as in the case of a skewed-right system with long tail) would have
a lower degree of 𝐻 than a system with high richness and high evenness.
364 Information and Coding Theory in Computer Science

Purpose of the Current Study


Recently, we have introduced a novel approach to representing diversity
within statistical distributions [5, 6], which overcomes such difficulties
and allows the distribution of diversity in any given system (or cumulative
portions thereof) to be directly compared to the distribution of diversity
within any other system. In effect, it is a renormalization that can be applied
to any probability distribution to produce a direct representation of the
distribution of diversity within that distribution. Arising from our work in the
area of complex systems, the approach is based on the notion of case-based
entropy, 𝐶𝑐 [5]. This approach has two major advantages over the Shannon
Entropy 𝐻, which, as we alluded to above, is one of the most commonly used
measures of diversity within probability distributions and which calculates
the average amount of uncertainty (or information, depending on one’s
perspective) present in a given probability distribution. First, 𝐶𝑐 can be used
to compare distributions that have different levels of scale; and, second, 𝐶𝑐
can be used to compare parts of distributions to their whole.
After developing the concept and formalism for case-based entropy for
discrete distributions [5], we first applied it to compare complexity across
a range of complex systems [6]. In that work, we investigated a series of
systems described by a variety of skewed-right probability distributions,
choosing examples that are often suggested to exhibit behaviors indicative of
complexity such as emergent collectivity, phase changes, or tipping points.
What we found was that such systems obeyed an apparent “limiting law
of restricted diversity” [6], which constrains the majority of cases in these
complex systems to simpler types. In fact, for these types of distribution,
the distributions of diversity were found to follow a scale-free 60/40
rule, with 60% or more of cases belonging to the simplest 40% or less of
equiprobable diversity types. This was found to be the case regardless of
whether the original distribution fit a power law or was long-tailed, making
it fundamentally distinct from the well-known (but often misunderstood)
Pareto Principle [7].
In the following, we continue to explore the use of case-based entropy
in comparing systems described by statistical distributions. However, we
now go beyond our prior work in the following ways. First, we extend the
formalism in order to compute case-based entropy for continuous as well
as discrete distributions. Second, we broaden our focus from complexity/
complex systems to diversity in any type of statistically distributed system.
Advancing Shannon Entropy for Measuring Diversity in Systems 365

That is, we start to explore distributions of diversity for systems where


richness is not a function of the degree of complexity types.
Third, the discrete indices we used had a degree of subjectivity to them,
for example, how should household income be binned and what influence
does that have on the distribution of diversity? As such, we wanted to see
how well 𝐶𝑐 worked for distributions where the unit of measurement was
universally agreed upon.
Fourth, we had not emphasized how 𝐶𝑐 was a major advance on
Shannon entropy 𝐻. As known, while 𝐻 has proven useful, it compresses
its measurement of diversity into a single number; it is also nonintuitive;
and, as we stated above, it is not scale-free and therefore cannot be used to
compare the diversity of different systems; neither can it be used to compare
parts of the diversity within a system to the entire system.
Hence, the purpose of the current study, as a demonstration of the utility
of 𝐶𝑐, is to renormalize and compare three physically significant energy
distributions in statistical physics: the energy probability density functions
for systems governed by Boltzmann, Bose-Einstein, and Fermi-Dirac
statistics.

RENORMALIZING PROBABILITY: CASE-BASED


ENTROPY AND THE DISTRIBUTION OF DIVERSITY
The quantity case-based entropy [5], 𝐶𝑐, renormalizes the diversity
contribution of any probability distribution (𝑥), by computing the true
diversity 𝐷 of an equiprobable distribution (called the Shannon-equivalent
uniform distribution) that has the same Shannon entropy 𝐻 as 𝑃(𝑥). 𝐶𝑐
is precisely the number of equiprobable types in the case of a discrete
distribution, or the length, support, or extent of the variable in the case of
continuous distributions, which is required to keep the value of the Shannon
entropy the same across the whole or any part of the distribution up to a
cumulative probability 𝑐. We choose the Shannon-equivalent uniform
distribution for two reasons:
• First, it is well known that, on a finite measure space, the uniform
distribution maximizes entropy: that is, the uniform distribution
has the maximal entropy among all probability distributions on a
set of finite Lebesgue measures [8].
• Second, a Shannon-equivalent uniform distribution will, by
definition, count the number of values (or range of values) of
366 Information and Coding Theory in Computer Science

𝑥 that are required to give the same information as the original


distribution 𝑃(𝑥) if we assume that all the values (or range of
values) are equally probable.
Hence, the uniform distribution renormalizes the effect of varying
relative frequencies (or probabilities) of occurrence of the values of 𝑥
without losing information (or entropy). In other words, if all choices of the
random variable are equally likely, the number of values (or the length, if
it is a continuous random variable) needed for the random variable to keep
the same amount of information as the given distribution is a measure of
diversity. In a sense, each new value (or type) is counted as adding to the
diversity, only if the new value has the same probability of occurrence as
the existing values. Diversity necessarily requires the values of the random
variable to be equiprobable since lower probability, for example, means
that such values occur rarely in the random variable and hence cannot be
treated as equally diverse as other values with higher probabilities. Hence,
by choosing an equiprobable (or uniform) distribution for normalization,
we are counting the true diversity, that is, the number of equiprobable types
that are required to match the same amount of Shannon information 𝐻 as the
given distribution.
This calculation (as we have shown elsewhere [5]) can be done for
parts of the distribution up to a cumulative probability of 𝑐. This means that
a comparison of 𝐶𝑐 for a variety of distributions is actually a comparison
of the variation of the fraction of diversity 𝐶𝑐 contributed by values of the
random variable up to 𝑐.
Since, regardless of the scale and units of the original distribution, 𝑐 and
𝐶𝑐 both vary from 0 to 1, one can plot a curve for 𝐶𝑐 versus 𝑐 for multiple
distributions on the same axes. 𝐶𝑐 thus provides us with a scale-free measure
to compare distributions without omitting any of the entropy information,
but by renormalizing the variable to one that has equiprobable values. What
is more, it also allows us to compare different parts of the same distribution,
or parts to wholes. That is, we can generate a 𝐶𝑐 versus 𝑐 curve for any part
of a distribution (normalizing the probabilities to add up to 1 in that part)
and compare the 𝐶𝑐 curve of the part to the 𝐶𝑐 curve of the whole or another
part to see if the functional dependence of 𝐶𝑐 on 𝑐 is the same or different.
In essence, 𝐶𝑐 has the ability to compare distributions in a “fractal” or self-
similar way.
In [5], we showed how to carry out the renormalization for discrete
probability distributions, both mathematical and empirical. In this paper, as
Advancing Shannon Entropy for Measuring Diversity in Systems 367

we stated in the Introduction, we make the case for how 𝐶𝑐 constitutes an


advance over 𝐻, in terms of providing a scale-free comparison of probability
distributions and also comparisons between parts of distributions. More
importantly, we demonstrate how 𝐶𝑐 works for continuous distributions,
by examining the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac
distributions for energy of subatomic particles. We begin with a more
detailed review of 𝐶𝑐.

CASE-BASED ENTROPY OF A CONTINUOUS


RANDOM VARIABLE
Our impetus for making an advance over the Shannon entropy 𝐻 comes
from the study of diversity in evolutionary biology and ecology, where
it is employed to measure the true diversity of species (types) in a given
ecological system of study [3, 4, 9, 10]. As we show here, it can also be
used to measure the diversity of an arbitrary probability distribution of a
continuous random variable.
Given the probability density function (𝑥) of a random variable 𝑥 in a
measure space 𝑋, the Shannon-Weiner entropy index 𝐻 is given by

(1)
The problem, however, with the Shannon entropy index 𝐻, as we identified
in our abstract and Introduction, is that while being useful for studying the
diversity of a single system, it cannot be used to compare the diversity
across probability distributions. In other words, 𝐻 is not multiplicative: a
doubling of value for 𝐻 does not mean that the actual diversity has doubled.
To address this problem, we turned to the true diversity measure 𝐷 [3, 11,
12], which gives the range of equiprobable values of 𝑥 that gives the same
value of 𝐻:

(2)
The utility of 𝐷 for comparing the diversity across probability
distributions is that, in 𝐷, a doubling of the value means that the number
of equiprobable ranges of values of 𝑥 has doubled as well. 𝐷 calculates
the range of such equiprobable values of 𝑥 that will give the same value of
Shannon entropy 𝐻 as observed in the distribution of 𝑥. We say that two
368 Information and Coding Theory in Computer Science

probability densities 𝑝1(𝑥) and 𝑝2(x) are Shannon-equivalent if they have


the same value of Shannon entropy. Case-based entropy is then the range of
values of 𝑥 for the Shannon-equivalent uniform distribution for (𝑥). We also
note that Shannon entropy can be recomputed from 𝐷 by using 𝐻 = ln(𝐷)..
In order to measure the distribution of diversity, we next need to
determine the fractional contribution to overall diversity up to a cumulative
probability 𝑐. In other words, we need to be able to compute the diversity
contribution 𝐷𝑐 up to a certain cumulative probability 𝑐. To do so, we
replace 𝐻 with 𝐻𝑐, the conditional entropy, given that only the portion of
the distribution up to a cumulative probability 𝑐 (denoted by 𝑋𝑐) is observed
with conditional probability of occurrence with density up to a given
cumulative probability 𝑐. That is,

(3)
The value of 𝐷𝑐 for a given value of cumulative probability 𝑐 is the
number of Shannon-equivalent equiprobable energy states (or of values
of the variable in the 𝑥-axis in general) that are required to explain the
information up to a cumulative probability of 𝑐 within the distribution. If
𝑐 = 1, then 𝐷𝑐 = 𝐷 is the number of such Shannon-equivalent equiprobable
energy states for the entire distribution itself.
We can then simply calculate the fractional diversity contribution or
case-based entropy as

(4)
It is at this point that the renormalization (𝐶𝑐 as a function of 𝑐) becomes
Advancing Shannon Entropy for Measuring Diversity in Systems 369

scale independent as both axes range between values of 0 and 1 with the
graph of 𝐶𝑐 versus 𝑐 passing through (0, 0) and (1, 1). Hence, irrespective
of the range and scale of the original distributions, all distributions can be
plotted on the same graph and their diversity contributions can be compared
in a scale-free manner.
To check the validity of our formalism, we calculate 𝐷𝑐 for the simple
case of a uniform distribution given by (𝑥) = [0,𝐿](𝑥) on the interval 𝑋 = [0,
𝐿]. Intuitively, if we choose 𝑋𝑐 = [0, 𝑐], then, owing to the uniformity of the
distribution, we expect 𝐷𝑐 = 𝑐 itself. In other words, the diversity of the part
[0, 𝑐] is simply equal to 𝑐, that is, the length of the interval [0, 𝑐], and hence
the 𝐶𝑐 versus 𝑐 curve will simply be the straight line with slope equal to 1.
This can be shown as follows:

(5)
With our formulation of 𝐶𝑐 complete, we turn to the energy distributions
for particles governed by Boltzmann, Bose-Einstein, and Fermi-Dirac
statistics.

RESULTS

𝐶𝑐 for the Boltzmann Distribution in One Dimension


We first illustrate our renormalization by applying it to a relatively simple
case: that of an ideal gas at temperature 𝑇. The kinetic energies 𝐸 of particles
in such a gas are described by the Boltzmann distribution [8]. In one
dimension, this is

(6)
where 𝑘𝐵 is the Boltzmann constant and 𝛽 = (1/𝑘𝐵𝑇).
The entropy of 𝑝𝐵,1𝐷(𝐸) can be shown to be 𝐻𝐵 = 1− ln(𝛽), and hence the
true diversity of energy in the range [0,∞) is given by
370 Information and Coding Theory in Computer Science

(7)
The cumulative probability 𝑐 from 𝐸 = 0 to 𝐸 = 𝑘 is then given by

(8)
Hence, 𝑘 can be computed in terms of 𝑐 as

(9)
Equation (9) is useful for the one-dimensional Boltzmann case to
eliminate the parameter 𝑘 altogether in (11) to obtain an explicit relationship
between 𝐶𝑐 and 𝑐. It is to be noted that, in most cases, both 𝐶𝑐 and 𝑐 can
only be parametrically related through 𝑘. The other quantities introduced in
Section 3 can then be calculated as follows:

(10)

(11)

(12)

(13)
We note that, in (13), the temperature factor 𝛽 cancels out, indicating that
the distribution of diversity for an ideal gas in one dimension is independent
Advancing Shannon Entropy for Measuring Diversity in Systems 371

of temperature. The resulting graph of 𝐶𝑐 as a function of 𝑐 is shown in Figure


1. It is worth noting in passing that 𝐶𝑐 reaches 40% when 𝑐 ≈ 69%, indicating
that approximately 69% of the molecules in the gas are contained within the
lower 40% of diversity of energy probability states at all temperatures (here,
diversity is defined as the number of equivalent equiprobable energy states
required to maintain the same amount of Shannon entropy 𝐻). Thus, the one-
dimensional Boltzmann distribution obeys an interesting phenomenon that
we have identified in a wide range of skewed-right complex systems, which
(as we briefly discussed in the Introduction) we call restricted diversity and,
more technically, the 60/40 rule [6]. The independence of temperature in
the 𝐶𝑐 versus 𝑐 curve, for the Boltzmann distribution, shows that the effect
of increasing 𝑇 is to shift the mean of the distribution to higher energies
and to increase its standard deviation, but not to change its characteristic
shape. Still, what is key to our results is that the temperature independence
of the 𝐶𝑐 curve for the Boltzmann distribution in one dimension validates
that our renormalization preserves the fundamental features of the original
distribution.

Figure 1. 𝐶𝑐 as a function of 𝑐 for the Boltzmann distribution in one dimension.


372 Information and Coding Theory in Computer Science

𝐶𝑐 for the Boltzmann Distribution in Three Dimensions


We now turn to the calculation of 𝐶𝑐 for the physically more important case
of the Boltzmann distribution in three dimensions [8]:

(14)
where the additional factor of accounts for the density of states.
The cumulative probability 𝑐 from 𝐸 = 0 to 𝐸 = 𝑘 can be computed as
follows:

(15)
As we would hope, (15) has the property that as 𝑘 → ∞, the cumulative
probability 𝑐 → 1.
However, it is difficult to solve (15) for 𝑘 directly in terms of 𝑐. We
therefore compute 𝐶𝑐 in parametric form with 𝑘 being the parameter. Also,
analytical forms are not possible, so Matlab was used to compute 𝐻𝑐, 𝐷𝑐, and
𝐶𝑐, respectively:

(16)
Thus, 𝐶𝑐 can also only be computed in parametric form with parameter
𝑘 that varies from 0 to ∞. Figure 2 shows the 𝐶𝑐 curve thus calculated for the
Boltzmann distribution in three dimensions.
Advancing Shannon Entropy for Measuring Diversity in Systems 373

Figure 2. 𝐶𝑐 versus 𝑐 for Boltzmann 3D superimposed at three different tem-


peratures: 𝑇 = 50 K, 500 K, and 5000 K.
Although the temperature independence of this distribution is not
immediately evident from Figure 2, one would, following the same logic
as for the one-dimensional case, expect the distribution of diversity to be
the same for all 𝑇. That is, as in the one-dimensional case, because changes
in 𝑇 do not affect the original distributions characteristic shape, we expect
the renormalized distribution to be independent of temperature. This does,
indeed, turn out to be the case. This is illustrated in Figure 2, which overlays
the results of the calculations for 𝑇 = 50 K, 500 K, and 5000 K. It is also worth
noting that, just like our one-dimensional case, the curve obeys the 60/40
rule of restricted diversity [6]: regardless of temperature, over 60 percent of
molecules are in the lower 40 percent of diversity of energy probability states
(here again, diversity is defined as the number of equivalent equiprobable
energy states required to maintain the same amount of Shannon entropy 𝐻).
In addition, it is worth noting that as we might expect, adding more
degrees of freedom increases the average energy by a factor of (1/2)𝑘𝐵𝑇
per degree while maintaining the same shape for the distribution of energy.
374 Information and Coding Theory in Computer Science

Hence, the current result will still hold true for gas molecules with higher
degrees of freedom; that is, the distribution of diversity is always exactly the
same for an ideal gas, whether monoatomic or polyatomic.

The Bose-Einstein Distributions for Massive and Massless


Bosons
We now move on to consider the second of our example distributions. The
Bose-Einstein distribution gives the energy probability density function for
massive bosons above the Bose temperature 𝑇𝐵 as

(17)
where 𝐶 is a normalization constant and

(18)
where 𝜁 is the Riemann zeta function. In the following calculations, we use
the Bose temperature for helium, 𝑇𝐵 = 3.14 K.
For massless bosons such as photons, the energy probability density
function is [13]

(19)
It is important to note that the “density of states” factors shown in (17)
and (19) result in different energy distributions, despite the two types of
boson obeying the same statistics.
The conditional probabilities, conditional entropies, true diversities, and
case-based entropies for these distributions cannot be calculated analytically
but can be calculated numerically. The results of such calculations, using the
software Matlab, are shown in Figure 3.
Advancing Shannon Entropy for Measuring Diversity in Systems 375

Figure 3. 𝐶𝑐 versus 𝑐 for Helium-4 and for photons. Note: the results of calcula-
tions carried out at 𝑇 = 50 K, 500 K, and 5000 K are overlaid.
As with the Boltzmann distributions, we find that the distributions
of diversity for the two boson systems are independent of temperature.
Although the curves for the two types of boson are very similar, it is evident
that the distributions of diversity do differ to some extent. For helium-4
bosons, a slightly larger fraction of particles are contained in lower diversity
energy states than is the case for photons, with 60% of atoms contained
in the approximately 37% of the lowest diversity states, as compared to
approximately 42% for photons. In other words, using 𝐶𝑐, we are able to
identify, even in such instances where intuition might suggest it to be true,
common patterns within and across these different energy systems, as
well as their variations. With this point made, we move to our final energy
distribution.

The Fermi-Dirac Distribution


The final distribution we use to illustrate our approach is the Fermi-Dirac
distribution:

(20)
376 Information and Coding Theory in Computer Science

where 𝐶 is again a normalization constant and 𝜇 is the Fermi energy [13]. In


the following, we calculate distributions for sodium electrons, for which 𝜇 =
3.4 eV. Once again, and 𝐶𝑐 cannot be calculated analytically and
so we rely on numerical calculations using Matlab.
The Fermi-Dirac distribution differs from the previous examples in that
it is not simply scaled by changes in energy. Instead, its shape changes,
transforming from a skewed-left distribution, with a sharp cut-off at the
Fermi energy at low temperatures, to a smooth, skewed-right distribution
at high temperatures. Thus, unlike the situation for Boltzmann and Bose-
Einstein distributions, one would expect the distributions of diversity for
fermions such as electrons to be dependent on temperature. Figure 4 compares
the results of calculating 𝐶𝑐 as a function of 𝑐 for electrons in sodium at
temperatures of 2.7 K (the temperature of space), 300 K (representing
temperatures on earth), 6000 K (the temperature of the surface of the sun),
and 15 × 106 K (the temperature of the core of the sun).

Figure 4. Diversity curves for sodium electrons at a range of temperatures with


𝐶𝑐 on the 𝑥-axis and 𝑐 on the 𝑦-axis.
This figure shows that the degree of diversity is the highest for fermions
at low temperatures; for example, at 2.7 K, fully 70% of the lowest
equiprobable diversity states are need to contain 60% of the particles,
compared with only approximately 38% at 15 × 106 K. It also shows that,
Advancing Shannon Entropy for Measuring Diversity in Systems 377

for sodium electrons, the diversity curve at normal temperatures on earth


(300 K) is almost identical to that at very low temperatures. That is, a room
temperature Fermi gas of sodium electrons has a distribution of diversity
very similar to that of a “Fermi condensate.”

USING 𝐶𝑐 TO COMPARE AND CONTRAST SYSTEMS


With our renormalization complete for all three distributions, we sought next
to demonstrate, albeit somewhat superficially, the utility of 𝐶𝑐 for comparing
and contrasting systems, given how widely known the results are for these
three classic energy distributions. To begin with, it is usual to assume that, in
the limit of high 𝑇, both Bose-Einstein and Fermi-Dirac distributions reduce
to Boltzmann distributions, and so the physical properties of both bosons
and fermions in this limit should be those of an ideal gas.
In Figures 5 and 6, we show a comparison of all three energy distributions
for temperatures of 6000 K and 15 × 106 K (the Bose-Einstein distribution for
massless bosons is included for comparison). In these figures, it appears that,
by 6000 K, the Bose-Einstein distribution for helium-4 is indistinguishable
from the 3D Boltzmann distribution. Also, while the Fermi-Dirac distribution
has clearly not reduced to the Boltzmann distribution even at 15 × 106 K, it
appears to be trending towards it.

Figure 5. Energy density curves.


378 Information and Coding Theory in Computer Science

Figure 6. 𝐶𝑐 versus 𝑐 curves.


However, comparison of the diversity distributions suggests that even
when the energy probability density functions appear to coincide, significant
physical differences remain between the systems. Figure 7 compares all the
diversity curves calculated in the present work.

Figure 7. Superposition of all diversity curves for Boltzmann 1D, Boltzmann


3D, Bose-Einstein Helium, Bose-Einstein Photon, and Fermi-Dirac Na at 2.7 K,
300 K, 6000 K, and 15000000 K.
It is clear from Figure 7 that the distributions of diversity for a classical
ideal gas and for both Bose-Einstein and Fermi-Dirac distributions are
significantly different. Because these renormalized distributions are
Advancing Shannon Entropy for Measuring Diversity in Systems 379

independent of temperature, this suggests that there is no limit in which


the Bose-Einstein distribution for the photon becomes completely
indistinguishable from the Boltzmann distribution. Even more strikingly,
the distribution of diversity in a system obeying Fermi-Dirac statistics only
approaches that of bosonic systems at extremely high temperatures, similar
to those at the core of the sun. At lower temperatures, the Fermi gas has
substantially higher degrees of diversity than all the other systems. This is
because, at lower temperatures, most of the fermions are yet to surpass the
barrier created by the Fermi energy and hence are all restricted to the lower
end of the energy.
Thus, the transformation from the usual probability distribution to a
distribution of case-based entropy (𝐶𝑐 versus 𝑐) has allowed us to make
direct scale-free comparisons, of the ways in which the Maxwell-Boltzmann,
Bose-Einstein, and Fermi-Dirac energy distributions are similar or differ
both internally (as a function of temperature 𝑇) and across distributions.
It appears that, except for extremely high temperatures, the Fermi-Dirac
distribution has a larger value of 𝐶𝑐 than the others. This means that there
are a larger number of Shannon-equivalent equiprobable states of energy
for the Fermi-Dirac distribution as compared to the others. A speculative
explanation could be that Pauli’s exclusion principle does not allow for more
than one fermion to occupy the same quantum state, thereby restricting the
accumulation of fermions in the same state (i.e., more diversity).

CONCLUSION
As we have hopefully shown in this paper, while Shannon entropy 𝐻 has
been used to rethink probability distributions in terms of diversity, it suffers
from two major limitations. First, it cannot be used to compare distributions
that have different levels of scale. Second, it cannot be used to compare
parts of distributions to the whole.
To address these limitations, we introduced a renormalization of
probability distributions based on the notion of case-based entropy 𝐶𝑐 (as
a function of the cumulative probability 𝑐). We began with an explanation
of why we rethink probability distributions in terms of diversity, based on
a Shannon-equivalent uniform distribution, which comes from the work of
Jost and others on the notion of true diversity in ecology and evolutionary
biology [4, 9, 10]. With this approach established, we then reviewed our
construction of case-based entropy 𝐶𝑐. Given a probability density (𝑥), 𝐶𝑐
measures the diversity of the distribution up to a cumulative probability of
380 Information and Coding Theory in Computer Science

𝑐, by computing the length or support of an equivalent uniform distribution


that has the same Shannon information as the conditional distribution of
up to a cumulative probability 𝑐.
With our conceptualization of 𝐶𝑐 complete, we used it to renormalize and
compare three physically significant energy distributions in physics, namely,
the Maxwell-Boltzmann, Bose-Einstein, and Fermi-Dirac distributions for
energy of subatomic particles. We chose these three distributions for three
key reasons: (1) we wanted to see if 𝐶𝑐 works for continuous distribution;
(2) where the focus was on diversity of types and not on their rank order
in terms of complexity; and (3) where the unit order of measure was both
objective and widely accepted. Based on our results, we concluded that 𝐶𝑐
is a vast improvement over 𝐻 as it provides an intuitively useful, scale-free
comparison of probability distributions and also allows for a comparison
between parts of distributions as well.
The renormalization obtained will have a different shape for different
distributions. In fact, a bimodal, right skewed, or other kinds of distributions
will lead to a different 𝐶𝑐 versus 𝑐 curve. There are two interesting points of
inquiry in future papers, namely, (a) how the shape of the original distribution
influences the 𝐶𝑐 versus 𝑐 curve and (b) whether we can reconstruct the
original shape of the distribution given the 𝐶𝑐 versus 𝑐 curve. Because of
the scale-free nature of 𝐶𝑐, all distributions can be compared in the same
plot without reference to their original scales. In our future work, we will
endeavor to connect the shape of the 𝐶𝑐. versus 𝑐 curve to the shape of the
original distribution. This will allow us to locate portions of the original
distribution (irrespective of their scale), where diversity is concentrated, and
portions where it is sparse, even though the original distributions cannot be
plotted on the same graph due to huge variation in their scales.

ACKNOWLEDGMENTS
The authors would like to thank the following colleagues at Kent State
University: (1) Dean Susan Stocker, (2) Kevin Acierno and Michael Ball
(Computer Services), and (3) the Complexity in Health and Infrastructure
Group for their support. They also wish to thank Emma Uprichard and
David Byrne and the ESRC Seminar Series on Complexity and Method in
the Social Sciences (Centre for Interdisciplinary Methodologies, University
of Warwick, UK) for the chance to work through the initial framing of these
ideas.
Advancing Shannon Entropy for Measuring Diversity in Systems 381

REFERENCES
1. S. E. Page, The Difference: How the Power of Diversity Creates Better
Groups, Firms, Schools, and Societies, Princeton University Press,
2008.
2. S. E. Page, Diversity and Complexity, Princeton University Press,
2008.
3. M. O. Hill, “Diversity and evenness: a unifying notation and its
consequences,” Ecology, vol. 54, no. 2, pp. 427–432, 1973.
4. L. Jost, “Entropy and diversity,” Oikos, vol. 113, no. 2, pp. 363–375,
2006.
5. R. Rajaram and B. Castellani, “An entropy based measure for comparing
distributions of complexity,” Physica A. Statistical Mechanics and Its
Applications, vol. 453, pp. 35–43, 2016.
6. B. Castellani and R. Rajaram, “Past the power law: complex systems
and the limiting law of restricted diversity,” Complexity, vol. 21, no. 2,
pp. 99–112, 2016.
7. M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,”
Contemporary Physics, vol. 46, no. 5, pp. 323–351, 2005.
8. M. C. Mackey, Time’s Arrow: The Origins of Thermodynamic Behavior,
Springer Verlag, Germany, 1992.
9. T. Leinster and C. A. Cobbold, “Measuring diversity: the importance
of species similarity,” Ecology, vol. 93, no. 3, pp. 477–489, 2012.
10. J. Beck and W. Schwanghart, “Comparing measures of species
diversity from incomplete inventories: an update,” Methods in Ecology
and Evolution, vol. 1, no. 1, pp. 38–44, 2010.
11. R. H. Macarthur, “Patterns of species diversity,” Biological Reviews,
vol. 40, pp. 510–533, 1965.
12. R. Peet, “The measurement of species diversity,” Annual Review of
Ecological Systems, vol. 5, pp. 285–307, 1974.
13. C. H. Tien and J. H. Lienhard, Statistical Thermodynamics, Hemisphere,
1979.
INDEX

A B
acknowledgments (ACK) 192 BAB (block array builder) 86
adaptive modulation and coding backward error correction (BEC)
(AMC) 192 188
additive white Gaussian noise base station (BS) 149, 150, 198
(AWGN) 14, 158, 165, 167, Base switching (BS) 273
169, 188 base-switching transformation
Airborne Visible Infrared Imaging (BST) algorithm 297
Spectrometer (AVIRIS) 236 BBM (Bialynicki-Birula and My-
antenna array elements 149, 150, cielski) 37
151, 158, 159 Beam Pattern Scanning (BPS) 149
arithmetic coding 290, 291, 298, Big Data 58
299, 301, 304, 305 binary erasure channel (BECH) 188
ARM (Advanced RISC Machines) binary symmetrical channel (BSC)
78 188
Artificial Intelligence 57, 58, 72 bit error rate (BER) 183
ASIC (application-specific integrat- bits back with asymmetric numeral
ed circuit) 83 systems (BB-ANS) 120
augmented reality (AR) video Boltzmann distribution 369, 371,
streams 196 372, 377, 379
Automatic repeat request (ARQ) Boltzmann law 315, 316
191 Bose, Chaudhuri and Hocquenghem
autonomous differential equations (BCH) 212, 217
348 Breast cancer 270, 286
autonomous nodes 346 BWT (Burrows-Wheeler transform)
81
384 Information and Coding Theory in Computer Science

C 374, 375, 376, 378, 379, 380,


381
case-based entropy 362, 364, 365,
diversity multiplexing tradeoff
368, 379
(DMT) 195
Cauchy functional equation 336
Dynamical systems 348
channel state information (CSI) 190
classical thermodynamics 309 E
Clausius entropy 315
edge-directed prediction (EDP) 119
Clustering 58, 66
entropic uncertainty relations
Coding redundancy 213
(EURs) 24
coding theorem 45, 49, 50, 53, 55
Entropy 309, 310, 314, 315, 324,
Cognitive radio (CR) 3
332
communication signal 347
Equal Gain Combining (EGC) 162,
communication theory 310
164
compression ratio (CR) 117
ergodic system 349
computational complexity 119
error image 272, 279, 280, 281, 282,
Computational Intelligence (CI) 59
283, 284
computer aided diagnosis (CAD)
270 F
Context-Based Adaptive Linear Pre-
Federal Communications Commis-
diction (CoBALP+) 119
sion (FCC) 4
Convolutional Neural Network
Fermi-Dirac distributions 362, 367,
(CNN) 116
377, 378, 380
correlation coefficient 355
fifth-generation (5G) 197
cumulative distribution function
firmware 212
(CDF) 201
fixed block based (FBB) 273
cumulative probability 362, 365,
forward error correction (FEC) 188
366, 368, 370, 372, 379
FPGA (field-programmable gate ar-
D ray) 83
Free Lossless Image Format (FLIF)
DCT (discrete cosine transform) al-
117
gorithm 77
Dempster-Shafer Theory of Evi- G
dence (DST) 345, 350
Gel’fand Pinsker (GP) 4
digital images 290
General Artificial Intelligence (GAI)
Discrete Cosine Transform (DCT)
58
116
Gradient Adjusted Predictor (GAP)
diversity 361, 362, 363, 364, 365,
119
366, 367, 368, 369, 370, 373,
Index 385

Gradient edge detection (GED) 119 L


H least-square (LS) 117
Lempel-Ziv-Welch (LZW) 212, 217
Haar transform 116
Liouville-von Neumann equation
Hadamard transform 116
27, 29, 38
half power beam width (HPBW)
Lossless image compression sys-
151
tems 272
hardware 212
Lossy image compression 212
HD (high-definition) camera 76
lossy (or erasure) channel 188
Heisenberg uncertainty relation
low-density generator matrix
(HUR) 23
(LDGM) 195
Hölder inequality 48, 51, 53
low-density parity check code
homogeneous network system 348
(LDPC) 193
Huffman coding 290, 305
Luby Transform (LT) 193
human visual system (HVS) 234
Hyperspectral images (HSI) 233, M
234
Machine Learning 57, 58
I Mammography 269, 270, 271, 272,
281
image compression 77, 83, 113
Markov networks 120
image processing 290
Matching Link Builder (MLB) 84
Information Gathering 58
mathematical statistics 310, 315
Information theory 24
Mathematics 58
interference channel with degraded
Maximal Ratio Combining (MRC)
message sets (IC-DMS) 4
153
Interpixel redundancy 213
maximum likelihood (ML) decoding
IoT (Internet of Things) 75, 76
182
IR (Industrial Revolution) 75, 76
Maxwell equations 26
J measurement 362, 365, 381
median edge detector (MED) 216,
Jenson’s inequality 47
290, 291
K Median edge detector (MED) 119
memory 212, 213
Karhunen-Loeve Transform (KLT) meta-adaptive near-zero integer
116 arithmetic coding\” (MANI-
Karhunen-Loève transform (KLT) AC) 120
238 MIAS (Mammography Image Anal-
Kraft inequality 49, 50 ysis Society) images 272
386 Information and Coding Theory in Computer Science

Minimum Mean Square Error Orthogonal space-time block coding


(MMSE) 119 (OSTBC) 176
MLB (matching link builder) 86
P
mobile ad-hoc networks (MANETs)
345 Partial Precision Matching Method
Mobile computing 346 (PPMM) 214
Moment Generating Function peak-signal-to-noise ratio (PSNR)
(MGF) 182 234
MTF (move-to-front) 81 pilot contamination (PC) 188
multi-carrier code division multiple Portable Network Graphics (PNG)
access, MC-CDMA 160 117
Multi-Level Progressive Method prediction by partial matching
(MLP) 214 (PPM) 120
multimedia broadcast/multicast ser- primary user (PU) 4
vice (MBMS) 196 primary user’s (PU’s) message 17
multimedia broadcast multicast sys- principal component analysis (PCA)
tem (MBMS) 196 233
Multiple Access channel (MAC) 12 probability 309, 310, 311, 312, 313,
Multiple Access Interference (MAI) 314, 315, 316, 318, 320, 321,
167 322, 323, 324, 325, 328, 329,
Multiple Input Multiple Output 330
(MIMO) 16 probability distribution 333, 334,
multiple-input multiple-output 336, 338, 339
(MIMO) wireless communi- Psychology 58
cation systems 187 Psychovisual redundancy 213
multiple-tables arithmetic coding
Q
(MTAC) 290, 291
multiuser interference (MUI) 200 quadratic form 179, 180
quality of service (QoS) wireless
N
services 196
Neurology 58 quantum information theory 24
quantum mechanics 23, 24, 34, 35,
O
42
open systems interconnection (OSI)
R
191
opportunistic communication 6, 13 radio frequency (RF) 4
Optimization 58 RAM (random access memory) 77
orthogonal frequency division mul-
tiplexing (OFDM) 160
Index 387

randomness 348 space-time trellis coding (STTC)


random trial 310, 311, 312, 313, 149
314, 320, 323 Space-Time Trellis Coding (STTC)
random variable 312, 313, 317, 318, information symbols 150
320, 321, 322, 323, 324, 325, Spatial Division Multiple Access
326, 329 (SDMA) 151
rapid development 290 split band (SB) 235
rateless codes (RCs) 187 Statistical distributions 362
rateless space-time block code statistical mechanics 310, 318
(RSTBC) 187, 198, 199, 202 stochastic process 347, 348, 350
Recurrent Neural Network (RNN) SVD (singular value decomposition)
116 algorithm 77
recursive least squares (RLS) 119 symbol-error-rate (SER) 200
Reflective Optics System Imaging
T
Spectrometer (ROSIS) 236
Renyi’s entropy 47, 49 thermodynamic 309, 311, 315, 316,
Residual Neural Network (RestNet) 329
116 time-dependent Hamiltonian sys-
RLE (run-length encoding) 81 tems 27
trajectory 348
S
transmitter (TX) 4
Schrödinger equation 28 Tsallis entropy 310, 320, 323, 329
Shannon inequality 45, 47, 54 Two Modules Based Algorithm
Shannon’s entropy 47 (TMBA) 215
Shannon’s entropy equation 291
U
signal-to-interference-and-noise ra-
tio (SINR) 188 ultrawideband (UWB) 5
signal-to-noise power ratio (SNR) 6
V
signal-to-noise ratio (SNR) 177
single-input single-output (SISO) variable block size
195 segmentation(VBSS) 273
singular value decomposition 116 vertical Bell Labs layered space-
Slant transform 116 time (V-BLAST) 195
small memory 76 Visual Saliency-based Index (VSI)
space-time block coding (STBC) 130
149 VM (virtual memory) 80, 106
Space-Time Coding (STC) 159
388 Information and Coding Theory in Computer Science

W wireless fading channel 188


WSN (wireless sensor network) 76
wavelet transform 238
wearable body area network X
(WBAN) 196
XML (Extensible Markup Lan-
Wehrl entropy 29
guage) 90

You might also like