Zoran Gacovski Information and Coding Theory in Computer Science
Zoran Gacovski Information and Coding Theory in Computer Science
in Computer Science
INFORMATION AND CODING
THEORY IN COMPUTER SCIENCE
Edited by:
Zoran Gacovski
ARCLER
P r e s s
www.arclerpress.com
Information and Coding Theory in Computer Science
Zoran Gacovski
Arcler Press
224 Shoreacres Road
Burlington, ON L7L 2H2
Canada
www.arclerpress.com
Email: [email protected]
This book contains information obtained from highly regarded resources. Reprinted
material sources are indicated. Copyright for individual articles remains with the au-
thors as indicated and published under Creative Commons License. A Wide variety of
references are listed. Reasonable efforts have been made to publish reliable data and
views articulated in the chapters are those of the individual contributors, and not neces-
sarily those of the editors or publishers. Editors or publishers are not responsible for
the accuracy of the information in the published chapters or consequences of their use.
The publisher assumes no responsibility for any damage or grievance to the persons or
property arising out of the use of any materials, instructions, methods or thoughts in the
book. The editors and the publisher have attempted to trace the copyright holders of all
material reproduced in this publication and apologize to copyright holders if permission
has not been obtained. If any copyright holder has not been acknowledged, please write
to us so we may rectify.
Notice: Registered trademark of products or corporate names are used only for explana-
tion and identification without intent of infringement.
Arcler Press publishes wide variety of books and eBooks. For more information about
Arcler Press and its products, visit our website at www.arclerpress.com
DECLARATION
Some content or chapters in this book are open access copyright free
published research work, which is published under Creative Commons
License and are indicated with the citation. We are thankful to the
publishers and authors of the content and chapters as without them this
book wouldn’t have been possible.
ABOUT THE EDITOR
Dr. Zoran Gacovski’s current position is a full professor at the Faculty of Technical
Sciences, “Mother Tereza” University, Skopje, Macedonia. His teaching subjects
include Software engineering and Intelligent systems, and his areas of research are:
information systems, intelligent control, machine learning, graphical models (Petri,
Neural and Bayesian networks), and human-computer interaction. Prof. Gacovski
has earned his PhD degree at Faculty of Electrical engineering, UKIM, Skopje. In his
career he was awarded by Fulbright postdoctoral fellowship (2002) for research stay at
Rutgers University, USA. He has also earned best-paper award at the Baltic Olympiad
for Automation control (2002), US NSF grant for conducting a specific research in the
field of human-computer interaction at Rutgers University, USA (2003), and DAAD
grant for research stay at University of Bremen, Germany (2008 and 2012). The projects
he took an active participation in, are: “A multimodal human-computer interaction and
modelling of the user behaviour” (for Rutgers University, 2002-2003) - sponsored by
US Army and Ford; “Development and implementation of algorithms for guidance,
navigation and control of mobile objects” (for Military Academy – Skopje, 1999-2002);
“Analytical and non-analytical intelligent systems for deciding and control of uncertain
complex processes” (for Macedonian Ministry of Science, 1995-1998). He is the author
of 3 books (including international edition “Mobile Robots”), 20 journal papers, over 40
Conference papers, and he is also a reviewer/ editor for IEEE journals and Conferences.
TABLE OF CONTENTS
List of Contributors........................................................................................xv
List of Abbreviations..................................................................................... xxi
Preface................................................................................................... ....xxiii
x
Related Work on Lossless Compression.................................................. 119
Our Proposed Method............................................................................ 121
Experiments............................................................................................ 129
Conclusions............................................................................................ 142
Acknowledgments.................................................................................. 143
References.............................................................................................. 144
xi
Motivation to Rateless Space-Time Coding.............................................. 197
Rateless Space-Time Block Code for Massive MIMO Systems................. 197
Conclusion............................................................................................. 202
References.............................................................................................. 204
xii
Chapter 13 Lossless Image Compression Based
on Multiple-Tables Arithmetic Coding................................................... 289
Abstract.................................................................................................. 290
Introduction............................................................................................ 290
The MTAC Method................................................................................. 291
Experiments............................................................................................ 300
Conclusions............................................................................................ 304
References.............................................................................................. 305
xiii
Conclusion............................................................................................. 358
References.............................................................................................. 360
Index...................................................................................................... 383
xiv
LIST OF CONTRIBUTORS
F. G. Awan
University of Engineering and Technology Lahore, 54890 Pakistan
N. M. Sheikh
University of Engineering and Technology Lahore, 54890 Pakistan
M. F. Hanif
University of the Punjab Quaid-e-Azam Campus 54590, Lahore Pakistan
Litegebe Wondie
Department of Mathematics, College of Natural and Computational Science, University
of Gondar, Gondar, Ethiopia
Satish Kumar
Department of Mathematics, College of Natural and Computational Science, University
of Gondar, Gondar, Ethiopia
Daniel Kovach
Kovach Technologies, San Jose, CA, USA
Qin Jiancheng
School of Electronic and Information Engineering, South China University of
Technology, Guangdong, China
Lu Yiqin
School of Electronic and Information Engineering, South China University of
Technology, Guangdong, China
Zhong Yu
Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China
School of Software, South China University of Technology, Guangdong, China
Jungan Chen
Department of Electronic and Computer Science, Zhejiang Wanli University, Ningbo,
China
Jean Jiang
College of Technology, Purdue University Northwest, Indiana, USA
Xinnian Guo
Department of Electronic Information Engineering, Huaiyin Institute of Technology,
Huaian, China
Lizhe Tan
Department of Electrical and Computer Engineering, Purdue University Northwest,
Indiana, USA
Lei Wang
School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an,
China.
Zhigang Chen
School of Electronics and Information Engineering, Xi’an Jiaotong University, Xi’an,
China.
Ali Alqahtani
College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia
A. Alarabeyyat
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan
S. Al-Hashemi
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan
xvi
T. Khdour
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan
M. Hjouj Btoush
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan
S. Bani-Ahmad
Prince Abdulah Bin Gazi Faculty of Information Technology, Al-Balqa Applied
University, Salt, Jordan
R. Al-Hashemi
The Computer Information Systems Department, College of Information Technology,
Al-Hussein Bin Talal University, Ma’an, Jordan.
Chiman Kwan
Applied Research LLC, Rockville, Maryland, USA
Jude Larkin
Applied Research LLC, Rockville, Maryland, USA
Shivaprakash Koliwad
Department of E&C., MCE, Hassan, Karnataka, India.
Rung-Ching Chen
Department of Information Management, Chaoyang University of Technology, No.
168, Jifong E. Rd., Wufong Township Taichung County 41349, Taiwan
Pei-Yan Pai
Department of Computer Science, National Tsing-Hua University, No. 101, Section 2,
Kuang-Fu Road, Hsinchu 300, Taiwan
Yung-Kuan Chan
Department of Management Information Systems, National Chung Hsing University,
250 Kuo-kuang Road, Taichung 402, Taiwan
xvii
Chin-Chen Chang
Department of Computer Science, National Tsing-Hua University, No. 101, Section 2,
Kuang-Fu Road, Hsinchu 300, Taiwan
Department of Information Engineering and Computer Science, Feng Chia University,
No. 100 Wenhwa Rd., Seatwen, Taichung 407, Taiwan
Vladimír Majerník
Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia
C. G. Chakrabarti
Department of Applied Mathematics, University of Calcutta, India
Indranil Chakrabarty
Department of Mathematics, Heritage Institute of Technology, Chowbaga Road,
Anandapur, India
Luís Marrone
LINTI, Facultad de Informatica UNLP, La Plata, Argentina
R. Rajaram
Department of Mathematical Sciences, Kent State University, Kent, OH, USA
B. Castellani
Department of Sociology, Kent State University, 3300 Lake Rd. West, Ashtabula, OH,
USA
A. N. Wilson
School of Social and Health Sciences, Abertay University, Dundee DD1 1HG, UK
xviii
LIST OF ABBREVIATIONS
5G Fifth-Generation
ACK Acknowledgments
AMC Adaptive Modulation And Coding
AR Augmented Reality
ARM Advanced RISC Machines
ARQ Automatic Repeat reQuest
ASIC Application-Specific Integrated Circuit
AVIRIS Airborne Visible Infrared Imaging Spectrometer
AWGN Additive White Gaussian Noise
BAB Block Array Builder
BB-ANS Bits Back With Asymmetric Numeral Systems
BBM Bialynicki-Birula and Mycielski
BCH Bose, Chaudhuri and Hocquenghem
BEC Backward Error Correction
BECH Binary Erasure Channel
BER Bit Error Rate
BPS Beam Pattern Scanning
BS Base Station
BS Base switching
BSC Binary Symmetrical Channel
BST Base-Switching Transformation
BWT Burrows-Wheeler Transform
CAD Computer Aided Diagnosis
CDF Cumulative Distribution Function
CI Computational Intelligence
CNN Convolutional Neural Network
CoBALP+ Context-Based Adaptive Linear Prediction
CR Cognitive Radio
CR Compression Ratio
CSI Channel State Information
DCT Discrete Cosine Transform
DMT Diversity Multiplexing Tradeoff
DST Dempster-Shafer Theory of Evidence
EDP Edge-Directed Prediction
EGC Equal Gain Combining
EURs Entropic Uncertainty Relations
FBB Fixed Block Based
FCC Federal Communications Commission
FEC Forward Error Correction
FLIF Free Lossless Image Format
FPGA Field-Programmable Gate Array
GAI General Artificial Intelligence
GAP Gradient Adjusted Predictor
GED Gradient Edge Detection
GP Gel’fand Pinsker
HD High-Definition
HIS Hyperspectral Images
HPBW Half Power Beam Width
HUR Heisenberg Uncertainty Relation
HVS Human Visual Systems
IC-DMS Interference Channel with Degraded Message Sets
IoT Internet of Things
IR Industrial Revolution
KLT Karhunen-Loeve Transform
LDGM Low-Density Generator Matrix
LDPC Low-Density Parity Check Code
LS Least-Square
LT Luby Transform
LZW Lempel-Ziv-Welch
MAC Multiple Access Channel
MAI Multiple Access Interference
MANETs Mobile Ad-Hoc Networks
xx
MANIAC Meta-Adaptive Near-Zero Integer Arithmetic Coding
MBMS Multimedia Broadcast Multicast System
MC-CDMA Multi-Carrier Code Division Multiple Access
MED Median Edge Detector
MGF Moment Generating Function
MIAS Mammography Image Analysis
MIMO Multiple Input Multiple Output
ML Maximum Likelihood
MLB Matching Link Builder
MLP Multi-Level Progressive
MMSE Minimum Mean Square Error
MRC Maximal Ratio Combining
MTAC Multiple-Tables Arithmetic Coding
MTF Move-To-Front
MUI Multiuser Interference
OFDM Orthogonal Frequency Division Multiplexing
OSI Open Systems Interconnection
OSTBC Orthogonal Space-Time Block Coding
PC Pilot Contamination
PCA Principal Component Analysis
PNG Portable Network Graphics
PPM Prediction By Partial Matching
PPMM Partial Precision Matching Method
PSNR Peak-Signal-To-Noise Ratio
PU Primary User
QoS Quality of Service
RAM Random Access Memory
RCs Rateless Codes
RestNet Residual Neural Network
RF Radio Frequency
RLE Run-Length Encoding
RLS Recursive Least Squares
RNN Recurrent Neural Network
ROSIS Reflective Optics System Imaging Spectrometer
xxi
RSTBC Rateless Space-Time Block Code
SB Split Band
SDMA Spatial Division Multiple Access
SER Symbol-Error-Rate
SINR Signal-To-Interference-And-Noise Ratio
SISO Single-Input Single-Output
SNR Signal-To-Noise Ratio
STBC Space-Time Block Coding
STC Space-Time Coding
STTC Space-Time Trellis Coding
SVD Singular Value Decomposition
TMBA Two Modules Based Algorithm
UWB Ultrawideband
V-BLAST Vertical Bell Labs Layered Spacetime
VBSS Variable Block Size Segmentation
VM Virtual Memory
VSI Visual Saliency-based Index
WBAN Wearable Body Area Network
WSN Wireless Sensor Network
XML Extensible Markup Language
xxii
PREFACE
Coding theory is a field that studies the codes, their properties and their
suitability for specific applications. Codes are used for data compression,
cryptography, error detection and correction, data transfer and data storage.
Codes are studied in a variety of scientific disciplines - such as information
theory, electrical engineering, mathematics, linguistics, and computer science -
in order to design efficient and reliable data transmission methods. This usually
involves removing redundant digits and detecting / correction of errors in the
transmitted data.
There are four types of coding:
• Data compression (or, source coding)
• Error control (or channel peeling)
• Cryptographic coding
• Line coding
Data compression tries to remove redundant data from a source in order to
transfer it as efficiently as possible. For example, Zip data compression
makes data files smaller for purposes such as reducing Internet traffic. Data
compression and error correction can be studied in combination.
Error correction adds extra bits to make data transmission more robust to
interference that occurs on the transmission channel. The average user may not
be aware of the many types of applications that use error correction. A typical
music CD uses the Reed-Solomon code to correct the impact of scratches and
dust. In this application, the transmission channel is the CD itself. Mobile
phones also use coding techniques to correct attenuation and high frequency
transmission noise. Data modems, telephone transmissions, and NASA’s Deep
Space Network all use channel coding techniques to transmit bits, such as turbo
code and LDPC codes.
This edition covers different topics from information theory methods and
approaches, block and stream coding, lossless data compression, and information
and Shannon entropy.
Section 1 focuses on information theory methods and approaches, describing
information theory of cognitive radio system, information theory and entropies
for quantized optical waves in complex time-varying media, some inequalities
in information theory using Tsallis entropy, and computational theory of
intelligence: information entropy.
Section 2 focuses on block and stream coding, describing block-split array coding
algorithm for long-stream data compression, bit-error aware lossless image
compression with 2d-layer-block coding, beam pattern scanning (BPS) versus
space-time block coding (STBC) and space-time trellis coding (STTC), partial
feedback based orthogonal space-time block coding with flexible feedback bits,
and rate-less space-time block codes for 5g wireless communication systems.
Section 3 focuses on lossless data compression, describing lossless image
compression technique using combination methods, new results in perceptually
lossless compression of hyperspectral images, lossless compression of digital
mammography using base switching method, and lossless image compression
based on multiple-tables arithmetic coding.
Section 4 focuses on information and Shannon entropy, describing entropy as
universal concept in sciences, Shannon entropy - axiomatic characterization
and application, Shannon entropy in distributed scientific calculations on
mobiles ad-hoc networks (MANETs), the computational theory of intelligence:
information entropy, and advancing Shannon entropy for measuring diversity
in systems.
SECTION 1: INFORMATION THEORY
METHODS AND APPROACHES
Chapter
INFORMATION THEORY
OF COGNITIVE RADIO
SYSTEM
1
INTRODUCTION
Cognitive radio (CR) carries bright prospects for very efficient utilization
of spectrum in future. Since cognitive radio is still in its early days, many
of its theoretical limits are yet to be explored. In particular, determination
of its maximum information rates for the most general case is still an open
problem. Till today, many cognitive channel models have been presented.
Either achievable or maximum bit rates have been evaluated for each of these.
This chapter will summarize all the existing results, makes a comparison
between different channel models and draw useful conclusions.
The scarcity of the radio frequency (RF) spectrum along with its severe
underutilization, as suggested by various government bodies like the
Federal Communications Commission (FCC) in USA and Ofcom in UK like
Rafsh [1] and [2] has triggered immense research activity on the concept
of CR all over the world. Of the many facets that need to be dealt with,
the information theoretic modeling of CR is of core importance, as it helps
predict the fundamental limit of its maximum reliable data transmission.
The information theoretic model proposed in [3] represents the real world
scenario that the CR will have to encounter in the presence of primary user
(PU) devices. Authors in [3] characterize the CR system as an interference
channel with degraded message sets (IC-DMS), since the spectrum
sensing nature of the CR may enable its transmitter (TX) to know PU’s
message provided the PU is in close proximity of the CR. Elegantly using a
combination of rate-splitting [4] and Gel’fand Pinsker (GP) [5] coding, [3]
has given an achievable rate region of the so called CR-channel or IC-DMS.
Further, in [3] time sharing is performed between the two extreme cases
when either the CR dedicates zero power (“highly polite”) or full power
(“highly rude”) to its message. A complete review of information theoretic
studies can be found in [6] and [7].
This chapter then discusses outer-bounds to the individual rates and
the conditions under which these bounds become tight for the symmetric
Gaussian CR channel in the low interference gain regime. The CR transmitter
is assumed to use dirty paper coding while deriving the outer-bounds. The
capacity of the CR channel in the low interference scenario is known when
the CR employs “polite” approach by devoting some portion of its power
to transmit PU’s message that will help calculating quality of service for
the CR users. Then, we will focus on the scenario when the CR goes for the
“rude” approach i.e., does not relay PU’s message and tries to maximize its
own rates only. It will be derived that when both CR and the PU operate in
low interference gain regime, then treating interference as additive noise at
the PU receiver and doing dirty paper coding at the CR is nearly optimal.
Underlay Paradigm
The underlay paradigm encompasses techniques that allow communication
by the cognitive radio assuming it has knowledge of the interference
caused by its transmitter to the receivers of all noncognitive users [7].
In this setting the cognitive radio is often called a secondary user which
cannot significantly interfere with the communication of existing (typically
licensed) users, who are referred to as primary users. Specifically, the
underlay paradigm mandates that concurrent noncognitive and cognitive
transmissions may occur only if the interference generated by the cognitive
devices at the noncognitive receivers is below some acceptable threshold.
The interference constraint for the noncognitive users may be met by using
multiple antennas to guide the cognitive signals away from the noncognitive
receivers, or by using a wide bandwidth over which the cognitive signal
can be spread below the noise floor, then despread at the cognitive receiver.
The latter technique is the basis of both spread spectrum and ultrawideband
(UWB) communications.
The interference caused by a cognitive transmitter to a noncognitive
receiver can be approximated via reciprocity if the cognitive transmitter can
overhear a transmission from the cognitive receiver’s location. Alternatively,
the cognitive transmitter can be very conservative in its output power to
ensure that its signal remains below the prescribed interference threshold.
In this case, since the interference constraints in underlay systems are
typically quite restrictive, this limits the cognitive users to short range
communications.
While the underlay paradigm is most common in the licensed spectrum
(e.g. UWB underlays many licensed spectral bands), it can also be used in
unlicensed bands to provide different classes of service to different users.
Overlay Paradigm
The enabling premise for overlay systems is that the cognitive transmitter
has knowledge of the noncognitive users’ codebooks and its messages as
well [7]. The codebook information could be obtained, for example, if the
noncognitive users follow a uniform standard for communication based on
a publicized codebook. Alternatively, they could broadcast their codebooks
periodically. A noncognitive user message might be obtained by decoding
6 Information and Coding Theory in Computer Science
the message at the cognitive receiver. However, the overlay model assumes
the noncognitive message is known at the cognitive transmitter when the
noncognitive user begins its transmission. While this is impractical for an
initial transmission, the assumption holds for a message retransmission
where the cognitive user hears the first transmission and decodes it, while
the intended receiver cannot decode the initial transmission due to fading or
interference. Alternatively, the noncognitive user may send its message to the
cognitive user (assumed to be close by) prior to its transmission. Knowledge
of a noncognitive user’s message and/or codebook can be exploited in a
variety of ways to either cancel or mitigate the interference seen at the
cognitive and noncognitive receivers. On the one hand, this information can
be used to completely cancel the interference due to the noncognitive signals
at the cognitive receiver by sophisticated techniques like dirty paper coding
[9]. On the other hand, the cognitive users can utilize this knowledge and
assign part of their power for their own communication and the remainder
of the power to assist (relay) the noncognitive transmissions. By careful
choice of the power split, the increase in the noncognitive user’s signal-to-
noise power ratio (SNR) due to the assistance from cognitive relaying can
be exactly offset by the decrease in the noncognitive user’s SNR due to the
interference caused by the remainder of the cognitive user’s transmit power
used for its own communication. This guarantees that the noncognitive
user’s rate remains unchanged while the cognitive user allocates part of its
power for its own transmissions. Note that the overlay paradigm can be
applied to either licensed or unlicensed band communications. In licensed
bands, cognitive users would be allowed to share the band with the licensed
users since they would not interfere with, and might even improve, their
communications. In unlicensed bands cognitive users would enable a higher
spectral efficiency by exploiting message and codebook knowledge to
reduce interference.
Interweave Paradigm
The ‘interweave’ paradigm is based on the idea of opportunistic
communication, and was the original motivation for cognitive radio [10].
The idea came about after studies conducted by the FCC [8] and industry
[2] showed that a major part of the spectrum is not utilized most of the time.
In other words, there exist temporary space-time frequency voids, referred
to as spectrum holes, that are not in constant use in both the licensed and
unlicensed bands.
Information Theory of Cognitive Radio System 7
These gaps change with time and geographic location, and can be exploited
by cognitive users for their communication. Thus, the utilization of spectrum
is improved by opportunistic frequency reuse over the spectrum holes. The
interweave technique requires knowledge of the activity information of the
noncognitive (licensed or unlicensed) users in the spectrum. One could also
consider that all the users in a given band are cognitive, but existing users
become primary users, and new users become secondary users that cannot
interfere with communications already taking place between existing users.
To summarize, an interweave cognitive radio is an intelligent wireless
communication system that periodically monitors the radio spectrum,
intelligently detects occupancy in the different parts of the spectrum and
then opportunistically communicates over spectrum holes with minimal
interference to the active users. For a fascinating motivation and discussion
of the signal processing challenges faced in interweave cognitive radio, we
refer the reader to [11].
Table 1 [12] summarizes the differences between the underlay, overlay
and interweave cognitive radio approaches. While underlay and overlay
techniques permit concurrent cognitive and noncognitive communication,
avoiding simultaneous transmissions with noncognitive or existing users
is the main goal in the interweave technique. We also point out that the
cognitive radio approaches require different amounts of side information:
underlay systems require knowledge of the interference caused by the
cognitive transmitter to the noncognitive receiver(s), interweave systems
require considerable side information about the noncognitive or existing
user activity (which can be obtained from robust primary user sensing) and
overlay systems require a large amount of side information (non-causal
knowledge of the noncognitive user’s codebook and possibly its message).
Apart from device level power limits, the cognitive user’s transmit power
in the underlay and interweave approaches is decided by the interference
constraint and range of sensing, respectively. While underlay, overlay and
interweave are three distinct approaches to cognitive radio, hybrid schemes
can also be constructed that combine the advantages of different approaches.
For example, the overlay and interweave approaches are combined in [7].
Before launching into capacity results for these three cognitive radio
networks, we will first review capacity results for the interference channel.
Since cognitive radio networks are based on the notion of minimal
interference, the interference channel provides a fundamental building
block to the capacity as well as encoding and decoding strategies for these
networks.
8 Information
and
Coding
Theory in
Computer
Science
Table
1. Comparison
of underlay,
overlay andinterweave
cognitive
radio
tech-
niques.
INTERFERENCE-MITIGATING
COGNITIVE
BEHAVIOR: THE
CONGNITIVE
RADIO
CHANNEL
This
discussion
is
has
beentaken
from
Natasha’s
paper. This
consideration
is
simplest
possible
scenario
in
which a cognitive
radio
could
be employed.
Assume
there exists
a primary
transmitter
and
receiver
pair (S1
— R1),
as
well
asthe
cognitive secondary
transmitter and receiver
pair
(S2
— R2). As
shown in Fig.
1.1, there are
three possibilities for transmitter
cooperation in
these
two point-to-point
channels. We have
chosen
to focus
on transmitter
cooperation
because such cooperation is often more insightful and general
than receiver-side cooperation [12, 13]. Thus assume that each receiver
decodes independently. Transmitter cooperation in this figure is denoted
by a directed double line. These three channels are simple examples of
the cognitive decomposition of wireless networks seen in [14]. The three
possible types of transmitter cooperation in this simplified scenario are:
• Competitive
www.intechopen.com behavior: The two transmitters transmit
independent messages. There is no cooperation in sending the
messages, and thus the two users compete for the channel. This is
the same channel as the 2 sender, 2 receiver interference channel
[14, 15].
• Cognitive behavior: Asymmetric cooperation is possible
between the transmitters. This asymmetric cooperation is a result
of S2 knowing S1’s message, but not vice-versa. As a first step,
we idealize the concept of message knowledge: whenever the
cognitive node S2 is able to hear and decode the message of the
primary node S1, we assume it has full a priori knowledge. This
Information Theory of Cognitive Radio System 9
Yp = X p + aX s + Z p (1)
Ys = bX p + X s + Z s (2)
Information Theory of Cognitive Radio System 11
(3)
where α ∈ [0,1] and its value is determined using the following arguments:
∗
(4)
Now for 0 < a < 1, using Intermediate Value Theorem, this quadratic
equation in a always has a unique root in [0, 1]:
(5)
It is to be noted that the above capacity expressions hold for any b∈R.
For detailed proofs the reader is referred to [8].
A few important points are worth mentioning here. Since the cognitive
radio knows both mp (the message to be transmitted by primary user) and
ms (the message to be transmitted by the cognitive device), it generates its
codeword Xns such that it also incorporates the codeword Xnp to be generated
by the primary user. By doing so, the cognitive device can implement the
concept of dirty paper coding that helps it mitigating the interference caused
by the primary user at the cognitive receiver. Thus the cognitive device
performs superposition coding as follows:
12 Information and Coding Theory in Computer Science
(6)
Where encodes ms and is generated by performing dirty paper
(7)
where N is additive white Gaussian noise (AWGN) at the secondary receiver.
sr is either 1 or 0 as mentioned above. So when it is multiplied as done in
(7), it simply shows whether the secondary receiver has detected the primary
device or not.
If ss is only known to the transmitter and sr is only available to the
receiver, the situation corresponds to communication with partial side
information. Determination of capacity with partial side information
involves input distribution maximization that is difficult to solve. This is not
done and instead a tight capacity upper and lower bound is obtained for this
communication channel. A capacity upperbound is determined by assuming
that the receiver knows both ss and sr whereas, the transmitter only knows
ss. The expression of capacity Css* of secondary user for this case is [23]:
Information Theory of Cognitive Radio System 15
1 d2 λπ R
2
−λ
C =e 2 Rr2 π − arccos dRr 1 − log 1 + Pe r
2 Rr 2
4R
r (8)
where P is the secondary transmitter power constraint. For the capacity
lowerbound, [23] uses the results of [25]. For this a genie g argument is
used. It should be noted that utilization of genie concept represents the
notion that either the sender or receiver is provided with some information
noncausally. To determine the capacity lower bound the genie is supposed
to provide some side information to the receiver.
So if the genie provides some information it must have an associated
entropy rate. The results in [25] suggest that the improvement in capacity
due to this genie information cannot exceed the entropy rate of the genie
information itself. Using this argument and that the genie provides
information to the receiver every T channel uses, it is easy to establish that
the capacity lower bound approaches the upper bound even for very highly
dynamic environments.
It is assumed that the location of primary users in the system follows
Poisson point process with a density of X nodes per unit area. And that the
primary user detection at the secondary transmitter and receiver is perfect.
The capacity expression in (8), as given in [23], is evaluated to be:
1 d2 λπ R
2
C =e − λ 2 Rr2 π − arccos dRr 1 − log 1 + Pe r
2 Rr 2
4R
r (9)
where, again, P is the secondary power constraint.
(11)
where Nz and Ny represent complex AWGN with variance 1/2, Hs is the
fading matrix between the source and destination nodes and Hr is the fading
matrix between the source and relay nodes. In the collaborating phase:
(12)
where Hc is the channel matrix that contains Hs as a submatrix. It is well
known that a Multiple Input Multiple Output (MIMO) system with Gaussian
codebook and rate R bits/channel can reliably communicate over any channel
with transfer matrix H such that where I denotes
the identity matrix and H† represents the conjugate transpose of H. Before
providing an explicit formula for rates, a little explanation is in order here.
Information Theory of Cognitive Radio System 17
During the first phase, relay listens for an amount of time n1 and since
it knows Hr, it results in nR ≤ n1C(Hr). On the other hand, the destination
node receives information at the rate of C(Hs) bits/channel during the first
phase and at the rate of C(Hc) bits/channel during the second phase. Thus it
may reliably decode the message provided that nR ≤ n1C(Hs)+(n-n1)C(Hc).
Taking limit n^><x> the ratio n1/n tends to a fractionf such that the code of
rate R for the set of channels (Hr,Hc) satisfies:
(13)
(14)
where
Similarly [23] presents a corollary which suggests that if the cognitive
user has no message of its own, it can aid the primary system because it
knows the primary’s message, resulting in an improvement of primary
user’s data rates.
New outer bounds to the individual rates and the conditions under which
these bounds become tight for the symmetric Gaussian cognitive radio (CR)
channel in the low interference gain regime are presented in [27]. The CR
transmitter is assumed to use dirty paper coding while deriving the outer
bounds. The capacity of the CR channel in the low interference scenario is
known when the CR employs “polite” approach by devoting some portion
of its power to transmit primary user’s (PU’s) message. However, this
approach does not guarantee any quality of service for the CR users. Hence,
focus will be on the scenario when the CR goes for the “rude” approach,
does not relay PU’s message and tries to maximize its own rates only. It
is shown that when both CR and the PU operate in low interference gain
regime, then treating interference as additive noise at the PU receiver and
doing dirty paper coding at the CR is nearly optimal.
COMPARSIONS
The channel model presented in the first section uses complex coding
techniques to mitigate channel interference that naturally results in higher
throughput than that of the channel model of the second section. But there is
one serious constraint in the first channel model, the information throughput
of the cognitive user is highly dependent upon the distance between the
primary transmitter and the cognitive transmitter. If this distance is large,
secondary transmitter spends considerable time in obtaining the primary
18 Information and Coding Theory in Computer Science
user’s message. After obtaining and decoding the message, the cognitive
device dirty paper codes it and sends it to its receiver. This message transfer
from the primary to the cognitive transmitter results in lower number of bits
transmitted per second by the cognitive radio and hence results in reduced
data rates.
The capacity of the two switch model is independent of the distance
between the two transmitters as the secondary transmitter refrains from
sending data when it finds the primary user busy. Thus the benefit of
the channel interference knowledge in the first channel model quickly
disappears as the distance between the primary and secondary transmitter
tends to increase. Accurate estimation of the primary system’s message and
the interference channel coefficient needed in the first channel model for
dirty paper coding purposes is itself a problem. Inaccurate estimation of
these parameters will result in a decrease in the rates of the cognitive radio
because the dirty paper code, based on the knowledge of primary user’s
message and channel interference coefficient, will not be able to completely
mitigate the primary user’s interference. At the same time determination
of channel interference coefficient requires a handshaking protocol to be
devised which is a serious overhead and may result in poor performance.
On the other hand, the interference avoiding channel cannot overcome
the hidden terminal problem. This problem naturally arises as the cognitive
user would not be able to detect the presence of distant primary devices.
The degraded signals from the primary users due to multipath fading and
shadowing effects would further aggravate this problem.
This requires the secondary systems to be equipped with extremely
sensitive detectors. But very sensitive detectors have prohibited long sensing
times. Thus a protocol needs to be devised by virtue of which the sensed
information is shared between the cognitive devices.
Finally, the role of CR as a relay in the third channel model restricts the
real usefulness of the concept of dynamic spectrum utilization. Although
limited, yet significant gains in data rates of the existing licensed user system
can be obtained by restricting a CR device to relays of PU’s message only.
Information Theory of Cognitive Radio System 19
REFERENCES
1. Federal Communications Commission Spectrum Policy Task Force,
“Report of the Spectrum Efficiency Working Group,” Technical Report
02-135, no. November, 2002.
2. Shared Spectrum Company, “Comprehensive Spectrum occupancy
measurements over six different locations,”August 2005.
3. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” IEEE Transactions on Information Theory, vol. 52,
no. 5, pp. 1813-1827, May 2006.
4. T. Han and K. Kobayashi, “A new achievable rate region for the
interference channel,” IEEE Transactions on Information Theory, vol.
27, no. 1, pp. 49-60,1981.
5. S. I. Gel’fand and M. S. Pinsker, “Coding for channel with random
parameters,” Problems of Control and Information Theory, vol. 9, no.
1, pp. 19-31, 1980.
6. F. G. Awan and M. F. Hanif, “A Unified View of Information-Theoretic
Aspects of Cognitive Radio,” in Proc. International Conference on
Information Technology: New Generations, pp. 327-331, April 2008.
7. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive
radio - a theoretical perspective,” IEEE Communications Magazine,
vol. 45, no. 5, pp. 73-79,2007.
8. Jovicic and P. Viswanath, “Cognitive radio: An information theoretic
perspective,” 2006 IEEE International Symposium on Information
Theory, July 2006.
9. M. H. M. Costa, “Writing on dirty paper,” IEEE Transactions on
Information Theory, vol. 29, no. 3, pp. 439-441, May 1983.
10. Joseph Mitola, “Cognitive Radio: An Integrated Agent Architecture
for Software Defined Radio,” PhD Dissertation, KTH, Stockholm,
Sweden, December 2000.
11. Paul J. Kolodzy, “Cognitive Radio Fundamentals,” SDR Forum,
Singapore, April 2005.
12. C.T.K.Ng and A. Goldsmith, “Capacity gain from transmitter and
receiver cooperation,” in Proc. IEEE Int. Symp. Inf. Theory, Sept.
2005.
13. N. Devroye, P. Mitran, and V. Tarokh, “Cognitive decomposition of
wireless networks,” in Proceedings of CROWNCOM, Mar. 2006.
20 Information and Coding Theory in Computer Science
14. Carleial, “Interference channels,” IEEE Trans. Inf. Theory, vol. IT-24,
no. 1, pp. 60-70, Jan. 1978.
15. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” IEEE Trans. Inf. Theory, vol. 52, no. 5, pp. 1813-
1827, May 2006.
16. W. Wu, S. Vishwanath, and A. Arapostathis, “On the capacity of the
interference channel with degraded message sets,” IEEE Trans. Inf.
Theory, June 2006.
17. H. Weingarten, Y. Steinberg, and S. Shamai, “The capacity region of
the Gaussian MIMO broadcast channel,” IEEE Trans. Inf. Theory, vol.
52, no. 9, pp. 3936-3964, Sept. 2006.
18. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
networks,” in 2005 IEEE International Symposium on Information
Theory, Sept. 2005.
19. E. C. van der Meulen, “Three-terminal communication channels,”
Adv. Appl. Prob., vol. 3, pp. 120-154, 1971.
20. I.Maric, R. Yates, and G. Kramer, “The strong interference channel with
unidirectional cooperation,” in Information Theory and Applications
ITA Inaugural Workshop, Feb. 2006.
21. N. Devroye, P. Mitran, and V. Tarokh, “Achievable rates in cognitive
radio channels,” in 39th Annual Conf. on Information Sciences and
Systems (CISS), Mar. 2005.
22. S. Jafar and S. Srinivasa, “Capacity limits of cognitive radio with
distributed dynamic spectral activity,” in Proc. of ICC, June 2006.
23. S. Srinivasa and S. A. Jafar, “The throughput potential of cognitive
radio: A theoretical perspective,” in Fortieth Asilomar Conference on
Signals, Systems and Computers, 2006., Oct. 2006.
24. P. Cheng, G. Yu, Z. Zhang, H.-H. Chen, and P. Qiu, “On the achievable
rate region of gaussian cognitive multiple access channel,” IEEE
Communications Letters, vol. 11, no.5, pp. 384-386, May. 2007.
25. S. A. Jafar, “Capacity with causal and non-causal side information-a
unified view,” IEEE Transactions on Information Theory, vol. 52, no.
12, pp. 5468-5474, Dec. 2006.
26. P. Mitran, H. Ochiai, and V. Tarokh, “Space-time diversity
enhancements using collaborative communication,” IEEE Transactions
on Information Theory, vol. 51, no.6, pp. 2041-2057, June 2005.
Information Theory of Cognitive Radio System 21
INFORMATION THEORY
AND ENTROPIES FOR
QUANTIZED OPTICAL
2
WAVES IN COMPLEX
TIME-VARYING MEDIA
INTRODUCTION
An important physical intuition that led to the Copenhagen interpretation of
quantum mechanics is the Heisenberg uncertainty relation (HUR) which is a
consequence of the noncommutativity between two conjugate observables.
Our ability of observation is intrinsically limited by the HUR, quantifying
an amount of inevitable and uncontrollable disturbance on measurements
(Ozawa, 2004).
Citation: Jeong Ryeol Choi, “Information Theory and Entropies for Quantized Optical
Waves in Complex Time-Varying Media”, Open Systems, Entanglement and Quantum
Optics, 2013, http://dx.doi.org/10.5772/56529.
Copyright: © 2013 The Author(s) and IntechOpen. This chapter is distributed under
the terms of the Creative Commons Attribution 3.0 License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is
properly cited.
24 Information and Coding Theory in Computer Science
Though the HUR is one of the most fundamental results of the whole
quantum mechanics, some drawbacks concerning its quantitative formulation
are reported. As the expectation value of the commutator between two
arbitrary noncommuting operators, the value of the HUR is not a fixed lower
bound and varies depending on quantum state (Deutsch, 1983). Moreover,
in some cases, the ordinary measure of uncertainty, i.e., the variance of
canonical variables, based on the Heisenberg-type formulation is divergent
(Abe et al., 2002).
These shortcommings are highly nontrivial issues in the context of
information sciences. Thereby, the theory of informational entropy is
proposed as an alternate optimal measure of uncertainty. The adequacy of
the entropic uncertainty relations (EURs) as an uncertainty measure is owing
to the fact that they only regard the probabilities of the different outcomes
of a measurement, whereas the HUR the variances of the measured values
themselves (Werner, 2004). According to Khinchin’s axioms (Ash, 1990) for
the requirements of common information measures, information measures
should be dependent exclusively on a probability distribution (Pennini
& Plastino, 2007). Thank to active research and technological progress
associated with quantum information theory (Nielsen & Chuang, 2000;
Choi et al., 2011), the entropic uncertainty band now became a new concept
in quantum physics.
Information theory proposed by Shannon (Shannon, 1948a; Shannon,
1948b) is important as information-theoretic uncertainty measures in
quantum physics but even in other areas such as signal and/or image
processing. Essential unity of overall statistical information for a system
can be demonstrated from the Shannon information, enabling us to know
how information could be quantified with absolute precision. Another good
measure of uncertainty or randomness is the Fisher information (Fisher,
1925) which appears as the basic ingredient in bounding entropy production.
The Fisher information is a measure of accuracy in statistical theory and
useful to estimate ultimate limits of quantum measurements.
Recently, quantum information theory besides the fundamental quantum
optics has aroused great interest due to its potential applicability in three
sub-regions which are quantum computation, quantum communication,
and quantum cryptography. Information theory has contributed to the
development of the modern quantum computation (Nielsen & Chuang, 2000)
and became a cornerstone in quantum mechanics. A remarkable ability of
Information Theory and Entropies for Quantized Optical Waves in ... 25
quantum computers is that they can carry out certain computational tasks
exponentially faster than classical computers utilizing the entanglement and
superposition principle.
Stimulated by these recent trends, this chapter is devoted to the study of
information theory for optical waves in complex time-varying media with
emphasis on the quantal information measures and informational entropies.
Information theoretic uncertainty relations and the information measures
of Shannon and Fisher will be managed. The EUR of the system will also
be treated, quantifying its physically allowed minimum value using the
invariant operator theory established by Lewis and Riesenfeld (Lewis, 1967;
Lewis & Riesenfeld, 1969). Invariant operator theory is crucial for studying
quantum properties of complicated time-varying systems, since it, in general,
gives exact quantum solutions for a system described by time-dependent
Hamiltonian so far as its counterpart classical solutions are known.
(1)
Then, considering the fact that the fields and current density obey the
relations, D = ϵ(t)E, B = µ(t)H, and J = σ(t)E, in linear media, we derive
26 Information and Coding Theory in Computer Science
(2)
Here, the angular frequency (natural frequency) is given by ωl(t) =
c(t)kl where c(t) is the speed of light in media and kl (= |kl|) is the wave
number. Because electromagnetic parameters vary with time, c(t) can be
represented as a time-dependent form, i.e., . However, kl (=
|kl|) is constant since it does not affected by time-variance of the parameters.
The formula of mode function ul (r) depends on the geometrical boundary
condition in media (Choi & Yeon, 2005). For example, it is given by
(ν = 1, 2) for the fields propagating under
the periodic boundary condition, where V is the volume of the space, is
a unit vector in the direction of polarization designated by ν.
From Hamilton’s equations of motion, and
, the classical Hamiltonian that gives Eq. (3) can be easily
established. Then, by converting canonical variables, ql and pl, into quantum
operators, and , from the resultant classical Hamiltonian, we have the
quantum Hamiltonian such that (Choi et al., 2012)
(3)
where is an arbitrary time function, and
(4)
(5)
The complete Hamiltonian is obtained by summing all individual
Hamiltonians:
From now on, let us treat the wave of a particular mode and drop the
under subscript l for convenience. It is well known that quantum problems
of optical waves in nonstationary media are described in terms of classical
solutions of the system. Some researchers use real classical solutions (Choi,
2012; Pedrosa & Rosas, 2009) and others imaginary solutions (Angelow &
Trifonov, 2010; Malkin et al., 1970). In this chapter, real solutions of classical
Information Theory and Entropies for Quantized Optical Waves in ... 27
equation of motion for q will be considered. Since Eq. (2) is a second order
differential equation, there are two linearly independent classical solutions.
Let us denote them as s1(t) and s2(t), respectively. Then, we can define an
Wronskian of the form
(6)
This will be used at later time, considering only the case that Ω > 0 for
convenience.
When we study quantum problem of a system that is described by a
time-dependent Hamiltonian such as Eq. (3), it is very convenient to
introduce an invariant operator of the system. Such idea (invariant operator
method) is firstly devised by Lewis and Riesenfeld (Lewis, 1967; Lewis &
Riesenfeld, 1969) in a time-dependent harmonic oscillator as mentioned in
the introductory part and now became one of potential tools for investigating
quantum characteristics of time-dependent Hamiltonian systems. By solving
the Liouville-von Neumann equation of the form
(7)
we obtain the invariant operator of the system as (Choi, 2004)
(8)
where Ω is chosen to be positive from Eq. (6) and and are annihilation
and creation operators, respectively, that are given by
(9)
(10)
with
(11)
Since the system is somewhat complicate, let us develop our theory with
b(t) = 0 from now on. Then, Eq. (5) just reduces to . Since the
formula of Eq. (8) is very similar to the familiar Hamiltonian of the simple
harmonic oscillator, we can solve its eigenvalue equation via well known
28 Information and Coding Theory in Computer Science
(12)
where and Hn are Hermite polynomials.
According to the theory of Lewis-Riesenfeld invariant, the wave
functions that satisfy the Schrödinger equation are given in terms of ϕn(q, t):
(13)
where θn(t) are time-dependent phases of the wave functions. By substituting
Eq. (13) with Eq. (3) into the Schrödinger equation, we derive the phases to
be θn(t) = − (n + 1/2) η(t) where (Choi, 2012)
(14)
The probability densities in both q and p spaces are given by the
square of wave functions, i.e., ρn(q) = |ψn(q, t)|2 and ,
respectively. From Eq. (13) and its Fourier component, we see that
(15)
(16)
where
(17)
The wave functions and the probability densities derived here will be
used in subsequent sections in order to develop the information theory of
the system.
Information Theory and Entropies for Quantized Optical Waves in ... 29
(18)
Considering the fact that invariant operator given in Eq. (8) is also
established via the Liouville-von Neumann equation, we can easily construct
density operator as a function of the invariant operator. This implies that the
Hamiltonian in the density operator of the simple harmonic oscillator
should be replaced by a function of the invariant operator where
is inserted for the purpose of dimensional
consideration. Thus we have the density operator in the form
(19)
where W = y(0)Ω, Z is a partition function, β = kbT, and kb is Boltzmann’s
constant. If we consider Fock state expression, the above equation can be
expand to be
(20)
while the partition function becomes
30 Information and Coding Theory in Computer Science
(21)
If we consider that the coherent state is the most classical-like quantum
state, a semiclassical distribution function associated with the coherent state
may be useful for the description of information measures. As is well known,
the coherent state is obtained by solving the eigenvalue equation of :
(22)
Now we introduce the semiclassical distribution function related
with the density operator via the equation (Anderson & Halliwell, 1993)
(23)
This is sometimes referred to as the Husimi distribution function (Husimi,
1940) and appears frequently in the study relevant to the Wigner distribution
function for thermalized quantum systems. The Wigner distribution function
is regarded as a qusi-distribution function because some parts of it are not
positive but negative. In spite of the ambiguity in interpreting this negative
value as a distribution function, the Wigner distribution function meets
all requirements of both quantum and statistical mechanics, i.e., it gives
correct expectation values of quantum mechanical observables. In fact,
the Husimi distribution function can also be constructed from the Wigner
distribution function through a mathematical procedure known as “Gaussian
smearing” (Anderson & Halliwell, 1993). Since this smearing washes out
the negative part, the negativity problem is resolved by the Hisimi’s work.
But it is interesting to note that new drawbacks are raised in that case, which
state that the probabilities of different samplings of q and p, relevant to the
Husimi distribution function, cannot be represented by mutually exclusive
states due to the lack of orthogonality of coherent states used for example
in Eq. (23) (Anderson & Halliwell, 1993; Nasiri, 2005). This weak point is
however almost fully negligible in the actual study of the system, allowing
us to use the Husimi distribution function as a powerful means in the realm
of semiclassical statistical physics.
Notice that coherent state can be rewritten in the form
(24)
Information Theory and Entropies for Quantized Optical Waves in ... 31
(25)
Hence, the coherent state is expanded in terms of Fock state wave
functions. Now using Eqs. (20) and (25), we can evaluate Eq. (23) to be
(26)
Here, we used a well known relation in photon statistics, which is
(27)
As you can see, the Husimi distribution function is strongly related to
the coherent state and it provides necessary concepts for establishment of
both the Shannon and the Fisher informations. If we consider Eqs. (9) and
(22), α (with b(t) = 0) can be written as
(28)
Hence there are innumerable number of α-samples that correspond to
different pair of (q,p), which need to be examined for measurement.
A natural measure of uncertainty in information theory is the Shannon
information as mentioned earlier. The Shannon information is defined as
(Anderson & Halliwell, 1993)
(29)
where d α = dRe(α) dIm(α). With the use of Eq. (26), we easily derive it:
2
(30)
This is independent of time and, in the limiting case of fields propagating
in time-independent media that have no conductivity, W becomes natural
32 Information and Coding Theory in Computer Science
(31)
In fact, there are many different scenarios of this information depending
on the choice of x. For a more general definition of the Fisher information,
you can refer to Ref. (Pennini & Plastino, 2004).
If we take and x = β, the Fisher’s information measure
can be written as (Pennini & Plastino, 2004)
(32)
Since β is the parameter to be estimated here, Iβ reflects the change of
according to the variation of temperature. A straightforward calculation
yields
(33)
This is independent of time and of course agree, in the limit of the simple
harmonic oscillator case, to the well-known formula of Pennini and Plastino
(Pennini & Plastino, 2004). Hence, the change of electromagnetic param-
Information Theory and Entropies for Quantized Optical Waves in ... 33
eters with time does not affect to the value of β. IF,β reduces to zero at abso-
lute zero-temperature (T → 0), leading to agreement with the third law of
thermodynamics (Pennini & Plastino, 2007).
Another typical form of the Fisher informations worth to be concerned
is the one obtained with the choice of and x = {q, p}
(Pennini et al, 1998):
(34)
where σqq,α and σpp,α are variances of q and p in the Glauber coherent state,
respectively. Notice that σqq,α and σpp,α are inserted here in order to consider
the weight of two independent terms in Eq. (34). As you can see, this
information is jointly determined by means of canonical variables q and p.
To evaluate this, we need
(35)
(36)
It may be favorable to represent in terms of at this
stage. They are easily obtained form the inverse representation of Eqs. (9)
and (10) to be
(37)
(38)
Thus with the use of these, Eqs. (35) and (36) become
(39)
(40)
A little evaluation after substituting these quantities into Eq. (34) leads
to
(41)
34 Information and Coding Theory in Computer Science
Notice that this varies depending on time. In case that the time dependence
of every electromagnetic parameters vanishes and σ → 0, this reduces to
that of the simple harmonic oscillator limit, where
natural frequency ω is constant, which exactly agrees with the result of
Pennini and Plastino (Pennini & Plastino, 2004).
(42)
(43)
(44)
where with an arbitrary operator is the expectation value
relevant to the Husimi distribution function and can be evaluated from
(45)
While and for l = 1, the rigorous algebra for higher
orders give
(46)
(47)
(48)
where
Information Theory and Entropies for Quantized Optical Waves in ... 35
(49)
Thus we readily have
(50)
(51)
Like other types of uncertainties in physics, the relationship between
σµ,qq and σµ,pp is rather unique, i.e., if one of them become large the other
become small, and there is nothing whatever one can do about it.
We can represent the uncertainty product σµ and the generalized
uncertainty product Σµ in the form
(52)
(53)
Through the use of Eqs. (50) and (51), we get
(54)
(55)
Notice that σµ varies depending on time, while Σµ does not and
is more simple form. The relationship between σµ and usual thermal
uncertainty relations σ obtained using the method of thermofield
dynamics (Choi, 2010b; Leplae et al., 1974) are given by σµ = r(β)σ where
.
spread when the curve of distribution involves only a simple hump such as
the case of Gaussian type. However, the HUR is known to be inadequate
when the distribution of the statistical data is somewhat complicated or
reveals two or more humps (Bialynicki-Birula, 1984; Majernik & Richterek,
1997).
For this reason, the EUR is suggested as an alternative to the HUR by
Bialynicki-Birula and Mycielski (Biatynicki-Birula & Mycielski, 1975).
To study the EUR, we start from entropies of q and p associated with the
Shannon’s information theory:
(56)
(57)
By executing some algebra after inserting Eqs. (15) and (16) into the
above equations, we get
(58)
(59)
where E(Hn) are entropies of Hermite polynomials of the form (Dehesa et
al, 2001)
(60)
By adding Eqs. (58) and (59) together,
(61)
we obtain the alternative uncertainty relation, so-called the EUR such that
(62)
Information Theory and Entropies for Quantized Optical Waves in ... 37
(63)
where µ0[= µ(0)], h, and ω0 are real constants and |h| ≪ 1. Then, the classical
solutions of Eq. (2) are given by
(64)
(65)
where s0 is a real constant, Ceν and Seν are Mathieu functions of the cosine
and the sine elliptics, respectively, and . Figure 1 is
information measures for this system, plotted as a function of time. Whereas
IS and IF,β do not vary with time, IF,{q,p} oscillates as time goes by.
38 Information and Coding Theory in Computer Science
Figure 1. The time evolution of IF,{q,p}. The values of (k, h) used here are (1, 0.1)
for solid red line, (3, 0.1) for long dashed blue line, and (3, 0.2) for short dashed
green line. Other parameters are taken to be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5,
and s0 = 1.
In case of h → 0, the natural frequency in Eq. (2) become constant and
W → ω. Then, Eqs. (64) and (65) become s1 = s0 cos ωt and s2 = s0 sin ωt,
respectively. We can confirm in this situation that our principal results, Eqs.
(30), (33), (41), (54), and (62) reduce to those of the wave described by the
simple harmonic oscillator as expected.
Figure 2. The Uncertainty product σµ (thick solid red line) together with σµ,qq
(long dashed blue line) and σµ,pp (short dashed green line). The values of (k, h)
used here are (1, 0.1) for (a) and (3, 0.1) for (b). Other parameters are taken to
be ϵ0 = 1, µ0 = 1, β = 1, ħ = 1, ω0 = 5, and s0 = 1.
Figure 3. The EUR UE (thick solid red line) together with S(ρn) (long dashed
blue line) and (short dashed green line). The values of (k, h) used here are
(1, 0.1) for (a), (3, 0.1) for (b), and (3, 0.2) for (c). Other parameters are taken
to be ϵ0 = 1, µ0 = 1, ħ = 1, ω0 = 5, n = 0, and s0 = 1.
Two kinds of uncertainty products relevant to the Husimi distribution
function are considered: one is the usual uncertainty product σµ and the other
Information Theory and Entropies for Quantized Optical Waves in ... 41
REFERENCES
1. Abe, S.; Martinez, S.; Pennini, F. & Plastino, A. (2002). The EUR for
power-law wave packets. Phys. Lett. A, Vol. 295, Nos. 2-3, pp. 74-77.
2. Anderson, A. & Halliwell, J. J. (1993). Information-theoretic measure
of uncertainty due to quantum and thermal fluctuations. Phys. Rev. D,
Vol. 48, No. 6, pp. 2753-2765.
3. Angelow, A. K. & Trifonov, D. A. (2010). Dynamical invariants and
Robertson-Schrödinger correlated states of electromagnetic field in
nonstationary linear media. arXiv:quant-ph/1009.5593v1.
4. Ash, R. B. (1990). Information Theory. Dover Publications, New York,
USA.
5. Bialynicki-Birula, I. (1984). Entropic uncertainty relations in quantum
mechanics. In: L. Accardi and W. von Waldenfels (Editors), Quantum
Probability and Applications, Lecture Notes in Mathematics 1136,
Springer, Berlin.
6. Biatynicki-Birula, I. & Mycielski, J. (1975). Uncertainty relations for
information entropy in wave mechanics. Commun. Math. Phys. Vol.
44, No. 2, pp. 129-132.
7. Bohr, N. (1928). Como Lectures. In: J. Kalckar (Editor), Niels Bohr
Collected Works, Vol. 6, North-Holand Publishing, Amsterdam, 1985.
8. Choi, J. R. (2004). Coherent states of general time-dependent harmonic
oscillator. Pramana-J. Phys., Vol. 62, No. 1, pp. 13-29.
9. Choi, J. R. (2010a). Interpreting quantum states of electromagnetic
field in time-dependent linear media. Phys. Rev. A, Vol. 82, No. 5, pp.
055803(1-4).
10. Choi, J. R. (2010b). Thermofield dynamics for two-dimensional
dissipative mesoscopic circuit coupled to a power source. J. Phys. Soc.
Japan, Vol. 79, No. 4, pp. 044402(1-6).
11. Choi, J, R, (2012). Nonclassical properties of superpositions of coherent
and squeezed states for electromagnetic fields in time-varying media.
In: S. Lyagushyn (Editor), Quantum Optics and Laser Experiments, pp.
25-48, InTech, Rijeka.
12. Choi, J. R.; Kim, M.-S.; Kim, D.; Maamache, M.; Menouar, S. &
Nahm, I. H. (2011). Information theories for time-dependent harmonic
oscillator. Ann. Phys.(N.Y.), Vol. 326, No. 6, pp. 1381-1393.
Information Theory and Entropies for Quantized Optical Waves in ... 43
SOME INEQUALITIES IN
INFORMATION THEORY
USING TSALLIS ENTROPY
3
ABSTRACT
We present a relation between Tsallis’s entropy and generalized Kerridge
inaccuracy which is called generalized Shannon inequality and is well-
known generalization in information theory and then give its application in
coding theory. The objective of the paper is to establish a result on noiseless
coding theorem for the proposed mean code length in terms of generalized
information measure of order ξ.
INTRODUCTION
Throughout the paper N denotes the set of the natural numbers and for n ∈
N we set
(1)
where n = 2, 3, 4, . . . denote the set of all n-components complete and
incomplete discrete probability distributions.
For we define a
nonadditive measure of inaccuracy, denoted by as
(2)
If then reduces to nonadditive entropy.
(3)
Entropy (3) was first of all characterized by Havrda and Charvát [1].
Later on, Daróczy [2] and Behara and Nath [3] studied this entropy. Vajda
[4] also characterized this entropy for finite discrete generalized probability
distributions. Sharma and Mittal [5] generalized this measure which is known
as the entropy of order α and type β. Pereira and Gur Dial [6] and Gur Dial
Some Inequalities in Information Theory Using Tsallis Entropy 47
[7] also studied Sharma and Mittal entropy for a generalization of Shannon
inequality and gave its applications in coding theory. Kumar and Choudhary
[8] also gave its application in coding theory. Recently, Wondie and Kumar
[9] gave a Joint Representation of Renyi’s and Tsallis Entropy. Tsallis [10]
gave its applications in physics for and reduces
to Shannon [11] entropy
(4)
Inequality (6) has been generalized in the case of Renyi’s entropy.
(5)
Equality holds if and only if A = B. In other words, Shannon’s entropy is
the minimum value of Kerridge’s inaccuracy. If then (5)
is no longer necessarily true. Also, the corresponding inequality
(6)
is not necessarily true even for generalized probability distributions. Hence,
it is natural to ask the following question: “For generalized probability
distributions, what are the quantity the minimum values of which are
?” We give below an answer to the above question separately for
by dividing the discussion into two parts (i) and (ii)
Also we shall assume that because the problem is trivial for n = 1.
Case 1. Let . If then as remarked earlier (5) is
true. For it can be easily seen by using Jenson’s inequality
that (5) is true if equality in (5) holding if and only if
(7)
48 Information and Coding Theory in Computer Science
(8)
such that and equality holds if and only if
Since by reverse Hölder inequality, that is, if
and are positive real numbers, then
(9)
Let and
Putting these values into (9), we get
(10)
where we used (8), too. This implies however that
(11)
Or
(12)
using (12) and the fact that ξ >1,, we get (6).
Particular’s Case. If ξ =1, then (6) becomes
(13)
which is Kerridge’s inaccuracy [12].
Some Inequalities in Information Theory Using Tsallis Entropy 49
(14)
Let a finite set of n input symbols
(15)
be encoded using alphabet of D symbols, then it has been shown by Feinstein
[13] that there is a uniquely decipherable code with lengths
if and only if the Kraft inequality holds; that is,
(16)
where D is the size of code alphabet.
Furthermore, if
(17)
is the average codeword length, then for a code satisfying (16), the inequality
(18)
is also fulfilled and equality holds if and only if
(19)
and by suitable encoded into words of long sequences, the average length
can be made arbitrarily close to H(A), (see Feinstein [13]). This is Shannon’s
noiseless coding theorem.
By considering Renyi’s entropy (see, e.g., [14]), a coding theorem,
analogous to the above noiseless coding theorem, has been established by
Campbell [15] and the authors obtained bounds for it in terms of
(20)
Kieffer [16] defined a class rules and showed that Hξ(A) is the best
decision rule for deciding which of the two sources can be coded with
expected cost of sequences of length N when N →∞, where the cost of
50 Information and Coding Theory in Computer Science
(21)
where and are positive
integers so that
(22)
Since (22) reduces to Kraft inequality when ξ = 1, therefore it is called
generalized Kraft inequality and codes obtained under this generalized
inequality are called personal codes.
Some Inequalities in Information Theory Using Tsallis Entropy 51
(23)
holds under condition (22) and equality holds if and only if
(24)
where and are given by (3) and (21), respectively.
Proof. First of all we shall prove the lower bound of .
By reverse Hölder inequality, that is, if and
are positive real numbers then
(25)
Let and
Putting these values into (25), we get
(26)
where we used (22), too. This implies however that
(27)
52 Information and Coding Theory in Computer Science
(28)
using (28) and the fact that ξ >1, we get
(29)
From (24) and after simplification, we get
(30)
This implies
(31)
which gives . Then equality sign holds in (29).
Now we will prove inequality (23) for upper bound of .
We choose the codeword lengths in such a way that
(32)
is fulfilled for all
From the left inequality of (32), we have
(33)
multiplying both sides by and then taking sum over k, we get the
generalized inequality (22). So there exists a generalized code with code
lengths
Some Inequalities in Information Theory Using Tsallis Entropy 53
(34)
(35)
Since for ξ >1, we get from (35) inequality (23).
Particular’s Cases. For ξ →1, then (23) becomes
(36)
which is the Shannon [11] classical noiseless coding theorem
(37)
where H(A) and L are given by (4) and (17), respectively.
CONCLUSION
In this paper we prove a generalization of Shannon’s inequality for the case
of entropy of order ξ with the help of Hölder inequality. Noiseless coding
theorem is proved. Considering Theorem 2 we remark that the optimal code
length depends on ξ in contrast with the optimal code lengths of Shannon
which do not depend of a parameter. However, it is possible to prove coding
theorem with respect to (3) such that the optimal code lengths are identical
to those of Shannon.
54 Information and Coding Theory in Computer Science
REFERENCES
1. J. Havrda and F. S. Charvát, “Quantification method of classification
processes. Concept of structural α-entropy,” Kybernetika, vol. 3, pp.
30–35, 1967.
2. Z. Daróczy, “Generalized information functions,” Information and
Control, vol. 16, no. 1, pp. 36–51, 1970.
3. M. Behara and P. Nath, “Additive and non-additive entropies of finite
measurable partitions,” in Probability and Information Theory II, pp.
102–138, Springer-Verlag, 1970.
4. I. Vajda, “Axioms for α-entropy of a generalized probability scheme,”
Kybernetika, vol. 4, pp. 105–112, 1968.
5. B. D. Sharma and D. P. Mittal, “New nonadditive measures of entropy
for discrete probability distributions,” Journal of Mathematical
Sciences, vol. 10, pp. 28–40, 1975.
6. R. Pereira and Gur Dial, “Pseudogeneralization of Shannon inequality
for Mittal’s entropy and its application in coding theory,” Kybernetika,
vol. 20, no. 1, pp. 73–77, 1984.
7. Gur Dial, “On a coding theorems connected with entropy of order α
and type β,” Information Sciences, vol. 30, no. 1, pp. 55–65, 1983.
8. S. Kumar and A. Choudhary, “Some coding theorems on generalized
Havrda-Charvat and Tsallis’s entropy,” Tamkang Journal of
Mathematics, vol. 43, no. 3, pp. 437–444, 2012.
9. L. Wondie and S. Kumar, “A joint representation of Renyi’s and
Tsalli’s entropy with application in coding theory,” International
Journal of Mathematics and Mathematical Sciences, vol. 2017, Article
ID 2683293, 5 pages, 2017.
10. C. Tsallis, “Possible generalization of Boltzmann-Gibbs statistics,”
Journal of Statistical Physics, vol. 52, no. 1-2, pp. 479–487, 1988.
11. C. E. Shannon, “A mathematical theory of communication,” Bell
System Technical Journal, vol. 27, no. 4, pp. 623–656, 1948.
12. D. F. Kerridge, “Inaccuracy and inference,” Journal of the Royal
Statistical Society. Series B (Methodological), vol. 23, pp. 184–194,
1961.
13. A. Feinstein, Foundations of Information Theory, McGraw-Hill, New
York, NY, USA, 1956.
Some Inequalities in Information Theory Using Tsallis Entropy 55
THE COMPUTATIONAL
THEORY OF INTELLIGENCE:
INFORMATION ENTROPY
4
Daniel Kovach
Kovach Technologies, San Jose, CA, USA
ABSTRACT
This paper presents an information theoretic approach to the concept
of intelligence in the computational sense. We introduce a probabilistic
framework from which computation alintelligence is shown to be an entropy
minimizing process at the local level. Using this new scheme, we develop a
simple data driven clustering example and discuss its applications.
INTRODUCTION
This paper attempts to introduce a computational approach to the study
of intelligence that the researcher has accumulated for many years. This
approach takes into account data from Psychology, Neurology, Artificial
Intelligence, Machine Learning, and Mathematics.
Central to this framework is the fact that the goal of any intelligent
agent is to reduce the randomness in its environment in some meaningful
ways. Of course, formal definitions in the context of this paper for terms like
“intelligence”, “environment”, and “agent” will follow.
The approach draws from multidisciplinary research and has many
applications. We will utilize the construct in discussions at the end of the
paper. Other applications will follow in future works. Implementations
of this framework can apply to many fields of study including General
Artificial Intelligence (GAI), Machine Learning, Optimization, Information
Gathering, Clustering, and Big Data, and extend outside of the applied
mathematics and computer science realm to even more areas including
Sociology, Psychology, and Neurology, and even Philosophy.
Definitions
One cannot begin a discussion about the philosophy of artificial intelligence
without a definition of the word “intelligence” in the first place. With the
panoply of definitions available, it is understandable that there may be
some disagreement, but typically each school of thought generally shares a
common thread. The following are three different definitions of intelligence
from respectable sources:
• “The aggregate or global capacity of the individual to act
purposefully, to think rationally, and to deal effectively with his
environment.”[1] .
• “A process that entails a set of skills of problem solving enabling
the individual to resolve genuine problems or difficulties that he
or she encounters and, when appropriate, to create an effective
product and must also entail the potential for finding or creating
problems and thereby providing the foundation for the acquisition
of new knowledge.” [2] .
• “Goal-directed adaptive behavior.” [3] .
Vernon’s hierarchical model of intelligence from the 1950’s [1] , and
Hawkins’ On Intelligence from g 2004 [4] are some other great resources
The Computational Theory of Intelligence: Information Entropy 59
on this topic. Consider the following working definition of this paper, with
regard to information theory and computation: Computational Intelligence
(CI) is an information processing algorithm that
• Records data or events into some type of store, or memory.
• Draws from the events recorded in memory, to make assertions,
or predictions about future events.
• Using the disparity between the predicted and events and the new
incoming events, the memory structure in step 1 can be updated
such that the predictions of step 2 are optimized.
The mapping in 3 is called learning, and is endemic to CI. Any entity
that is facilitating the CI process we will refer to as an agent, in particular
when the connotation is that the entity is autonomous. The surrounding
infrastructure that encapsulates the agent together with the ensuing events is
called the environment.
Structure
The paper is organized as follows. In Section 2 we provide a brief summary
of the concept of information entropy as it is used for our purposes. In
Section 3, we provide a mathematical framework for intelligence and show
discuss its relation to entropy. Section 4 discusses the global ramifications
of local entropy minimization. In Section 5 we present a simple application
of the framework to data analytics, which is available for free download.
Sections 6 and 7 discuss relevant related research, and future work.
ENTROPY
A key concept of information theory is that of entropy, which amounts to
the uncertainty in a given random variable, [5] . It is essentially, a measure
of unpredictability (among other interpretations). The concept of entropy
is a much deeper principal of nature that penetrates to the deepest core of
physical reality and is central to physics and cosmological models [6] -[8] .
Mathematical Representation
Although terms like Shannon entropy are pervasive in the field of information
theory, it will be insightful to review the formulation in our context. To arrive
at the definition of entropy, we must first recall what is meant by information
60 Information and Coding Theory in Computer Science
(1)
where ℙ[X] is the probability of X. The entropy of X, denoted 𝔼[X], is then
defined as the expectation value of the information content.
(2)
Expanding using the definition of the expectation value, we have
(3)
(4)
where pi denote the probability of each microstate, or configuration of the
system, and kb is the Boltzmann constant which serves to map the value of
the summation to physical reality in terms of quantity and units.
The connection between the thermodynamic and information theoretic
versions of entropy relate to the information needed to detail the exact state
of the system, specifically, the amount of further Shannon information
needed to define the microscopic state of the system that remains ambiguous
when given its macroscopic definition in terms of the variables of Classical
Thermodynamics. The Boltzmann constant serves as a constant of
proportionality.
The Computational Theory of Intelligence: Information Entropy 61
Renyi Entropy
We can extend the logic of the beginning of this section to a more general
formulation called the Renyi entropy of order α, where α ≥ 0 and α ≠ 1
defined as
(5)
Under this convention we can apply the concept of entropy more
generally to extend the utility of the concept to a variety of applications. It
is important to note that this formulation approaches 1 in the limit as α →1.
Although the discussions of this paper were inspired by Shannon entropy,
we wish to present a much more general definition and a bolder proposition.
(6)
reflect the fact that 𝕀 is mapping one element from S to one element in O,
each tagged by the identifier i ∈ ℕ, which is bounded by the cardinality of
the input set. The cardinality of these two sets need not match, nor does
the mapping between 𝕀 need to be bijective, or even surjective. This is an
iterative process, as denoted by the index, t. Finally, let Ot represent the
collection of .
Over time, the mapping should converge to the intended element, oi ∈ O,
as is reflected in notation by
(7)
Introduce the function
(8)
which in practice is usually some type of error or fitness measuring
function. Assuming that 𝕃t is continuous and differentiable, let the updating
mechanism at some particular t for 𝕀 be
62 Information and Coding Theory in Computer Science
(9)
In other words, 𝕀 iteratively updates itself with respect to the gradient
of some function, 𝕃. Additionally, 𝕃 must satisfy the following partial
differential equation
(10)
where the function d is some measure of the distance between O and
Ot, assuming such a function exists, and ρ is called the learning rate. A
generalization of this process to abstract topological spaces where such a
distance function is a commodity is an open question.
Finally, for this to qualify as an intelligence process, we must have
(11)
Overtraining
Further, we should note that just because we have some function : S →
O satisfying the definitions and assumptions of this section does not mean
that this mapping be necessarily meaningful. After all, we could make a
completely arbitrary but consistent mapping via the prescription above, and
although this would satisfy all the definitions and assumptions, it would be
complete memorization on the part of the agent. But this, in fact is exactly the
definition of overtraining a common pitfall in the training stage of machine
learning and about which one must be very diligent to avoid.
The Computational Theory of Intelligence: Information Entropy 63
Entropy Minimization
One final part of the framework remains, and that is to show that entropy is
minimized, as was stated at the beginning of this section. To show that, we
consider 𝕀 as a probabilistic mapping, with
(12)
indicating the probability that 𝕀 maps si ∈ S to some particular oj ∈ O.
From this, we can calculate the entropy in the mapping from S to O, at each
iteration t. If the projection [si] has N possible outcomes, then the Shannon
entropy of each si ∈ S is given by
(13)
The total entropy is simply the sum of 𝔼t[si] over i. Let , then
for the purposes of standardization across varying cardinalities, it may be
insightful to speak of the normalized entropy 𝔼t[S],
(14)
(15)
Therefore
(16)
Further, using the definition for Renyi entropy in 5 for each t and i
(17)
To show that the Renyi entropy is also minimized, we can use an identity
involving the p-norm
64 Information and Coding Theory in Computer Science
(18)
and show that the log function is maximized t → ∞ for α > 1, and mini-
mized for . The case α → 0 was shown above with the definition of
Shannon entropy. To continue, note that
(19)
since the summation is taken over all possible states oj ∈ O. But from 15,
we have
(20)
for finite t and thus the log function is minimized only as t → ∞. To show that
the Renyi entropy is also minimized for , we repeat the abovelogic
but note that the with the sign reversal of , the quantity is
minimized as t → ∞.
Finally, we can take a normalized sum over all i to obtain the total Renyi
entropy of . By this definition, then the total entropy is minimized
along with its components.
(21)
and for every s ∈ S, there is aunique s ∈σ such that s ∈ S. That is, every
element of S has one and only one element of σ containing it. The term
entropic self-organization refers to finding the Σ ⊂ P(S) such that ℍα[σ] is
minimized over all σ satisfying 21
(22)
The Computational Theory of Intelligence: Information Entropy 65
GLOBAL EFFECTS
In nature, whenever a system is taken from a state of higher entropy to a
state of lower entropy, there is always some amount of energy involved in
this transition, and an increase in the entropy of the rest of the environment
greater than or equal to that of the entropy loss [6] . In other words, consider
a system S composed of two subsystems, s1 and s2, then
(23)
Now, consider that system in equilibrium at times t1, and t2t1 > t2 and
denote the state at each S1 and S2, respectively. Then due to the second law
of thermodynamics,
. (24)
Therefore
(25)
Now, suppose one of the subsystems, say, s1 decreases in entropy by
some amount, Δs between t1, and t2, i.e. . Then to preserve the
inequality, the entropy of the rest of the system must be such that
. (26)
So the entropy of the rest of the system has to increase by an amount
greater than or equal to the loss of entropy in s1. This will require some
amount of energy, ΔE.
Observe that all we have done thus far is follow the natural consequences
of the Second Law of Thermodynamics with respect to our considerations
with intelligence. While the second law of thermodynamics has been verified
time and again in virtually all areas of physics, few have extended it as a
more general principal in the context of information theory. In fact, we will
conclude this section with a postulate about intelligence:
Computational intelligence is a process that locally minimizes and
globally maximizes Renyi entropy.
66 Information and Coding Theory in Computer Science
APPLICATIONS
Here, we implement the discussions of this paper to practical examples.
First, we consider a simple example of unsupervised learning; a clustering
algorithm based on Shannon entropy minimization. Next we look at some
simple behavior of an intelligent agent as it acts to maximize global entropy
in its environment.
We can visualize the progression of the algorithm and the final results,
respectively, in Figure 1. For simplicity, only the first two (non-noise)
dimensions are plotted. The accuracy of the clustering algorithm was 8.3%
error rate in 10,000 iterations, with an average simulation time: 480.1
seconds.
Observe that although there are a few “blemishes” in the final clustering
results, with a proper choice of parameters including the maximum
computational epochs the clustering algorithm will eventually succeed with
100% accuracy.
Also pictured in Figure 2 are the results of the clustering algorithm applied
to a data set containing four additional fields of pseudo-randomly generated
noise, each in the interval [−1,1]. The performance of this trial was worse
than the last in terms of speed, but was had about the same classification
accuracy. The accuracy of the clustering algorithm was 6.0% error rate in
10,000 iterations, with an average simulation time: 1013.1 seconds.
Entropy Maximization
In our next set of examples consider a virtual agent confined to move about
a “terrain”, represented by a three- dimensional surface, given by one of the
two following equations, each of which are plotted visually in Figure 3, and
defined by the following functions, respectively:
(27)
and
(28)
68 Information and Coding Theory in Computer Science
From the simple set of rules, we see emergent desire for parsimony with
respect to position on the surface, even in the less probable partitions of
z, as z →1. As our simulation continues to run, so tends ℙH to a uniform
distribution.
The Figure 3 depict a random walk on surface 1 and surface 2
respectively, where the top and bottom right figures show surface traversal
using an entropic search algorithm.
RELATED WORKS
Although there are many approaches to intelligence from the angle of
cognitive science, few have been proposed from the computational side.
However, as of late, some great work in this area is underway.
CONCLUSIONS
The purpose of this paper was to lay the groundwork for a generalization of
the concept of intelligence in the computational sense. We discussed how
entropy minimization can be utilized to facilitate the intelligence process,
and how the disparities between the agent’s prediction and the reality of
The Computational Theory of Intelligence: Information Entropy 71
the training set can be used to optimize the agent’s performance. We also
showed how such a concept could be used to produce a meaningful, albeit
simplified, practical demonstration.
Some future work includes applying the principals of this paper to data
analysis, specifically in the presence of noise or sparse data. We will discuss
some of these applications in the next paper.
More future work includes discussing the underlying principals under
which data can be collected hierarchically, discussing how computational
processes can implement the discussions in this paper to evolve and work
together to form processes of greater complexity, and discussing the
relevance of these contributions to abstract concepts like consciousness and
self awareness.
In the following paper we will examine how information can aggregate
together to form more complicated structures, the roles these structures can
play.
Daniel Kovach More concepts, examples, and applications will follow
in future works.
72 Information and Coding Theory in Computer Science
REFERENCES
1. Wechsler, D. and Matarazzo, J.D. (1972) Wechsler’s Measurement and
Appraisal of Adult Intelligence. Oxford UP, New York.
2. Gardner, H. (1993) Frames of the Mind: The Theory of Multiple
Intelligences. Basic, New York.
3. Sternberg, R.J. (1982) Handbook of Human Intelligence. Cambridge
UP, Cambridge Cambridgeshire.
4. Hawkins, J. and Sandra, B. (2004) On Intelligence. Times, New York.
5. Ihara, S. (1993) Information Theory for Continuous Systems. World
Scientific, Singapore.
6. Schroeder, D.V. (2000) An Introduction to Thermal Physics. Addison
Wesley, San Francisco.
7. Penrose, R. (2011) Cycles of Time: An Extraordinary New View of the
Universe. Alfred A. Knopf, New York.
8. Hawking, S.W. (1998) A Brief History of Time. Bantam, New York.
9. Jones, M.T. (2008) Artificial Intelligence: A Systems Approach. Infinity
Science, Hingham.
10. Russell, S.J. and Peter, No. (2003) Artificial Intelligence: A Modern
Approach. Prentice Hall/Pearson Education, Upper Saddle River.
11. (2013) Download Python. N.p., n.d. Web. 17 August 2013. http://www.
python.org/getit
12. Marr, D. and Poggio, T. (1979) A Computational Theory of Human
Stereo Vision. Proceedings of the Royal Society B: Biological Sciences,
204, 301-328. http://dx.doi.org/10.1098/rspb.1979.0029
13. Meyer, D.E. and Kieras, D.E. (1997) A Computational Theory of
Executive Cognitive Processes and Multiple-Task Performance: Part
I. Basic Mechanisms. Psychological Review, 104, 3-65. http://dx.doi.
org/10.1098/rspb.1979.0029
14. Wissner-Gross, A. and Freer, C. (2013) Causal Entropic Forces.
Physical Review Letters, 110, Article ID: 168702. http://dx.doi.
org/10.1103/PhysRevLett.110.168702
15. (2013) Entropica. N.p., n.d. Web. 17 August 2013. http://www.
entropica.com
SECTION 2: BLOCK AND STREAM CODING
Chapter
BLOCK-SPLIT ARRAY
CODING ALGORITHM FOR
LONG-STREAM DATA
5
COMPRESSION
Guangdong, China
Zhaoqing Branch, China Telecom Co., Ltd., Guangdong, China
2
ABSTRACT
With the advent of IR (Industrial Revolution) 4.0, the spread of sensors in
IoT (Internet of Things) may generate massive data, which will challenge
the limited sensor storage and network bandwidth. Hence, the study of
big data compression is valuable in the field of sensors. A problem is how
to compress the long-stream data efficiently with the finite memory of a
Citation: Qin Jiancheng, Lu Yiqin, Zhong Yu, “Block-Split Array Coding Algorithm for
Long-Stream Data Compression”, Journal of Sensors, vol. 2020, Article ID 5726527,
22 pages, 2020. https://doi.org/10.1155/2020/5726527.
Copyright: © 2020 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
76 Information and Coding Theory in Computer Science
INTRODUCTION
With the advent of IR (Industrial Revolution) 4.0 and the following rapid
expanding of IoT (Internet of Things), lots of sensors are available in
various fields, which will generate massive data. The soul of IR 4.0 and IoT
with intelligent decision and control relies on these valuable data. But the
spreading sensors’ data also bring problems to the smart systems, especially
in the WSN (wireless sensor network) with precious bandwidth. Due to the
limited storage capacity and network bandwidth, GBs or TBs of data in IoT
make an enormous challenge to the sensors.
Data compression is a desirable way to reduce storage usage and speed
up network transportation. In practice, stream data are widely used to support
the large data volume which exceeds the maximum storage of a sensor. For
example, a sensor with an HD (high-definition) camera can deal with its
video data as a long stream, despite its small memory. And in most cases,
a lot of sensors in the same zone may generate similar data, which can be
gathered and compressed as a stream, and then transmitted to the back-end
cloud platform.
This paper focuses on the sensors’ compression which has strict demands
about low computing consumption, fast encoding, and energy saving. These
special demands exclude most of the unfit compression algorithms. And we
pay attention to the lossless compression because it is general. Even a lossy
compression system usually contains an entropy encoder as the terminal
unit, which depends on the lossless compression. For example, in the image
Block-Split Array Coding Algorithm for Long-Stream Data Compression 77
(1)
Dzip and D are the volumes of the compressed and original data,
respectively. If the original data are not compressed, R = 0. If the compressed
data are larger than the original data, R < 0. Always R < 1.
The compression algorithms can merge similar data fragments within a
data window. Thus, a larger data window can see more data and have more
merging opportunities. But the data window size is limited by the capacity
of RAM. Although in Figure 1 the cloud platforms have plenty of RAM for
large data windows, a typical heavy-node sensor has only MBs of RAM at
present. Using the VM (virtual memory) to enlarge the data window is not
good enough, because the flash memory is much slower than RAM.
Moreover, a heavy node may gather data streams from different
lightweight nodes, and the streams may have similar data fragments. But
how can a data window see more than one stream to get more merging
opportunities?
For the second problem, the compression speed is important for
the sensors, especially the heavy nodes. GBs of stream data have to be
compressed in time, while the sensors’ computing resources are finite.
Cutting down the data window size to accelerate the computations is not
a good way, because the compression ratio will sink evidently, which runs
into the first problem again.
To solve the problems, we need to review the main related works around
sensors and stream data compression.
In [3] and [4], we have discussed current mathematic models and
methods of lossless compression can be divided into 3 classes:
• The compression based on probabilities and statistics: typical
algorithms in this class are Huffman and arithmetic coding [7].
The data window size can hardly influence the speed of such a
class of compression
• To maintain the statistic data for compression, the time complexity
of Huffman coding is O(lbM) and that of traditional arithmetic
coding is O(M). M is the amount of coding symbols, such as 256
characters and the index code-words in some LZ algorithms.
O(M) is not fast enough, but current arithmetic coding algorithms
Block-Split Array Coding Algorithm for Long-Stream Data Compression 81
Figure 2 shows these focuses within the platform ComZip. We can see
that the figures in each paper have some different appearances of ComZip
because different problems are considered. Typically, this paper focuses on
array coding, so the green arrows in Figure 2 points to the ComZip details
which are invisible in the figures of [3, 4, 6]. Besides, this paper compares
the similar structure in Focus C and D: the Matching Link Builder (MLB).
Despite these MLBs in CZ-BWT [6] and CZ-Array have different functions,
we cover them with a similar structure, which can simplify the hardware
design and save the cost.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 85
CZ-ARRAY CODING
Concepts of CZ-Array
The compression software ComZip uses the parallel pipeline named “CZ
pipeline.” We have introduced the framework of the CZ encoding pipeline
in [4, 6], and the reverse framework is the CZ decoding pipeline. Figure 3 is
the same encoding framework, and the difference is that CZ-Array units are
embedded, including
86 Information and Coding Theory in Computer Science
• The BAB (block array builder), which can mix multiple data
streams into a united array stream and enlarge the data window
scale
• The MLB (matching link builder), which can improve the speed
of LZ77 and BWT encoding
BAB Coding
The BAB benefits from the hint of RAID, although they are different in
detail. Figure 6 shows the simplest scenes of RAID-0 and BAB coding. In
Block-Split Array Coding Algorithm for Long-Stream Data Compression 89
Figure 6(a), we suppose the serial file is written to the empty RAID-0. This
file data stream is split into blocks and then written into different disks in
the RAID. In Figure 6(b), the blocks are read from these disks and then re-
arranged into a serial file.
for the data blocks. But this paper just cares about the compression ratio and
speed, so we focus on the simplest block array, like RAID-0.
In Figure 6, the RAID and BAB coding algorithms can easily determine
the proper order of data blocks because the cases are simple and the block
sizes are the same. But in practice, the data streams are more complex, such
as the multiple streams with different lengths, the sudden ending stream,
and the newly starting stream. Thus, the BAB coding algorithm has to treat
these situations.
In the BAB encoding process, we define n as the row amount of the data
block array; B as the size of a block; A as the amount of original data streams
(to be encoded); Stream[i] as the data stream with ID i (i = 0, 1, ⋯, A − 1);
Stream[i].length is the length of Stream[i] (i = 0, 1, ⋯, A − 1).
Figure 6 shows the simplest situation n = A. But in practice, the data
window size is limited, so this n cannot be too large. Otherwise, the data in the
window may be overdispersed and decrease the compression ratio. Hence, n
< A is usual, and each stream (to be encoded/decoded) has 2 status: active and
inactive. A stream under the BAB coding (especially in the block array rows)
is active, while a stream in the queue (outside of the rows) is inactive.
As shown in Figure 7, we divide the situations into 2 cases to simplify
the algorithm design:
• Case 1: Stream[i].length and A are already known before encoding.
For example, ComZip generates an XML (Extensible Markup
Language) document at the beginning of the compression. Each
file is regarded as a data stream, and the XML document stores
the files’ catalog
As shown in Figure 7(a), this case indicates all the streams are ready and
none is absent during the encoding process. So the algorithm can use a fixed
amount n to encode the streams. When an active stream ends, a new inactive
stream in the queue will connect this end. It is a seamless connection in the
same block. And in the decompression process, streams can be separated by
Stream[i].length.
• Case 2: Stream[i].length and A are unknown before encoding.
For example, the sensors send video streams occasionally. If a
sensor stops the current stream and then continues, we regard the
continued stream as a new one
As shown in Figure 7(b), this case indicates the real-time stream amount
A is dynamic, and each length is unknown until Stream[i] ends. So the
Block-Split Array Coding Algorithm for Long-Stream Data Compression 91
In Figure 8(a), the data stream and its matching links can be endless. But
in practice, the data window is limited by the RAM of the sensor. While the
data stream passes through, the data window slides in the stream buffer, and
the parts of data out of the buffer will lose information so that we cannot
follow the matching links beyond the data window. Cutting the links is slow,
but we may simply distinguish whether the link pointer has exceeded the
window bound.
To locate the data window sliding on the stream buffer, we define the
“straight” string s in the window as follows:
(2)
where buf[0...M − 1] is the stream buffer (M = 2N), pos is the current position
of the data window on the stream buffer, s[i...i + 2] is a 24b integer (i = 0, 1,
⋯, N − 1,), and buf[i...i + 2] is also a 24b integer (i = 0, 1, ⋯, M − 3).
In [6], we have built the matching links for CZ-BWT, which are also
called “bucket sorting links.” Now, we build the matching links for LZ77 on
buf. Figure 8 shows the example of the “XYZ” link, and we see 3 matching
points in it. The bucket array has 2563 link headers, and all links are endless.
We define the structure of the links and their headers as follows:
(3)
(4)
k is the fixed match length of each matching point in the links. In the example
of Figure 5, k = 3 for “ACO” and “XYZ”. i is the plain position in Equations
(3) and (4). We can see link[i] stores the distance (i.e., relative position) of the
next matching point. Be aware that if the next matching point is out of the
window bound, link[i] stores a useless value. The algorithm can distinguish this
condition: Let the value null = M, if i -link[i]<0, link[i] is expired.
Figure 8(b) shows the fast string matching by following the matching
links. We can see the reversed string “ZYXCBA…” starting in s[N] gets
2 matching points: reversed “ZYXC” and “ZYXCB,” with the matching
lengths 4 and 5.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 95
The data stream may be very long or endless, while the stream buffer is
finite; thus, the data window sliding on this buffer will reach the edge buf[M
− 1]. We can move the data window back to buf[0], but it is slow. So, we
assume this buffer is a “cycle” string buf[0...M − 1], which has the following
feature:
(5)
Then, Equation (2) is changed and extended into a congruent modulus
equation:
(6)
As M = 2N, the data window always occupies half of the stream buffer,
and the fresh stream data fill the other half. To simplify the algorithm
description, this paper hides the implementing detail of Equation (6) but
takes for granted that Equation (5) is working.
Following the matching links is not fast enough and needs further
acceleration. Seeking all the matching points along a link for the maximum
matching length consumes too much time, yet most of the attempts fail to
increase the matching length. So, we propose a configurable parameter sight
to control the matching times and observe that LZ77 encoding speeds up
evidently while the compression ratio decreased slightly.
Algorithm 3 shows the MLB coding, including the matching link
building and LZ77 encoding. And we can use a common LZ77 decoding
algorithm without matching links.
Time Complexities
BAB Coding
In practice, RAID-0 is very fast, and its algorithm is simple. The key factors
of its performance are N and n. N is the input/output data stream length,
and n is the disk amount of RAID-0. We regard n as a constant. So the time
complexity of RAID-0 encoding/decoding is O(N/n) = O(N).
As shown in Algorithms 1 and 2 and Figure 6, BAB coding is similar to
RAID-0. The difference is BAB coding has to treat multiple streams with
different lengths, but this operation is also fast and does not influence the
performance. So the time complexity of BAB coding is also O(N). We can
expect the practical BAB coding is as fast as RAID.
binary trees for LZSS encoding. Seeking a binary tree is faster than tracing a
matching link, so the time complexity of 7-zip (LZSS) encoding is O(NlbN).
We have a faster way in Phase (b) of Algorithm 3. A configurable
parameter sight is used to control the amount of the applicable matching
points in each link. Then we need not trace the whole matching link. Take
sight = 10 for example, each time only the first 10 matching points in the link
are inspected to find their maximum matching length. In practice, we regard
sight as a constant, so the time complexity of ComZip (LZ77) encoding is
O(sight N) = O(N).
As we have investigated, lz4 is one of the fastest parallel compression
software of the LZ series. It has the top speed because each time it just
inspects a single matching point, and it makes full use of the CPU cache.
Then, we may roughly regard lz4 as a fast LZ77 with the short matching
links of sight = 1. The time complexity of lz4 encoding is also O(N).
In [6], we have discussed bzip2 (BWT) encoding has the time complexity
O(N2lbN).
Space Complexities
BAB Coding
The memory usage is important for the sensors. As shown in Algorithms 1
and 2, BAB coding needs a block buffer and a set of arrays. The block buffer
has space complexity O(B). B is the block size. The stream information
arrays have space complexity O(n). n is the row amount of the array. But n
is very small, so the space complexity of BAB coding is O(B). B < N.
(LZSS) encoding needs the RAM 9.5 N (mainly for the binary tree). So, the
space complexity of 7-zip (LZSS) encoding is O(pN).
ComZip needs the RAM for the window buffer, matching links, and link
headers. In Figure 8, the stream buffer is a dual window buffer and needs
the RAM 2 N, but in ComZip we have optimized the dynamic RAM usage
which needs 1 − 2N. The matching links need the RAM 4 N (supporting 2 GB
data window) or 5 N (supporting 512 GB). The link headers form a bucket
array with 256k elements. If k = 3, a bucket array needs the RAM 64 MB
(supporting 2 GB data window) or 80 MB (supporting 512 GB), which can
be ignored if the data window is larger than 16 MB. In summary, ComZip
(LZ77) encoding needs the RAM 5-7 N, so it has the space complexity O(N).
Compression Ratios
The LZ series compression bases on the string matching. So the compression
ratio depends on the matching opportunities and maximum matching length
of each matching point. If a matching point goes out of the data window, it
cannot contribute to the compression ratio.
BAB coding rebuilds the data stream so that the multiple streams can
be seen in the same data window. If the streams are similar, many matching
points will benefit the compression ratio. But if the streams are completely
different, the compression ratio will slightly decrease, like the effect of
reducing the data window size. These results were observed in the data
experiments, and we may infer an experimental rule: a closer matching point
may have a larger matching length.
The data in practice are complex, so we need to do various data
experiments for this rule modeling. This paper just takes a simple example
to explain these BAB effects. According to Equation (1) and Figure 4, we
assume Dzip has the following trend:
(7)
N is the data window size. When BAB mixes n streams, the data window
size for each stream is N/n. Then, Equation (7) becomes
(8)
We regard N as a constant in this example. Then, Equation (8) becomes
(9)
Block-Split Array Coding Algorithm for Long-Stream Data Compression 99
Equation (9) infers that the compression ratio will slightly decrease if
the streams in the block array are completely different. But if the streams are
similar, as a simple example we may assume Dzip has the following trend:
(10)
The effects of Equations (9) and (10) get together, the results will show
the main factor Equation (10), which infers the compression ratio will
increase.
We may still use Equation (7) as an example to estimate the compression
ratio limited by the data window size, but we should also consider the
influence of the compression format. The 32 KB window of gzip is far away
from the 1 GB data window of 7-zip, but the length/index code-word of gzip
is short, which helps to improve the compression ratio. After all, the size
of 32 KB is too small, we may expect the compression ratio of gzip is the
lowest in our experiments. But the results show lz4 is the lowest.
Lz4 is designed for high-speed parallel compression. Each compression
thread has an independent 64 KB data window. But to reach the time
complexity of O(N), each time in the LZ encoding it tries only one matching
point, which will lose the remainder of matching opportunities in the window.
As a contrast, gzip tries all the matching points in the 32 KB window, and it
also has Huffman coding, so its compression ratio is higher than lz4.
Although bzip2 uses BWT encoding instead of the LZ series, we may
still use the example of Equation (7), which is supported by the experiment
results. Because the data windows of gzip, lz4, and bzip2 are all small, their
compression ratios are uncompetitive. And if they use larger windows, their
compression speed will slow down.
7-zip and ComZip have the advantage of large data windows in the
level of GB, so their compression ratios are high. And both of them support
multithread parallel compression for high speed. But we ought to consider
the RAM consumption of the multiple threads. If the RAM of a sensor is
limited, can it still support the large window for a high compression ratio?
Like the behavior of lz4, 7-zip also uses independent data windows for
the parallel threads. As mentioned above, 7-zip (LZSS) encoding has space
complexity O(pN). For example, if it uses p = 16 threads, and each thread
uses the N = 512 MB window, the total window size is pN = 8 GB. On the
other hand, if the RAM limits the total window size to pN = 512 MB, the
independent window for each thread is only N = 32 MB. So the parallel 7-zip
encoding decreases the compression ratio by N.
100 Information and Coding Theory in Computer Science
ComZip shares a whole data window for all parallel threads. If the total
window size is N = 512 MB, each (LZ77) encoding thread can make full use
of the 512 MB window. So the parallel encoding of ComZip can keep the
high compression ratio.
Moreover, ComZip (LZ77) encoding has the time complexity O(N),
which implies it has the latent capacity to enlarge the data window for a
higher compression ratio, without obvious rapid speed loss. As a contrast,
7-zip (LZSS) encoding has the time complexity O(NlbN).
ComZip uses sight to control the amount of the applicable matching
points for high speed and skips the matching link remainder. This will
decrease the compression ratio slightly, but we can enlarge either N or sight
to cover this loss.
Table 5 shows the comparison of CZ-Array and other LZ/BWT encoding
of these popular compression software.
In the view of hardware, both Figures 9 and 10 are succinct and easy to
optimize the hardware speed.
EXPERIMENT RESULTS
In [3, 4, 6], we have done some experiments to compare ComZip and other
compression software, but they are in different versions and different cases.
This paper focuses on CZ-Array and compares ComZip with p7zip (the
Linux version of 7-zip), bzip2, lz4, and gzip. These compression software
are popular, and a lot of other compression software have compared with
them. We may just compare ComZip with them because the comparison
with others can be estimated.
In this paper, we use the parameter “-9” for gzip and lz4 to gain their
maximum compression ratio. For ComZip itself, we use n = 4 to test BAB
coding and then use n = 1 to simulate ComZip without BAB. This is workable
by modifying the parameter “column” in the configuration file “cz.ini”. We
also use sight = 20 and B = 1 MB by modifying the parameter “sight” and
“cluster.”
We find 4 public files to simulate the data streams, and thank the
developers of the OpenWRT project. Table 6 shows these files. They can be
downloaded from http://downloads.openwrt.org/
Figure 11. Compressed file size and data window size in Table 7.
Table 8 and Figure 12 show the relationship between the compression/
decompression time and N. Figure 12 hides the decompression time because
we have not optimized the decompression of ComZip yet. The current
decompression program is just to verify the compression algorithms of
ComZip are correct, so it may be slower than other software. We focus on
the compression performance first, and the optimization of decompression
for ComZip is our future work.
0.5 MB — — — — — — 40 20 — — — —
0.6 MB — — — — — — 40 19 — — — —
0.7 MB — — — — — — 42 19 — — — —
0.8 MB — — — — — — 42 18 — — — —
0.9 MB — — — — — — 42 19 — — — —
1 MB 18 26 19 26 18 10 — — — — — —
2 MB 18 16 19 25 18 10 — — — — — —
4 MB 20 15 22 24 20 9 — — — — — —
8 MB 18 14 19 24 21 9 — — — — — —
16 MB 18 14 19 20 21 8 — — — — — —
32 MB 18 13 19 19 32 7 — — — — — —
64 MB 19 12 20 19 59 7 — — — — — —
128 MB 19 11 16 12 65 5 — — — — — —
256 MB 18 12 16 11 64 5 — — — — — —
512 MB 15 12 14 11 128 4 — — — — — —
16MB) because their RAM consumptions are near: ComZip (5N or 6N) vs
p7zip (16∗9.5N).
So, we may consider ComZip with BAB has the best compression
ratio among these software, and lz4 has the worst R owing to the 64 KB
small window and the single matching point algorithm. We also observe
the compression ratio of ComZip with BAB is obviously higher than that
without BAB. But when N = 128 MB or larger, ComZip gains the almost
same compression ratio, despite whether it uses BAB. The reason is that
the window is large enough to see 2 data streams or more, which has the
same effect as BAB coding. Another special point is in N = 1 MB, ComZip
also gains the almost same compression ratio, because B = N leads to the
invisible blocks for each other and useless BAB coding.
According to the compression speed shown in Figure 12 and Table 8,
ComZip without BAB (N = 512 MB) is the fastest, while ComZip with BAB
(N = 512 MB) and lz4 are very close to it. Both lz4 and ComZip have the
advantages of compression speed because they have the time complexity
O(N). But we observe abnormal curves which show ComZip with a small
N is slower than itself with a large N. The reason is the large N brings more
matching opportunities and decreases the total encoding operations.
The curve of p7zip is complex. When N < 16 MB, p7zip compression is
also fast. But when N is between 16 and 64 MB, its speed decreases rapidly
and fits the time complexity O(NlbN). When N = 128 MB, the window can
see 2 data streams and brings lots of matching opportunities, which changes
the trend of O(NlbN). When N = 512 MB, the speed decreases rapidly again
because the memory requirement for data windows (16*9.5N) exceeds the
RAM (64 GB), and the VM (virtual memory) is used. Fortunately, the VM
in SSD is faster than that in HDD (hard disk driver). As a contrast, ComZip
can use 64 GB RAM to build a 10 GB data window (6N).
The compression ratios and speeds of gzip and bzip2 are all uncompetitive,
so this paper simply provides their results.
Above all, the experiment results on this x86/64 platform show that CZ-
Array in ComZip has the following advantages to use large data windows:
high compression ratio and speed, shared window, and efficient RAM
consumption. ComZip with BAB can have the highest compression ratio
and speed among these software, which is practical for the long-stream data
compression.
Moreover, the experimental data are virtual router image files of
OpenWRT, and the results infer that CZ-Array in ComZip has the latent
Block-Split Array Coding Algorithm for Long-Stream Data Compression 107
Figure 13. Compressed file size and data window size in Table 9.
Figure 14. Compression time and data window size in Table 10.
This experiment is limited by the platform hardware, especially the 1 GB
RAM. When N = 96 MB, ComZip aborts for insufficient RAM. But other
software are worse. Only p7zip can use N = 64 MB (p = 1). p is the number
of parallel compression threads. And p7zip can only support N = 32 MB
(p = 2) and N = 16 MB (p = 4), and the larger N cannot work.
Figure 12 and Table 9 show that in this experiment p7zip cannot reach
N = 128 MB, so its compression ratio does not keep up with that of ComZip
with BAB. Then, ComZip with BAB has the best compression ratio, and lz4
has the worst.
In Figure 14 and Table 10, we observe that lz4 is the fastest, and ComZip
with BAB is the second fastest. We estimate the reason may be that the
CPU cache of ARM is small, which fits the small 64 KB data window of
lz4 to exhibit the performance. And to ComZip, the 4-core CPU limits the
performance of the parallel CZ encoding pipeline.
The curve of p7zip is complex. When N is between 2 and 8 MB, the
larger N brings more matching opportunities and benefits the speed, and
110 Information and Coding Theory in Computer Science
CONCLUSIONS
The rapid expansion of IoT leads to numerous sensors, which generate
massive data and bring the challenges of data transmission and storage. A
desirable way for this requirement is stream data compression, which can
treat long-stream data in the limited memory of a sensor.
But the problems of long-stream compression in the sensors still exist.
Owing to the limited computation resources of each sensor, enlarging the
Block-Split Array Coding Algorithm for Long-Stream Data Compression 111
ACKNOWLEDGMENTS
This paper is supported by the R&D Program in Key Areas of Guangdong
Province (2018B010113001, 2019B010137001), Guangzhou Science and
Technology Foundation of China (201802010023, 201902010061), and
Fundamental Research Funds for the Central Universities.
112 Information and Coding Theory in Computer Science
REFERENCES
1. G. Y. Lee, S. H. Lee, and H. J. Kwon, “DCT-based HDR exposure
fusion using multiexposed image sensors,” Journal of Sensors, vol.
2017, Article ID 2837970, 14 pages, 2017.
2. C. M. Li, K. Z. Deng, J. Y. Sun, and H. Wang, “Compressed sensing,
pseudodictionary-based, superresolution reconstruction,” Journal of
Sensors, vol. 2016, Article ID 1250538, 9 pages, 2016.
3. J. C. Qin and Z. Y. Bai, “Design of new format for mass data
compression,” The Journal of China Universities of Posts and
Telecommunications, vol. 18, no. 1, pp. 121–128, 2011.
4. J. C. Qin, Y. Q. Lu, and Y. Zhong, “Parallel algorithm for wireless data
compression and encryption,” Journal of Sensors, vol. 2017, Article
ID 4209397, 11 pages, 2017.
5. J. Ziv and A. Lempel, “A universal algorithm for sequential data
compression,” in IEEE Transactions on Information Theory, vol. 23,
no. 3, pp. 337–343, 1977.
6. J. C. Qin, Y. Q. Lu, and Y. Zhong, “Fast algorithm of truncated Burrows-
Wheeler transform coding for data compression of sensors,” Journal of
Sensors, vol. 2018, Article ID 6908760, 17 pages, 2018.
7. M. Alistair, M. N. Radford, and H. W. Ian, “Arithmetic coding
revisited,” ACM Transactions on Information Systems, vol. 16, no. 3,
pp. 256–294, 1998.
8. M. Burrows and D. J. Wheeler, A Block-Sorting lossless Data
Compression Algorithm, DIGITAL System Research Center, 1994.
9. J. L. Bentley, D. D. Sleator, R. E. Tarjan, and V. K. Wei, “A locally
adaptive data compression scheme,” Communications of the ACM, vol.
29, no. 4, pp. 320–330, 1986.
10. M. Alistair, “Implementing the PPM Data Compression Scheme,” in
IEEE Transactions on Communications, vol. 38, no. 11, pp. 1917–
1921, 1990.
11. N. Timoshevskaya and W. C. Feng, “SAIS-OPT: on the characterization
and optimization of the SA-IS algorithm for suffix array construction,”
in 4th IEEE International Conference on Computational Advances in
Bio and Medical Sciences (ICCABS), pp. 1–6, Florida, USA, 2014.
12. U. Baier, Linear-time Suffix Sorting - a New Approach for Suffix Array
Construction, Master Thesis at Ulm University, 2015.
Block-Split Array Coding Algorithm for Long-Stream Data Compression 113
13. F. Hussain and J. Jeong, “Efficient deep neural network for digital
image compression employing rectified linear neurons,” Journal of
Sensors, vol. 2016, Article ID 3184840, 7 pages, 2016.
14. G. L. Sicuranza, G. Ramponi, and S. Marsi, “Artificial neural network
for image compression,” Electronics Letters, vol. 26, no. 7, pp. 477–
479, 1990.
15. J. Jiang, “Image compression with neural networks - a survey,” Signal
Processing: Image Communication, vol. 14, no. 9, pp. 737–760, 1999.
16. P. C. Tseng, Y. C. Chang, Y. W. Huang, H. C. Fang, C. T. Huang, and
L. G. Chen, “Advances in hardware architectures for image and video
coding – a survey,” in Proceedings of the IEEE, vol. 93, no. 1, pp.
184–197, 2005.
17. S. M. Lee, J. H. Jang, J. H. Oh, J. K. Kim, and S. E. Lee, “Design
of hardware accelerator for Lempel-Ziv 4 (LZ4) compression,” IEICE
Electronics Express, vol. 14, no. 11, pp. 1–6, 2017.
18. U. I. Cheema and A. A. Khokhar, “High performance architecture for
computing Burrows-Wheeler transform on FPGAs,” in Proceedings of
the International Conference on Reconfigurable and FPGAs, pp. 1–6,
Cancun, Mexico, 2013.
19. S. Funasaka, K. Nakano, and Y. Ito, “Adaptive loss-less data
compression method optimized for GPU decompression,” Concurrency
and Computation-practice & Experience, vol. 29, no. 24, 2017.
20. A. Ozsoy and M. Swany, “CULZSS: LZSS lossless data compression
on CUDA,” in 2011 IEEE International Conference on Cluster
Computing, pp. 403–411, Austin, TX, USA, 2011.
21. M. Deo and S. Keely, “Parallel suffix array and least common prefix
for the GPU,” ACM SIGPLAN Notices, vol. 48, no. 8, pp. 197–206,
2013.
22. D. A. Patterson, P. Chen, G. Gibson, and R. H. Katz, “Introduction
to redundant arrays of inexpensive disks (RAID),” in IEEE Computer
Society International Conference: Intellectual Leverage, pp. 112–117,
San Francisco, CA, USA, 1989.
Chapter
BIT-ERROR AWARE
LOSSLESS IMAGE
COMPRESSION WITH
6
2D-LAYER-BLOCK CODING
ABSTRACT
With IoT development, it becomes more popular that image data is
transmitted via wireless communication systems. If bit errors occur during
transmission, the recovered image will become useless. To solve this
problem, a bit-error aware lossless image compression based on bi-level
Citation: Jungan Chen, Jean Jiang, Xinnian Guo, Lizhe Tan, “Bit-Error Aware Loss-
less Image Compression with 2D-Layer-Block Coding”, Journal of Sensors, vol. 2021,
Article ID 7331459, 18 pages, 2021. https://doi.org/10.1155/2021/7331459.
Copyright: © 2021 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
116 Information and Coding Theory in Computer Science
coding is proposed for gray image compression. But bi-level coding has
not considered the inherent statistical correlation in 2D context region. To
resolve this shortage, a novel variable-size 2D-block extraction and encoding
method with built-in bi-level coding for color image is developed to decrease
the entropy of information and improve the compression ratio. A lossless
color transformation from RGB to the YCrCb color space is used for the
decorrelation of color components. Particularly, the layer-extraction method
is proposed to keep the Laplacian distribution of the data in 2D blocks which
is suitable for bi-level coding. In addition, optimization of 2D-block start
bits is used to improve the performance. To evaluate the performance of our
proposed method, many experiments including the comparison with state-
of-the-art methods, the effects with different color space, etc. are conducted.
The comparison experiments under a bit-error environment show that the
average compression rate of our method is better than bi-level, Jpeg2000,
WebP, FLIF, and L3C (deep learning method) with hamming code. Also,
our method achieves the same image quality with the bi-level method. Other
experiments illustrate the positive effect of built-in bi-level encoding and
encoding with zero-mean values, which can maintain high image quality. At
last, the results of the decrease of entropy and the procedure of our method
are given and discussed.
INTRODUCTION
With cloud computing and Internet of Things (IoT) development, the
requirement for data transmission and storage is increasing. Fast and efficient
compression of data plays a very important role in many applications. For
instance, image data compression has been used in many areas such as
medical, satellite remote sensing, and multimedia.
There are many methods to compress image data including prediction-
based, transformation-based, and other methods such as fractal image
compression and deep learning with Auto Encoder (AE) [1, 2], Recurrent
Neural Network (RNN), Convolutional Neural Network (CNN) [3], and
Residual Neural Network (RestNet) [4]. The transformation-based method
includes Discrete Cosine Transform (DCT), Karhunen-Loeve Transform
(KLT), Hadamard transform, Slant transform, Haar transform, and singular
value decomposition [5]. Usually, transformation-based or deep learning
methods are used in lossy compression while prediction-based methods are
used for lossless compression.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 117
for the efficiency of bi-level coding is that it uses the sparsity property of
data which required fewer encoding bits.
In this work, we will use bi-level coding [7] for natural images with
red (R), green (G), and blue (B) components. As R, G, and B are highly
correlated, a linear transformation is applied to map RGB to other color space
and achieve better CR [17, 18]. As discussed above, the spatial structure-
based method with 2D context region is taken as an effective solution to
improve CR. Therefore, image is split into many 2D blocks which has
sparsity property and be suitable to be encoded with bi-level coding. Finally,
a novel variable-size 2D-block extraction and encoding method with built-
in bi-level coding is proposed to improve CR for color image and robust
to bit-error environment. An important 2D-layer-block extraction method
is used to split the image to many 2D blocks with similar color and keep
the Laplacian or Gaussian distribution of data in one 2D block, which has
sparsity property.
The contributions of this paper are summarized as follows:
• For color image compression, a lossless color transformation
from RGB to the YCrCb color space is used for the decorrelation
of color components. The prediction-based method is used to
remove data correlation and produce residue sequence
• To keep the data distribution with the sparsity property and be
suitable for bi-level coding, a novel 2D-layer-block extraction
method is proposed to keep the Laplacian or Gaussian distribution
of data in 2D blocks. Furthermore, by rearranging the order of
data encoded, the extraction method can decrease the entropy of
data and improve CR
• A novel variable-size 2D-block encoding method with built-
in bi-level is proposed to improve CR and robust to bit-error
environment just as the bi-level coding method. The mean or min
value in one 2D block and key information bits in built-in bi-level
coding are protected with hamming code. So, the image can be
recovered and useful
The rest of this paper is organized as follows. In Section 2, related
works on lossless compression are discussed. In Section 3, the details of the
proposed method are briefly introduced. In Section 4, several experiments
including comparison and analysis of basic principles are conducted. Finally,
the conclusion and future researches are drawn in Section 5.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 119
Prediction-Based Methods
The context-based adaptive prediction method is based on a static predictor
which is usually a switching predictor able to adapt to several types of
contexts, like horizontal edge, vertical edge, or smooth area. Many static
predictors can be found in [6, 19]. Median edge detector (MED) used in
LOCO-I uses only three causal pixels to determine a type of pixel area which
is currently predicted [20]. LOCO-I is further improved and standardized
as the JPEG-LS lossless compression algorithm, which has eight different
predictive schemes including three one-dimensional and four tow-
dimensional predictors [21]. To detect edges, Gradient Adjusted Predictor
(GAP) embedded in the CALIC algorithm uses local gradient estimation
and three heuristic-defined thresholds [22]. Gradient edge detection (GED)
predictor combines simplicity of MED and efficiency of GAP [23]. In [19],
the prediction errors are encoded using codes adaptively selected from the
modified Golomb-Rice code family. To enable processing of images with
higher bit depths, a simple context-based entropy coder is presented [6].
LS-based optimization is proposed as an approach to accommodate
varying statistics of coding images. To reduce computational complexity,
edge-directed prediction (EDP) initiates the LS optimization process only
when the prediction error is beyond a preselected threshold [24]. In [25],
the LS optimization is processed only when the coding pixel is around an
edge or when the prediction error is large. And a switching coding scheme
is further proposed that combines the advantages of both run-length and
adaptive linear predictive coding [26]. Minimum Mean Square Error
(MMSE) predictor uses least mean square principle to adapt k-order linear
predictor coefficients for optimal prediction of the current pixel, from a fixed
number of m causal neighbors [27]. The paper [28] presents a lossless coding
method based on blending approach with a set of 20 blended predictors, such
as recursive least squares (RLS) predictors and Context-Based Adaptive
Linear Prediction (CoBALP+).
Although individual prediction is favored, the morphology of 2D context
region would be destructed accordingly and inherent statistical correlation
among the correlated region gets obscure. As an alternative, spatial structure
has been considered to compensate the pixel-wise prediction [29]. In [14],
quadtree-based variable block-size partitioning is introduced into the
adaptive prediction technique to remove spatial redundancy in a given
120 Information and Coding Theory in Computer Science
image and the resulting prediction errors are encoded using context-adaptive
arithmetic coding. Inspired by the success of prediction by partial matching
(PPM) in sequential compression, the paper [30] introduces the probabilistic
modeling of the encoding symbol based on its previous context occurrences.
In [15], superspatial structure prediction is proposed to find an optimal
prediction of the structure components, e.g., edges, patterns, and textures,
within the previously encoded image regions instead of the spatial causal
neighborhood. The paper [17] presents a lossless color image compression
algorithm based on the hierarchical prediction and context-adaptive
arithmetic coding. By exploiting the decomposition and combinatorial
structure of the local prediction task and making the conditional prediction
with multiple max-margin estimation in a correlated region, a structured
set prediction model with max-margin Markov networks is proposed [29].
In [16], the image data is treated as an interleaved sequence generated by
multiple sources and a new linear prediction technique combined with
template-matching prediction and predictor blending method is proposed.
Our method uses a variable-size 2D-block extraction and encoding method
with built-in bi-level to improve the compression rate.
(1)
(2)
122 Information and Coding Theory in Computer Science
(3)
n bits required to encode in the extracted blocks, and the remaining data not
belonging to any blocks is left to the next layer for extraction. The 2D-block
encoding method with built-in bi-level is used to improve CR and keeps
robust to bit errors. The built-in bi-level procedure split the 2D block into
many one-dimension signals, and each signal is encoding separately. It is
because the bi-level method has the maximum encoding length, which is
normally the same as the width of image.
(4)
Let us consider all these datum x in the block governed by a probability
density f (x), and the entropy is calculated by (6) [33].
(5)
(6)
By inserting (5) into (6), the entropy for a Gaussian distribution is
expressed as
(7)
Since the residue sequence with Gaussian distribution has maximum
entropy, the following inequality holds in general.
(8)
According to (4), the fixed standard deviation σ is less than (2n – 1)/2
and (9) can be deducted when we assume μ = 0; L is the sample size in all of
these datum x in one block. By substituting (9) to (8), equation (10) can be
obtained. When blocks for n bits are found starting from 1 to 8, the entropy
of these data in blocks is increased later and later according to (10). So, the
entropy is decided by n and it is possible to improve the compression ratio
with this method.
(9)
(10)
According to the discussion above, we assume μ = 0. After performing
prediction and making it zero-mean by removing the average, many residue
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 125
values are close to zero and the residues follow a Laplacian distribution
as shown in Figure 4(a). That is, all the data in one of these encoding 2D
blocks will satisfy (11). Note that the sample size L in the block is above a
threshold value of thn and data in the block possess a Laplacian or Gaussian
distribution approximately.
(11)
Figure 4. Data distribution. (a) data distribution in blocks; (b) data distribution
after 2D-block extraction; (c) data in one block.
To proceed, if all of these 2D blocks with n = 1 bits are found, they will
be extracted from residue data. The rest of the residue data consists of three
portions. The first portion has values bigger than (2n – 1)/2, and the second
portion has values smaller than –(2n – 1)/2, while the third portion contains
data which size is smaller than thn. It is noted that after the residues ∈ [−(2n
– 1)/2, (2n – 1)/2] shown in Figure 4(a) are extracted, the rest of the residue
data will nearly keep the Laplacian distribution. When the extraction is
repeated from n = 1 to 8, the Laplacian distribution of the remained residue
data will change with decreasing the probability density around zeros as
shown in Figure 4(b). In addition, the Laplacian or Gaussian distribution
in these 2D blocks will be flattened as depicted in Figure 4(c) because of
126 Information and Coding Theory in Computer Science
increasing value of (2n – 1). In this paper, the procedure is called as layer
extraction.
Bilevel Coding
As most of these data in one of 2D blocks has a sparse distribution discussed
above, a bi-level coding scheme proposed by our previous works [7, 34] in
Figure 7 can be applied.
(12)
For a given 2D block for n bits, N0 = n, the original total length is N0
∗ ns. When bi-level block coding is applied, the compression ratio will be
improved according to (12).
128 Information and Coding Theory in Computer Science
(13)
2D-Block Encoding
Figured 9(a)–9(c) show the details of the encoding scheme. When one color
image is given, three channels are separately encoded and the head information
including the color space, predictor information, and their hamming coding
in (a). In each channel, 2D blocks are extracted with extraction method layer
by layer so encoding is implemented recursively layer by layer as well and
each layer is encoded separately. In each layer of (b), head data and image
data are separately encoded. (c) shows the encoding scheme of head data.
Width of parent matrix ∗ the height of image is the size of M, and length of
B is the length of the remaining data in Figures 5 and 6. Every block has start
position (x, y), its size (w, h), mean value of the data in block, the maximum
bits used of each data in block, and the key information of built-in bi-level
coding including N0, N1, nb, the number of block ns/nb, and bitstream of block
type. Particularly, the mean value of data in block has two functions. One is
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 129
used to improve the capability of robust to bit error because the mean value
keeps the key information of one block. Another is used to ensure zero-mean
of the block data, which is the feature in bi-level coding.
Figure 9. 2D-block encoding scheme: (a) data encoding of one image; (b) data
encoding in one channel of image; (c) header data encoding of one layer.
EXPERIMENTS
To validate our proposed algorithm, Open Images from http://data.
vision.ee.ethz.ch/mentzerf/validation_sets_lossless/val_oi_500_r.tar.gz,
CLIC mobile dataset from https://data.vision.ee.ethz.ch/cvl/clic/mobile_
valid_2020.zip shown in Table 1, and many classic images from http://sipi.
usc.edu/database/ and http://homepages.cae.wisc.edu/~ece533/images/ are
used.
(14)
(15)
(16)
(17)
(18)
Comparison
In this experiment, our proposed method is compared with many state-
of-the-art methods from Refs. [16, 17], engineered lossless compression
algorithms including PNG, Jpeg2000, WebP, FLIF, and deep learning-based
lossless compression algorithm L3C. As all of these methods are not suitable
to be applied in bit-error situation, these methods with hamming code (7,4)
are supposed a solution robust to bit-error environment. The results are
given in Tables 2 and 3 and Figure 10.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 131
Figure 10. Comparison of recovered image quality: (a) PSNR; (b) other assess-
ment methods.
In Table 2, the results are taken from Refs. [15, 16] and the best results
of CR with hamming code are listed in the last second column. The average
CR of our method is 1.31296 and better than 1.116592. In Table 3, the results
of Jpeg2000, WebP, and FLIF are achieved through the compression tools
including OpenJpeg, WebP from Google, and FLIF from Cloudinary. As a
deep learning method, the result of L3C is achieved by using the neuron model
trained with Open Images dataset to compress images. The average CR of our
method is 1.682933 and better than others such as bi-level (1.66655), FLIF
(1.429898), and L3C (1.396234). In addition, it is noticed that CR of L3C
with images (5) and (9) from the CLIC dataset get the worst results, which are
1.098277 and 0.918406. One reason that leads to the result is that the neuron
model used to compress is trained with Open Images dataset and L3C do not
perform well on the images from a different dataset CLIC.
Figure 10 shows the image quality assessment results. PSNR, SSIM and
MSSSIM_Y can better reflect the situation of bit-error channel. Therefore,
only these three assessment results are discussed in the late section.
According to the comparison results in Tables 2 and 3 and Figure
10, the compression ratio of our proposed method is higher than bi-level
coding although 2D-block encoding requires more header bits to encode the
information about the position and size of block. And similar image quality
with bi-level coding is kept in Figure 10. The reason is that the 2D-layer-
block extraction method rearranges the data order to decrease the entropy
and the data distribution of one-layer blocks nearly keeps as Laplacian
distribution which is suitable for bi-level coding as discussed before. In
addition, the analysis will be further discussed in Section 4.6–4.7.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 133
Figure 14. Effect of zero-mean on PSNR (BER = 0.001, thn = 128, no bi-level).
As discussed in Section 4.2, the compression ratio with built-in bi-level
coding is higher when zero-mean values are used. While without the bi-
level coding used, the compression ratio with positive integer values by
removing the min value is higher than with zero-mean values in Figure 13.
The reason is that encoding with zero-mean values requires the sign bit.
But the usage of sign bit can result in that encoding with zero-mean values
has smaller amplitude around the mean value, so encoding with zero-mean
values has less effect on PSNR and SSIM than with positive integer values.
Consequently, according to the PSNR and SSIM shown in Figures 14(a) and
14(b), encoding with zero-mean values can achieve better PSNR and SSIM
than with positive integer, which is consistent with Section 3.3.
Figure 15 gives the reconstructed images in different methods. According
to the results of image (9) at the last row, encoding with “positive integer”
values shows worse image quality than others. But in Figure 14(c), the
MSSSIM value of “positive integer” is higher than “zero-mean.” Therefore,
the MSSSIM result cannot be consistent with the real image quality in some
cases.
138 Information and Coding Theory in Computer Science
Figure 17. Comparison with different color space and predictor (BER = 0.001,
thn = 128).
Figure 18. The entropy of 2D blocks for n bits (entropy ∗ number of samples,
thn = 128). Left is the entropy of blocks extracted.
Discussion
Through these experiments, the results of comparison proved that our
method performs better than state-of-the-art methods, engineering lossless
compression algorithms and deep learning methods under bit-error situation.
There are four main reasons.
First, the 2D block extraction method extracts the data encoded with
smaller bits layer by layer; thus, the entropy is decreased as Section 4.6
show.
Second, the edge data always cause poor compression rate but the
2D-block extraction method has changed the edge data distribution. And the
data distribution of each layer block nearly keeps as Laplacian distribution
which is suitable for bi-level coding as Figure 19 of Section 4.7.
Third, built-in bi-level coding with zero-mean value can preserve high
image quality under bit-error environment as Section 4.2 and 4.3 discussed.
At last, optimization of 2D-block start bits and color space used in “2D
block” is an important mechanism to improve the compression rate as the
discussion in Sections 4.4 and 4.5.
CONCLUSIONS
When image data is transferred through wireless communication systems,
bit errors may occur and will cause corruption of image data. To reducing
the bit-error effect, a bit-aware lossless image compression algorithm
based on bi-level coding can be applied. But bi-level coding is one of
the one-dimension coding methods and has not considered the inherent
statistical correlation in 2D context region. So, to resolve this shortage,
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 143
ACKNOWLEDGMENTS
This work was supported in part by the National Natural Science Foundation
of China (No. 61502423), Zhejiang Provincial Natural Science Foundation
(No. LY16G020012), and Zhejiang Province Public Welfare Technology
Application Research Project (Nos. LGF19F010002, LGN20F010001,
LGF20F010004, and LGG21F030014).
144 Information and Coding Theory in Computer Science
REFERENCES
1. J. Ballé, D. Minnen, S. Singh, S. J. Hwang, and N. Johnston,
“Variational image compression with a scale hyperprior,” 2018, arXiv
preprint arXiv:1802.01436.
2. J. Lee, S. Cho, and M. Kim, An End-to-End Joint Learning Scheme of
Image Compression and Quality Enhancement with Improved Entropy
Minimization, 2019, https://arxiv.org/pdf/1912.12817.
3. F. Jiang, W. Tao, S. Liu, J. Ren, X. Guo, and D. Zhao, “An end-to-
end compression framework based on convolutional neural networks,”
IEEE Transactions on Circuits and Systems for Video Technology, vol.
28, no. 10, pp. 3007–3018, 2018.
4. Z. Cheng, H. Sun, M. Takeuchi, and J. Katto, “Learned image
compression with discretized Gaussian mixture likelihoods and
attention modules,” in 2020 IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR), Seattle, WA, USA, June 2020.
5. K. N. Shruthi, B. M. Shashank, Y. S. Saketh et al., “Comparison analysis
of a biomedical image for compression using various transform coding
techniques,” in 2016 IEEE 6th International Conference on Advanced
Computing (IACC), pp. 297–303, Bhimavaram, India, February 2016.
6. A. Avramović and G. Banjac, “On predictive-based lossless
compression of images with higher bit depths,” Telfor Journal, vol. 4,
no. 2, pp. 122–127, 2012.
7. L. Tan and L. Wang, “Bit-error aware lossless image compression,”
International Journal of Modern Engineering, vol. 11, no. 2, pp. 54–
59, 2011.
8. “WebP image format,” https://developers.google.com/speed/webp/.
9. J. Sneyers and P. Wuille, “FLIF: free lossless image format based on
MANIAC compression,” in 2016 IEEE International Conference on
Image Processing (ICIP), Phoenix, AZ, USA, September 2016.
10. J. Townsend, T. Bird, and D. Barber, “Practical lossless compression
with latent variables using bits back coding,” The International
Conference on Learning Representations, Louisiana, USA, 2019,
arXiv preprint arXiv:1901.04866.
11. F. Kingma, P. Abbeel, and J. Ho, “Bit-swap: recursive bits-back
coding for lossless compression with hierarchical latent variables,”
in In International Conference on Machine Learning, pp. 3408–3417,
California, USA, 2019.
Bit-Error Aware Lossless Image Compression with 2D-Layer-Block Coding 145
ABSTRACT
In this paper, Beam Pattern Scanning (BPS), a transmit diversity technique,
is compared with two well known transmit diversity techniques, space-time
block coding (STBC) and space-time trellis coding (STTC). In BPS (also
called beam pattern oscillation), controlled time varying weight vectors
are applied to the antenna array elements mounted at the base station (BS).
This creates a small movement in the antenna array pattern directed toward
Citation: P. Keong Teh and S. Zekavat, “Beam Pattern Scanning (BPS) versus Space-
Time Block Coding (STBC) and Space-Time Trellis Coding (STTC),” International
Journal of Communications, Network and System Sciences, Vol. 2, No. 6, 2009, pp.
469-479. doi: 10.4236/ijcns.2009.26051.
Copyright: © 2009 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0.
150 Information and Coding Theory in Computer Science
the desired user. In rich scattering environments, this small beam pattern
movement creates an artificial fast fading channel. The receiver is designed to
exploit time diversity benefits of the fast fading channel. Via the application
of simple combining techniques, BPS improves the probability-of-error
performance and network capacity with minimal cost and complexity.In this
work, to highlight the potential of the BPS, we compare BPS and Space-Time
Coding (i.e., STBC and STTC) schemes. The comparisons are in terms of
their complexity, system physical dimension, network capacity, probability-
of-error performance, and spectrum efficiency. It is shown that BPS leads to
higher network capacity and performance with a smaller antenna dimension
and complexity with minimal loss in spectrum efficiency. This identifies
BPS as a promising scheme for future wireless communications with smart
antennas.
INTRODUCTION
Transmit diversity schemes use arrays of antennas at the transmitter to
create diversity at the receiver. Different transmit diversity techniques have
been introduced to mitigate fading effects in wireless communications [1–
5]. Examples are space-time block coding [1–3], space-time trellis coding
[3–5], antenna hopping [6] and delay diversity [6,7].
In Space-Time Block Coding (STBC), data is encoded by a channel
coder and the encoded data is split into N unique streams, simultaneously
transmitted over N antenna array elements. At the receiver, the symbols
are decoded using a maximum likelihood decoder. This scheme combines
the benefits of channel coding and diversity transmission, providing BER
performance gains. However, receiver complexity increases as a function of
bandwidth efficiency [3] and requires high number of antennas to achieve
high diversity orders. Moreover, antenna elements should be located far
enough to achieve space diversity and when antenna arrays at the base
station (BS) are used in this fashion, directionality benefits are no longer
available [1–3]. This reduces the network capacity of wireless systems in
terms of number of users.
In Space-Time Trellis Coding (STTC) information symbols are encoded
by a unique space-time channel coder and the encoded information symbols
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 151
The results confirm that BPS scheme leads to higher network capacity
and BER/FER performance and lower complexity. However, BPS technique
relative spectral efficiency is less than STBC and STTC, e.g., in the order
of 5%. In other words, BPS technique offers higher quality-of-service and
network capacity with a minimal cost of spectrum efficiency. This introduces
BPS as a powerful scheme for future generation of wireless communications
with smart antenna arrays.
Section 2 introduces STBC, STTC and BPS schemes. Section 3 compares
their characteristics and, Section 4 presents and compares their capacity and
BER/FER performance simulations. Section 5 concludes the paper.
STBC
STBC is a transmit diversity technique capable of creating diversity at the
receiver to improve the performance of communications systems. STBC
utilizes N transmit antennas separated far apart to ensure independent fades
[1,2]. At a given symbol period, N signals are transmitted simultaneously
from N antennas. The signal transmitted from each antenna has a unique
structure that allows the signal to be combined and recovered at the receiver.
For simplicity in presentation, we only consider STBC with 2 transmit
antennas (N = 2) (see Figure 1).
We consider s0 and s1 two consecutive signals generated at two
consecutive times t0 and t1 = t0+Ts, respectively. The signal transmitted from
antenna zero is denoted by s0 and the one from antenna one is denoted by
s1. At the next symbol period, the transmitted signal from antenna zero is
(1)
respectively, where Ts is the symbol duration, ai, qi, i ∈{0,1} are the Rayleigh
fading gain and phase, respectively. The received signal at time t and t + Ts,
corresponds to
(2)
respectively. Here, nt and are complex random variables representing
receiver noise and interference at time t and t + Ts, respectively.
In the STBC receiver, Maximal Ratio Combining (MRC) leads to an
estimation of s0 and s1, corresponding to:
154 Information and Coding Theory in Computer Science
(3)
respectively (note: rt=r(t)). Substituting (1) and (2) into (3), we obtain
(4)
In other word, a maximum likelihood receiver leads to the removal of
the s1 and s0 dependent terms in ŝ0 and ŝ1, respectively. This generates a high
probability-of-error performance at the receiver.
STTC Technique
STTC is a transmit diversity technique that combines space diversity and
coding gain to improve the performance of communication systems [3,5,8].
STTC utilizes N transmit antennas separated far apart to ensure independent
channels. At a given symbol period, N signals are transmitted simultaneously
from N antennas. The signal transmitted from each antenna has a unique
structure with inherent error-correction capability to allow signal to be
recovered and corrected at the receiver [8]. In this paper, we only consider
the simulation scenario presented in [3], that is p/4-QPSK, 4-states, 2 b/s/
Hz STTC (hereafter, denoted as STTC-QPSK) that utilizes two transmit
antennas and one receive antenna.
The trellis structure of STTC-QPSK is shown in Figure 2(a) and
the constellation mapping in Figure 2(b). In STTC-QPSK, information
symbols are encoded using a channel coder by mapping input symbols to
a vector of output (codewords) based on a trellis structure (Figure 2(a)).
Here, information symbols are encoded based on the current state of the
encoder and the current information symbols. Thus, the encoded codewords
are correlated in time.
At the left of the trellis structure (Figure 2(a)) are the STTC codewords
(s1,s2), s1,s2 ∈ {0,1,2,3}. In Figure 2(a), there are four emerging branches
from each trellis state, because there are four possible QPSK symbols,
namely {0,1,2,3}. For example, consider the space time trellis coder that
starts at state (q1,q2) = (0,0) (represented by 00). When the information
symbol is 10, the coder transition from state 00 to 10 produces the output
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 155
code-words (s1,s2) of (0,2). When the next information symbol is 11, the
coder transition from state 10 to 11 produces the output codeword (2,3). The
channel coder continues to change from its current state to a new state based
on the incoming information symbols. Based on the design, the channel
coder resets to state 0 after completing the coding of a frame (e.g., 130
symbols). The output code-words of the encoder is then mapped into a p/4-
QPSK constellation (Figure 2(b)). The mapping results in two information
symbols. Each information symbol is then transmitted on each antenna
simultaneously. Through this encoding scheme, redundancy is introduced
into the system but at the same time, the symbols are transmitted over two
antennas. Therefore, coding redundancy does not impact the throughput. In
order to achieve SDMA to improve network capacity, each STTC-QPSK
antenna element is replaced with one antenna array [9] to generate two static
beams directed toward the desired users (Figure 3).
Figure 2. (a) STTC-QPSK trellis structure, and (b) Constellation mapping us-
ing gray code
Figure 3. STTC far located antenna elements are replaced by antenna arrays to
support SDMA.
156 Information and Coding Theory in Computer Science
(6)
where si(t) is the transmitted symbol and n(t) is the complex random variable
representing receiver noise at time t. The receiver is designed using Viterbi
algorithm. The branch metric for a transition labeled q1(t) q2(t) corresponds
to [3]
(7)
where P is the number of transmit antenna. Viterbi algorithm is used to
compute the path with the lowest accumulated metric [3].
BPS
BPS is a new transmit diversity technique utilizing an antenna array to
support directionality and transmit diversity via carefully controlled time
varying phase shifts applied to each antenna element. This creates a slight
motion of the beam pattern directed toward the desired users [10]. Beam
pattern movement creates an artificial fast fading environment that leads to
time diversity exploitable by the BPS receiver [11]. Beam pattern movement
is created by applying time varying phase q(t) to the elements of antenna
array (see Figure 4).
selecting the phase offset q(t) leads to a movement of antenna beam pattern
that ensures: 1) constant large scale fading over Ts, and 2) the generation of
L independent fades within each Ts.
1) Achieving constant large-scale fading: In order to ensure constant
large-scale fading over each symbol period Ts, the mobile must
remain within the antenna array’s HPBW at all times. This
corresponds to
(8)
where b is the HPBW, φ is the azimuth angle, dφ / dt is the rate of antenna
pattern movement, and Ts·(dφ / dt) is the amount of antenna pattern
movement within Ts. The received antenna pattern amplitude is ensured
to remain within the HPBW for the entire symbol duration, Ts, using the
control parameter k, 0 < k < 1.
2) Achieving L independent fades within each Ts: Using (8), the
phase offset applied to the antenna array is found to be (see
[3,6,7]):
(9)
where l is the wavelength of the carrier and d is the distance between adjacent
antenna elements.
The sweeping of the beam pattern creates an artificial fast fading channel
with a coherence time that may lead to L independent fades over Ts. This is a
direct result of the departure and the arrival of scatterers within the antenna
array beam pattern window. Simulation results in [10] and [11] assuming a
medium size city center, with 0.0005 < k < 0.05, reveals that time diversity
gains as high as L = 7 is achievable using BPS scheme.
Assuming BPSK modulation, the transmitted signal can be represented
as
(10)
where b0 ∈ {–1,+1} is the transmitted bit, fo is the carrier frequency, and
gTs(t) is the pulse shape (e.g., a rectangular waveform with unity height over
0 to Ts). The normalized signal received at the mobile receiver input cor-
responds to:
158 Information and Coding Theory in Computer Science
(11)
where m ∈ {0,1,2,…, M–1} is the mth antenna array element (Figure 2),
nl(t) is an additive white Gaussian noise (AWGN), which is considered
independent for different time slots (l), al is the fade amplitude in the lth time
slot, and xl is its phase offset (hereafter, this phase offset is assumed to be
tracked and removed). Moreover, in (11),
(12)
where (2pd/l)×cosφ is the phase offset caused by the difference in distance
between antenna array elements and the mobile (assuming the antenna array
is mounted horizontally), and θ(t) is introduced in Equation (9). Applying
the summation over m, Equation (11) corresponds to
(13)
Here,
(14)
is the antenna array factor. Assuming the mobile located at φ = p/2, (12) can
be approximated by g(t, φ) = g(t) = -q(t). Moreover, assuming that antenna
array’s peak is directed towards the intended mobile at time 0, and small
movements of antenna array pattern over Ts, i.e., in Equation (9), k is small,
the array factor is well approximated by AF(t, φ) @ 1.
The time varying phase of (9) in (12) and (13) leads to a spectrum
expansion of the transmitted (and the received) signal. Because the parameter
k in (9) is considered small (e.g., k = 0.05), this expansion is minimal (see
Subsection 3.2). After returning the signal to the base-band the received
signal corresponds to:
(15)
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 159
(16)
where (B.W.)BPS = bandwidth needed with BPS and (B.W.)withoutBPS =
bandwidth needed without BPS. Considering (13) and using (12) and (9),
the expansion factor fexp. corresponds to
(17)
Hence, with a constant Ts, l, b, d and M, for both BPS and STBC systems,
the relative reduction in bandwidth efficiency due to BPS corresponds to
(18)
Considering d = l/2, and typical values of b (e.g., b = 0.5 rad.), and M =
6, (18) can be approximated by
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 161
(19)
With this definition, the relative reduction in BPS spectrum efficiency is
determined by the control parameter, k. For example, considering k = 0.05
(an antenna sweeping is equivalent to 5% of HPBW), hR = 95%. On the
other hand, with a constant bandwidth available to both BPS, and STBC and
STTC, the throughput of BPS is less than STC techniques by the factor fexp.
(e.g., by a factor of less than 5%). This disadvantage of BPS is very minimal
with respect to advantages of BPS techniques as discussed in this paper.
SIMULATIONS
• Signal power to noise power ratio is SNR = 10dB for STBC and
SNR = 12dB for STTC.
(20)
Here, AF(t, φc) is the array factor introduced in (14), nl(t) is an additive
white Gaussian noise (AWGN), which is considered independent for
different time slots (l), bc,k∈{+1,-1} is the cth cell’s kth user’s
transmitted bit, is the Hadamard-Walsh spreading code for kth user and
nth sub-carrier in the cth cell, ψcn is the long code of the nth sub-carrier for
cth cell, is the Rayleigh fade amplitude on the nth sub-carrier in the lth
time slot in the cth cell and is its phase (which is assumed to be tracked
and removed). is assumed independent over time components, l, and
correlated over frequency components, n [14]. Kc represents the number of
users effectively interfere with the desired user.
In the neighboring cells, these users are located at the antenna pattern
(sector) with directions shown in Figure 7. Considering assumptions (f) and
(i)
166 Information and Coding Theory in Computer Science
(21)
when E(·) denotes the expectation and K is the number of users available
in each cell. In (20) 1/(Rc)a represents the long-term path loss of the signal
received by the mobile (MS) in the cell 0. This signal is transmitted by
the BS of neighboring cells to the users located in those cells, and in the
directions which interfere with the intended mobile (see Figure 7).
In Figure 7, D is the cell radius. Assuming the intended mobile is located at
and approximating the coverage area by a triangle, represents
the approximate center of mass of users in the beam pattern coverage area.
Rc represents the distance between the BS of the cell c, c ∈ {0,1,2,…6}, and
the intended mobile in the cell 0. From the geometry in Figure 7, vector R
formed by the elements Rc, c ∈ {0,1,2,…6}, corresponds to [12]
(22)
where R0 is normalized to one and the others are normalized with respect to
this value.
In (20), the power factor a is a function of user location, BS antenna
height and environment. Considering urban areas, parameter a changes with
the carrier frequency and BS antenna height. In urban areas, a = 1, if Rc <
Dmax, and a = 2 if Rc > Dmax, where Dmax = D(fo,ha), (Dmax is a function of the
carrier freuquency fo and antenna height ha). Considering fo = 900MHz, and
BS height, ha > 25m, Dmax ≈ 1000m (see [15]). Assuming a cell of radius D ≈
500m, and by referring to [15], we find that a = 2 for cells 1, 2 and 6 whereas
a = 1 for cells 3, 4 and 5. Thus, in the simulations we ignore the interference
from cells 1, 2 and 6 and only consider inter-cell interference from cells 3, 4
and 5 with a little loss in accuracy.
With the model introduced in (20), the received STBC/MC-CDMA
signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 167
(23)
where bc,k[i] and bc,k[i+1], i ∈ {0,2,4,…} is the kth user ith information bit in
the cth cell for STBC, and are the Rayleigh fade amplitude due to
antenna 0 and antenna 1 in the nth sub-carrier in the cth cell and and
are their phase, respectively, is the HadamardWalsh spreading code for
k user and n subcarrier, ψc is the long code of the nth sub-carrier in the
th th n
cth cell, 1/(Rc)a characterizes the long-term path loss and n(t) is an additive
white Gaussian noise (AWGN).
Figure 8(a) represents network capacity simulation results generated
considering MRC across time components in BPS and across space
components in STBC (see [3] and [4]) and EGC across frequency components
in both BPS and STBC. It is observed that a higher network capacity is
achievable with BPS/MC-CDMA. For example, at the probability-of-error
of 10-2 BPS/MC-CDMA offers up to two-fold higher capacity. It is also
observed that STBC/MC-CDMA offers a better performance compared
to the traditional MC-CDMA without diversity when the number of users
in the cell are less than 80. However, as the number of users in the cell
increases beyond 80, the performance of STBC/MC-CDMA becomes even
worse than traditional MC-CDMA (i.e., MC-CDMA with antenna array but
without diversity benefits). This is because STBC scheme discussed in this
paper (see [1]) is designed to utilize MRC. It has been shown that MRC
combining scheme is the optimal combining scheme when there is only one
user available, while in a Multiple Access environment, MRC enhances the
Multiple Access Interference (MAI) and therefore degrades the performance
of the system [16].
168 Information and Coding Theory in Computer Science
Figure 8. Capacity performance (a) STBC and BPS, and (b) STTC and BPS.
Considering STTC-QPSK, with assumption (d), STTC-QPSK/MC-
CDMA received signal corresponds to
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 169
(24)
Here s0,k and s1,k is the k user information bit transmitted from antenna
th
CONCLUSIONS
A comparison was preformed between STBC, STTCQPSK and BPS transmit
diversity techniques in terms of network capacity, BER/FER performance,
spectrum efficiency, complexity and antenna dimensions. BER performance
and network capacity simulations are generated BPS, STBC, and STTC
schemes. This comparison shows that BPS transmit diversity scheme is
much superior compared to both STBC and STTC-QPSK schemes: a) The
BS physical antenna dimensions of BPS is much smaller than that of STC
techniques, and b) The BER/FER performance and network capacity of BPS
is much higher than that of STC schemes. The complexity of BPS system
is minimal because the complexity is mainly located at the BS, and the
receiver complexity is low because all the diversity components enter the
receiver serially in time. In terms of spectrum efficiency, both STC schemes
outperform BPS scheme by a very small percentage (e.g., in the order
of 5%). BPS scheme introduces a small bandwidth expansion due to the
movement in the beam pattern that eventually results in a lower throughput
per bandwidth.
Beam Pattern Scanning (BPS) versus Space-Time Block Coding ... 171
REFERENCES
1. S. M. Alamouti, “A simple transmit diversity technique for wireless
communications,” IEEE Journal on Selected areas in Communications,
Vol. 16, No. 8, pp. 1451–1458, 1998.
2. V. Tarokh, H. Jafarkhani, and A. R. Calderbank, “Space-time block
codes from orthogonal designs,” IEEE Transactions on Information
Theory, Vol. 45, No. 5, pp. 1456–1467, July 1999.
3. V. Tarokh, N. Seshadri, and A. R. Calderbank, “Space-time codes for
high data rate wireless communication: Performance criterion and code
construction,” IEEE Transactions on Information Theory, Vol. 44, pp.
744–765, March 1998.
4. V. Tarokh, A. F. Naguib, N. Seshadri, and A. Calderbank, “Space-
time codes for high data rate wireless communications: Performance
criteria in the presence of channel estimation errors, mobility, and
multiple paths,” IEEE Transactions on Communications, Vol. 47, No.
2, February 1999.
5. A. F. Naguib, V. Tarokh, N. Seshadri, and A. R Calderbank, “A space-
time coding modem for high-data-rate wireless communications,”
IEEE Journal on Selected Areas in Communications, Vol. 16, No. 8,
October 1998.
6. N. Seshadri and J. H. Winters, “Two signaling schemes for improving
the error performance of frequency division-duplex transmission
system using transmitter antenna diversity,” International Journal
Wireless Information Networks, Vol. 1, No. 1, pp. 49–60, January
1994.
7. J. H. Winters, “The diversity gain of transmit diversity in wireless
systems with Rayleigh fading,” in Proceedings of the 1994 ICC/
SUPERCOMM, New Orleans, Vol. 2, pp. 1121–1125, May 1994.
8. R. W. Heath, S. Sandhu, and A. J. Paulraj, “Space-time block coding
versus space-time trellis codes,” Proceedings of IEEE International
Conference on Communications, Helsinki, Finland, June 11–14, 2001.
9. V. Tarokh, A. Naguib, N. Seshadri, and A. R. Calderbank, “Combined
array processing and space-time coding,” IEEE Transactions on
Information Theory, Vol. 45, No. 4, pp. 1121–1128, May 1999.
10. S. A. Zekavat and C. R. Nassar, “Antenna arrays with oscillating
beam patterns: Characterization of transmit diversity using semi-
elliptic coverage geometric-based stochastic channel modeling,” IEEE
172 Information and Coding Theory in Computer Science
PARTIAL FEEDBACK
BASED ORTHOGONAL
SPACE-TIME BLOCK
8
CODING WITH FLEXIBLE
FEEDBACK BITS
ABSTRACT
The conventional orthogonal space-time block code (OSTBC) with limited
feedback has fixed p-1 feedback bits for the specific nTp transmit antennas. A
new partial feedback based OSTBC which provides flexible feedback bits is
proposed in this paper. The proposed scheme inherits the properties of having
a simple decoder and the full diversity of OSTBC, moreover, preserves
full data rate. Simulation results show that for nTp transmit antennas, the
proposed scheme has the similar performance with the conventional one
Citation: Wang, L. and Chen, Z. (2013), “Partial Feedback Based Orthogonal Space-
Time Block Coding With Flexible Feedback Bits”. Communications and Network, 5,
127-131. doi: 10.4236/cn.2013.53B2024.
Copyright: © 2013 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
176 Information and Coding Theory in Computer Science
by using p-1 feedback bits, whereas has the better performance with more
feedback bits.
INTRODUCTION
Orthogonal space-time block coding (OSTBC) is a simple and effective
transmission paradigm for MIMO system, due to achieving full diversity
with low complexity [1]. One of the most effective OSTBC schemes is the
Alamouti code [2] for two transmit antennas, which has been adopted as the
open-loop transmit diversity scheme by current 3GPP standards. However,
the Alamouti code is the only rate-one OSTBC scheme [3]. With higher
number of transmit antennas, the OSTBC for complex constellations will
suffer the rate loss.
Focusing on this drawback, the open-loop solutions have been presented,
such as the quasi-OSTBC (QOSTBC) [4] with rate one for four transmit
antennas, and other STBC schemes [5,6] with full rate and full diversity.
Alternatively, the close-loop solutions have also been designed to improve
the performance of OSTBC by exploiting limited channel information
feedback at the transmitter. In this paper, we focus on the close-loop scheme.
Based the group-coherent code, the nTp bits feedback based OSTBC
for p-1 transmit antennas has been constructed in [7], and generalized to
an arbitrary number of receive antennas in [8]. The partial feedback based
schemes in [7,8] exhibit a higher diversity order while preserving low
decoding complexity. However, these schemes for nTp transmit antennas
require a fixed number of p−1 bits feedback. That is to say, for such scheme,
improving the performance by increasing the feedback bits implies that
the number of transmit antennas nTp must be increased at the same time.
Therefore, the scheme is inflexible in compromising the performance and
the feedback overhead.
In this paper, by multiplying a well-designed feedback vector to each
signal to be transmitted from each antenna, we propose a novel partial
feedback based OSTBC scheme with flexible feedback bits. In this
scheme, the OSTBC can be straightly extended to more than two antennas.
Importantly, we can show that the proposed scheme preserves the simple
decoding structure of OSTBC, full diversity and full data rate.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 177
Notations: Throughout this paper, (⋅) T and (⋅)H represent “transpose” and
“Hermition”, respectively. Re(a) denotes the real part of a complex a, and
.
(1)
where is the 1 × nTp feedback vector for the mth antenna, which is defined
as where ⊗ denotes the Kronecker product, is the mth
row of the identity matrix , and 1 × p vector bl is given by
(2)
where . For the feedback vector at the mth
antenna, it contains a subset of all possible Qp−1 feedback vectors i.e.,
.
With the transmission of T × nTp code matrix , the T × nR receive signal
can be written as
(3)
where is the nTp × nR channel matrix, and
is the T × nR complex Gaussian noise matrix. The entries of H and N are
independent samples of a zero-mean complex Gaussian random variable
with variance 1 and nTp/ρ respectively, where ρ is the average signal-to-
noise ratio (SNR) at each receive antenna.
178 Information and Coding Theory in Computer Science
(4)
where the nT × nTp matrix bl is composed of nT feedback vectors, and can be
expressed in a stacked form given by
(5)
(6)
For convenience, we will use the Alamouti code as the basic OSTBC
matrix in the rest of this paper, and the results can be straightly
extended to other OSTBC. For the received signal in (4), After performing
the conjugate operation to the second entry of yi, the received signal yi can
be equivalently expressed as
(7)
where is the equivalent channel matrix corresponding to the entries of
and their conjugates, and has a pair of symbols in the Ala-
(8)
where the matrices Ck and Dk specifying the Alamouti code are defined in
[9]. Since matched filtering is the first step in the detection process, left-
multiplying by will yield
(9)
where . Due to the properties of Ck and Dk for the Alamouti
code, we get
(10)
where denotes the equivalent channel gain for receive antenna i. It is
clear that is a diagonal matrix, therefore, the simple decoder of OSTBC
can be straightly applied for (7), thus s1 and s2 can be decoded independently.
(11)
where
and
180 Information and Coding Theory in Computer Science
For all the nR receive antennas, then the total channel gain is given by
(12)
It is clear that in order to improve the system performance, we
must feedback the specific l with (p−1)logQ bits, which provides the
(13)
where gik(n) denotes the nth element in gik, and
(14)
Thus, the (p−1)logQ feedback bits will be selected as
(15)
In this way, we can choose the optimal feedback vector bl, further
construct for the mth transmit antenna.
Diversity Analysis
The key property of the proposed partial feedback based OSTBC scheme is
proved in the following.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 181
(16)
(17)
(18)
(19)
Since the lower bound of the channel gain provides full diversity of
nTpnR the proposed scheme can certainly guarantee the full diversity.
182 Information and Coding Theory in Computer Science
Furthermore, the proposed scheme has the flexible feedback bits. For
a specific p, has the feedback bits of (p−1)logQ. However, for the
number that not equal to (p−1)logQ, we can rewrite the vector bl in (2) as
BER Analysis
Assuming the power of each symbol in x = [ s1 s2]T is normalized to unity,
i.e., for i = 1, 2, we can obtain the average SNR per bit has
the form of
(20)
By using (16), the upper bound of the conditional BER can be formulated
as
(21)
Using the technique of Moment Generating Function (MGF)[10], the
average BER can be expressed as
(22)
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 183
(23)
Using the result of (5A.4) in [10], this definite integrals has the closed-
form of
(24)
where
SIMULATION RESULTS
In all simulations, we consider QPSK symbols in Alamouti code, and
a single receive antenna with nR = 1, where the channels are assumed to
be independent and identically distributed (i.i.d.) quasi-static Rayleigh
flatfading channels. In Figure 1, we plot the bit error rate (BER) performance
of the generalized partial feedback based OSTBC scheme in [7,8] (“GPF”
for short ) and the proposed flexible feedback bits scheme (“FFB” for short)
with nTp = 4 transmit antennas. For this case p = 2, and the GPF scheme can
only feedback 1 bit, whereas the proposed scheme can feedback more bits to
improve the system performance. For comparison, in Figure 1 we also give
the BER figures of the complex orthogonal code for four transmit antennas
[11], and the numerical results of the upper bound in (24) of the proposed
scheme. Figure 1 shows that with 1 bit feedback, the GPF and FFB schemes
have close performance, whereas the FFB scheme has better performance
with more feedback bits. In comparison to the complex orthogonal code,
both two schemes have better performance.
184 Information and Coding Theory in Computer Science
Figure 1. BER performance of the two schemes with nTp = 4 transmit antennas.
In Figure 2, the BER performance of the two schemes with nTp = 8
transmit antennas is depicted. For this case p = 4, and the GPF scheme can
only feedback 3 bits, whereas the proposed FFB scheme can feedback more
bits. We can observe that with the same feedback bits 3, the two schemes
have very similar performance, and with more feedback bits, the proposed
FFB scheme can further improve the performance. In the simulations of
these two schemes, the exhaustive search over all possible feedback vectors
is used.
Figure 2. BER performance of the two schemes with nTp = 8 transmit antennas.
Partial Feedback Based Orthogonal Space-Time Block Coding With ... 185
CONCLUSIONS
In this paper, we proposed a partial feedback based OSTBC scheme with
flexible feedback bits. The new scheme inherits the OSTBC properties of
achieving full diversity, preserving low decoding complexity, and has full
rate. Moreover, compared with the conventional partial feedback based
OSTBC schemes, the new scheme can support flexible feedback bits and
can improve the system performance with more feedback bits.
186 Information and Coding Theory in Computer Science
REFERENCES
1. V. Tarokh, H. Jafarkhani and A. R. Calderbank, “Space-time Block
Codes from Orthogonal Designs,” IEEE Transactions on Information
Theory, Vol. 45, No. 5, 1999, pp. 1456-1467. doi:10.1109/18.771146
2. S. M. Alamouti, “A Simple Transmitter Diversity Scheme for Wireless
Communications,” IEEE Journal on Selected Areas in Communications,
Vol. 16, No. 8, 1998, pp. 1451-1458. doi:10.1109/49.730453
3. S. Sandhu and A. J. Paulraj, “Space-time Block Codes: A Capacity
Perspective,” IEEE, Communications Letters, Vol. 4, No. 12, 2000, pp.
384-386. doi:10.1109/4234.898716
4. H. Jafarkhani, “A Quasi-orthogonal Space-time Block Code,” IEEE
Transactions on Communications, Vol. 49, No. 1, 2001, pp. 1-4.
doi:10.1109/26.898239
5. W. Su and X. G. Xia, “Signal Constellations for Quasi- orthogonal
Space-time Block Codes with Full Diversity,” IEEE Transactions
on Information Theory, Vol. 50, 2004, pp. 2331-2347. doi:10.1109/
TIT.2004.834740
6. X. L. Ma and G. B. Giannakis, “Full-diversity Full-rate Complex-field
Space-time Coding,” IEEE Transactions on Signal Processing, Vol. 51,
No. 11, 2003, pp. 2917-2930. doi:10.1109/TSP.2003.818206
7. J. Akhtar and D. Gesbert, “Extending Orthogonal Block Codes with
Partial Feedback,” IEEE Transactions on Wireless Communications,
Vol. 3, No. 6, 2004, pp. 1959-1962. doi:10.1109/TWC.2004.837469
8. A. Sezgin, G. Altay and A. Paulraj, “Generalized Partial Feedback
Based Orthogonal Space-time Block Coding,” IEEE Transactions
on Wireless Communications, Vol. 8, No. 6, 2009, pp. 2771-2775.
doi:10.1109/TWC.2009.080352
9. B. Hassibi and B. M. Hochwald, “High-rate Codes That Are Linear in
Space and Time,” IEEE Transactions on Information Theory, Vol. 48,
No. 7, 2002, pp. 1804-1824. doi:10.1109/TIT.2002.1013127
10. M. K. Simon and M. S. Alouini, “Digital Communication over Fading
Channels,” John Wiley & Sons Inc., 2000.
11. G. Ganeon and P. Stoica, “Space-time Block Codes: A Maximum SNR
Approach,” IEEE Transactions Informations Theory, Vol. 47, No. 4,
2001, pp. 1650-1656. doi:10.1109/18.923754
Chapter
RATELESS SPACE-TIME
BLOCK CODES FOR 5G
WIRELESS COMMUNICA-
9
TION SYSTEMS
Ali Alqahtani
College of Applied Engineering, King Saud University, Riyadh, Saudi Arabia
ABSTRACT
This chapter presents a rateless space-time block code (RSTBC) for massive
multiple-input multiple-output (MIMO) wireless communication systems.
We discuss the principles of rateless coding compared to the fixed-rate
channel codes. A literature review of rateless codes (RCs) is also addressed.
Furthermore, the chapter illustrates the basis of RSTBC deployments
in massive MIMO transmissions over lossy wireless channels. In such
channels, data may be lost or are not decodable at the receiver end due to
a variety of factors such as channel losses or pilot contamination. Massive
Citation: Ali Alqahtani, “Rateless Space-Time Block Codes for 5G Wireless Commu-
nication Systems”, Intech Open - The Fifth Generation (5G) of Wireless Communica-
tion, 2018, DOI: 10.5772/intechopen.74561.
Copyright: © 2018 the Author(s) and IntechOpen. This chapter is distributed under the
terms of the Creative Commons Attribution 3.0 License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly
cited.
188 Information and Coding Theory in Computer Science
INTRODUCTION
In practice, the transmitted data over the channel are usually affected by
noise, interference, and fading. Several channel models, such as additive
white Gaussian noise (AWGN), binary symmetrical channel (BSC), binary
erasure channel (BECH), wireless fading channel, and lossy (or erasure)
channel, are introduced in which errors’ (or losses) control technique is
required to reduce the errors (or losses) caused by such channel impairments
[1].
This technique is called channel coding, which is a main part of the
digital communication theory. Historical perspective on channel coding is
given in [2]. Generally speaking, channel coding, characterized by a code
rate, is designed by controlled-adding redundancy to the data to detect and/
or correct errors and, hence, achieve reliable delivery of digital data over
unreliable communication channels. Error correction may generally be
realized in two different error control techniques, namely: forward error
correction (FEC), and backward error correction (BEC). The former omits
the need for the data retransmission process, while the latter is widely known
as automatic repeat request (or sometimes automatic retransmission query)
(ARQ).
For large size of data, a large number of errors will occur, and thereby, it is
difficult for FEC to work reasonably. The ARQ technique, in such conditions,
requires more retransmission processes, which will cause significant growth
in power consumption. However, these processes will sustain additional
overhead that includes data retransmission and adding redundancy into the
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 189
original data. They cannot correctly decode the source data when the packet
loss rate is high [3]. Therefore, it is of significant interest to design a simple
channel coding with a flexible code rate and capacity approaching behavior
to achieve robust and reliable transmission over universal lossy channels.
Rateless codes constitute such a class of schemes. We describe the concept
of rateless coding in the next section.
Figure 2. Automatic repeat request (ARQ) [6]. (a) Stop and wait ARQ (half
duplex); (b) continuous ARQ with pullback (full duplex); (c) continuous ARQ
with selective repeat (full duplex).
These protocols reside in the data link or transport layers of the open
systems interconnection (OSI) model. This is one difference between the
192 Information and Coding Theory in Computer Science
to the strong theory behind these codes mostly for erasure channels. Most
of the available works in the rateless codes literature are extensions of the
fountain codes over the erasure channels [10]. The name “fountain” came
from the analogy to a water supply capable of giving an unlimited number of
water drops. Due to this reason, rateless codes are also referred to as fountain
codes. They were initially developed to achieve efficient transmission in
erasure channels, to which the initial work on rateless codes has mainly been
limited, with the primary application in multimedia video streaming [10].
The first practical class of rateless codes is the Luby Transform (LT)
code which was originally intended to be used for recovering packets
that are lost (erased) during transmission over computer networks. The
fundamentals of LT are introduced in [11] in which the code is rateless
since the number of encoded packets that can be generated from the original
packets is potentially limitless. Figure 3 illustrates the block diagram of LT
encoder. The original packets can be generated from slightly larger encoded
packets. Although the encoding process of LT is quite simple, however, LT
requires the proper design of the degree distribution (Soliton distribution-
based) which significantly affects the performance of the code.
simple rateless codes that are DMT optimal for a SISO channel have also been
examined. However, [24] considered the whole MIMO channel as parallel
sub-channels, in which each sub-channel is a MIMO channel. Furthermore,
for each block, the code construction of symbols within the redundant
block is not discussed. Hence, more investigation of other performance
metrics for the scheme proposed in [24] under different channel scenarios is
required. In [27], a cognitive radio network employs rateless coding along
with queuing theory to maximize the capacity of the secondary user while
achieving primary users’ delay requirement. Furthermore, [28] presents a
novel framework of opportunistic beam-forming employing rateless code
in multiuser MIMO downlink to provide faster and high quality of service
(QoS) wireless services.
station (BS), T is the number of time slots, and L is the number of required
blocks at the receiver to recover the transmitted block. Let ml denote the
measured mutual information after receiving the codeword block . If ml
≤ M, the receiver waits for further blocks, else if ml > M, the receiver sends
a simple feedback to the transmitter to stop transmitting the remaining part
of the encoded packets and remove them from the BS buffer. This process
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 199
frequency resources where each user device has Nr receive antennas. The
overall channel matrix can be written as
(2)
where is the channel matrix corresponding to
the kth user. To eliminate the effects of the multiuser interference (MUI) at
the specific receiving users, a precoding technique is applied at the BSTx
(3)
where β is a normalized factor.
In this system, channel reciprocity is exploited to estimate the downlink
channels via uplink training, as the resulting overhead is linearly a function
of the number of users rather than the number of BS antennas [46].
For a single-cell MU-MIMO system, the received signal at the kth user at
time instant t can be expressed as
(4)
where corresponds to the average SNR per user (Ex is the symbol
energy, and 0 is the noise power at the receiver); L is the maximum number
of required blocks of RSTBC at the user; is
channel coefficient from the nth transmit antenna to the kth user;
which is the (n, l)th element of the matrix Dl; and wk is the noise at the kth
user receiver.
It has been demonstrated in [42, 43, 44, 45] that RSTBC is able to
compensate for data losses. For more details, the reader is referred to these
references. Here are some sample simulation results. The averaged symbol-
error-rate (SER) performance when RSTBC is applied for Nt = 100 with
QPSK is shown in Figure 8, where the loss rate is assumed to be 25%.
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 201
Figure 8. SER curves for massive MU-MIMO system with 25%-rate loss and
Nt = 100, K = 10 users, with QPSK, when RSTBC is applied.
It is inferred from Figure 8 that for small values of L, the averaged
SER approaches a fixed level at high SNR because RSTBC, with the
current number of blocks, is no longer able to compensate for further losses.
Therefore, it is required to increase L to achieve enhancements until losses
effects are eliminated. As shown, for instance, RSTBC with L = 32, the
flooring in the SER curves has vanished due to the diversity gain achieved
by RSTBC (as the slopes of the SER curves increase) so that the effect of
losses is eliminated considerably. Thus, the potential for employing RSTBC
to combat losses in massive MU-MIMO systems has been shown.
Furthermore, Figure 9 shows the cumulative distribution function (CDF)
of the averaged downlink SINR (in dB) in the target cell for simulation
and analytical results for a multi-cell massive MU-MIMO system with Nt
= 100, K = 10 users, QPSK, and pilot reuse factor = 3/7, when RSTBC
is applied with L = 4, 8, 16, 32, where lossy channel of 25% loss rate is
assumed. Notably, RSTBC supports the system to alleviate the effects of
pilot contamination by increasing the downlink SINR. Simulation and
analytical results show good matching as seen. Also, it is obvious that the
improvements in SINR are linear functions of the number of RSTBC blocks
L. It should be mentioned that the simulation parameters are tabulated in
Table 1.
202 Information and Coding Theory in Computer Science
Parameter Value
Cell radius 500 m
Reference distance from the BS 100 m
Path loss exponent 3.8
Carrier frequency 28 GHz
Shadow fading standard deviation 8 dB
CONCLUSION
In this chapter, we have considered the rateless space-time block code
(RSTBC) for massive MIMO wireless communication systems. Unlike
the fixed-rate codes, RSTBC adapts the amount of redundancy over time
and space for transmitting a message based on the instantaneous channel
conditions. RSTBC can be used to protect data transmission in lossy systems
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 203
and still guarantee the reliability of the regime when transmitting big data.
It is concluded that, using RSTBC with very large MIMO dimensions, it
is possible to recover the original data from a certain amount of encoded
data even when the losses are high. Moreover, RSTBC can be employed
in a multi-cell massive MIMO system at the BS to mitigate the downlink
inter-cell interference (resulting from pilot contamination) by improving
the downlink SINR. These results strongly introduce the RSTBC for the
upcoming 5G wireless communication systems.
204 Information and Coding Theory in Computer Science
REFERENCES
1. Abdullah A, Abbasi M, Fisal N. Review of rateless-network-
coding based packet protection in wireless sensor networks. Mobile
Information Systems. 2015;2015:1-13
2. Liew T, Hanzo L. Space–time codes and concatenated channel codes for
wireless communications. Proceedings of the IEEE. 2002;90(2):187-
219
3. Huang J-W, Yang K-C, Hsieh H-Y, Wang J-S. Transmission control
for fast recovery of rateless codes. International Journal of Advanced
Computer Science and Applications (IJACSA). 2013;4(3):26-30
4. Bonello N, Yang Y, Aissa S, Hanzo L. Myths and realities of rateless
coding. IEEE Communications Magazine. 2011;49(8):143-151
5. Bonello N, Zhang R, Chen S, Hanzo L. Reconfigurable rateless codes.
In: IEEE 69th Vehicular Technology Conference, 2009, VTC Spring
2009; IEEE. 2009, pp. 1-5
6. Bernard S. Digital Communications Fundamentals and Applications.
USA: Prentice Hall; 2001
7. Mehran F, Nikitopoulos K, Xiao P, Chen Q. Rateless wireless systems:
Gains, approaches, and challenges. In: 2015 IEEE China Summit and
International Conference on Signal and Information Processing (Chi-
naSIP). IEEE; 2015. pp. 751-755
8. Wang X, Chen W, Cao Z. ARQ versus rateless coding: from a point
of view of redundancy. In: 2012 IEEE International Conference on
Communications (ICC); IEEE. 2012. pp. 3931-3935
9. Wang P. Finite length analysis of rateless codes and their application
in wireless networks [PhD dissertation]. University of Sydney; 2015
10. Byers JW, Luby M, Mitzenmacher M, Rege A. A digital fountain
approach to reliable distribution of bulk data. ACM SIGCOMM
Computer Communication Review. 1998;28(4):56-67
11. Luby M. LT codes. In: The 43rd Annual IEEE Symposium on
Foundations of Computer Science, 2002. Proceedings. 2002. pp. 271-
280
12. Shokrollahi A. Raptor codes. IEEE Transactions on Information
Theory. 2006;52(6):2551-2567
13. Maymounkov P. Online codes. Technical report. New York University;
2002
Rateless Space-Time Block Codes for 5G Wireless Communication Systems 205
LOSSLESS IMAGE
COMPRESSION
TECHNIQUE USING
10
COMBINATION METHODS
ABSTRACT
The development of multimedia and digital imaging has led to high quantity
of data required to represent modern imagery. This requires large disk space
for storage, and long time for transmission over computer networks, and
these two are relatively expensive. These factors prove the need for images
INTRODUCTION
Image applications are widely used, driven by recent advances in the
technology and breakthroughs in the price and performance of the hardware
and the firmware. This leads to an enormous increase in the storage space
and the transmitting time required for images. This emphasizes the need to
provide efficient and effective image compression techniques.
In this paper we provide a method which is capable of compressing
images without degrading its quality. This is achieved through minimizing
the number of bits required to represent each pixel. This, in return, reduces
the amount of memory required to store images and facilitates transmitting
image in less time.
Image compression techniques fall into two categories: lossless or lossy
image compression. Choosing which of these two categories depends on the
application and on the compression degree required [1,2].
Lossless image compression is used to compress images in critical
applications as it allows the exact original image to be reconstructed from
the compressed one without any loss of the image data. Lossy image
compression, on the other hand, suffers from the loss of some data. Thus,
repeatedly compressing and decompressing an image results in poor
Lossless Image Compression Technique Using Combination Methods 213
LITERATURE REVIEW
A large number of data compression algorithms have been developed and
used throughout the years. Some of which are of general use, i.e., can be
used to compress files of different types (e.g., text files, image files, video
files, etc.). Others are developed to compress efficiently a particular type of
files. It has been realized that, according to the representation form of the
data at which the compression process is performed, below is reviewing
some of the literature review in this field.
In [10], the authors present lossless image compression with four modular
components: pixel sequence, prediction, error modeling, and coding. They
used two methods that clearly separate the four modular components.
These method are called Multi-Level Progressive Method (MLP), and
Partial Precision Matching Method (PPMM) for lossless compression, both
involving linear predictions, modeling prediction errors by estimating the
variance of a Laplace distribution (symmetric exponential), and coding
using arithmetic coding applied to pre-computed distributions [10].
In [11], a composite modeling method (hybrid compression algorithm
for binary image) is used to reduce the number of data coded by arithmetic
coding, which code the uniform areas with less computation and apply
arithmetic coding to the areas. The image block is classified into three
categories: all-white, all-black, and mixed, then image processed 16 rows at
a time, which is then operated by two global and local stages [11].
In [12], the authors propose an algorithm that works by applying a
reversible transformation on the fourteen commonly used files of the
Calgary Compression Corpus. It does not process its input sequentially, but
instead processes a block of texts as a single unit, to form a new block
that contains the same characters, but is easier to compress by simple
compression algorithms, group characters together based on their contexts.
This technique makes use of the context on only one side of each character
so that the probability of finding a character closer to another instance of the
same character is increased substantially. The transformation does not itself
compress the data, but reorder it to make it easy to compress with simple
algorithms such as move-to-front coding in combination with Huffman or
arithmetic coding [12].
Lossless Image Compression Technique Using Combination Methods 215
median edge detector (MED) to reduce the entropy rate of f. The gray levels
of two adjacent pixels in an image are usually similar. A base-switching
transformation approach is then used to reduce the spatial redundancy of the
image. The gray levels of some pixels in an image are more common than
those of others. Finally, the arithmetic encoding method is applied to reduce
the coding redundancy of the image [17].
In [18], the authors used a lossless method of image compression
and decompression is proposed. It uses a simple coding technique
called Huffman coding. A software algorithm has been developed and
implemented to compress and decompress the given image using Huffman
coding techniques in a MATLAB platform. They concern with compressing
images by reducing the number of bits per pixel required to represent it, and
to decrease the transmission time for images transmission. The image is
reconstructed back by decoding it using Huffman codes [18].
This paper uses the adaptive bit-level text compression schema based on
humming code data compression used in [19]. Our schema consists of six
steps repeated to increase image compression rate. The compression ratio is
found by multiplying the compression ratio for each loop, and are referred
to this schema by HCDC (K) where (K) represents the number of repetition
[19].
In [20], the authors presented a lossless image compression based on
BCH combined with Huffman algorithm [20].
system in this paper benefit from this feature. This method of BCH convert
blocks of size k to n by adding parity bits, depending on the size of the
message k, which is encoded into a code word of the length n. The proposed
method is shown below in Figure 1.
Lempel-Ziv-Welch (LZW)
The compression system improves the compression of the image through
the implementation of LZW algorithm. First, the entered image is converted
to the gray scale and then converted from decimal to binary to be a suitable
form to be compressed. The algorithm builds a data dictionary (also called
a translation table or string table) of data occurring in an uncompressed data
stream. Patterns of data are identified in the data stream and are matched
to entries in the dictionary. If the patterns are not present in the dictionary,
a code phrase is created based on the data content of that pattern, and it is
stored in the dictionary. The phrase is then written to the compressed output
stream. When a reoccurrence of a pattern is identified in the data, the phrase
of the pattern already stored in the dictionary is written to the output.
it is a valid codeword or not. The BCH decoder converts the valid block to
4 bits. The proposed method adds 1 as an indicator for the valid codeword
to an extra file called (map), otherwise if it is not a codeword, it remains 7
and adds 0 to the same file. The benefit of the extra file (map) is that it is
used as the key for image decompression in order to distinguish between
compressed blocks and the not compressed ones (codeword or not).
After the image is compressed, the file (map) is compressed by RLE to
decrease its size, and then it is attached to the header of the image. This step
is iterated three times, the BCH decoding repeat three times to improve the
compression ratio; we stopped repeating this algorithm at three times after
done experiment; conclude that if we try to decode more it will affect the
other performance factor that leads to increase time needed for compression,
and the map file becomes large in each time we decode by BCH so it leads
to the problem of increase the size of image, which opposes the objective of
this paper to reduce the image size. Below is an example of the compressed
image:
Example
Next is an example of the proposed system compression stage. In this example
a segment of the image is demonstrated using the proposed algorithm. First
of all it converts the decimal values into binary, compresses it by LZW and
then divides it into blocks of 7 bits
A = Original
After dividing the image into blocks of 7 bits, the system implements
the BCH code that checks each block if it is a codeword or not by matching
the block with 16 standards codeword in the BCH. The first iteration shows
that we found four codewords. This block is compressed by using BCH
algorithm which is converted to blocks of 4 bit each.
Set N = 7, k = 4
WHILE (there is a codeword) and (round ≤3)
xxx = the size of the (Out 1)
remd = matrix size mod N;
div = matrix size /N;
FOR i = 1 to xxx-remd step N
FOR R = i to i + (N ‒ 1)
divide the image into blocks of size 7 save into parameter msg = out 1
[R]
END FOR R
c2 = convert (msg) to Galoris field;
origin = c2
d2 = decoding by BCH decoder (bchdec (c2, n, k,))
c2 = Encode by BCH encoder for test bchenc (d2, n, k)
IF (c2 == origin) THEN // original message parameter
INCREMENT the parameter test (the number of codeword found) by 1;
add the compressed block d2 to the matrix CmprsImg
add 1 to the map[round] matrix
ELSE
add the original block (origin) to the matrix CmprsImg
add 0 to the map[round] matrix
ENDIF
END FOR i
Pad and Add remd bits to the matrix CmprsImg and encode it
Final map file = map [round] to reuse map file in the iteration
FOR stp = 1 to 3
Compress map by RLE encoder and put in parameter map_RLE [stp] =
RLE (map [stp])
END FOR stp
INCREMENT round by 1
ENDWHILE
END
Lossless Image Compression Technique Using Combination Methods 221
Decompression
It is reversible steps to the compression stage to reconstruct the images. At
first the system decompress the attach file (map) by RLE decoder because
we depend on its values to know which block in the compress image is a
code word to be decompressed. That means if the value of the map file is
1, then it reads 4 bit block from the compressed image which means it’s a
codeword then decompressed by BCH encoder. If the value is 0, it reads
the 7 bit block from the compressed image which means that it is not a
codeword. This operation is repeated three times, after that the output from
BCH is decompressed using LZW algorithm. The below example explains
these steps.
Example
Read the map file after decompressing it by RLE algorithm.
Depending on the value of the map file, in the positions 4 and 7, the value
is 1 which means that the system will read 4 bit from the compressed image.
This means that a codeword and by compressing it by BCH, it reconstructs
7 bit from 16 codewords valid in BCH that match it, and the remained value
of the file is 0. This mean it a non codeword reads 7 bit from the compressed
image. The compressed image is:
(q) Is the number of bit represent each pixel in uncompressed image, (S0) the
size of the original data and (Sc) the size of the compressed data.
Also compare it with the standards of the compression technique, and
finally explain the test (the codeword found in image) comparing it with the
original size of the image in bit.
The above results show that the compression by the proposed system
is the best compared to the results of compressing the image by using RLE
algorithm, LZW algorithm or Huffman algorithm.
Here in Figure 4, we illustrate the comparison based on compression
ratio between the proposed algorithm (BCH and LZW) and the standard
image compression algorithms (RLE, Huffman and LZW) which can be
distinguished by color. And Figure 5 explains the size of original image
compared with image after compressed by the standard image compression
algorithm and the proposed method. Table 2 shows the results of the
compression based on bit per pixel rate for the proposed method, and the
standards compression algorithm.
228 Information and Coding Theory in Computer Science
Figure 4. Comparing the proposed method with (RLE, LZW and Huffman)
based on compression ratio.
Figure 5. Comparing the proposed method with (RLE, LZW and Huffman)
based on image size.
Figure 6 explains the result of the above Table 2.
Figure 6. Comparing the proposed method with (RLE, LZW and Huffman)
based on bit per pixel.
Lossless Image Compression Technique Using Combination Methods 229
Discussion
In this section we show the efficiency of the proposed system which
uses MatLab to implement the algorithm. In order to demonstrate the
compression performance of the proposed method, we compared it with
some representative lossless image compression techniques on the set of
ISO test images that were made available to the proposer that were shown in
the first column in all tables.
Table 1 lists compression ratio results of the tested images which
calculated depend on size of original image to the size of image after
compression; the second column of this table lists the compression ratio
result from compress image by the RLE algorithm. Column three and four
list the compression ratio result from compress by LZW and Huffman
algorithms respectively while the last column lists the compression ratio
achieved by the proposed method. In addition the average compression
ratio of each method after applied on all tested images (RLE 1.2017, LZW
1.4808, Huffman 1.195782 and BCH and, LZW the average is 1.676091).
The average of compression ratio on tested images based on the proposed
method is the best ratio achieved, this mean image size is reduced more when
compressed by using combination method LZW and BCH compared to the
standards of lossless compression algorithm, and Figure 2 can clear the view
of the proposed method that has higher compression ratio than the RLE,
LZW and Huffman. Figure 3 displays the original image size and the size of
image after compressed by each RLE, LZW, Huffman and compress by the
proposed method which show it had the less image size which achieves the
goal of this paper to utilize storage need to store the image and therefore,
reduce time for transmission.
The second comparison depends on bit per pixel shown in Table 2. The
goal of the image compression is to reduce the size as much as possible,
while maintaining the image quality. Smaller files use less space to store,
so it is better to have fewer bits need to represent in each pixel. The table
tests the same image sets and explains the proposed method that needs fewer
numbers of bit per pixel than the other standard image compression and the
average bit per pixel of all tested images are 6.904287, 5.656522, 6.774273
and 5.01157 to RLE, LZW, Huffman and proposed method respectively.
CONCLUSIONS
This paper was motivated by the desire of improving the effectiveness
of lossless image compression by improving the BCH and LZW. We
230 Information and Coding Theory in Computer Science
FUTURE WORK
In this paper, we develop a method for improve image compression based
on BCH and LZW. We suggest for future work to use BCH with another
compression method and that enable to repeat the compression more than
three times, and to investigate how to provide a high compression ratio
for given images and to find an algorithm that decrease file (map). The
experiment dataset in this paper was somehow limited so applying the
developed methods on a larger dataset could be a subject for future research
and finally extending the work to the video compression is also very
interesting, Video data is basically a three-dimensional array of color pixels,
that contains spatial and temporal redundancy. Similarities can thus be
encoded by registering differences within a frame (spatial), and/or between
frames (temporal) where data frame is a set of all pixels that correspond to a
single time moment. Basically, a frame is the same as a still picture.
Spatial encoding in video compression is performed by taking advantage
of the fact that the human eye is unable to distinguish small differences in
color as easily as it can perceive changes in brightness, so that very similar
areas of color can be “averaged out” in a similar way to JPEG images. With
temporal compression only the changes from one frame to the next are
encoded as often a large number of the pixels will be the same on a series
of frames.
Lossless Image Compression Technique Using Combination Methods 231
REFERENCES
1. R. C. Gonzalez, R. E. Woods and S. L. Eddins, “Digital Image
Processing Using MATLAB,” Pearson Prentice Hall, USA, 2003.
2. K. D. Sonal, “Study of Various Image Compression Techniques,”
Proceedings of COIT, RIMT Institute of Engineering & Technology,
Pacific, 2000, pp. 799-803.
3. M. Rabbani and W. P. Jones, “Digital Image Compression Techniques,”
SPIE, Washington. doi:10.1117/3.34917
4. D. Shapira and A. Daptardar, “Adapting the Knuth-Morris-Pratt
Algorithm for Pattern Matching in Huffman Encoded Texts,”
Information Processing and Management, Vol. 42, No. 2, 2006, pp.
429-439. doi:10.1016/j.ipm.2005.02.003
5. H. Zha, “Progressive Lossless Image Compression Using Image
Decomposition and Context Quantization,” Master Thesis, University
of Waterloo, Waterloo.
6. W. Walczak, “Fractal Compression of Medical Images,” Master Thesis,
School of Engineering Blekinge Institute of Technology, Sweden.
7. R. Rajeswari and R. Rajesh, “WBMP Compression,” International
Journal of Wisdom Based Computing, Vol. 1, No. 2, 2011. doi:10.1109/
ICIIP.2011.6108930
8. M. Poolakkaparambil, J. Mathew, A. M. Jabir, D. K. Pradhan and
S. P. Mohanty, “BCH Code Based Multiple Bit Error Correction in
Finite Field Multiplier Circuits,” Proceedings of the 12th International
Symposium on Quality Electronic Design (ISQED), Santa Clara, 14-
16 March 2011, pp. 1-6. doi:10.1109/ISQED.2011.5770792
9. B. Ranjan, “Information Theory, Coding and Cryptography,” 2nd
Edition, McGraw-Hill Book Company, India, 2008.
10. P. G. Howard and V. J. Scott, “New Method for Lossless Image
Compression Using Arithmetic Coding,” Information Processing &
Management, Vol. 28, No. 6, 1992, pp. 749-763. doi:10.1016/0306-
4573(92)90066-9
11. P. Franti, “A Fast and Efficient Compression Method for Binary
Image,” 1993.
12. M. Burrows and D. J. Wheeler, “A Block-Sorting Lossless Data
Compression Algorithm,” Systems Research Center, Vol. 22, No. 5,
1994.
232 Information and Coding Theory in Computer Science
13. 13. B. Meyer and P. Tischer, “TMW—a New Method for Lossless
Image Compression,” Australia, 1997.
14. 14. M. F. Talu and I. Türkoglu, “Hybrid Lossless Compression Method
for Binary Images,” University of Firat, Elazig, Turkey, 2003.
15. 15. L. Zhou, “A New Highly Efficient Algorithm for Lossless Binary
Image Compression,” Master Thesis, University of Northern British
Columbia, Canada, 2004.
16. 16. N. J. Brittain and M. R. El-Sakka, “Grayscale True Two-
Dimensional Dictionary-Based Image Compression,” Journal of Visual
Communication and Image Representation, Vol. 18, No. 1, pp. 35-44.
17. 17. R.-C. Chen, P.-Y. Pai, Y.-K. Chan and C.-C. Chang, “Lossless
Image Compression Based on Multiple-Tables Arithmetic Coding,”
Mathematical Problems in Engineering, Vol. 2009, 2009, Article ID:
128317. doi:10.1155/2009/128317
18. 18. J. H. Pujar and L. M. Kadlaskar, “A New Lossless Method of Image
Compression and Decompression Using Huffman Coding Technique,”
Journal of Theoretical and Applied Information Technology, Vol. 15,
No. 1, 2010.
19. 19. H. Bahadili and A. Rababa’a, “A Bit-Level Text Compression
Scheme Based on the HCDC Algorithm,” International Journal of
Computers and Applications, Vol. 32, No. 3, 2010.
20. 20. R. Al-Hashemi and I. Kamal, “A New Lossless Image Compression
Technique Based on Bose,” International Journal of Software
Engineering and Its Applications, Vol. 5, No. 3, 2011, pp. 15-22.
Chapter
NEW RESULTS IN
PERCEPTUALLY LOSSLESS
COMPRESSION OF
11
HYPERSPECTRAL IMAGES
ABSTRACT
Hyperspectral images (HSI) have hundreds of bands, which impose
heavy burden on data storage and transmission bandwidth. Quite a few
compression techniques have been explored for HSI in the past decades.
One high performing technique is the combination of principal component
analysis (PCA) and JPEG-2000 (J2K). However, since there are several
new compression codecs developed after J2K in the past 15 years, it is
worthwhile to revisit this research area and investigate if there are better
techniques for HSI compression. In this paper, we present some new
Citation: Kwan, C. and Larkin, J. (2019), “New Results in Perceptually Lossless Com-
pression of Hyperspectral Images”. Journal of Signal and Information Processing, 10,
96-124. doi: 10.4236/jsip.2019.103007.
Copyright: © 2019 by authors and Scientific Research Publishing Inc. This work is
licensed under the Creative Commons Attribution International License (CC BY).
http://creativecommons.org/licenses/by/4.0
234 Information and Coding Theory in Computer Science
INTRODUCTION
Hyperspectral images (HSI) have found a wide range of applications,
including remote chemical monitoring [1] , target detection [2] , anomaly
and change detection [3] [4] [5] , etc. Due to the presence of hundreds of
bands in HSI, however, heavy burden in data storage and transmission
bandwidth has been introduced.
For many practical applications, it is unnecessary to compress data
losslessly because lossless compression can achieve only two to three times
of compression. Instead, it will be more practical to apply perceptually
lossless compression [6] [7] [8] [9] . A simple rule of thumb is that if the
peak-signal-to-noise ratio (PSNR) or human visual system (HVS) inspired
metric is above 40 dBs, then the decompressed image is considered as
“near perceptually lossless” [10] . In several recent papers, we have applied
perceptually lossless compression to maritime images [10] , sonar images
[10] , and Mastcam images [11] [12] [13] .
In the past few decades, there are some alternative techniques for
compressing HSI. In [14] , a tensor approach was proposed to compress
the HSI. In [15] , a missing data approach was presented to compress HSI.
Another simple and straightforward approach is to apply PCA directly
to HSI. For instance, in [3] , the authors have used 10 PCA compressed
New Results in Perceptually Lossless Compression of Hyperspectral Images 235
bands for anomaly detection. There are also some conventional, simple, and
somewhat naïve approaches, to compressing HSI. One idea known as split
band (SB) is to split the hundreds of HSI bands into groups of 3-band images
and then compress each 3-band image separately. Another idea known as the
video approach (Video) is to treat the 3-band images as video frames and
compress the frames as a video. The SB and Video approaches have been
used for multispectral images [13] and were observed to achieve reasonable
performance.
One powerful approach to HSI compression is the combination of PCA
and J2K [16] . The idea was to first apply PCA to decorrelate the hundreds of
bands and then a J2K codec is then applied to compress the few PCA bands.
In the compression literature, there are a lot of new developments after
J2K [17] in the past 15 years. X264 [18] , a fast implementation of H264
standard, has been widely used in Youtube and many other social media
platforms. X265 [19] , a fast implementation of H265, is a new codec that
will succeed X264. Moreover, a free video codec known as Daala, emerged
recently [20] . In light of these new codecs, it is about time and worthwhile
to revisit the HSI compression problem.
In this paper, we summarize our study in this area. Our aim is to achieve
perceptually lossless compression of HSI at 100 to 1 compression. The key
idea is to compare several combinations of PCA and video/image codecs.
Three representative HSI data cubes such as the Pavia and AVIRIS datasets
were used in our studies. Four video/image codecs, including J2K, X264,
X265, and Daala, have been investigated and four performance metrics were
used in our comparative studies. Moreover, some alternative techniques
such as video, split band, and PCA only approaches were also compared.
It was observed that the combination of PCA and X264 yielded the best
performance in terms of compression performance (rate-distortion curves)
and computational complexity. In the Pavia data case, the PCA + X264
combination achieved more than 3 dBs than the PCA + J2K combination. Most
importantly, our investigations showed that the PCA + X264 combination
can achieve more than 40 dBs of PSNR at 100 to 1 compression. This means
that perceptually lossless compression of HSI is achievable even at 100 to
1 compression.
The key contributions are as follows. First, we revisited the hyperspectral
image compression problem and extensively compared several approaches:
PCA only, Video approach, Split Band approach, and a two-step approach.
236 Information and Coding Theory in Computer Science
Second, for the two-step approach, we compared four variants: PCA + J2K,
PCA + X264, PCA + X265, and PCA + Daala. We observed that the two-
step approach is better than PCA only, Video, and Split Band approaches, as
perceptually lossless compression can be achieved at 100 to 1 ratio. Third,
within the two-step approach, our experiments showed that the PCA +
X264 combination is better than other variants in terms of performance and
computational complexity. To the best of our knowledge, we have not seen
such a study in the literature.
Our paper is organized as follows. Section 2 summarizes the HSI data,
the technical approach, the various algorithms, and performance metrics.
In Section 3, we focus on the experimental results, including the PCA only
results, video approach, split band approach, and two-step approach (PCA
+ video codecs). Four performance metrics were used to compare different
algorithms. Finally, some concluding remarks are included in Section 4.
Data
We have used several representative HSI data in this paper. The Pavia and
AVIRIS image cubes were collected using airborne sensors and the Air
Force image was collected on the ground. The numbers of bands in the three
data sets vary from one hundred to more than two hundred.
Image 1: Pavia [21]
The first image we had tested was the Pavia data with a 610 × 340 × 103
image cube. The image was taken with a Reflective Optics System Imaging
Spectrometer (ROSIS) sensor during a flight over northern Italy. Figure 1
shows the RGB bands of the Pavia image cube.
Image 2: AF image
The second image was the image cube used in [3] and it consists of 124
bands and has a height of 267 pixels and a width of 342 pixels. The RGB
image of this data set is shown in Figure 2.
Image 3: AVIRIS
The third image was taken from NASA’s Airborne Visible Infrared
Imaging Spectrometer (AVIRIS). There are 213 bands with wavelengths
from 380 nm to 2500 nm. The image size is 300 × 300 × 213. Figure 3 shows
the RGB image of the data cube.
New Results in Perceptually Lossless Compression of Hyperspectral Images 237
Compression Approaches
Here, we first present the various work flows of several representative
compression approaches for HSI. We then include some background
materials for several video/image codecs in the literature. We will also
mention two conventional performance metrics and two other metrics
motivated by human visual systems (HVS).
PCA Only
PCA is also known as Karhunen-Loève transform (KLT). Comparing with
discrete cosine transform (DCT) and wavelet transform, PCA is optimal
because it is data-dependent whereas the DCT and WT are independent of
input data. The work flow is shown in Figure 4. After some preprocessing
steps, PCA compresses the raw HSI data cube (N bands) into a pre-defined
number of bands (r bands) and those r bands will be saved or transmitted.
At the receiving end, an inverse PCA will be performed to reconstruct the
HSI image cube.
Video Approach
This approach is similar to the SB approach. Here, the 3-band images are
treated as video frames and then a video codec is then applied. Details can
be found in Figure 5.
We include some details for some of the blocks.
Pre-processing
The preprocessing has a few components. First, it is important to ensure
the input image dimensions to have even numbers because some codecs
may crash if the image size has odd dimensions. Second, the input image
is normalized to double precision with values between 0 and 1. Third, the
different bands are saved into tiff format. Fourth, all the bands are written
into YUV444 and Y4M formats.
Codecs
Different codecs have different requirements. For J2K, we used Matlab’s
video writer to create a J2K format with certain quality parameters. We then
used Matlab’s data reader to decode the compressed data and the individual
frames will be retrieved. For X264 and X265, the videos are encoded using
the respective encoders with certain quality parameters. The video decoding
was done within FFMPEG. For Daala, we directly used the Daala’s functions
for encoding and decoding.
Performance Evaluation
In the evaluation part, each frame is reconstructed and compared to the
original input band. Four performance metrics have been used.
Daala [20]
Recently, there is a parallel activity at xiph.org foundation, which implements
a compression codec called Daala [20]. It is based on DCT. There are pre-
and post-filters to increase energy compaction and remove block artifacts.
Daalaborrows ideas from [26] .
The block-coding framework in Daala can be illustrated in Figure 7.
In this study, we compared Daala with X264, X265, and J2K in our
experiments.
Wavelet-based Algorithms
J2K is a wavelet [17] [27] [28] [29] based compression standard. It has
better performance than JPEG. However, J2K requires the use of the whole
image for coding and hence is not suitable for real-time applications. In
addition, motion-J2K for video compression is not popular in the market.
Performance Metrics
In almost all compression systems, researchers used peak signal-to-noise
ratio (PSNR) or structural similarity (SSIM) to evaluate the compression
algorithms. Given a fixed compression ratio, algorithms that yield higher
242 Information and Coding Theory in Computer Science
EXPERIMENTAL RESULTS
Here, we briefly describe the experimental settings. In PCA only approach, a
program was written for PCA. The input is one hyperspectral image and the
number of principal components to be used in the compression. The outputs
are the PCA bands. The performance metrics are generated by comparing
the original hyperspectral image with the inverse-PCA outputs.
In the Video only approach, we used ffmpeg to call X264 and X265. For
Daala, we used the latest open-source code in Daala’s website. For J2K, we
used the built-in J2K function in Matlab.
PCA Only
Here, we applied PCA directly to compress the 103 bands to 3, 6, and 9
bands, which we denote as PCA3, PCA6, and PCA9, respectively. From
Figure 9, one can see that PCA3 achieved 33 times of compression with
44.75 dB of PSNR. The other metrics are also high. Similarly, PCA6 and
PCA9 also attained high values in performance metrics. This means that
PCA alone can achieve reasonable compression performance. However, if
our goal is to achieve 100 to 1 compression with higher than 40 dBs of
PSNR, then the PCA only approach may be insufficient.
Video Approach
As mentioned earlier, the video approach treats the HSI data cube as a video
where each frame takes 3 bands out of the data cube. There are 35 frames
in total in the video for the Pavia data. We then applied four video codecs
(J2K, X264, X265, and Daala) to the video. Four performance metrics were
generated as shown in Figure 10. If one compares the metrics in Figure 9 and
244 Information and Coding Theory in Computer Science
Figure 10, one can see that Video approach is slightly better than the PCA
only approach. For instance, at 0.03 compression ratio, PCA3 yielded 38.2
dBs and the Video approach yielded more than 40 dBs in terms of PSNR.
X265 performed better than others at compression ratios less than 0.1.
Two-step approach
Two-step approach first compresses the HSI cube by using PCA to a number
of bands (3, 6, 9, etc.) The second step applies a video codec to compress the
PCA bands. We have five case studies below.
Figure 9. Performance of PCA only: (a) PSNR in dB for Pavia; (b) SSIM for
Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
New Results in Perceptually Lossless Compression of Hyperspectral Images 245
Figure 10. Performance of video approach: (a) PSNR in dB for Pavia; (b) SSIM
for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
Figure 11. Performance of SB approach: (a) PSNR in dB for Pavia; (b) SSIM
for Pavia; (c) HVS in dB for Pavia; (d) HVSm in dB for Pavia.
40 dBs of PSNR. The other metrics are also high. Daala has better visual
performance (HVS and HVsm) than others. We can also notice that the PCA3
+ Video approach can attain much higher compression than PCA only, SB,
and Video approaches. That is, the compression ratio can be more than 100
times compression with close to 40 dBs of HVSm in the two-step approach
whereas the SB and Video approach cannot achieve 100 to 1 compression
with the same performance metrics (40 dBs).
PCA6 + Video
Figure 13 summarizes the PCA6 + Video results. At 0.01 compression ratio,
the PCA6 + Video approach appears to be slightly better than that of PCA3
+ Video. X264 is better than others in three out of four metrics. In particular,
at 0.01 compression, X264 has 45 dBs in terms of HVSm. This value is very
high and can be considered as perceptually lossless.
PCA12 + Video
As shown in Figure 15, the performance of PCA12 + Video is somewhat
similar to PCA9 + Video.
PCA15 + Video
As shown in Figure 16, the performance of PCA15 + Video is somewhat
similar to PCA12 + Video.
New Results in Perceptually Lossless Compression of Hyperspectral Images 249
Figure 17. Performance of PCA only: (a) PSNR in dB for AF image cube; (b)
SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in dB
for AF image cube.
Video Approach
Comparing the performance of video approach (Figure 18) with the PCA
only approach (Figure 17), one can immediately notice that the Video
approach allows higher compression ratios to be achieved. For instance,
at 0.01 compression ratio, X265 achieved about 38 dBs in PSNR. X265
performs well for small ratios (high compression).
252 Information and Coding Theory in Computer Science
Figure 18. Performance of Video approach: (a) PSNR in dB for AF image cube;
(b) SSIM for AF image cube; (c) HVS in dB for AF image cube; (d) HVSm in
dB for AF image cube.
SB Approach
Comparing the SB approach in Figure 19 with the Video approach in Figure
18, the Video approach is better. For instance, if one looks at the PSNR
values at 0.05 compression ratio, one can see that the X265 codec in the
Video approach has a value of 44 dBs whereas the best codec (J2K) has a
value of 41.5 dBs.
Two-step Approach
Here, the PCA is combined with the Video approach. That is, the PCA is
first applied to the 124 bands to obtain 3, 6, 9, 12, and 15 bands. After that,
a video codec is applied to further compress the PCA bands.
PCA3 + Video
From Figure 20, we can see that the PCA3 + Video can achieve 0.01
compression ratio with more than 40 dBs of PSNR. Hence, the performance
is better than the earlier approaches (PCA only, Video, and SB). Daala has
better performance in terms of HVS and HVsm.
PCA6 + Video
From Figure 21 and Figure 20, we can see that PCA6 + Video is better than
PCA3 + Video. For example, at 0.01 compression ratio, Daala has 44 dBs
(HVSm) for PCA6 + Video whereas Daala only has 34.75 dB for PCA3 +
Video. X264 has better metrics in PSNR and SSIM, but Daala has better
performance in terms of HVS and HVSm.
254 Information and Coding Theory in Computer Science
PCA9 + Video
Comparing Figure 21 and Figure 22, PCA9 + Video is worse than that of
PCA6 + Video. For instance, at 0.01 compression ratio, PCA9 + Video has
42 dBs (PSNR) and PCA6 + Video has slightly over 44 dBs of PSNR. Daala
has better scores in HVS and HVSm, but X264 has higher values in PSNR
and SSIM.
PCA12 + Video
As shown in Figure 23, the performance of PCA12 + Video is worse than
some of the earlier combinations. For example, Daala’s HVSm value is 40
dBs at 0.01 compression ratio and this is lower than PCA6 + Video (Figure
21) and PCA9 + Video (Figure 22). PCA12 + Video is better than PCA3 +
Video (Figure 20).
PCA15 + Video
From Figure 24, we can see that PCA15 + Video is similar to PCA3 + Video
(Figure 20), but worse than the other PCA + Video combinations (Figure 21,
Figure 22, Figure 23).
256 Information and Coding Theory in Computer Science
Figure 25. Performance of PCA only: (a) PSNR in dB for AVIRIS image cube;
(b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube; (d)
HVSm in dB for AVIRIS image cube.
Video Approach
Here, the 213 bands are divided into groups of 3 bands. As a result, there are
71 groups, which are then treated as 73 frames in a video. After that, different
video codecs are applied. The performance metrics are shown in Figure 26.
Comparing with PCA only approach, the video approach is slightly inferior.
For instance, PCA6 has PSNR of 44 dBs at a compression ratio of 0.028
whereas the Video only approach has about 42.5 dBs at 0.028 ratio.
New Results in Perceptually Lossless Compression of Hyperspectral Images 259
Figure 26. Performance of Video approach: (a) PSNR in dB for AVIRIS image
cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS image cube;
(d) HVSm in dB for AVIRIS image cube.
SB Approach
Here the 73 groups of 3-band images are compressed separately. The results
shown in Figure 27 are worse than the video approach. This is understandable
as the correlations between frames were not taken into account in the SB
approach.
Two-step Approach
We have the following five case studies based on the number of PCA bands
coming out of the first step.
PCA3 + Video
From Figure 28, the performance metrics appear to flatten out after a
compression ratio of 0.005. The maximum PSNR value is below 40 dBs.
Other metrics are also not very high. Comparing with the PCA only, Video,
and SB approaches, PCA3 + Video does not show any advantages.
Figure 28. Performance of PCA3 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.
PCA6 + Video
From Figure 29, we can see that PCA6 + Video has much better performance
than PCA3 + Video as well as PCA only, Video, and SB approaches. At
0.01 compression ratio, the PSNR values reached more than 42 dBs. Other
metrics also performed well. Daala has higher scores in HVS and HVSm.
X265 is slightly better in PSNR and SSIM.
New Results in Perceptually Lossless Compression of Hyperspectral Images 261
Figure 29. Performance of PCA6 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.
PCA9 + Video
Comparing Figure 29 and Figure 30, we can see that PCA9 + Video has
better metrics than that of PCA6 + Video. For instance, at 0.01 compression
ratio, PCA9 + Video has achieved 40 dBs of HVSm (Daala), but PCA6 +
Video has 38.5 dBs.
Figure 30. Performance of PCA9 + Video approach: (a) PSNR in dB for AVIRIS
image cube; (b) SSIM for AVIRIS image cube; (c) HVS in dB for AVIRIS im-
age cube; (d) HVSm in dB for AVIRIS image cube.
262 Information and Coding Theory in Computer Science
PCA12 + Video
Comparing Figure 30 and Figure 31, we can see that PCA12 + Video is
slightly worse than that of PCA9 + Video.
PCA15 + Video
Comparing Figure 31 and Figure 32, it can be seen that PCA15 + Video is
slightly worse than PCA12 + Video.
CONCLUSION
In this paper, we summarize some new results for HSI compression. The
key idea is to revisit a two-step approach to HSI data compression. The first
step adopts PCA to compress the HSI data spectrally. That is, the number of
bands is greatly reduced to a few bands via PCA. The second step applies
the latest video/image codecs to further compress the few PCA bands. Four
well-known codecs (J2K, X264, X265, and Daala) were used in the second
step. Three HSI data sets with diversely varying numbers of bands were used
in our studies. Four performance metrics were utilized in our experiments.
We have several key observations. First, we observed that compressing of
the HIS to six bands has the best overall performance in all of the three
HSI data sets. This is different from the observation in [16] where more
PCA bands were included in the J2K step. Second, the X264 codec gave the
best performance in terms of compression performance and computational
complexity. Third, the PCA6 + X264 combination can be 3 dBs better than
the PCA6 + J2K combination in the Pavia data at 100 to 1 compression and
this is quite significant. Fourth, even at 100 to 1 compression, the PCA6 +
New Results in Perceptually Lossless Compression of Hyperspectral Images 265
X264 combination can attain better than 40 dBs in PSNR for all of the three
data sets. This means the compression performance is perceptually lossless
at 100 to compression.
ACKNOWLEDGEMENTS
This research was supported by NASA Jet Propulsion Laboratory under
contract # 80NSSC17C0035. The views, opinions and/or findings expressed
are those of the author(s) and should not be interpreted as representing the
official views or policies of NASA or the U.S. Government.
266 Information and Coding Theory in Computer Science
REFERENCES
1. Ayhan, B., Kwan, C. and Jensen, J.O. (2019) Remote Vapor Detection
and Classification Using Hyperspectral Images. Proceedings SPIE,
Chemical, Biological, Radiological, Nuclear, and Explosives (CBRNE)
Sensing XX, Vol. 11010, 110100U. https://doi.org/10.1117/12.2518500
2. Zhou, J., Kwan, C. and Ayhan, B. (2017) Improved Target Detection
for Hyperspectral Images Using Hybrid In-Scene Calibration. Journal
of Applied Remote Sensing, 11, Article ID: 035010. https://doi.
org/10.1117/1.JRS.11.035010
3. Zhou, J., Kwan, C., Ayhan, B. and Eismann, M. (2016) A Novel Cluster
Kernel RX Algorithm for Anomaly and Change Detection Using
Hyperspectral Images. IEEE Transactions on Geoscience and Remote
Sensing, 54, 6497-6504. https://doi.org/10.1109/TGRS.2016.2585495
4. Zhou, J., Kwan, C. and Budavari, B. (2016) Hyperspectral Image Super-
Resolution: A Hybrid Color Mapping Approach. Journal of Applied
Remote Sensing, 10, Article ID: 035024. https://doi.org/10.1117/1.
JRS.10.035024
5. Qu, Y., Wang, W., Guo, R., Ayhan, B., Kwan, C., Vance, S. and Qi, H.
(2018) Hyperspectral Anomaly Detection through Spectral Unmixing
and Dictionary Based Low Rank Decomposition. IEEE Transactions
on Geoscience and Remote Sensing, 56, 4391-4405. https://doi.
org/10.1109/TGRS.2018.2818159
6. Wu, H.R., Reibman, A., Lin, W., Pereira, F. and Hemami, S.
(2013) Perceptual Visual Signal Compression and Transmission.
Proceedings of the IEEE, 101, 2025-2043. https://doi.org/10.1109/
JPROC.2013.2262911
7. Wu, D., Tan, D.M., Baird, M., DeCampo, J., White, C. and Wu,
H.R. (2006) Perceptually Lossless Coding of Medical Images. IEEE
Transactions on Medical Imaging, 25, 335-344. https://doi.org/10.1109/
TMI.2006.870483
8. Oh, H., Bilgin, A. and Marcellin, M.W. (2013) Visually Lossless
Encoding for JPEG 2000. IEEE Transactions on Image Processing, 22,
189-201. https://doi.org/10.1109/TIP.2012.2215616
9. Tan, D.M. and Wu, D. (2016) Perceptually Lossless and Perceptually
Enhanced Image Compression System & Method. U.S. Patent
9,516,315,6.
New Results in Perceptually Lossless Compression of Hyperspectral Images 267
10. Kwan, C., Larkin, J., Budavari, B., Chou, B., Shang, E. and Tran, T.D.
(2019) A Comparison of Compression Codecs for Maritime and Sonar
Images in Bandwidth Constrained Applications. Computers, 8, 32.
https://doi.org/10.3390/computers8020032
11. Kwan, C., Larkin, J., Budavari, B. and Chou, B. (2019) Compression
Algorithm Selection for Multispectral Mastcam Images. Signal &
Image Processing: An International Journal, 10, 1-14. https://doi.
org/10.5121/sipij.2019.10101
12. Kwan, C. and Larkin, J. (2018) Perceptually Lossless Compression for
Mastcam Images. IEEE Ubiquitous Computing, Electronics & Mobile
Communication Conference, New York, 8-10 November 2018, 559-
565. https://doi.org/10.1109/UEMCON.2018.8796824
13. Kwan, C., Larkin, J. and Chou, B. (2019) Perceptually Lossless
Compression of Mastcam Images with Error Recovery. Proceedings
SPIE, Signal Processing, Sensor/Information Fusion, and Target
Recognition XXVIII, Vol. 11018. https://doi.org/10.1117/12.2518482
14. Li, N. and Li, B. (2010) Tensor Completion for On-Board Compression
of Hyperspectral Images. IEEE International Conference on
Image Processing, Hong Kong, 517-520. https://doi.org/10.1109/
ICIP.2010.5651225
15. Zhou, J., Kwan, C. and Ayhan, B. (2012) A High Performance
Missing Pixel Reconstruction Algorithm for Hyperspectral Images.
2nd International Conference on Applied and Theoretical Information
Systems Research, Taipei, 27-29 December 2012.
16. Du, Q. and Fowler, J.E. (2007) Hyperspectral Image Compression
Using JPEG2000 and Principal Component Analysis. IEEE Geoscience
and Remote Sensing Letters, 4, 201-205. https://doi.org/10.1109/
LGRS.2006.888109
17. JPEG-2000. https://en.wikipedia.org/wiki/JPEG_2000
18. X264. http://www.videolan.org/developers/x264.html
19. X265. https://www.videolan.org/developers/x265.html
20. Daala. https://xiph.org/daala/
21. http://lesun.weebly.com/hyperspectral-data-set.html
22. JPEG. https://en.wikipedia.org/wiki/JPEG
23. JPEG-XR. https://en.wikipedia.org/wiki/JPEG_XR
24. VP8. https://en.wikipedia.org/wiki/VP8
268 Information and Coding Theory in Computer Science
LOSSLESS COMPRESSION
OF DIGITAL MAMMOGRA-
PHY USING BASE
12
SWITCHING METHOD
ABSTRACT
Mammography is a specific type of imaging that uses low-dose x-ray
system to examine breasts. This is an efficient means of early detection
of breast cancer. Archiving and retaining these data for at least three
years is expensive, difficult and requires sophisticated data compression
techniques. We propose a lossless compression method that makes use of
the smoothness property of the images. In the first step, de-correlation of the
given image is done using two efficient predictors. The two residue images
are partitioned into non overlapping sub-images of size 4x4. At every instant
one of the sub-images is selected and sent for coding. The sub-images with
all zero pixels are identified using one bit code. The remaining sub- images
are coded by using base switching method. Special techniques are used to
save the overhead information. Experimental results indicate an average
compression ratio of 6.44 for the selected database.
INTRODUCTION
Breast cancer is the most frequent cancer in the women worldwide with 1.05
million new cases every year and represents over 20% of all malignancies
among female. In India, 80,000 women were affected by breast cancer in
2002. In the US, alone in 2002, more than 40,000 women died of breast
cancer. 98% of women survive breast cancer if the tumor is smaller than 2 cm
[1]. One of the effective methods of early diagnosis of this type of cancer is
non-palpable, non-invasive mammography. Through mammogram analysis
radiologists have a detection rate of 76% to 94%, which is considerably
higher than 57% to 70% detection rate for a clinical breast examination [2].
Mammography is a low dose x-ray technique to acquire an image of
the breast. Digital image format is required in computer aided diagnosis
(CAD) schemes to assist the radiologists in the detection of radiological
features that could point to different pathologies. However, the usefulness
of the CAD technique mainly depends on two parameters of importance:
the spatial and grey level resolutions. They must provide a diagnostic
accuracy in digital images equivalent to that of conventional films. Both
pixel size and pixel depth are factors that critically affect the visibility of
small low contrast objects or signals, which often are relevant information
for diagnosis [3]. Therefore, digital image recording systems for medical
imaging must provide high spatial resolution and high contrast sensitivity.
Due to this, mammography images commonly have a spatial resolution of
1024x1024, 2048x2048 or 4096x4096 and use 16, 12 or 8 bits/pixel. Figure
1 shows a mammography image of size 1024x1024 which uses 8 bits/pixel.
Lossless Compression of Digital Mammography Using Base Switching... 271
BASE-SWITCHING ALGORITHM
The BS method divides the original image (gray-level data) into non-
overlapping sub-images of size n × n. Given a n × n sub-image A, whose
N gray values are g0,g1,…gN-1, define the “minimum” m, “base” b and the
“modified sub-image “AI, whose N gray values are by
(1)
(2)
(3)
Also,
(4)
where N = n × n and each of the elements of I is 1. The value of ‘b’ is
related to smoothness of the sub-image where smoothness is measured as
the difference between maximum and minimum pixel values in the sub-
image.
The number of bits required to code the gray values is,
(5)
Then, total bits required for the whole sub-image is,
(6)
For example, for the sub-image of Figure 2, n = 4, N = 16, m = 95 & b = 9.
Modified sub-image of Figure 3 is obtained by subtracting 95 from every
gray values of A.
(7)
The image can be treated as an N digit number
in the base b number system. An integer value function f
can be defined such that f (AI, b) = decimal integer equivalent to the base-b
number.
(8)
(9)
Then, number of bits required to store the integer f (AI, b) is
(10)
Reconstruction of AI is done by switching the binary (base 2) number to
a base b number. Therefore, reconstruction of A needs the value of m and b.
The format of representation of a sub-image is as shown below.
276 Information and Coding Theory in Computer Science
(11)
Always,
(12)
This verifies that
ZB ≤ ZA. (13)
Format 1
If b∈{1,2 …,11}, then the coding format is
Format 2
If b∈ (12, 13,…., 128}, then the coding format is
Lossless Compression of Digital Mammography Using Base Switching... 277
Format 3
If b∈ (129, 130,…., 256}, then the coding format is
Here, c stands for the category bit. If c is 0, then the block is encoded
using Formats 1 or 2; otherwise Format 3 is used.
PROPOSED METHOD
In the proposed method, we made following modifications to the basic BS
method.
• Prediction
• Increasing the block size from 3x3 to 4x4
• All-zero block removal
• Coding the minimum value and base value
Prediction
After reviewing the BS method, it is found that number of bits required for a
sub-image is decided by the value of base ‘b’. If ‘b’ is reduced, the number
of bits required for a sub-image is also reduced. In the proposed method,
prediction is used to reduce the value of ‘b’ significantly. A predictor
generates at each pixel of the input image the anticipated value of that
278 Information and Coding Theory in Computer Science
pixel based on some of the past inputs. The output of the predictor is then
rounded to the nearest integer, denoted x̂n and used to form the difference
or prediction error
(14)
This prediction error is coded by the proposed entropy coder. The
decoder reconstructs en from received code words and perform the reverse
operation
(15)
The quality of the prediction for each pixel directly affects how efficiently
the prediction error can be encoded. The better this prediction is less is the
information that must be carried by the prediction error signal. This, in turn,
leads to fewer bits. One way to improve the prediction used for each pixel
is to make a number of predictions then chose the one which comes closest
to the actual value [21]. This method also called as switched prediction has
the major disadvantage that the choice of prediction for each pixel must
be sent as overhead information. The proposed prediction scheme uses two
predictors and one of them is chosen for every block of pixels of size 4x4.
Thus, the choice of prediction is to be made only once for the entire 4x4
block. This reduces the amount of overhead. The two predictions are given
in Eq.16 and 17, in that, Pr1 is the popular MED predictor used in JPEG-LS
standard and Pr2 is the one used by [5] for the compression of mammography
images.
For the entire pixels of a block of size 4 × 4, one of the two predictions is
chosen depending on the smoothness property of the predicted blocks. Here,
smoothness is measured as the difference between maximum and minimum
pixel values in the block. The predictor that gives the lowest difference
value will be selected for that block. The advantage here is that the overhead
required for each block is only one bit.
(16)
(17)
Lossless Compression of Digital Mammography Using Base Switching... 279
(18)
Figure 5 illustrates the prediction technique. It shows two error images
which are obtained by using two predictors Pr1 and Pr2 respectively. The BS
algorithm divides them into 4x4 sub-images and computes the difference
between maximum and minimum pixel values for all the four sub-images.
For the first sub-image, difference ‘d1’ is 6 and ‘d2’ is 8, where d1 and d2
are the differences of the sub-images corresponding to the predictors Pr1 and
Pr2 respectively. Now, since d1<d2, the prediction Pr considers 4x4 sub-
image of predictor Pr1 for further processing. This procedure is repeated for
all the other sub-images. The resulting error image is shown in Figure 6.
We use a separate file predict to store the choice of predictor made at every
sub-image.
280 Information and Coding Theory in Computer Science
For these images, total bits required for marking the presence or absence
of all-zero blocks is 65536, since total sub-images are 65536. The error
image will have both negative and positive pixel values since the prediction
has changed the range of the pixel values from [0, 255] to [-255, 255].
Therefore, 9 bits are required to record the pixel values. The approximate
average compression ratio obtained by all-zero block removal for the 50
MIAS images considered in Table 2 can be estimated by the following
formula.
Here the value 65536 indicates the overhead bits required for marking
the presence or absence of all-zero blocks. The numerator gives total bits
used by the original uncompressed image. This computation clearly shows
282 Information and Coding Theory in Computer Science
System Overview
As shown in Figure 7, we first divide the error image into sub-images of size
4x4.The sub-images are then processed one by one. For each sub-image, we
have to determine whether it belongs to all-zero category and if so they are
removed. The remaining blocks are retained for further processing.
Lossless Compression of Digital Mammography Using Base Switching... 283
Decoding a Sub-Image
Following are the decoding steps:
• The files min, base, zero and predict are reconstructed.
• The decoding algorithm first checks whether the block is an
all-zero block or not. If all-zero, then a 4x4 block of zero’s is
generated.
• Otherwise, base value and minimum value are first obtained
by using the files min and base. The modified image AI is
reconstructed as explained in Section 2 and 4x4 error image is
obtained by adding min value to it.
284 Information and Coding Theory in Computer Science
• Type of prediction used is read from the predict file. The prediction
rule is applied to the 4x4 error image and the original 4x4 sub-
image is reconstructed.
RESULTS
We evaluated the performance of the proposed scheme using the 50 gray-
scale images of the MIAS dataset that include three varieties of images:
Normal, Benign and Malignant. The MIAS is a European Society that
researches mammograms and supplies over 11 countries with real world
clinical data. Films taken from the UK national Breast Screening Program
have been digitized to 50 micron pixel edge and representing each pixel with
an 8 bit word. MATLAB is the tool used for simulation. All the simulation
was conducted on a 1.7GHZ processor and was supplied with the same set of
50 mammography images. Each mammogram has a resolution of 1024x1024
and uses 8 bits/pixel. Results of the proposed method are compared with
that of the popular methods. Comparison of Compression results for the 50
MIAS images is shown in Table 4. This set includes all the three varieties of
images namely normal, benign and malignant.
Figure 8 and Figure 9 show the two images mdb040. pgm and mdb025.
pgm that gives best and the worst compression ratio respectively.
CONCLUSIONS
Several techniques have been used for the lossless compression of
mammography images; none of them have used the smoothness property
of the images. Our study has shown that there is very large number of zero
blocks present in mammography images. We have picked up the concept of
Base switching transformation and success fully optimized and applied it
in conjunction with other existing compression methods to digitized high
resolution mammography images. Comparison with other approaches is
given for a set of 50 high resolution digital mammograms comprising of
normal, benign and malignant images. Compared with the PNG method, one
of the international standards, JBIG, performs better by 36%. Transformation
method based JPEG200, another international compression standard, when
used in lossless mode, performs slightly better than JBIG by 7%. Whereas, the
latest standard for lossless image compression JPEG-LS based on prediction
based method, performs best among the four international standards of
lossless coding techniques. It gives a compression ratio of 6.39 which is
1.5% better than the JPEG 2000. Finally, for these images, the proposed
method performs better than PNG, JBIG, JPEG2000 and JPEG-LS by 50%,
9.5%, 2.4% and approximately 1% respectively. The success of our method
is primarily due to its zero block removal procedure, compression of the
overheads and the switched prediction used. It should be also noted that the
speed of the BST method is very much comparable with the speed of other
standard methods as verified by [18]. Further investigation on improvement
of the performance of our method is under way, by developing more suitable
prediction methods. Motivated by the results obtained here, our next study
will carry out the compression of larger database of natural images and
medical images obtained by other modalities.
286 Information and Coding Theory in Computer Science
REFERENCES
1. Saxena, S., Rekhi, B., Bansal, A., Bagya, A., Chintamani, and Murthy,
N. S., (2005) Clinico-morphological patterns of breast cancer including
family history in a New Delhi Hospital, In-dia—a cross sectional study,
World Journal of Surgical Oncol-ogy, 1–8.
2. Cahoon, T. C., Sulton, M. A, and Bezdek, J. C. (2000), Breast
cancer detection using image processing techniques, The 9th IEEE
International Conference on Fuzzy Systems, 2, 973–976.
3. Penedo, M., Pearlman, W. A., Tahoces, P. G., Souto, M., and Vidal,
J. J., (2003) Region-based wavelet coding methods for digital
mammography, IEEE Transactions on Medical Imaging, 22, 1288–
1296.
4. Wu, X. L., (1997), Efficient lossless compression of Continu-ous-tone
images via context selection and quantization, IEEE Transaction. on
Image Processing, 6, 656– 664.
5. Ratakonda, K. and Ahuja, N. (2002), Lossless image compres-sion with
multi-scale segmentation, IEEE Transactions on Im-age Processing,
11, 1228–1237.
6. Weinberger, M. J., Rissanen, J., and Asps, R., (1996) Applica-tion
of universal context modeling to lossless compression of gray scale
images, IEEE Transaction on Image Processing, 5, 575–586.
7. Grecos, C., Jiang, J., and Edirisinghe, E. A., (2001) Two Low cost
algorithms for improved edge detection in JPEG-LS, IEEE Transactions
on Consumer Electronics, 47, 466–472.
8. Weinberger, M. J., Seroussi, G., and Sapiro, G., (2000) The LOCO-I
lossless image compression algorithm: Principles and standardization
into JPEG-LS, IEEE Transactions on Image processing, 9, 1309–1324.
9. Sung, M. M., Kim, H.-J., Kim, E.-K., Kwak, J.-Y., Kyung, J., and
Yoo, H.-S., (2002) Clinical evaluation of JPEG 2000 com-pression for
digital mammography, IEEE Transactions on Nu-clear Science, 49,
827–832.
10. Neekabadi, A., Samavi, S., and Karimi, N., (2007) Lossless compression
of mammographic images by chronological sift-ing of prediction
errors, IEEE Pacific Rim Conference on Communications, Computers
& Signal Processing, 58–61.
Lossless Compression of Digital Mammography Using Base Switching... 287
11. Li, X., Krishnan, S., and Marwan, N. W., (2004) A novel way of lossless
compression of digital mammograms using gram-mar codes, Canadian
Conference on Electrical and Computer Engineering, 4, 2085–2088.
12. da Silva, L. S. and Scharcanski, J., (2005) A lossless compres-sion
approach for mammographic digital images based on the delaunay
triangulation, International Conference on Image Processing, 2, 758–
761.
13. Khademi, A. and Krishnan, S., (2005) Comparison of JPEG2000
and other lossless vompression dchemes for figital mammograms,
Proceedings of the IEEE Engineering in Medi-cine and Biology
Conference, 3771– 3774.
14. Shen, L. and Rangayyan, R. M., (1997) A segmentation based lossless
image coding method for high-resolution medical im-age compression,
IEEE Transactions on Medical imaging, 16, 301–307.
15. Ranganathan, N., Romaniuk, S. G., and Namuduri, K. R., (1995)
A lossless image compression algorithm using variable block
segmentation, IEEE Transactions on Image Processing, 4, 1396–1406.
16. Namuduri, K. R., Ranganathan, N., and Rashedi, H., (1996) SVBS: A
high-resolution medical image compression algo-rithm using slicing
with variable block size segmentation, IEEE Proceedings of ICPR,
919–923.
17. Alsaiegh, M. Y. and Krishnan, S. (2001), Fixed block- based lossless
compression of digital mammograms, Canadian Con-ference on
Electrical and Computer Engineering, 2, 937–942.
18. Chuang, T.-J. and Lin, J. C., (1998) A new algorithm for loss-less still
image compression, Pattern Recognition, 31, 1343–1352.
19. Chang, C.-C., Hsieh, C.-P., and Hsiao, J.-Y., (2003) A new approach to
lossless image compression, Proceedings of ICCT’03, 1734–38.
20. Ravikumar, M. S., Koliwad, S., and Dwarakish, G. S., (2008) Lossless
compression of digital mammography using fixed block segmentation
and pixel grouping, Proceedings of IEEE 6th Indian Conference on
Computer Vision Graphics and Im-age Processing, 201–206.
21. Sayood, K., (2003) Lossless compression handbook, First edi-tion,
Academic Press, USA, 207–223.
Chapter
LOSSLESS IMAGE
COMPRESSION BASED
ON MULTIPLE-TABLES
13
ARITHMETIC CODING
Citation: Rung-Ching Chen, Pei-Yan Pai, Yung-Kuan Chan, Chin-Chen Chang, “Loss-
less Image Compression Based on Multiple-Tables Arithmetic Coding”, Mathematical
Problems in Engineering, vol. 2009, Article ID 128317, 13 pages, 2009. https://doi.
org/10.1155/2009/128317.
Copyright: © 2020 by Authors. This is an open access article distributed under the
Creative Commons Attribution License, which permits unrestricted use, distribution,
and reproduction in any medium, provided the original work is properly cited.
290 Information and Coding Theory in Computer Science
ABSTRACT
This paper is intended to present a lossless image compression method based
on multiple-tables arithmetic coding (MTAC) method to encode a gray-level
image f. First, the MTAC method employs a median edge detector (MED)
to reduce the entropy rate of f. The gray levels of two adjacent pixels in
an image are usually similar. A base-switching transformation approach is
then used to reduce the spatial redundancy of the image. The gray levels of
some pixels in an image are more common than those of others. Finally, the
arithmetic encoding method is applied to reduce the coding redundancy of
the image. To promote high performance of the arithmetic encoding method,
the MTAC method first classifies the data and then encodes each cluster
of data using a distinct code table. The experimental results show that, in
most cases, the MTAC method provides a higher efficiency in use of storage
space than the lossless JPEG2000 does.
INTRODUCTION
With the rapid development of image processing and Internet technologies,
a great number of digital images are being created every moment. Therefore,
it is necessary to develop an effective image-compression method to reduce
the storage space required to hold image data and to speed the image
transmission over the Internet [1–16].
Image compression reduces the amount of data required to describe a
digital image by removing the redundant data in the image. Lossless image
compression deals with reducing coding redundancy and spatial redundancy.
Coding redundancy consists in using variable-length codewords selected to
match the statistics of the original source. The gray levels of some pixels
in an image are more common than those of others–-that is, different gray
levels occur with different probabilities–-so coding redundancy reduction
uses shorter codewords for the more common gray levels and longer
codewords for the less common gray levels. We call this process variable-
length coding. This type of coding is always reversible and is usually
implemented using look-up tables. Examples of image coding schemes that
explore coding redundancy are the Huffman coding [4, 5, 7] and arithmetic
coding techniques [8, 9].
There exists a significant correlation among the neighbor pixels in an
image, which may result spatial redundancy in data. Spatial redundancy
reduction exploits the fact that the gray levels of the pixels in an image
region are usually the same or almost the same. Methods, such as the LZ77,
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 291
LZ88, and LZW methods, exploit the spatial redundancy in several ways,
one of which is to predict the gray level of a pixel through the gray levels of
its neighboring pixels [14].
To encode an image effectively, a statistical-model-based compression
method needs precisely to predict the occurrence probabilities of the data
patterns in the image. This paper proposes a lossless image compression
method based on multiple-tables arithmetic coding (MTAC) method to
encode a gray-level image.
A statistical-model-based compression method generally creates a code
table to hold the probabilities of occurrence of all data patterns. The type of
data pattern significantly affects the encoding efficiency when minimizing
storage space. When the data come from different sources, it is difficult to
find an appropriate code table to describe all the data. Therefore, this MTAC
method categorizes the data and adopts distinct code tables that record the
frequencies which the data patterns occur in different clusters.
(2.1)
It is impossible to encode the data set, in a lossless manner, with a bit
rate higher than or equal to E. The bit rate is defined as the ratio of the
number of bits holding the compression data to the number of pixels in the
compressed image. The higher the entropy rate, the less one can compress it
using a statistical-model-based compression method.
292 Information and Coding Theory in Computer Science
(2.2)
In addition, for i = 1 to H, and j = 1 to W:
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 293
(2.3)
Here, Max( g(i, j − 1), g(i − 1, j)) and Min(g(i, j − 1), g(i − 1, j)) are the
maximum and the minimum between g(i, j − 1) and g(i − 1, j), respectively.
If , MED considers that a horizontal edge passes through
P(i, j) or some pixels above P(i, j). When , MED
perceives that one vertical edge passes through P(i, j) or some pixel on the
left of P(i, j).
(2.4)
294 Information and Coding Theory in Computer Science
Figure 3. Two gray-level images, Airplane and Baboon, and their color histo-
grams.
According to formula (2.1), the entropy rates of Airplane and Baboon,
in theory, reach 6.5 and 7.2 bits/pixel, respectively. From Shannon’s limit
[11, 12], with such an entropy rate, the minimum number of bits required to
describe a pixel in Airplane (resp., Baboon) is 6.5 (resp., 7.2) bits/pixel. Since
the numbers of bits are over the acceptable maximum, the MTAC method
utilizes MED [10] to decrease the entropy rate of f before encoding f. Figure
4 demonstrates the error images of Airplane and Baboon shown in Figure 3,
and the gray-level histograms of the error images. The gray levels of most
pixels in the error images are close to 0; the entropy rates of the error images
of Airplane and Baboon are 3.6 and 5.2 bits/pixel, respectively, which are far
lower than the entropy rates of the original Airplane and Baboon.
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 295
Figure 4. The difference images and their histograms of Airplane and Baboon.
Figure 5 shows the sign bit images of Airplane and Baboon. It is clear that
both sign bit images are messy, so it is difficult to find a method effectively
with which to encode them effectively. To deal with this problem, the MTAC
method transforms the error image and sign bit image into a difference image
and an MSB (most significant bit) image, respectively. The MTAC method
pulls out the MSB of all the | e(i, j) |s to create an H × W binary image fMSB,
where the MSB of | e(i, j) | is given to the pixel located at the coordinates
(i, j) of fMSB. We call the binary image the MSB image fMSB of f. Meanwhile,
the MTAC method concatenates the sign bit b of e(i, j) and the remaining
| e(i, j) |, whose MSB has been drawn out, by appending b to the rightmost
bit of the remaining | e(i, j) | in order to generate another gray-level image.
We name the gray-level image the difference image of f. Figure 6 illustrates
these actions.
Figure 5. The sign bit images of the images Airplane and Baboon.
296 Information and Coding Theory in Computer Science
Figure 8. The difference images and color histograms of Airplane and Baboon.
(2.5)
For each image block, the BST algorithm needs to hold only gmin, S, and
the gray-level differences between gmin and the gray levels of all the pixels
in B. We call the difference between the gray level of a pixel P and gmin the
gray-level difference of P. Figure 9 is a 4 × 4 image block B. The 16 × 8
= 128 bits of memory space are required to store B. However, in the BST
algorithm, gmax, gmin, and S of B are 137, 122, and 3, respectively. The BST
algorithm uses 8 bits, 3 bits, and 4 × 16 bits to hold gmin, S, and the gray-level
differences of all the pixels in B; hence, the BST algorithm requires only a
total of 75 bits to store B.
Image Decompression
In the decompression phrase, the MTAC method first draws W, H, and
StringCODE TABLE from the compression data. The bit length of each data pattern
in the MSB image is 8. Hence, the MTAC method can reconstruct the MSB
image based on StringMSB by using the arithmetic decoding method. Since
f consists of H × W/9 image blocks, the MTAC method will decompress
the (H × W/9)gmins from Stringg min using the arithmetical decoding method,
where the bit length of a data pattern is 8. Similarly, it can decode (H × W/9)
Ss from StringS, where each data pattern is described by 3 bits. How many
data patterns are in each GDS can be easily computed via Ss. Hence, each
GDS can be decoded as well.
300 Information and Coding Theory in Computer Science
EXPERIMENTS
The purpose of this section is to investigate the performance of the MTAC
method by experiments. In these experiments, ten 256 × 256 gray-level
images Airplane, Lena, Baboon, Gold, Sailboat, Boat, Toy, Barb, Pepper,
and Girl, shown in Figure 10, are used as test images. The first experiment
explores the effect of the MED processing approach on reducing the entropy
rate of the compressed image. Table 1 lists the entropy rates of the ten original
test images and the entropy rates of their error images. The experimental
results show that most of the entropy rates of the error images are close to
half those of the original test images.
Table 2. The size of the sign bit images and their compression data.
Table 4. Bit rates (bits/pixel) obtained by the MTAC and lossless JPEG 2000.
Method
Image
MTAC Lossless JPEG2000
Airplane 4.14 4.35
Lena 4.20 4.25
Baboon 6.10 6.11
Gold 4.71 4.90
Sailboat 4.88 5.10
Boat 4.24 4.44
Toy 3.90 4.16
Barb 5.28 5.14
Pepper 4.34 4.43
Girl 4.63 4.73
Figure 11. The difference image of Barb and its partial image.
304 Information and Coding Theory in Computer Science
CONCLUSIONS
This paper proposes the MTAC method to encode a gray-level image f. The
MTAC method contains the MED processing, BST, and statistical-model-
based compressing approaches. The MED processing approach reduces the
entropy rate of f. The BST approach decreases the spatial redundancy of
the difference image of f based on the similarity among adjacent pixels.
The statistical-model-based compressing approach further compresses the
data generated in the MED processing and BST approaches, based on their
coding redundancy. The data patterns of the data produced by the MED
processing approach and the BST approach have different bit lengths and
distinct occurrence frequencies. Hence, the MTAC method first classifies
the data into clusters before compressing the data in each cluster using the
arithmetic coding algorithm via separated code tables.
The experimental results reveal that the MTAC method usually gives a
better bit rate than the lossless JPEG2000 does, particularly for the images
with small gray-level variations among adjacent pixels. However, when the
gray-level variations among adjacent pixels in an image are very large, the
MTAC method performs worse in terms of bit rate.
Lossless Image Compression Based on Multiple-Tables Arithmetic Coding 305
REFERENCES
1. T. J. Chuang and J. C. Lin, “On the lossless compression of still
image,” in Proceedings of the International Computer Symposium-On
Image Processing and Character Reognition (ICS ‘96), pp. 121–128,
Kaohsiung, Taiwan, 1996.
2. T. J. Chuang and J. C. Lin, “New approach to image encryption,”
Journal of Electronic Imaging, vol. 7, no. 2, pp. 350–356, 1998.
3. S. C. Diego, G. Raphaël, and E. Touradj, “JPEG 2000 performance
evaluation and assessment,” Signal Processing: Image Communication,
vol. 17, no. 1, pp. 113–130, 2002.
4. Y. C. Hu and C.-C. Chang, “A new lossless compression scheme based
on huffman coding scheme for image compression,” Signal Processing:
Image Communication, vol. 16, no. 4, pp. 367–372, 2000.
5. D. A. Huffman, “A method for the construction of minimum redundancy
codes,” in Proceedings of the IRE, vol. 40, pp. 1098–1101, 1952.
6. ISO/IEC FCD 155444-1, Information Technology-JPEG 2000 Image
Coding System, 2000.
7. D. E. Knuth, “Dynamic Huffman coding,” Journal of Algorithms, vol.
6, no. 2, pp. 163–180, 1985.
8. G. G. Langdon Jr., “An introduction to arithmetic coding,” IBM Journal
of Research and Development, vol. 28, no. 2, pp. 135–149, 1984.
9. J. Rissanen and G. G. Langdon Jr., “Arithmetic coding,” IBM Journal
of Research and Development, vol. 23, no. 2, pp. 149–162, 1979.
10. S. A. Martucci, “Reversible compression of HDTV images using
median adaptive prediction and arithmetic coding,” in Proceedings of
the IEEE International Symposium on Circuits and Systems, vol. 2, pp.
1310–1313, New York, NY, USA, 1990.
11. C. E. Shannon, “A mathematical theory of communication,” Bell
System Technical Journal, vol. 27, pp. 379–423, 1948.
12. C. E. Shannon, “Prediction and entropy of printed English,” Bell
System Technical Journal, vol. 30, pp. 50–64, 1951.
13. A. N. Skodras, C. A. Christopoulos, and T. Ebrahimi, “JPEG 2000:
the upcoming still image compression standard,” Pattern Recognition
Letters, pp. 1337–1345, 2001.
306 Information and Coding Theory in Computer Science
ENTROPY—A UNIVERSAL
CONCEPT IN SCIENCES 14
Vladimír Majerník
Mathematical Institute, Slovak Academy of Sciences, Bratislava, Slovakia
ABSTRACT
Entropy represents a universal concept in science suitable for quantifying
the uncertainty of a series of random events. We define and describe
this notion in an appropriate manner for physicists. We start with a brief
recapitulation of the basic concept of the theory probability being useful
for the determination of the concept of entropy. The history of how this
concept came into its to-day exact form is sketched. We show that the
Shannon entropy represents the most adequate measure of the probabilistic
uncertainty of a random object. Though the notion of entropy has been
introduced in classical thermodynamics as a thermodynamic state variable
INTRODUCTION
At the most fundamental level, all our further considerations rely on the
concept of probability. Although there is a well-defined mathematical
theory of probability, there is no universal agreement about the meaning of
probability. Thus, for example, there is the view that probability is an objective
property of a system and another view that it describes a subjective state of
belief of a person. Then there is the frequentist view that the probability of an
event is the relative frequency of its occurrence in a long or infinite sequence
of trials. This latter interpretation is often employed in the mathematical
statistics and statistical physics. The probability means in everyday life the
degree of ignorance about the outcome of a random trial. This is why the
probability is commonly interpreted as degree of the subjective expectation
of an outcome of a random trial. Both subjective and statistical probability
are “normed”. It means that the degree of expectation that an outcome of a
random trial occurs, and the degree of the “complementary” expectation,
that it does not, is always equal to one [1] 1.
Entropy—A Universal Concept in Sciences 311
S S1 S2 S3 S4 S5 S6
P 1/6 1/6 1/6 1/6 1/6 1/6
x 1 2 3 4 5 6
and
respectively.
The statistical moments of a random variable are often used as the
uncertainty measures of the random trial, especially in the experimental
physics, where, e.g., the standard deviation of measured quantities
characterizes the accuracy of a physical measurement. The moment
uncertainty measures of a random variable are also used by formulating the
uncertainty relations in quantum mechanics [7] .
(ii)The probabilistic or entropic measures of uncertainty of a random
trial contain in their expressions only the components of the
probability distribution of a random trial.
To determine the notion of entropy we consider quantities, called
as partial uncertainties which are assigned to individual probabilities
. A partial uncertainty we denote by symbol Hi. In any
probabilistic uncertainty measures, a partial uncertainty is function only of
Entropy—A Universal Concept in Sciences 313
(1)
where Pi and Pj are the probability of the i-th and j-th outcome, respectively;
(iii) .
It was shown that the only function which satisfies these requirements
has the form [8]
(2)
The quantity is called information-theoretical or Shannon entropy.
We denote it by symbol S. Shannon entropy is a real and positive number.
It is a function only of the components of the probability distribution
assigned to the set of outcomes of a random trial.
Shannon entropy satisfies the following demands (see Appendix):
(i) If the probability distribution contains only one component, e.g.
, and the rest components are equal to zero,
then . In this case, there is no uncertainty in a random
trial because an outcome is realized with certainty.
(ii) The more spread is the probability distribution P, the larger
becomes the entropy S.
(iii)
For a uniform probability distribution becomes
maximal. In this case, the probabilities of all outcomes are equal,
therefore the mean uncertainty of such a random trial becomes
maximum.
One uses for the characterization of a random trial a random scheme. If
is a discrete random variable assigned to a random trial then its random
scheme has the form
314 Information and Coding Theory in Computer Science
S S1 S2 Sn
X x1 x2 xn
the Boltzmann constant4. The Boltzmann law represents the solution to the
functional equation between St and W. Let us consider a set of the isolated
thermodynamic systems . According to Clausius, the total
entropy of this system is an additive function of the entropies of its parts,
i.e., it holds
(3)
On the other side, the joint “thermodynamic” probability of system (3)
is
(4)
To obtain the homomorphism between Equations (3) and (4), it is
sufficient that
(5)
which is just the Boltzmann law [2] .
We give some remarks regarding the relationship between the Clausius,
Boltzmann and Shannon entropies:
(i) The thermodynamic probability W in the Boltzmann law is given
by the number of the possibilities how to distribute N particles in
n cells having different energies
(6)
The probability Pi that a particle of the statistical ensemble has the i-th
value of energy is given by the ratio Ni/N. Inserting the probabilities
Entropy—A Universal Concept in Sciences 317
(7)
into Boltzmann’s entropy formula we have
(8)
Supposing that the number of particles in a statistical ensemble is very
large, we can use the asymptotic formula
(9)
For very large N, the second term in Equation (9) can be neglected and
we find
(10)
subject to given constraints. For example, by taking the mean energy
per particle as the constraint at the extremizing procedure, we obtain the
following probability distribution for the particle energy
(11)
Since Shannon has introduced his entropy, several other classes of
probabilistic uncertainty measures (entropies) have been described in the
literature (see, e.g., [16] ). We can broadly divide them into two classes:
(i) The Shannon-like uncertainty measures which for a certain value
of the corresponding parameters converge towards the Shannon
entropy, e.g., Rényi’s entropy
Entropy—A Universal Concept in Sciences 319
(12)
(iii) The uncertainty measures having no direct connection to Shannon
entropy, e.g., information “energy” defined in information theory
as [16]
(13)
and called Hilbert-Schmidt norm in quantum physics. The most important
uncertainty measures of the first class are:
(i) The Rényi entropy defined as follows [17]
(14)
(ii) The Havrda-Charvat entropy (or α-entropy)5 is defined as [18]
(15)
For the sake of completeness, we list some other entropy-like uncertainty
measures presented in the literature [19] :
(i) The trigonometric entropy is defined as [10]
(16)
There are six properties which are usually considered desirable for a
measure of a random trial: (i) symmetry, (ii) expansibility, (iii) subadditivity,
(iv) additivity, (v) normalization, and (vi) continuity. The only uncertainty
measure which satisfies all these requirements is Shannon entropy. Each
of the other entropies violates at least one of them, e.g. Rényi’s entropy
violates only the subadditivity property, Havrda-Charvat’s entropy violates
the additivity property, the R-norm entropies violate both subadditivity and
additivity. More details about the properties of each entropies can be found
elsewhere (e.g., [15] ). The Shannon entropy satisfies all above requirements
put on uncertainty measure and it exact matches the properties of physical
entropy6. All these classes of entropies represent the probabilistic uncertainty
measures which have similar mathematical properties as Shannon entropy.
The best known Shannon-like probabilistic uncertainty measure is the
Havrda and Charvat entropy [18] which is more general than Shannon
measure and much simpler than Renyi’s measure. It depends on a parameter
α which is from the interval . As such, it represents a family of
uncertainty measures which includes information entropy as a limiting case
when α →1. We note that in physics the Havrda Charvat entropy is known
as Tsallis entropy [20] . All the mentioned entropic measures of uncertainty
are functions of the components of the probability distribution of a random
variable and they have three important properties: (i) They assume their
maximal values for the uniform probability distribution of . (ii) They
become zero for the probability distributions having only one component.
(iii) They express a measure of the spread of a probability distribution. The
larger this spread becomes, the smaller values they assume. These properties
qualify them for being the measures of uncertainty (inaccuracy) in the
physical theory of measurement.
The entropic uncertainty measures for a discrete random variable are,
in the frame of theory of probability, exactly defined. The transition from
the discrete to the continuous entropic uncertainty measures is, however,
not always unique and has still many open problems. A continuous random
Entropy—A Universal Concept in Sciences 321
with both terms H(1)(p) and H(2)H always diverges. Usually, one
“renormalizes” by taking only the term H(1)(x) (called differential
entropy or the Shannon entropy functional H(x)) for the entropic uncertainty
measure of a continuous random variable. This functional is well known
to play an important role in probability and statistics. We refer to [15] for
applications of the Shannon entropy functional to the theory of probability
and statistics.
As it is well-known, the Shannon entropy functionals of some continuous
variables represent complicated integrals which often are difficult to
compute analytically or even numerically. Everybody, who tried to calculate
analytically the differential entropies of the continuous variables, became
aware how difficult it may be. From the purely mathematical point of view,
the differential entropy can be taken as a formula for expressing the spread
of any standard single-valued function (the probability density belongs to
this class of functions). Generally, the Shannon entropy functional assigns
to a probability density function (belonging to the class of functions L2(R1))
a real number H through a mapping L2(R1) → H. H is a monotonously
increasing function of the degree of “spreading” of p(x), i.e. the larger H
becomes, the spread is p(x).
The Shannon entropy functional was studied just at the beginning of
information theory [17] . Since that time, besides the Shannon entropy
functional, several other entropy functionals were introduced and studied
in the probability theory. The majority of them are dependent on certain
parameters. As such, they form a whole family of different functionals
322 Information and Coding Theory in Computer Science
(17)
(ii) The Havrda-Charvat entropic functional [18]
(18)
(iii) The trigonometric entropic functional [10]
(19)
CONCLUSIONS
From what has been said so far it follows:
(i) The concept of entropy is inherently connected with the probability
distribution of outcomes of a random trial. The entropy quantifies
the probability uncertainty of a general random trial.
(ii) There are two ways how to express the uncertainty of a random
trial:
The moment and probabilistic measure. The former measure includes in
its definition both values assigned to trial outcomes and their probabilities.
The latter measure contains in its definition only the corresponding
probabilities. The moment uncertainty measures are given as a rule by the
higher statistical moments of a random variable whereas the probabilistic
measure is expressed by means of entropy. The most important probabilistic
uncertainty measure is the Shannon entropy defined by the formula
APPENDIX
(A1)
(iv) (A2)
are and
respectively, then
(A3)
(A4)
The equality in (A4) is valid if and only if
(A5)
where
and
(A6)
where
and
Here
which is the so-called “uncertainty balance”, the only conservation law for
entropy.
Finally, property (vi) shows that some data on can only decrease the
uncertainty on , namely
(A7)
with equality if and only if and are independent. From (A5) and (A7)
we get
(A8)
or, equivalently,
is the distance between the random variables and , with the two random
variables considered identical if either one completely determines the other,
328 Information and Coding Theory in Computer Science
NOTES
1
The concept of probability was mathematically clarified and rigorously
determined about sixty years ago. The probability is interpreted as a
complete measure on the σ-algebra γ of the subsets S1, S2,···Sn of the set of
the elementary random events B. The probability measure P fulfils following
relations:
(i)
(iv) P(B) = 1.
The σ-algebra, on which the set function P is defined, is called the
Kolmogorov probability algebra. The triplet [B, γ, P] denotes the probability
Entropy—A Universal Concept in Sciences 329
On the other side, the thermodynamical entropy has the physical dimension
where
and
330 Information and Coding Theory in Computer Science
REFERENCES
1. Feller, W. (1968) An Introduction to Probability Theory and Its
Applications. Volume I., John Wiley and Sons, New York
2. Boltzmann, L. (1896) Vorlesungen über Gastheorie. J. A. Barth,
Leipzig.
3. Shannon, C.E. (1948) A Mathematical Theory of Communication.
The Bell System Technical Journal, 27, 53-75. http://dx.doi.
org/10.1002/j.1538-7305.1948.tb00917.x
4. Nyquist, H. (1924) Certain Factors Affecting Telegraph Speed.
Bell System Technical Journal, 3, 324-346. http://dx.doi.
org/10.1002/j.1538-7305.1924.tb01361.x
5. Khinchin, A.I. (1957) Mathematical Foundation of Information Theory.
Dover Publications, New York.
6. Aczcél, J. and Daróczy, Z. (1975) On Measures of Information and
Their Characterization. Academic Press, New York.
7. Merzbacher, E. (1967) Quantum Physics. 7th Edition, John Wiley and
Sons, New York.
8. Faddejew, D.K. (1957) Der Begriff der Entropie in der
Wahrscheinlichkeitstheorie. In: Arbeiten zur Informationstheorie I.
DVdW, Berlin.
9. Watanabe, S. (1969) Knowing and Guessing. John Wiley and Sons,
New York.
10. Majerník, V. (2001) Elementary Theory of Organization. Palacký
University Press, Olomouc.
11. Haken, H. (1983) Advanced Synergetics. Springer-Verlag, Berlin.
12. Ke-Hsuch, L. (2000) Physics of Open Systems. Physics Reports, 165,
1-101.
13. Jaynes, E.T. (1957) Information Theory and Statistical Mechanics.
Physical Review, 106, 620-630. http://dx.doi.org/10.1103/
PhysRev.106.620
14. Jaynes, E.T. (1967) Foundations of Probability Theory and Statistical
Mechanics. In: Bunge, M., Ed., Delavare Seminar in the Foundation
of Physics, Springer, New York. http://dx.doi.org/10.1007/978-3-642-
86102-4_6
15. Ang, A.H. and Tang, W.H. (2004) Probability Concepts in Engineering.
Planning, 1, 3-5.
332 Information and Coding Theory in Computer Science
SHANNON ENTROPY:
AXIOMATIC CHARACTERI-
ZATION AND APPLICATION
15
2
Department of Mathematics, Heritage Institute of Technology, Chowbaga Road, Anandapur,
India
INTRODUCTION
Shannon entropy is the key concept of information theory [12]. It has found
wide applications in different fields of science and technology [3, 4, 5, 7]. It is
a characteristic of probability distribution providing a measure of uncertainty
associated with the probability distribution. There are different approaches
to the derivation of Shannon entropy based on different postulates or axioms
[1, 8].
The object of present paper is to stress the importance of the properties of
additivity and concavity in the determination of functional form of Shannon
entropy and its generalization. The main content of the paper is divided into
three sections. In Section 2, we have provided an axiomatic derivation of
Shannon entropy on the basis of the properties of additivity and concavity
of entropy function. In Section 3, we have generalized Shannon entropy
and introduced the notion of total entropy to take account of observational
uncertainty. The entropy of continuous distribution, called the differential
entropy, has been obtained as a limiting value . In Section 4, the differential
entropy along with the quantum uncertainty relation has been used to derive
the expression of classical entropy in statistical mechanics.
(2.1)
In other words, P may be considered as a random experiment having
n possible outcomes with probabilities (p1, p2,..., pn). There is uncertainty
associated with the probability distribution P and there are different measures
of uncertainty depending on different postulates or conditions. In general,
the uncertainty associated with the random experiment P is a mapping [9]
(2.2)
where ℝ is the set of real numbers. It can be shown that (2.2) is a reasonable
measure of uncertainty if and only if it is a Shur concave on ∆n [9]. A general
class of uncertainty measures is given by
Shannon Entropy: Axiomatic Characterization and Application 335
(2.3)
where φ : [0,1] → ℝ is a concave function. By taking different concave
function defined on [0,1], we get different measures of uncertainty or
entropy. For example, if we take φ(pi) = −pi log pi, we get Shannon entropy
[12]
(2.4)
where 0log0 = 0 by convention and k is a constant depending on the unit of
measurement of entropy. There are different axiomatic characterizations of
Shannon entropy based on different set of axioms [1, 8]. In the following,
we will present a different approach depending on the concavity character of
entropy function. We set the following axiom to be satisfied by the entropy
function H(P) = H(p1, p2,..., pn).
Axiom 1. We assume that the entropy H(P) is nonnegative, that is, for all
P = (p1, p2, ..., pn), H(P) ≥ 0. This is essential for a measure.
Axiom 2. We assume that generalized form of entropy function (2.3) is
(2.5)
Axiom 3. We assume that the function φ is a continuous concave function
of its arguments.
Axiom 4. We assume the additivity of entropy, that is, for any two
statistically independent experiments P = (p1, p2,..., pn) and Q = (q1,q2,...,qm),
(2.6)
Then we have the following theorem.
Theorem 2.1. If the entropy function H(P) satisfies Axioms 1 to 4, then
H(P) is given by
(2.7)
where k is a positive constant depending on the unit of measurement of
entropy.
336 Information and Coding Theory in Computer Science
(2.8)
Then according to the axiom of additivity of entropy (2.6), we have
(2.9)
Let us now make small changes of the probabilities pk and pj of the
probability distribution P = (p1, p2,..., pj,...,pk,..., pn) leaving others undisturbed
and keeping the normalization condition fixed. By the axiom of continuity
of φ, the relation (2.9) can be reduced to the form
(2.10)
The right-hand side of (2.10) is independent of qα and the relation (2.10)
is satisfied independently of p’s if
(2.11)
The above leads to the Cauchy functional equation
(2.12)
The solution of the functional equation (2.12) is given by
(2.13)
or
(2.14)
where A, B, and C are all constants. The condition of concavity (Axiom 3)
requires A < 0 and let us take A = −k where k(> 0) is positive constant by
Axiom 1. The generalized entropy (2.5) then reduces to the form
(2.15)
or
Shannon Entropy: Axiomatic Characterization and Application 337
(2.16)
where constants (B − A) and C have been omitted without changing the
character of the entropy function. This proves the theorem.
(3.1)
Let us now generalize the above definition to take account of an
additional uncertainty due to the observer himself, irrespective of the
definition of random experiment. Let X denote a discrete random variable
which takes the values x1,x2,...,xn with probabilities p1, p2,..., pn. We
decompose the practical observation of X into two stages. First, we assume
that X ∈ L(xi) with probability pi, where L(xi) denotes the ith interval of the
set {L(x1),L(x2),...,L(xn)} of intervals indexed by xi. The Shannon entropy
of this experiment is H(X). Second, given that X is known to be in the ith
interval, we determine its exact position in L(xi) and we assume that the
entropy of this experiment is U(xi). Then the global entropy associated with
the random variable X is given by
(3.2)
Let hi denote the length of the ith interval L(xi), (i = 1,2,...,n), and define
(3.3)
We have then
(3.4)
338 Information and Coding Theory in Computer Science
(3.6)
Integrating (3.6) with respect to pj and using the boundary condition
(3.5), we have
(3.7)
so that the generalized entropy (2.3) reduces to the form
(3.8)
where we have taken A = −k < 0 for the same unit of measurement of entropy
and the negative sign to take account of Axiom 1. The constants appearing in
(3.8) have been neglected without any loss of characteristic properties. The
expression (3.8) is the required expression of total entropy obtained earlier.
Let us now see how to obtain the entropy of a continuous probability
distribution as a limiting value of the total entropy HT(X) defined above. For
this let us first define the differential entropy H(X) of a continuous random
variable X.
Definition 3.3. The differential entropy HC(X) of a continuous random
variable with probability density f (x) is defined by [2]
(3.9)
Shannon Entropy: Axiomatic Characterization and Application 339
where R is the support set of the random variable X. We divide the range
of X into bins of length (or width) h. Let us assume that the density f(x)
is continuous within the bins. Then by mean-value theorem, there exists a
value xi within each bin such that
(3.10)
We define the quantized or discrete probability distribution (p1, p2,..., pn)
by
(3.11)
so that we have then
(3.12)
The total entropy HT(X) defined for hi = h (i = 1,2,...,n),
(3.13)
then reduces to the form
(3.14)
Let h → 0, then by definition of Riemann integral, we have HT(X) →
H(X) as h → 0, that is,
(3.15)
Thus we have the following theorem.
Theorem 3.4. The total entropy HT(X) defined by (3.13) approaches to
the differential entropy HC(X) in the limiting case when the length of each
bin tends to zero.
(4.1)
showing that when h → 0 that is, when the length of the bins h is very small,
the quantized entropy given by the left-hand side of (4.1) approaches not to
the differential entropy HC(X) defined in (3.9) but to the form given by the
right-hand side of (4.1) which we call modified differential entropy. This
relation has important physical significance in statistical mechanics. As an
application of this relation, we now find the expression of classical entropy
as a limiting case of quantized entropy.
Let us consider an isolated system with configuration space volume V
and a fixed number of particles N, which is constrained to the energy shell
R = (E,E + ∆E). We consider the energy shell rather than just the energy
surface because the Heisenburg uncertainty principle tells us that we can
never determine the energy E exactly. we can make ∆E as small as we like.
Let f (XN) be the probability density of microstates defined on the phase
space Γ = {XN = (q1,q2,...,q2N ; p1, p2,..., p2N )}. The normalized condition is
(4.2)
where
(4.3)
Following (4.1), we define the entropy of the system as
(4.4)
The constant C appearing in (4.4) is to be determined later on. The
N
(4.5)
Shannon Entropy: Axiomatic Characterization and Application 341
(4.6)
The constant C has the same unit as Ω(E,V,N) and cannot be determined
N
(4.7)
The classical entropy that follows a limiting case of von Neumann
entropy is given by [14]
(4.8)
This is, however, different from the one given by (4.7) and it does not
lead to the form of Boltzmann entropy (4.6).
CONCLUSION
The literature on the axiomatic derivation of Shannon entropy is vast [1,
8]. The present approach is, however, different. This is based mainly on
the postulates of additivity and concavity of entropy function. These are, in
fact, variant forms of additivity and nondecreasing characters of entropy in
thermodynamics. The concept of additivity is dormant in many axiomatic
derivations of Shannon entropy. It plays a vital role in the foundation of
Shannon information theory [15]. Nonadditive entropies like Renyi entropy
and Tsallis entropy need a different formulation and lead to different
342 Information and Coding Theory in Computer Science
physical phenomena [11, 13]. In the present paper, we have also provided
a new axiomatic derivation of Shannon total entropy which in the limiting
case reduces to the expression of modified differential entropy (4.1). The
modified differential entropy together with quantum uncertainty relation
provides a mathematically strong approach to the derivation of the expression
of classical entropy.
Shannon Entropy: Axiomatic Characterization and Application 343
REFERENCES
1. J. Aczel and Z. Dar ´ oczy, ´ On Measures of Information and Their
Characterizations, Mathematics in Science and Engineering, vol. 115,
Academic Press, New York, 1975.
2. T. M. Cover and J. A. Thomas, Elements of Information Theory, Wiley
Series in Telecommunications, John Wiley & Sons, New York, 1991.
3. E. T. Jaynes, Information theory and statistical mechanics, Phys. Rev.
(2) 106 (1957), 620–630.
4. G. Jumarie, Relative Information. Theories and Applications, Springer
Series in Synergetics, vol. 47, Springer, Berlin, 1990.
5. J. N. Kapur, Measures of Information and Their Applications, John
Wiley & Sons, New York, 1994.
6. L. D. Landau and E. M. Lifshitz, Statistical Physics, Pergamon Press,
Oxford, 1969.
7. V. Majernik, Elementary Theory of Organization, Palacky University
Press, Olomouc, 2001.
8. A. Mathai and R. N. Rathie, Information Theory and Statistics, Wiley
Eastern, New Delhi, 1974.
9. D. Morales, L. Pardo, and I. Vajda, Uncertainty of discrete stochastic
systems: general theory and statistical inference, IEEE Trans. Syst.,
Man, Cybern. A 26 (1996), no. 6, 681–697.
10. L. E. Reichl, A Modern Course in Statistical Physics, University of
Texas Press, Texas, 1980.
11. A. Renyi, ´ Probability Theory, North-Holland Publishing, Amsterdam,
1970.
12. C. E. Shannon and W. Weaver, The Mathematical Theory of
Communication, The University of Illinois Press, Illinois, 1949.
13. C. Tsallis, Possible generalization of Boltzmann-Gibbs statistics, J.
Statist. Phys. 52 (1988), no. 1-2, 479–487.
14. A. Wehrl, On the relation between classical and quantum-mechanical
entropy, Rep. Math. Phys. 16 (1979), no. 3, 353–358.
15. T. Yamano, A possible extension of Shannon’s information theory,
Entropy 3 (2001), no. 4, 280– 292.
Chapter
SHANNON ENTROPY IN
DISTRIBUTED SCIENTIFIC
CALCULATIONS ON
16
MOBILES AD-HOC
NETWORKS (MANETS)
ABSTRACT
This paper addresses the problem of giving a formal metric to estimate
uncertainty at the moment of starting a distributed scientific calculation on
clients working over mobile ad-hoc networks (MANETs). Measuring the
uncertainty related to the successful completion of a distributed computation
on the aforementioned network infrastructure is based on the Dempster-
Shafer Theory of Evidence (DST). Shannon Entropy will be the formal
mechanism by which the conflict in the scenarios proposed in this paper
will be estimated. This paper will begin with a description of the procedure
INTRODUCTION
Mobile computing has been established as the de facto standard for Web
access, owing to users preferring it to other connection alternatives.
Mobile ad-hoc networks, or MANETs, are currently the focus of attention
in mobile computing, as they are the most flexible and adaptable network
technology in existence today [1]. These qualities are particularly desirable
in the development of applications meant for this kind of infrastructure—a
number of American government projects, such as the military investment
in resources for the development of this technology, bear witness to this fact.
As previously mentioned, ad-hoc mobile networks are the most flexible
and adaptable communication architecture currently in existence. These
wireless networks are comprised of interconnected autonomous nodes.
They are self-organized and self-generated, which eliminates the need for a
centralized infrastructure.
The use of this type of networks as a new alternative for the implementation
of distributed computing systems is closely related to the capability to begin
calculation, assign parts and collect results once computation is finished.
Due to the intrinsic nature of this kind of network, there is no certainty that
all the stages involved in this kind of calculation can be completed, which
makes estimating the uncertainty in these scenarios, a vital capability.
(1)
(2)
time intervals can be considered to be randomly distributed, whereby the
previously presented function turns into a stochastic process. Consequently,
in dynamical environments, Connectivity Probability is defined as follows:
348 Information and Coding Theory in Computer Science
(3)
where is the expected value, as long as it exists. It can be seen that P+ is
time-dependent: P+ = P+(t). For stationary stochastic processes P+ = const. If
the stationary process is ergodic, then (3) can be substituted by:
(4)
This equality is equivalent to:
(5)
where and the mes function are used to measure the total length of
the interval. The problem of whether the network is connected is
thus reduced to determining the existence and estimation of the expected
value (3), and if the mobility model is stationary and ergodic, (5) can be used
to estimate connectivity [2].
(6)
where Π is the phase space, x is a set of coordinates in Π (usually position and
speed) and the dot indicates that time is the differential. Let n be a number
of nodes and its phase coordinates, then these coordinates satisfy
the following differential equation:
(7)
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 349
(8)
where
(9)
is the measure of the entire phase space [2].
Let f be a function characteristic of a measurable domain D:
(10)
Since f is limited and D is measurable, is integrable. In this
case, the left side of (12) is equivalent to the time interval 0 ≤ t ≤ T when
resides in the domain D.
Thus the Connectivity Probability of an ad-hoc mobile network will be
equivalent to the right side of (12):
350 Information and Coding Theory in Computer Science
(11)
This approximation can be interpreted in terms of the theory of stochastic
processes in phase space . The probability for a system in measurable
domain D is determined by formula (11). Let f (x) be a function characteristic
of domain D and x (t) the solution of system (6). Thus the function f (x(t))
can be interpreted as a stochastic process. Let E[f(t)] be the expected value
of the function f(t) at time t. If the right side of Equation (6) is not time-
dependent, then the stochastic process is stationary. In particular, this means
that E[f(t)] does not depend on t. If the system is also ergodic, the expected
value can be calculated using formula (11):
(12)
Therefore, the problem of calculating expected value (3) is reduced to a
geometric problem in which we must determine the volume of the domains
in a phase space if the process is ergodic [2].
(13)
Shannon Entropy will be the formal mechanism by which the conflict
will be estimated in this document. This measure of uncertainty stems from a
probability distribution obtained from observing the results of an experiment
or any other research mechanism. Probability distribution p has the form p =
〈p(x) ∣ x ∈ X〉 where X is the domain of discourse. Additionally, a decreasing
function in relation to incidence probability is defined, called anticipatory
uncertainty, which must have a decreasing monotonous continuous mapping,
and be additive and normalized. This yields that the anticipatory uncertainty
of an x result is: −log2 p(x).
Thus, Shannon Entropy, which provides the expected value of the
anticipatory uncertainties for each element of the domain of discourse [3],
takes the following form:
(14)
The normalized version of (14) takes the following form:
(15)
and is the one used to calculate uncertainty in the simulations performed.
352 Information and Coding Theory in Computer Science
SIMULATION
Following, we present an adjustment to the previously obtained theoretical
results, in order to reach a simulation method that is consistent with them. A
description of scenarios posed and results obtained will follow.
(16)
As shown, S(p) is zero when p reaches the value zero or one, therefore,
there will always be uncertain about the result. Another interesting fact is
that the formula of Shannon Entropy which results from this is normalized,
since:
(17)
The results of the simulation scenarios are shown in Figure 1.
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 355
(18)
(19)
(20)
(21)
(22)
(23)
and the extreme values of its possible results are: 0 (no relationship) and ±1
(maximal relationship) [7]. The variables analyzed by means of this method,
p and s, must fulfill certain requirements. The two that are the most relevant
to this study are as follows:
• Variables p and s must be continuous.
• The relationship between p and s must be linear
Of the two previous conditions, the most relevant to our observations,
or restrictive of them, as the reader prefers, is 2, since it would require
the relationship between Connectivity Probability and Shannon Entropy
to be linear, which is not the case. Therefore, it is evident that Pearson’s
Coefficient cannot be used as proposed, since the relationship under analysis
is curvilinear. For this reason, this behavior will be analyzed dividing
variable p into two segments, which will result in two study groups: the
first, we will call g1 where p ∈ (0,0,5), and the second, g2, where p ∈ [0,5,1).
Correlation will therefore be calculated in separate groups.
The results obtained and detailed in Table 1, show that:
• g1 exhibits direct (positive) dependence between p and s, i.e., for
large values of p there will be large values of s.
• g2 shows that the relationship between p and s is inverse (negative)
dependence, i.e., for large values p the values of s will be small.
Based on these results, we can conclude the following for MANETs that
operate on surfaces of:
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 357
CONCLUSION
The development of this work duly evidenced and documented that the
uncertainty existing at the beginning of a distributed computation on a
MANET will depend directly on the amount of nodes participating in it and
on the surface involved. This statement is based on the results obtained from
the simulations detailed in this document, which allowed us to conclude
that uncertainty begins to decrease once node density has reached a certain
threshold, and that this threshold takes different values for different surfaces.
Works oriented towards correctly identifying the amount of uncertainty
existing at the time the results of a distributed calculation on ad-hoc mobile
networks are collected bring the potential benefit that they can be used
to develop more intelligent workload distribution strategies that take into
account the amount of uncertainty they will have to deal with, which will
necessarily results in more efficient computations. In this sense and based
on the latest studies oriented towards providing more certain mechanisms as
to the conservation of power in the devices that comprise a MANET [8] or
on equally relevant studies focusing on achieving the greatest cooperation
possible between the nodes of an ad hoc mobile network [9], thus mitigating
their intrinsic egotism, the results of having an uncertainty measure that would
either indicate that there is no certainty to achieve calculation completion
or ensure its success will be twofold. In the first of the aforementioned
two fields of study, preventing workload distribution in situations where
calculation concretion is not ensured will have a direct repercussion in the
conservation of power in devices, which will result in longer operational
periods which unable to identify the aforementioned scenarios. The second
research field seeks to maximize cooperation among the nodes. With
this in mind, in scenarios where completion certainty is medium or low,
one possible distribution strategy oriented toward collaboration could be
assigning workload only to the most collaborative nodes, to avoid the risk
of assigning load to un-collaborative nodes, which, in the event of result
collection failure, may take a more selfish or conservationist attitude toward
their resources (such as power) and leave the MANET. In a scheme of
mobile distributed calculation where all participants offer their collaboration
to find the answer to a common interest problem, such as the SETI@
Home program [10], measuring uncertainty can be used as a function to
grant credit to collaborators—when a participant is notified that there is
a medium to high level of uncertainty regarding computation success and
they decide to participate nonetheless, more credits can be granted than in
scenarios where total certainty of success exists. If more credits mean more
Shannon Entropy in Distributed Scientific Calculations on Mobiles ... 359
benefits for the participant in some way, for example, publicity of the most
committed participant in the calculation environment, then we would have
a psychological mechanism of positive reinforcement that would promote
node collaboration, which would enable a network conformed by more
collaborative and satisfied participants that are less egotistic.
All potential strategies of distributed computing over MANETs
presented in this document and others that can emerge from an intelligent
use of uncertainty measures will bring with them new types of applications
that will seize all the power of the underlying network infrastructure.
360 Information and Coding Theory in Computer Science
REFERENCES
1. P. Mohapatra and S. Krishhamurthy, “Ad Hoc Networks Technologies
and Protocols,” 2004.
2. T. K. Madsen, F. H. P. Fitzek, R. Prasad and G. Schulte, “Connectivity
Probability of Wireless Ad Hoc Networks: Definition, Evaluation,
Comparison,” Vol. 35, 2005, ISSN: 0929-6212, pp. 135-151.
3. C. S. Duque, “Medidas Para Estimar la Incertidumbre Basada en
Información Deficiente,” 2012.
4. F. Bai and A. Helmy, “A Survey of Mobility Models in Wireless Adhoc
Networks,” 2008.
5. C. Bettstetter, H. Hartenstein and X. Perez-Costa, “Stochastic
Properties of the Random Waypoint Mobility Model in ACM/Kluwer
Wireless Networks,” Special Issue on Modeling and Analysis of
Mobile Networks, Vol. 10, No. 5, 2004.
6. F. Xu, “Correlation-Aware Statistical Methods for Sampling-Based
Group by Estimates,” Doctoral Thesis, University of Florida, 2009.
7. P. M. Vallejo, Correlación y Covarianza. Revision, 30 de Octubre de
2007.
8. S. Prakash, J. P. Saini and S. C. Gupta, “A Vision of Energy Management
Schemes in Wireless Ad Hoc Networks,” ISSN: 2278-7844, 2012.
9. A. E. Hilal and A. B. Mackenzie, “Mitigating the Effect of Mobility
on Cooperation in Wireless Ad Hoc Networks,” 8th IEEE International
Conference on Wireless and Mobile Computing, Networking and
Communications (WiMob), 2012.
10. SETI@home. http://setiathome.berkeley.edu/
Chapter
ADVANCING SHANNON
ENTROPY FOR MEASURING
DIVERSITY IN SYSTEMS
17
Department of Sociology, Kent State University, 3300 Lake Rd. West, Ashtabula, OH, USA
2
School of Social and Health Sciences, Abertay University, Dundee DD1 1HG, UK
3
ABSTRACT
From economic inequality and species diversity to power laws and the
analysis of multiple trends and trajectories, diversity within systems
is a major issue for science. Part of the challenge is measuring it.
Shannon entropy H has been used to rethink diversity within probability
distributions, based on the notion of information. However, there are
INTRODUCTION
Statistical distributions play an important role in any branch of science that
studies systems comprised of many similar or identical particles, objects,
or actors, whether material or immaterial, human or nonhuman. One of
the key features that determines the characteristics and range of potential
behaviors of such systems is the degree and distribution of diversity, that is,
the extent to which the components of the system occupy states with similar
or different features.
As Page outlined in a series of inquiries [1, 2], including The Difference
and Diversity and Complexity, diversity within systems is an important
concern for science, be it making sense of economic inequality, expanding
the trade portfolio of countries, measuring the collapse of species diversity
in various ecosystems, or determining the optimal utility/robustness of
a network. However, an important major challenge in the literature on
diversity and complexity, which Page also points out [1, 2], remains: the
issue of measurement. Although statistical distributions that directly reflect
the spread of key parameters (such as mass, age, wealth, or energy) provide
descriptions of this diversity, it can be difficult to compare the diversity
of different distributions or even the same distribution under different
conditions, mostly because of differences in scales and parameters. Also,
Advancing Shannon Entropy for Measuring Diversity in Systems 363
(1)
The problem, however, with the Shannon entropy index 𝐻, as we identified
in our abstract and Introduction, is that while being useful for studying the
diversity of a single system, it cannot be used to compare the diversity
across probability distributions. In other words, 𝐻 is not multiplicative: a
doubling of value for 𝐻 does not mean that the actual diversity has doubled.
To address this problem, we turned to the true diversity measure 𝐷 [3, 11,
12], which gives the range of equiprobable values of 𝑥 that gives the same
value of 𝐻:
(2)
The utility of 𝐷 for comparing the diversity across probability
distributions is that, in 𝐷, a doubling of the value means that the number
of equiprobable ranges of values of 𝑥 has doubled as well. 𝐷 calculates
the range of such equiprobable values of 𝑥 that will give the same value of
Shannon entropy 𝐻 as observed in the distribution of 𝑥. We say that two
368 Information and Coding Theory in Computer Science
(3)
The value of 𝐷𝑐 for a given value of cumulative probability 𝑐 is the
number of Shannon-equivalent equiprobable energy states (or of values
of the variable in the 𝑥-axis in general) that are required to explain the
information up to a cumulative probability of 𝑐 within the distribution. If
𝑐 = 1, then 𝐷𝑐 = 𝐷 is the number of such Shannon-equivalent equiprobable
energy states for the entire distribution itself.
We can then simply calculate the fractional diversity contribution or
case-based entropy as
(4)
It is at this point that the renormalization (𝐶𝑐 as a function of 𝑐) becomes
Advancing Shannon Entropy for Measuring Diversity in Systems 369
scale independent as both axes range between values of 0 and 1 with the
graph of 𝐶𝑐 versus 𝑐 passing through (0, 0) and (1, 1). Hence, irrespective
of the range and scale of the original distributions, all distributions can be
plotted on the same graph and their diversity contributions can be compared
in a scale-free manner.
To check the validity of our formalism, we calculate 𝐷𝑐 for the simple
case of a uniform distribution given by (𝑥) = [0,𝐿](𝑥) on the interval 𝑋 = [0,
𝐿]. Intuitively, if we choose 𝑋𝑐 = [0, 𝑐], then, owing to the uniformity of the
distribution, we expect 𝐷𝑐 = 𝑐 itself. In other words, the diversity of the part
[0, 𝑐] is simply equal to 𝑐, that is, the length of the interval [0, 𝑐], and hence
the 𝐶𝑐 versus 𝑐 curve will simply be the straight line with slope equal to 1.
This can be shown as follows:
(5)
With our formulation of 𝐶𝑐 complete, we turn to the energy distributions
for particles governed by Boltzmann, Bose-Einstein, and Fermi-Dirac
statistics.
RESULTS
(6)
where 𝑘𝐵 is the Boltzmann constant and 𝛽 = (1/𝑘𝐵𝑇).
The entropy of 𝑝𝐵,1𝐷(𝐸) can be shown to be 𝐻𝐵 = 1− ln(𝛽), and hence the
true diversity of energy in the range [0,∞) is given by
370 Information and Coding Theory in Computer Science
(7)
The cumulative probability 𝑐 from 𝐸 = 0 to 𝐸 = 𝑘 is then given by
(8)
Hence, 𝑘 can be computed in terms of 𝑐 as
(9)
Equation (9) is useful for the one-dimensional Boltzmann case to
eliminate the parameter 𝑘 altogether in (11) to obtain an explicit relationship
between 𝐶𝑐 and 𝑐. It is to be noted that, in most cases, both 𝐶𝑐 and 𝑐 can
only be parametrically related through 𝑘. The other quantities introduced in
Section 3 can then be calculated as follows:
(10)
(11)
(12)
(13)
We note that, in (13), the temperature factor 𝛽 cancels out, indicating that
the distribution of diversity for an ideal gas in one dimension is independent
Advancing Shannon Entropy for Measuring Diversity in Systems 371
(14)
where the additional factor of accounts for the density of states.
The cumulative probability 𝑐 from 𝐸 = 0 to 𝐸 = 𝑘 can be computed as
follows:
(15)
As we would hope, (15) has the property that as 𝑘 → ∞, the cumulative
probability 𝑐 → 1.
However, it is difficult to solve (15) for 𝑘 directly in terms of 𝑐. We
therefore compute 𝐶𝑐 in parametric form with 𝑘 being the parameter. Also,
analytical forms are not possible, so Matlab was used to compute 𝐻𝑐, 𝐷𝑐, and
𝐶𝑐, respectively:
(16)
Thus, 𝐶𝑐 can also only be computed in parametric form with parameter
𝑘 that varies from 0 to ∞. Figure 2 shows the 𝐶𝑐 curve thus calculated for the
Boltzmann distribution in three dimensions.
Advancing Shannon Entropy for Measuring Diversity in Systems 373
Hence, the current result will still hold true for gas molecules with higher
degrees of freedom; that is, the distribution of diversity is always exactly the
same for an ideal gas, whether monoatomic or polyatomic.
(17)
where 𝐶 is a normalization constant and
(18)
where 𝜁 is the Riemann zeta function. In the following calculations, we use
the Bose temperature for helium, 𝑇𝐵 = 3.14 K.
For massless bosons such as photons, the energy probability density
function is [13]
(19)
It is important to note that the “density of states” factors shown in (17)
and (19) result in different energy distributions, despite the two types of
boson obeying the same statistics.
The conditional probabilities, conditional entropies, true diversities, and
case-based entropies for these distributions cannot be calculated analytically
but can be calculated numerically. The results of such calculations, using the
software Matlab, are shown in Figure 3.
Advancing Shannon Entropy for Measuring Diversity in Systems 375
Figure 3. 𝐶𝑐 versus 𝑐 for Helium-4 and for photons. Note: the results of calcula-
tions carried out at 𝑇 = 50 K, 500 K, and 5000 K are overlaid.
As with the Boltzmann distributions, we find that the distributions
of diversity for the two boson systems are independent of temperature.
Although the curves for the two types of boson are very similar, it is evident
that the distributions of diversity do differ to some extent. For helium-4
bosons, a slightly larger fraction of particles are contained in lower diversity
energy states than is the case for photons, with 60% of atoms contained
in the approximately 37% of the lowest diversity states, as compared to
approximately 42% for photons. In other words, using 𝐶𝑐, we are able to
identify, even in such instances where intuition might suggest it to be true,
common patterns within and across these different energy systems, as
well as their variations. With this point made, we move to our final energy
distribution.
(20)
376 Information and Coding Theory in Computer Science
CONCLUSION
As we have hopefully shown in this paper, while Shannon entropy 𝐻 has
been used to rethink probability distributions in terms of diversity, it suffers
from two major limitations. First, it cannot be used to compare distributions
that have different levels of scale. Second, it cannot be used to compare
parts of distributions to the whole.
To address these limitations, we introduced a renormalization of
probability distributions based on the notion of case-based entropy 𝐶𝑐 (as
a function of the cumulative probability 𝑐). We began with an explanation
of why we rethink probability distributions in terms of diversity, based on
a Shannon-equivalent uniform distribution, which comes from the work of
Jost and others on the notion of true diversity in ecology and evolutionary
biology [4, 9, 10]. With this approach established, we then reviewed our
construction of case-based entropy 𝐶𝑐. Given a probability density (𝑥), 𝐶𝑐
measures the diversity of the distribution up to a cumulative probability of
380 Information and Coding Theory in Computer Science
ACKNOWLEDGMENTS
The authors would like to thank the following colleagues at Kent State
University: (1) Dean Susan Stocker, (2) Kevin Acierno and Michael Ball
(Computer Services), and (3) the Complexity in Health and Infrastructure
Group for their support. They also wish to thank Emma Uprichard and
David Byrne and the ESRC Seminar Series on Complexity and Method in
the Social Sciences (Centre for Interdisciplinary Methodologies, University
of Warwick, UK) for the chance to work through the initial framing of these
ideas.
Advancing Shannon Entropy for Measuring Diversity in Systems 381
REFERENCES
1. S. E. Page, The Difference: How the Power of Diversity Creates Better
Groups, Firms, Schools, and Societies, Princeton University Press,
2008.
2. S. E. Page, Diversity and Complexity, Princeton University Press,
2008.
3. M. O. Hill, “Diversity and evenness: a unifying notation and its
consequences,” Ecology, vol. 54, no. 2, pp. 427–432, 1973.
4. L. Jost, “Entropy and diversity,” Oikos, vol. 113, no. 2, pp. 363–375,
2006.
5. R. Rajaram and B. Castellani, “An entropy based measure for comparing
distributions of complexity,” Physica A. Statistical Mechanics and Its
Applications, vol. 453, pp. 35–43, 2016.
6. B. Castellani and R. Rajaram, “Past the power law: complex systems
and the limiting law of restricted diversity,” Complexity, vol. 21, no. 2,
pp. 99–112, 2016.
7. M. E. J. Newman, “Power laws, Pareto distributions and Zipf’s law,”
Contemporary Physics, vol. 46, no. 5, pp. 323–351, 2005.
8. M. C. Mackey, Time’s Arrow: The Origins of Thermodynamic Behavior,
Springer Verlag, Germany, 1992.
9. T. Leinster and C. A. Cobbold, “Measuring diversity: the importance
of species similarity,” Ecology, vol. 93, no. 3, pp. 477–489, 2012.
10. J. Beck and W. Schwanghart, “Comparing measures of species
diversity from incomplete inventories: an update,” Methods in Ecology
and Evolution, vol. 1, no. 1, pp. 38–44, 2010.
11. R. H. Macarthur, “Patterns of species diversity,” Biological Reviews,
vol. 40, pp. 510–533, 1965.
12. R. Peet, “The measurement of species diversity,” Annual Review of
Ecological Systems, vol. 5, pp. 285–307, 1974.
13. C. H. Tien and J. H. Lienhard, Statistical Thermodynamics, Hemisphere,
1979.
INDEX
A B
acknowledgments (ACK) 192 BAB (block array builder) 86
adaptive modulation and coding backward error correction (BEC)
(AMC) 192 188
additive white Gaussian noise base station (BS) 149, 150, 198
(AWGN) 14, 158, 165, 167, Base switching (BS) 273
169, 188 base-switching transformation
Airborne Visible Infrared Imaging (BST) algorithm 297
Spectrometer (AVIRIS) 236 BBM (Bialynicki-Birula and My-
antenna array elements 149, 150, cielski) 37
151, 158, 159 Beam Pattern Scanning (BPS) 149
arithmetic coding 290, 291, 298, Big Data 58
299, 301, 304, 305 binary erasure channel (BECH) 188
ARM (Advanced RISC Machines) binary symmetrical channel (BSC)
78 188
Artificial Intelligence 57, 58, 72 bit error rate (BER) 183
ASIC (application-specific integrat- bits back with asymmetric numeral
ed circuit) 83 systems (BB-ANS) 120
augmented reality (AR) video Boltzmann distribution 369, 371,
streams 196 372, 377, 379
Automatic repeat request (ARQ) Boltzmann law 315, 316
191 Bose, Chaudhuri and Hocquenghem
autonomous differential equations (BCH) 212, 217
348 Breast cancer 270, 286
autonomous nodes 346 BWT (Burrows-Wheeler transform)
81
384 Information and Coding Theory in Computer Science