Avcibas PDF
Avcibas PDF
Avcibas PDF
COMPRESSION
by
smail Avcba
B.S. in E.E., Uluda University, 1992
M.S. in E.E., Uluda University, 1994
Boazii University
2001
ii
APPROVED BY:
DATE OF APPROVAL.
iii
TABLE OF CONTENTS
ACKNOWLEDGMENTS..v
ABSTRACT.vii
ZET......viii
LIST OF FIGURES...ix
LIST OF TABLES.x
LIST OF SYMBOLS / ABBREVATIONS.xii
1. INTRODUCTION..1
1.1. Motivation..1
1.2. Approaches.3
1.3. Contributions..5
1.4. Outline6
2. STATISTICAL EVALUATION OF IMAGE QUALITY MEASURES.8
2.1. Introduction...8
2.2. Image Quality Measures..11
2.2.1. Measures Based on Pixel Differences..12
2.2.1.1. Minkowsky Metrics...12
2.2.1.2. MSE in Lab Space.14
2.2.1.3. Difference over a Neighborhood...15
2.2.1.4 Multiresolution Distance Measure.16
2.2.2. Correlation-Based Measures17
2.2.2.1. Image Correlation Measures...17
2.2.2.2. Moments of the Angles...18
2.2.3. Edge Quality Measures.19
2.2.3.1. Pratt Measure..20
2.2.3.2. Edge Stability Measure...21
2.2.4. Spectral Distance Measures..22
2.2.5. Context Measures.25
2.2.5.1. Rate-Distortion Based Distortion Measure.26
2.2.5.2. f-divergences...27
2.2.5.3. Local Histogram Distances.28
iv
ACKNOWLEDGMENTS
vi
vii
ABSTRACT
Kohonen maps. Measures that give consistent scores across an image class and that are
sensitive to distortions and coding artifacts are pointed out.
We present techniques for steganalysis of images that have been potentially
subjected to watermarking or steganographic algorithms.
watermarking and steganographic schemes leave statistical evidence that can be exploited
for detection with the aid of image quality features and multivariate regression analysis.
The steganalyzer is built using multivariate regression on the selected quality metrics. In
the absence of the ground-truth, a common reference image is obtained based on blurring.
Simulation results with the chosen feature set and well-known watermarking and
steganographic techniques indicate that our approach is able to reasonably accurately
distinguish between marked and unmarked images.
We also present a technique that provides progressive transmission and near-lossless
compression in one single framework. The proposed technique produces a bitstream that
results in progressive reconstruction of the image just like what one can obtain with a
reversible wavelet codec. In addition, the proposed scheme provides near-lossless
reconstruction with respect to a given bound after each layer of the successively refinable
bitstream is decoded. Experimental results for both lossless and near-lossless cases are
presented, which are competitive with the state-of-the-art compression schemes.
viii
ZET
mge nitelik ltlerini detayl bir ekilde snflandrdk, gri seviye imgeler iin
tanmlanm olan imge nitelik ltlerini ok bantl imgelere genilettik. Bu ltleri
deiik bozulumlar, imge kodlama ve damgalama uygulamalarnda ortaya kan
bozulumlar iin istatistiksel olarak karlatrdk.
Deiinti analizlerine
dayanarak imge kodlama, damgalama ve dier bozulumlara tutarl ve duyarl tepki veren
ltler saptand.
Damgalanm imgelerde damga varlnn sezimi ile ilgili yntemler sunulmaktadr.
Belirli bir damgalama ynteminin imgede istatistiksel ve yapsal izler brakaca
varsaymndan yola karak, bu izlerin uygun zniteliklerin seimi ve oklu balanm
analizi ile damga varlnn seziminde kullanlabilecei gsterilmitir.
yi bilinen
damgalama yntemleri ve zengin bir imge kmesi zerinde elde edilen grece yksek
doru sezim yzdeleri, nerilen yntemlerin baarl olduunu kantlamaktadr.
Yeni bir yitimsiz kodlama yntemi ile, yitimlerin istenen snrlar iinde kalmasnn
saland, dolaysyla snrl yitimli olarak adlandrlan bir kodlama yntemi gelitirildi.
Yntemimizin bir dier nemli zellii, herhangi bir
kodlayclar salayamamaktadrlar.
deneysel sonular, nerilen yntemin baarmnn bilinen en iyi kodlayclar kadar iyi hatta
baz imgeler iin daha da iyi olduunu gstermektedir.
ix
LIST OF FIGURES
Figure 2.1. Box plots of quality measure scores. A) good measure, b) moderate
measure, c) poor measure. The F-scores as well as the significance
level p are given....35
Figure 2.2. SOM of distortion measures for JPEG and SPIHT...43
Figure 3.1. Schematic descriptions of (a) watermarking or stegoing, (b) filtering
an un-marked image, (c) filtering a marked image..52
Figure 3.2. Scatter plots of the three image quality measures (M3: czenakowski
measure, M5: image fidelity, M6: normalized cross-correlation)....53
Figure 3.3. Schematic description of (a) training, and (b) testing.58
Figure 4.1. Ordering of the causal prediction neighbors of the current pixel i , N=6...70
Figure 4.2. The context pixels, denoted by and ! , used in the covariance
estimation of the current pixel . The number of context pixels is K=40..70
Figure 4.3. Causal, ! , and non-causal, , neighbors of the current pixel, ,
used for probability mass estimation in the second and higher passes.....71
Figure 4.4. Schematic description of the overall compression scheme...75
Figure 4.5. Details of the encoder block used in figure 4.4. Here is the length
of the interval (Lm , Rm ] .76
Figure 4.6. The decoder is a replica of the encoder....77
LIST OF TABLES
Table 2.1. List of symbols and equation numbers of the quality metrics11
Table 2.2. ANOVA results (f-scores) for the JPEG and SPIHT compression
distortions as well as additive noise and blur artifacts. For each
distortion type the variation due to image set is also established...39
Table 2.3. One-way ANOVA results for each image class and two-way ANOVA
results for the distortions on the combined and image set independence..40
Table 2.4. ANOVA scores for the bit rate variability (combined JPEG and SPIHT
scores) and coder variation41
Table 3.1. One-way ANOVA tests for watermarking, steganography and pooled
watermarking and steganography...55
Table 3.2. Training and test samples for DIGIMARC and PGS for experiment 1.59
Table 3.3. Training and test samples for COX for experiment 159
Table 3.4. Training and test samples for pooled watermarking algorithms for
experiment 2 (L1: level 1 etc.)...60
Table 3.5. Training and test samples for experiment 3: train on DIGIMARC,
test on PGS and COX....60
Table 3.6. Training and test samples for Stools for experiment 4...63
Table 3.7. Training and test samples for Jsteg for experiment 4.60
Table 3.8. Training and test samples for Steganos for experiment 4..60
xi
Table 3.9. Training and test samples for pooled stego algorithms for experiment 5..61
Table 3.10. Training and test samples for experiment 6: train on
Steganos and Stools, test on Jsteg61
Table 3.11. Training and test samples for pooled watermarking and
steganography algorithms for experiment 761
Table 3.12. Training and test samples for experiment 8: train on Steganos,
Stools and DIGIMARC, test on Jsteg and COX......61
Table 3.13. Performance of the steganalyzer for all the experiments.62
Table 4.1. Comparison of lossless compression results: proposed method
versus CALIC.
.78
xii
Norm of vector a
a, b
Multispectral image
Ck
kth band of C
C k (i, j )
C(i, j )
C kl
Cm (i, j )
C w (i, j )
DCT
D( p q )
d (, )
Distance metric
d rk
E(i, j, m )
Ep
Cover signal
Stego signal
G m ( x, y )
.
H ()
Entropy funtion
H2
H0
Null hypothesis
HA
Alternative hypothesis
H ( )
H (m, n )
xiii
(Lm , Rm ]
M (u , v)
Magnitude spectra
pi
p*
{p }
Q(i, j )
R( p )
{ }
xm
Variance
(u , v)
Phase spectra
Test set
k (u , v)
kl (u , v )
Mean
Training set
k (u, v )
Watermark signal
ANOVA
ANalysis Of VAriance
C1
Normalized Cross-Correlation
C2
Image Fidelity
C3
Czekanowski Correlation
i j
j =1
j =1
xiv
C4
C5
CALIC
CIE
D1
D2
D3
D4
D5
Neighborhood Error
D6
Multiresolution Error
DCT
DFT
E1
E2
H1
H2
HVS L2 Norm
H3
Browsing Similarity
H4
DCTune
HVS
i.i.d.
IQM
ITU
JND
JPEG
KS
Kolmogorov-Smirnov
LSB
MAP
Maximum A Posteriori
ML
Maximum Likelihood
MSE
p.d.f.
p.m.f.
PSNR
Peak SNR
RGB
xv
S1
S2
S3
S4
S5
SNR
SOM
SPIHT
SRC
VQEG
Z1
Z2
Hellinger distance
Z3
Z4
1. INTRODUCTION
1.1. Motivation
There has been an explosive growth in multimedia technology and applications in the
past several years. Efficient representation for storage, transmission, retrieval and security
are some of the biggest challenges faced.
The first concern addressed in this thesis is the efficient compression of data. Visual
information is one of the richest but also most bandwidth consuming modes of
communication. However, to meet the requirements of new applications such as mobile
multimedia, interactive databases (encyclopedias, electronic newspaper, travel information,
and so on) powerful data compression techniques are needed to reduce the bit rate, even in
the presence of growing communications channels offering increased bandwidth. Other
applications are in remote sensing, education and entertainment.
Not only reducing the bit rate but functionalities of progressive transmission or
progressive decoding of the bit stream became more important features of compression
schemes. A typical application is data browsing. A user may want to visualize the picture
at a lower quality to save transmission time. Another application is tele-radiology where a
physician can request portions of an image at increased quality (including lossless
reconstruction) while accepting unimportant portions at much lower quality, thereby
reducing the overall bandwidth required for transmitting an image.
image quality. Identifying the image quality measures that have highest sensitivity to these
distortions would help systematic design of coding, communication and imaging systems
and of improving or optimizing the picture quality for a desired quality of service.
Given the
proliferation of digital images, and given the high degree of redundancy present in a digital
representation of an image (despite compression), there has been an increased interest in
using digital images as cover-objects for the purpose of data hiding. Since unlimited
number of copies of an original can be easily distributed or forged, the protection and
enforcement of intellectual property rights is an another important issue.
A digital
robustness, we note that it would be desirable to first detect the possible presence of a
watermark before trying to remove or manipulate it.
Schemes for digital elections and digital cash make use of anonymous
communication techniques.
1.2. Approaches
The focus of this thesis is on three challenges of visual communications. One is the
efficient representation of image data for storage and transmission, which is the art and
science of identifying models for different types of structures existing in image data to
obtain compact representations. Second is image quality and the third is multimedia
security.
These are diverse research areas, but also are integral parts of visual
communications as a whole. Findings in one field are readily used in the others. For
example incorporation of a good image model in a compression scheme decreases the bit
rate to represent the image. Still this image model can be used in image data hiding to get
a true idea of the data-hiding capacity. Good image models together with quality metrics
incorporating human visual system are indispensable in the design of both image coding
and watermarking systems as we want visually pleasing, compactly represented and robust
images resistant to various attacks.
Our
approach is different from the companion studies, in that set of objective image quality
measures has been statistically analyzed to identify the ones most sensitive and
discriminative to compression, watermarking, blurring and noise distortions.
The
alterations of the signal properties that introduce some form of degradation, no matter how
small. These degradations can act as signatures that could be used to reveal the existence
of a hidden message. For example, in the context of digital watermarking, the general
underlying idea is to create a watermarked signal that is perceptually identical but
statistically different from the host signal. A decoder uses this statistical difference in
order to detect the watermark. However, the very same statistical difference that is created
can potentially be exploited to determine if a given image is watermarked or not. We show
that addition of a watermark or message leaves unique artifacts, which can be detected
using the most discriminative image quality measures identified from their statistical
analysis and multivariate regression analysis.
measures form a multidimensional feature space whose points cluster well enough to do a
classification of marked and non-marked images.
Another pillar of this thesis is the proposal of a novel image compression scheme
that provides progressive transmission and near-lossless compression in one single
framework. We formulate the image data compression problem as one of asking the
optimal questions to determine, respectively, the value or the interval of the pixel,
depending on whether one is interested in lossless or near-lossless compression. New
prediction methods based on the nature of the data at a given pass are presented and links
to the existing methods are explored. The trade-off between non-causal prediction and
data precision is discussed within the context of successive refinement. Context selection
for prediction in different passes is addressed. Experimental results for both lossless and
near-lossless cases are presented, which are competitive with the state-of-the-art
compression schemes.
1.3. Contributions
The
correlation between various measures has been depicted via Kohonens SelfOrganizing Map. The placement of measures in the two-dimensional map has
been in agreement with ones intuitive grouping.
used in the context of a passive warden model and for watermarking which can
be used to embed secret messages in the context of an active warden. The
techniques we present are novel and to the best of our knowledge, the first
attempt at designing general purpose tools for steganalysis.
We proposed a novel technique that unifies progressive transmission and nearlossless compression in one single bit stream. The proposed technique produces
a bitstream that results in progressive reconstruction of the image just like what
one can obtain with a reversible wavelet codec. In addition, the proposed scheme
provides near-lossless reconstruction with respect to a given bound after each
layer of the successively refinable bitstream is decoded.
Furthermore, the
1.4. Outline
We present a set of image quality measures and analyze them statistically with
respect to coding, blurring and noise distortions in the second section. The measures are
categorized into pixel difference-based, correlation-based, edge-based, spectral-based,
context-based and HVS-based (Human Visual System-based) measures. We conduct a
statistical analysis of the sensitivity and consistency behavior of objective image quality
measures. The mutual relationships between the measures are visualized by plotting their
Kohonen maps. Their consistency and sensitivity to coding as well as additive noise and
blur are investigated via analysis of variance of their scores.
which we build the steganalyzer by using a small subset of image quality measures and
multivariate regression analysis. Later, we give the design principles and extensively
describe the experiments to test the performance of the steganalyzer with a variety of best
known watermarking and mostly cited steganographic algorithms on a rich image set.
Section five concludes the thesis and explores directions for future work.
2.1. Introduction
Image quality measures are figures of merit used for the evaluation of imaging
systems or of coding/processing techniques. We consider several image quality metrics and
study their statistical behavior when measuring various compression and/or sensor
artifacts.
A good objective quality measure should well reflect the distortion on the image due
to, for example, blurring, noise, compression, sensor inadequacy. One expects that such
measures could be instrumental in predicting the performance of vision-based algorithms
such as feature extraction, image-based measurements, detection, tracking, segmentation
etc. tasks. Our approach is different from companion studies in the literature focused on
subjective image quality criteria, such as in [1, 2, 3]. In the subjective assessment of
measures characteristics of the human perception becomes paramount, and image quality is
correlated with the preference of an observer or the performance of an operator on some
specific task.
In the image coding and computer vision literature, the raw error measures based on
deviations between the original and the coded images are overwhelmingly used [4, 5, 6],
Mean Square Error (MSE) or alternatively Signal to Noise Ratio (SNR) varieties being the
most common measures. The reason for their widespread choice is their mathematical
tractability and it is often straightforward to design systems that minimize the MSE. Raw
error measures such as MSE may quantify the error in mathematical terms, and they are at
their best with additive noise contamination, but they do not necessarily correspond to all
aspects of the observers visual perception of the errors [7, 8], nor do they correctly reflect
structural coding artifacts.
For multimedia applications and very low bit rate coding, quality measures based on
human perception are being more frequently used [9, 10, 11, 12, 13, 14]. Since a human
observer is the end user in multimedia applications, an image quality measure that is based
on a human vision model seems to be more appropriate for predicting user acceptance and
for system optimization. This class of distortion measures gives in general a numerical
value that will quantify the dissatisfaction of the viewer in observing the reproduced image
in place of the original (though Dalys VPD map [13] is a counter example to this). The
alternative is subjective tests where subjects view a series of reproduced images and rate
them based on the visibility of artifacts [15, 16]. Subjective tests are tedious and time
consuming and the results depend on various factors such as observers background,
motivation, etc., and furthermore actually only the display quality is being assessed.
Therefore an objective measure that accurately predicts the subjective rating would be a
useful guide when optimizing image compression algorithms.
10
blur, noise, etc. amount in a distortion space) that could have resulted in the measured
metric values.
Thus in this study we aim to study objective measures of image quality and to
investigate their statistical performance. Their statistical behavior is evaluated first, in
terms of how discriminating they are to distortion artifacts when tested on a variety of
images using Analysis of Variance (ANOVA) method.
investigated in terms of their mutual correlation or similarity, this being put into evidence
by means of Kohonen maps.
Some 26 image quality metrics are described and summarized in Table 2.1. These
quality metrics are categorized into six groups according to the type of information they
are using. The categories used are:
Human Visual System-based measures, measures either based on the HVSweighted spectral distortion measures or (dis)similarity criteria used in image
database browsing functions.
We define several distortion measures in each category. The specific measures are
named as D1, D2.. in the pixel difference category, as C1, C2 .. in the correlation category
etc. for ease of reference in the results and discussion sections.
11
Table 2.1. List of Symbols and Equation Numbers of the Quality Metrics
SYMBOL
D1
D2
D3
D4
D5
D6
C1
C2
C3
C4
C5
E1
E2
S1
S2
S3
S4
S5
Z1
Z2
Z3
Z4
H1
H2
H3
H4
DESCRIPTION
Mean Square Error
Mean Absolute Error
Modified Infinity Norm
L*a*b* Perceptual Error
Neighborhood Error
Multiresolution Error
Normalized Cross-Correlation
Image Fidelity
Czenakowski Correlation
Mean Angle Similarity
Mean Angle-Magnitude Similarity
Pratt Edge Measure
Edge Stability Measure
Spectral Phase Error
Spectral Phase-Magnitude Error
Block Spectral Magnitude Error
Block Spectral Phase Error
Block Spectral Phase-Magnitude Error
Rate Distortion Measure
Hellinger distance
Generalized Matusita distance
Spearman Rank Correlation
HVS Absolute Norm
HVS L2 Norm
Browsing Similarity
DCTune
EQUATION
2.3
2.1
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
2.20
2.21
2.22
2.23
2.24
2.26
We define and describe the multitude of image quality measures considered. In these
definitions the pixel lattices of images A, B will be referred to as A(i, j ) and B(i, j ) , i, j =
1...N, as the lattices are assumed to have dimensions NxN. The pixels can take values
from the set {0,..., G} in any spectral band. The actual color images we considered had G =
255 in each band. Similarly we will denote the multispectral components of an image at the
pixel position i, j, and in band k as Ck (i, j ) , where k = 1,..., K . The boldface symbols
(i, j ) will indicate the multispectral pixel vectors at position (i,j). For
C(i, j ) and C
example for the color images in the RGB representation one has C(i,j) = [R(i,j) G(i,j)
B(i,j)]T. These definitions are summarized as follows:
12
C k (i, j )
C(i, j )
A multispectral image
Ck
k = C k C k
Error over all the pixels in the kth band of a multispectral image C.
Thus for example the power in the k'th band can be calculated as k2 =
N 1
C (i, j )
i , j =0
. All
(i, j ) =
C(i, j ) C
[C (i, j ) C (i, j )]
K
k =1
components at a given pixel position i, j. Similarly the error expression in the last row of
N
2
the above table expands as k2 = C k (i, j ) C k (i, j ) . In the specific case of RGB
i =1 j =1
These measures calculate the distortion between two images on the basis of their
pixelwise differences or certain moments of the difference (error) image.
calculated by taking the Minkowsky average of the pixel differences spatially and then
chromatically (that is over the bands):
1
=
K
1
2
k =1 N
K
N 1
C (i, j ) C (i, j )
i , j =0
1/
(2.1)
Alternately the Minkowsky average can be first carried over the bands and then
spatially, as in the following expression:
13
1
= 2
N
N 1 1
i , j =0 K
C k (i, j ) C k (i, j )
k =1
1/
(2.2)
In what follows we have used the pixel-wise difference in the Minkowsky sum as
given in Eq. (2.1). For = 2, one obtains the well-known Mean Square Error (MSE)
expression, denoted as D1:
D1 =
1 1
K N2
N 1
(i, j )
C(i, j ) C
i , j =0
1
K
k =1
2
k
(2.3)
1
(i, j) ||
Ck (i, j ) C k (i, j) = max|| C(i, j) C
,
i
j
k =1 K
= max
i, j
(2.4)
is obtained. Recall that in signal and image processing the maximum difference or the
infinity norm is very commonly used [6]. However given the noise-prone nature of the
maximum difference, this metric can be made more robust by considering the ranked list of
)
) denotes the l
Here (C C
th
14
D3 =
1 r 2
m C C
r m =1
(2.5)
2.2.1.2. MSE in Lab Space. The choice of the color-space for an image similarity metric
is important, because the color-space must be uniform, so that the intensity difference
between two colors must be consistent with the color difference estimated by a human
observer. Since the RGB color-space is not well-suited for this task two color spaces are
defined: 1976 CIE L*u*v* and the 1976 CIE L*a*b* color spaces [26]. One recommended
color-difference equation for the Lab color-space is given by the Euclidean distance [27].
Let
L* (i, j ) = L* (i, j ) L* (i, j )
(2.6)
(2.7)
(2.8)
denote the color component differences in L*a*b* space. Then the Euclidean distance is:
D4 =
1
N2
[L (i, j )
N 1
i , j =0
+ a * (i, j ) + b * (i, j ) .
2
(2.9)
Note that (2.37) is intended to yield perceptually uniform spacing of colors that
exhibit color differences greater than JND threshold but smaller than those in Munsell
book of color [27]. This measure applies obviously to color images only and cannot be
generalized to arbitrary multispectral images. Therefore it has been used only for the face
images and texture images, and not in satellite images.
15
2.2.1.3. Difference over a Neighborhood. Image distortion on a pixel level can arise from
differences in the gray level of the pixels and/or from the displacements of the pixel. A
distortion measure that penalizes in a graduated way spatial displacements in addition to
gray level differences, and that allows therefore some tolerance for pixel shifts can be
defined as follows [28, 29]:
D5 =
1
2( N w) 2
N w / 2
i, j =w / 2
{(
)}
{(
)}
(l , m ) ] 2 + [ min d C
(i, j ), C(l , m ) ] 2
[ min d C(i, j ), C
l , mwi , j
l , mwi , j
(2.10)
where d (, ) is some appropriate distance metric. Notice that for w=1 this metric reduces to
the mean square error as in D1. Thus for any given pixel C(i, j ) , we search for a best
(i, j ) ,
matching pixel in the d-distance sense in the wxw neighborhood of the pixel C
(i, j ) . The size of the neighborhood is typically small e.g., 3x3 or 5x5, and
denoted as C
w
one can consider a square or a cross-shaped support. Similarly one calculates the distance
from C (i, j ) to C w (i, j ) where again C w (i, j ) denotes the pixels in the wxw neighborhood
(i, j ), C (i, j ) . As for the distance measure d ( , ) , the city metric or the chessboard
dC
w
metric can be used. For example city block metric becomes
city
(i l + j m ) C(i, j ) C (l , m )
C(i, j ), C(l , m ) =
+
N
G
(2.11)
16
human visual system more closely, by assigning larger weights to low resolutions and
smaller weights to the detail image [30]. Such measures are also more realistic in machine
vision tasks that often use local information only.
Consider the various levels of resolution denoted by r 1 . For each value of r the
image is split into blocks b1 to bn where n depends on scale r. For example for r = 1, at the
lowest resolution, only one block covers the whole image characterized by its average gray
level g. For r = 2 one has four blocks each of size (
N N
x ) with average gray levels g11,
2 2
g12, g21 and g22. For the rth resolution level one would have than 2 2 r 2 blocks of size
(
N
N
x r 1 ), characterized by the block average gray levels gij, i, j = 1,...,22 r 2 . Thus for
r 1
2
2
each block bij of the image C , take gij as its average gray level and g ij to corresponds to its
component in the image C (For simplicity a third index denoting the resolution level has
been omitted). The average difference in gray level at the resolution r has weight
1
.
2r
1 1
d r = r 2r 2
2 2
2 r 1
i , j =1
ij
g ij
(2.12)
where 2 r 1 is the number of blocks along either the i and j indices. If one considers a total
of R resolution levels, then a distance function can simply be found by summing over all
( )
resolution levels) will be set by the initial resolution of the digital image. For example for
a 512x512 images one has R = 9. Finally for multispectral images one can extend this
definition in two ways. In the straightforward extension, one sums the multiresolution
distances d rk over the bands:
D6 =
1
K
d
k =1 r =1
k
r
(2.13)
17
where d rk is the multiresolution distance in the kth band. This is the multiresolution distance
definition that we used in the experiments. Alternatively the Burt pyramid was constructed
to obtain the multiresolution representation. However in the tests it did not perform as well
as the pyramid described in [30].
( )
R
= d with d = 1 1
D C, C
r
r
2 r 2 2r 2
r =1
[(g
2 r 1
i , j =1
R
ij
g ijR
) + (g
2
G
ij
g ijG
) + (g
2
B
ij
g ijB
)]
2
(2.14)
where, for example, g ijR is the average gray level of the ij'th block in the "red" component
of the image at the (implicit) resolution level r. Notice that in the latter equation the
Euclidean norm of the differences of the block average color components R, G, B have
been utilized.
Notice that the last two measures, that is, the neighborhood distance measure and the
multiresolution distance measure have not been previously used in evaluating compressed
images.
2.2.2.1 Image Correlation Measures. The closeness between two digital images can also
be quantified in terms of correlation function [5]. These measures measure the similarity
between two images, hence in this sense they are complementary to the difference-based
measures: Some correlation based measures are as follows:
Structural content
N 1
C1 =
1
K
k =1
C (i, j )
i, j =0
N 1
2
C k (i, j )
i, j =0
(2.15)
18
1
C2 =
K
k =1
C (i, j )C (i, j )
k
i, j =0
N 1
C (i, j )
i, j =0
(2.16)
A metric useful to compare vectors with strictly non-negative components as in the case of
images is given by the Czekanowski distance [31]:
2
min C k (i, j ) , C k (i, j )
N 1
1
.
C 3 = 2 1 k =K1
N i , j =0
C k (i, j ) + C k (i, j )
k =1
(2.17)
The Czenakowski coefficient (also called the percentage similarity) measures the similarity
between different samples, communities, quadrates.
Obviously as the difference between two images tends to zero = C C 0 , all the
correlation-based measures tend to 1, while as 2 G 2 they tend to 0. Recall also that
distance measures and correlation measures are complementary, so that under certain
conditions, minimizing distance measures is tantamount to maximizing the correlation
measure [32].
19
(i, j )
C(i, j ), C
2
ij = 1 1 cos 1
(i, j )
C(i, j ) C
(i, j )
C(i, j ) C
1
3 255 2
(2.18)
We can use the moments of the spectral (chromatic) vector differences as distortion
measures. To this effect we have used the mean of the angle difference (C4) and the mean
of the combined angle-magnitude difference (C5) as in the following two measures:
1
C4 = = 1 2
N
(i, j )
C(i, j ), C
2
1
),
( cos
(i, j )
i , j =1
C(i, j ) C
N
C5 =
1
N2
(2.19)
ij
(2.20)
i, j=1
where is the mean of the angular differences. These moments have been previously
used for the assessment of directional correlation between color vectors.
According to the contour-texture paradigm of images, the edges form the most
informative part in images. For example, in the perception of scene content by human
visual system, edges play the major role. Similarly machine vision algorithms often rely
on feature maps obtained from the edges. Thus, task performance in vision, whether by
humans or machines, is highly dependent on the quality of the edges and other twodimensional features such as corners [9, 34, 35]. Some examples of edge degradations are:
Discontinuities in the edge, decrease of edge sharpness by smoothing effects, offset of
edge position, missing edge points, falsely detected edge points etc [32]. Notice however
that all the above degradations are not necessarily observed as edge and corner information
in images is rather well preserved by most compression algorithms.
Since we do not possess the ground-truthed edge map, we have used the edge map
obtained from the original uncompressed images as the reference. Thus to obtain edge-
20
based quality measures we have generated edge fields from both the original and
compressed images using the Canny detector [36]. We have not used any multiband edge
detector; instead a separate edge map from each band has been obtained. The outputs of the
derivative of gaussians of each band are averaged, and the resulting average output is
interpolated, thresholded and thinned in a manner similar to that in [12]. The parameters
are set1 as in [36].
In summary for each band k=1...K, horizontal and vertical gradients and their norms,
2
1
K
with
min(N ), = 0.1 .
k
Tmax =
where
1
K
max(N )
k
and
2.2.3.1. Pratt Measure. A measure introduced by Pratt [32] considers both edge location
accuracy and missing / false alarm edge elements.
knowledge of an ideal reference edge map, where the reference edges should have
preferably a width of one pixel. The figure of merit is defined as:
E1 =
nd
1
1
max{nd , nt } i =1 1 + adi2
(2.21)
where nd and nt are the number of detected and ground-truth edge points, respectively, and
di is the distance to the closest edge candidate for the ith detected edge pixel. In our study
the binary edge field obtained from the uncompressed image is considered as the ground
truth, or the reference edge field. The factor max{nd , nt } penalizes the number of false
alarm edges or conversely missing edges.
This scaling factor provides a relative weighting between smeared edges and thin but
offset edges. The sum terms penalize possible shifts from the correct edge positions. In
1
At http://robotics.eecs.berkeley.edu/~sastry/ee20/cacode.html
21
summary the smearing and offset effects are all included in the Pratt measure, which
provides an impression of overall quality.
2.2.3.2. Edge Stability Measure. Edge stability is defined as the consistency of edge
evidences across different scales in both the original and coded images [37]. Edge maps at
different scales have been obtained from the images by using the Canny [36] operator at
different scale parameters (with the standard deviation of the Gaussian filter assuming
values m = 1.19, 1.44, 1.68, 2.0, 2.38
thresholded with Tm, where T m = 0.1(C max C min ) + C min . In this expression Cmax and Cmin
denote the maximum and minimum values of the norm of the gradient output in that band.
Thus the edge map at scale m of the image C is obtained as
1 Cm (i, j ) > T m at
E(i, j, m ) =
otherwise
0
(i, j)
(2.22)
where Cm (i, j ) is the output of the Derivative of Gaussian operator at the mth scale. In other
words using a continuous function notation one has Cm (x, y) = C(x, y) Gm (x, y) where
x2 + y2
1
G m ( x, y ) =
xy exp
2
2 m4
2 m
(2.23)
and ** denotes two dimensional convolution. An edge stability map Q(i, j ) is obtained by
considering the longest subsequence E (i, j , m ),..., E (i, j , m +l 1 ) of edge images such that
Q(i, j ) = l
" E (i, j, ) = 1 .
k
(2.24)
m k m + l 1
The edge stability index calculated from distorted image at pixel position i,j will be
denoted by Q (i, j ) . We have used five scales to obtain the edge maps of five band-pass
filtered images.
(ESMSE) can be calculated by summing the differences in edge stability indexes over all
22
edge pixel positions, nd, that is the edge pixels of the ground-truth (undistorted) image at
full resolution.
E2 =
1
nd
(Q(i, j ) Q (i, j ))
nd
(2.25)
i, j =0
For multispectral images the index in (2.51) can be simply averaged over the bands.
Alternatively a single edge field from multiband images [38, 39] can be obtained and the
resulting edge discrepancies measured as Eq. (2.51).
In this category we consider the distortion penalty functions obtained from the
complex Fourier spectrum of images [10].
Let the Discrete Fourier Transforms (DFT) of the kth band of the original and coded
image be denoted by k (u, v) and k (u, v) , respectively. The spectra are defined as:
k (u , v ) =
N 1
u
v
m,n =0
(2.26)
23
Spectral distortion measures, using difference metrics as for example given in (2.12.3) can be extended to multispectral images. To this effect considering the phase and
magnitude spectra, that is
(u , v) = arctan((u, v) ) ,
(2.27)
M (u , v) = (u , v) ,
(2.28)
the distortion occurring in the phase and magnitude spectra can be separately calculated
and weighted. Thus one can define the spectral magnitude distortion
1
S= 2
N
N 1
M (u, v ) M (u, v )
(2.29)
u ,v = 0
S1 =
1
N2
N 1
(u, v ) (u, v )
(2.30)
u ,v = 0
1
S2 = 2
N
2
2
N 1
N 1
(u, v ) (u , v ) + (1 ) M (u , v ) M (u , v )
u ,v = 0
u ,v = 0
(2.31)
where is to be judiciously chosen e.g., to reflect quality judgment. These ideas can be
extended in a straightforward manner to multiple band images, by summing over all band
distortions. In the following computations, is chosen so as to render the contributions of
the magnitude and phase terms commensurate, that is = 2.5x10 5 .
Due to the localized nature of distortion and/or the non-stationary image field,
Minkowsky averaging of block spectral distortions may be more advantageous. Thus an
image can be divided into non-overlapping or overlapping L blocks of size b x b, say
24
16x16, and blockwise spectral distortions as in (2.14-2.15) can be computed. Let the DFT
of the lth block of the kth band image Ckl (m, n ) be kl (u , v ):
kl (u , v ) =
b 1
m, n = 0
(2.32)
b b
where u , v = ... and l = 1,..., L ,or in the magnitude-phase form
2 2
kl (u , v ) = kl (u , v ) e j k (u ,v ) = M kl (u , v )e k (u , v ) .
l
(2.33)
Then the following measures can be defined in the transform domain over the lth block.
J Ml
1
=
K
1
J =
K
l
b 1 l
k (u , v ) kl (u , v )
k =1 u , v = 0
K
b 1 l
k (u, v ) kl (u , v )
k =1 u , v = 0
K
J l = J Ml + (1 )Jl
(2.34)
(2.35)
(2.36)
with the relative weighting factor of the magnitude and phase spectra. Obviously
measures (2.36)-(2.38) are special cases of the above definitions for block size b covering
the whole image. Various rank order operations on the block spectral differences J M and /
or J can prove useful. Thus let J (1),..., J (L ) be the rank ordered block distortions, such
{ }
1
averages: Median block distortion J 2 + J
2
L
L +1
1 L (i )
J . We have found that median of the block
L i =1
25
distortions is the most effective averaging of rank ordered block spectral distortions and we
have thus used:
S 3 = median J Ml
(2.37)
S 4 = median J l
(2.38)
S 5 = median J l
(2.39)
In this study we have averaged the block spectra with =2 and as for the choice of
the block size we have found that block sizes of 32 and 64 yield better results than sizes in
the lower or higher ranges.
Most of the compression algorithms and computer vision tasks are based on the
neighborhood information of the pixels. In this sense any loss of information in the pixel
neighborhoods, that is, damage to pixel contexts could be a good measure of overall image
distortion. Since such statistical information lies in the context probabilities, that is the
joint probability mass function (p.m.f.) of pixel neighborhoods, changes in the context
probabilities should be indicative of image distortions.
26
In what follows we have used the causal neighborhood of pixels i.e., Ck (i, j ) ,
Ck (i 1, j ) , Ck (i, j 1) , Ck (i 1, j 1) , k = 1, 2, 3.
p(x )
p( x ) log p (x )
(2.40)
x X s
where
Furthermore p and p are the p.m.f.s of the original image contexts and that of the
distorted (e.g., blurred, noisy, compressed etc.) image. The relative entropy is directly
related to efficiency in compression and error rate in classification. Recall also that the
optimal average bit rate is the entropy of x
H (X ) =
(2.41)
X X s
If instead of the true probability, a perturbed version p , that is the p.m.f. of the
distorted image, is used, then the average bit rate R( p ) becomes
R( p ) =
p( X) log
p ( X) = H (X ) + D( p p ) .
(2.42)
X X s
The increase in the entropy rate is also indicative of how much the context
probability differs from the original due to coding artifacts. However we do not know the
true p.m.f. p, hence its rate. We can bypass this problem by comparing two competing
compression algorithms, in terms of the resulting context probabilities p 1 and p 2 . If p 1
and p 2 are the p.m.f.s resulting from the two compressed images, then their difference in
relative entropy
27
Z1 = D( p p 1 ) D( p p 2 ) = R( p 1 ) R( p 2 )
(2.43)
is easily and reliably estimated from a moderate-size sample by subtracting the sample
average of log p 2 from that of log p1 [44].
more than two images compressed to different bit rates in a similar way, that is comparing
them two by two since the unknown entropy term is common to all of them.
As a quality measure for images we have calculated Z1 for each image when they
were compressed at two consecutive bit rates, for example, R( p1 ) at the bit rate of quality
factor 90 and R( p 2 ) at the bit rate of quality factor 70, etc. Alternatively the distortion was
calculated for an original image and its blurred or noise contaminated version.
2.2.5.2.
f-divergences.
information theoretic distortion measures [45] can be used. Most of these measures can be
expressed in the following general form
p
d ( p, p ) = g E p f
p
where
(2.44)
p
is the likelihood ratio between, p , the context p.m.f. of the distorted image, p the
p
p.m.f. of the original image, and Ep is the expectation with respect to p. Some examples
are as follows:
Hellinger distance, f (x ) =
x 1 , g (x ) =
2
Z2 =
1
2
1
x
2
(
1
r
r
p d ,
, g (x ) = x
(2.45)
28
Z3 =
1/ r
p 1 / r d
r 1.
(2.46)
Notice that integration in (2.44)-(2.45) are carried out in s-dimensional space. Also we
have found according to ANOVA analysis that the choice of r = 5 in the Matusita distance,
yields good results. Despite the fact that the p.m.f.'s do not reflect directly the structural
content or the geometrical features in an image, they perform sufficiently well to
differentiate artifacts between the original and test images.
2.2.5.3. Local Histogram Distances. In order to reflect the differences between two
images at a local level, we calculated the histograms of the original and distorted images
on the basis of 16x16 blocks. To this effect we considered both the Kolmogorov-Smirnov
(KS) distance and the Spearman Rank Correlation (SRC).
For the KS distance we calculated the maximum deviation between the respective
cumulatives. For each of the 16x16 blocks of the image, the maximum of the KS distances
over the K spectral components was found and these local figures were summed over all
b
u =1
{ }
of the block number u and of the kth spectral component. However the KS distance did not
turn out to be effective in the ANOVA tests. Instead the SRC measure had a better
performance. We again considered the SRC on a 16x16 block basis and we took the
maximum over the three spectral bands. The block SRC measure was computed by
computing the rank scores of the gray levels in the bands and their largest for each pixel
neighborhood. Finally the correlation of the block ranks of the original and distorted
images is calculated.
Z 4 = max SRC uk
u =1
k =1.. K
(2.47)
where SRCuk denotes the Spearman Rank Correlation for the uth block number and kth
spectral band.
29
Despite the quest for objective image distortion measure it is intriguing to find out
the role of HVS based measures. The HVS is too complex to be fully understood with
present psychophysical means, but the incorporation of even a simplified HVS model into
objective measures reportedly [7, 46, 10, 14] leads to a better correlation with the
subjective ratings. It is conjectured therefore that in machine vision tasks they may have as
well some relevance.
2.2.6.1. HVS Modified Spectral Distortion. In order to obtain a closer relation with the
assessment by the human visual system, both the original and coded images can be
preprocessed via filters that simulate the HVS. One of the models for the human visual
system is given as a band-pass filter with a transfer function in polar coordinates [46]:
0.05e
H ( ) = 9[ log log 9 ]2.3
10
10
e
<7
0.554
where = u 2 + v 2
1/ 2
(2.48)
{H ( u
+ v 2 (u , v )
(2.49)
where (u, v ) denotes the 2-D Discrete Cosine Transform (DCT) of the image and
DCT
Some possible measures [5, 47, 48, 49] for the K component multispectral image are
H1 =
1
K
N 1
N 1
(i, j ) /
{
(
)
}
,
U
C
i
j
U
C
U {C k (i, j )} ,
k
k
k =1 i , j = 0
i , j =0
(2.50)
30
L2 norm
1
H2 =
K
1
2
k =1 N
K
1/ 2
2
(i, j )
{
(
)
}
,
U
C
i
j
U
C
k
k
i , j =0
N 1
(2.51)
H=
1
K
}] / [U {C (i, j )}] .
N 1
U {C k (i, j )} U C k (i, j )
k =1 i , j = 0
K
N 1
i, j =0
(2.52)
2.2.6.2. A Distance Metric for Database Browsing. The metric proposed in [14, 50] based
on a multiscale model of the human visual system, has actually the function of bringing
forth the similarities between image objects for database search and browsing purposes.
This multiscale model includes channels, which account for perceptual phenomena such as
color, contrast, color-contrast and orientation selectivity. From these channels, features are
extracted and then an aggregate measure of similarity using a weighted linear combination
of the feature differences is formed. The choice of features and weights is made to
maximize the consistency with similarity.
More specifically, we exploit a vision system designed for image database browsing
and object identification to measure image distortion. The image similarity metric in [14]
uses 102 feature vectors extracted at different scales and orientations both in luminance
and color channels. The final (dis)similarity metric is
102
H 3 = i d i
i =1
(2.53)
31
where i are their weights as attributed in [50] and d i are the individual feature
discrepancies. We call this metric browsing metric for the lack of a better name. For
example the color contrast distortion at scale l is given by
1
d =
Nl N l
(
Nl
i , j =0
K (i, j ) K (i, j )
(2.54)
where N l xN l is the size of the image at scale l. K (i, j ) and K (i, j ) denote any color or
contrast channel of the original image and of the coded image at a certain level l. The
lengthy details of the algorithm and its adaptation to our problem are summarized in [14,
50]. Finally note that this measure was used only for color images, and not in the case of
satellite three-band images.
The last quality measure we used that reflects the properties of the human visual system
was the DCTune algorithm [56]. DCTune is in fact a technique for optimizing JPEG still
image compression. DCTune calculates the best JPEG quantization matrices to achieve
the maximum possible compression for a specified perceptual error, given a particular
image and a particular set of viewing conditions. DCTune also allows the user to compute
the perceptual error between two images in units of JND (just-noticeable differences)
between a reference image and a test image. This figure was used as the last metric (H4)
in Table 2.1.
2.3. Goals and Methods
Objective video quality model attributes have been studied in [17, 18]. These
attributes can be directly translated to the still image quality measures in the multimedia
and computer vision applications.
32
These desired characteristics reflect on the box plots and the F scores of the quality
metrics, as detailed in the sequel.
All the image quality measures are calculated in their multiband version. In the
study of the quality measures in image compression, we used the two well-known
compression algorithms:
method, Set Partitioning in Hierarchical Trees (SPIHT), due to Said and Pearlman [52].
The other types of image distortions are generated by the use of blurring filters with
various support sizes and by the addition of white Gaussian noise at various levels.
The rate selection scheme was based on the accepted rate ranges of JPEG. It is wellknown that the JPEG quality factor Q between 80-100 corresponds to visually
imperceptible impairment, Q between 60-80 is the perceptible but not annoying distortion,
Q between 40-60 becomes slightly annoying, Q between 20-40 is annoying, and finally 0120 is the Q range where the quality is very annoying.
compressed with 5 JPEG Q factors of 90, 70, 50, 30 and 10. For each class the average
length of compressed files was calculated and the corresponding bit rate (bit/pixel) was
accepted as the class rate. The same rate as obtained from the JPEG experiment was also
used in the SPIHT algorithm.
The test material consisted of the following image sets: 1) Ten three-band remote
sensing images, which contained a fair amount of variety, i.e., edges, textures, plateaus and
33
contrast range, 2) Ten color face images from Purdue University Face Images database
[53], 3) Ten texture images from MIT Texture Database (VISTEX)2.
The Analysis of Variance (ANOVA) [54] was used as a statistical tool to put into the
evidence merits of quality measures. In other words ANOVA was used to show whether
the variation in the data could be accounted for by the hypothesized factor, for example the
factor of image compression type, the factor of image class etc.
The purpose of a one-way ANOVA is to find out whether data from several groups
have a common mean. That is, to determine whether the groups are actually different in the
measured characteristic. In our case each "compression group" consists of quality scores
from various images at a certain bit rate, and there are k = 5 groups corresponding to the 5
bit rates tested. Each group had 30 sample vectors since there were 30 multispectral test
images (10 remote sensing, 10 faces, 10 textures). Similarly three "blur groups" were
created by low-pass filtering the images with 2-D Gaussian-shaped filters with increasing
support. Finally three "noise groups" were created by contaminating the images with
Gaussian noise with increasing variance, that is 2 = 200, 600, 1700. This range of noise
values spans the noisy image quality from the just noticeable distortion to annoying
degradation.
The purpose of two-way ANOVA is to find out whether data from several groups
have a common mean. One-way ANOVA and two-way ANOVA differ in that the groups
in two-way ANOVA have two categories of defining characteristics instead of one. Since
we have two coders (i.e., JPEG and SPIHT algorithms) two-way ANOVA is appropriate.
The hypotheses for the comparison of independent groups are:
H01:
11 = 12 = ... = 1k
HA1:
1i 1 j
H02:
21 = 22 = ... = 2 k
http://www-white.media.edu/vismod/imagery/VisionTexture/vistex.html
34
HA2:
2i 2 j
It should be noted that the test statistic is an F test with k-1 and N-k degrees of
freedom, where N is the total number of compressed images. ANOVA returns the p-value
for the null hypothesis that the means of the groups are equal [54]. A low p-value (high F
value) for this test indicates evidence to reject the null hypothesis in favor of the
alternative. In other words, there is evidence that at least one pair of means are not equal.
We have opted to carry out the multiple comparison tests at a significance level of 0.05.
Thus any test resulting in a p-value under 0.05 would be significant, and therefore, one
would reject the null hypothesis in favor of the alternative hypothesis. This is to assert that
the difference in the quality metric arises from the image coding artifacts and not from
random fluctuations in the image content.
To find out whether the variability of the metric scores arises predominantly from the
image quality, and not from the image set, we considered the interaction between image set
and the distortion artifacts (i.e., compression bit rate, blur etc.).
To this effect we
considered the F-scores with respect to the image set as well. As discussed in sub-section
2.3.1. and shown in Tables 2.2.-2.3, metrics that were sensitive to distortion artifacts were
naturally sensitive to image set variation as well. However for the good measures that
identified the sensitivity to image set variation was always inferior to the distortion
sensitivity.
35
Q r , r +1 =
r r +1
Q = Ave{Q r , r +1 }
r r +1
r = 1,..., k 1
(2.55)
(2.56)
36
where r denotes the mean value of the image quality measure for the images compressed
at rate r and r is the standard deviation, k is the number of different bit rates at which
quality measures are calculated. A good image quality measure should have high Q value,
which implies little overlap between groups and/or large jumps between them hence high
discriminative power of the quality measure. It should be noted that the Q values and the
F-scores yielded totally parallel results in our experiments. In Figure 2.1. we give box plot
examples of a good, a moderate and a poor measure. For the box plot visualization the
data has been appropriately scaled without any loss of information. The horizontal axis
corresponds to bitrate variation and the vertical axis is the normalized IQM scores.
Since we would like to visualize the quality metrics data, we organize them as
vectors and feed them to a SOM (Self-Organizing Map) algorithm. The elements of the
vectors are the corresponding quality scores. For example, consider the MSE error (D1)
for a specific compression algorithm (e.g., JPEG) at a specific rate. The corresponding
vector D1 is M dimensional, where M is the number of images, and it reads as:
D1(JPEG, bitrate) = [D1(1| JPEG, bitrate) .... D1(M| JPEG, bitrate]T
(2.57)
There will be 5 such vectors, one for each bit rate considered. Overall for training of SOM
we utilize 30 images x 5 bit rates x 2 compressors x 26 metrics = 7800 vectors.
Recall that the self-organizing map (SOM) is a tool for visualization of high
dimensional data.
37
vector X on the SOM array is defined by the decoder function d(X, m i ) , where d (.,.) is a
general distance measure. The image of the input vector will have the array index c
definedas c = arg min d(X, m i ) . A critical part of the algorithm is to define the m i in such
i
a way that the mapping is ordered and descriptive of distribution of X . Finding such a set
of values that minimize the distance measure resembles in fact the standard VQ problem.
In contrast, the indexing of these values is arbitrary, whereby the mapping is unordered.
However if the minimization of the objective functional based on the distance function is
implemented under the conditions described in [55], then one can obtain ordered values of
m i , almost as if the m i were lying at the nodes of an elastic net. With the elastic net
analogy in mind, SOM algorithm can be constructed as
m i (t + 1) = m i (t ) + ( t )[ X( t ) m i (t )]
(2.58)
where ( t ) is a small scalar, if the distance between units c and i in the array is smaller
than or equal to a specified limit (radius), and ( t ) = 0 otherwise. During the course of
ordering process, (t ) is decreased from 0.05 to 0.02, while radius of neighborhood is
decreased from 10 to 3. Furthermore scores are normalized with respect to the range.
The component planes j of the SOM, i.e., the array of scalar values ij representing
the j'th components of the weight vectors m i and having the same format as the SOM
array is displayed as shades of gray.
distortion effects as blur and noise. Our second goal is to establish how various quality
measures are related to each other and to show the degree to which measures respond to
38
The two-way ANOVA results of the image quality measures for the data obtained
from all image classes (Fabrics, Faces, Remotes) are listed in Table 2.2. In these tables
the symbols of quality measures D1, D2...H3, H4 are listed in the first column while the Fscores of JPEG compression, of SPIHT compression, of blur and of noise distortions are
given, respectively, in the succeeding four columns. The first factor tested is the bitrate
variation and the second factor is the image set variation.
The metric that responds most strongly to one distortion type is called the
fundamental metric of that distortion type [24]. Similarly the metric that responds to all
sorts of distortion effects is denoted as the global metric. One can notice that:
The fundamental metrics for JPEG compression are H2, H1, S2, E2, that is, HVS
L2 norm, HVS absolute norm, spectral phase-magnitude, and edge stability
measures. These measures are listed in decreasing order of F-score.
The fundamental metrics for SPIHT compression are E2, S2, S5, H2, that is, edge
stability, spectral phase-magnitude, block spectral phase-magnitude, and HVS L2
norm.
The fundamental metrics for the BLUR effect are S1, E2, S2, H1, that is, spectral
phase, edge stability, spectral phase-magnitude, HVS absolute norm. Notice the
similarity of metrics between SPIHT and blur due, in fact, to the blurring artifact
encountered in wavelet-based compression.
The fundamental metric for the NOISE effect is, as expected, D1, the mean
square error.
39
Finally the image quality metric that is sensitive to all distortion artifacts are, in
order, E2, H1, S2, H2, S5, that is, edge stability, HVS absolute norm, spectral
phase-magnitude, HVS L2 norm, block spectral phase-magnitude.
Table 2.2. ANOVA results (F-scores) for the JPEG and SPIHT compression distortions as
well as additive noise and blur artifacts. For each distortion type the variation due to
image set is also established
ANOVA2
D1
D2
D3
D4
D5
D6
C1
C2
C3
C4
C5
E1
E2
S1
S2
S3
S4
S5
Z1
Z2
Z3
Z4
H1
H2
H3
H4
JPEG
Bitrate F
Imageset
Score
F Score
104.6
42.59
108.5
67.45
63.35
29.37
89.93
1.99
20.26
80.71
76.73
5.94
1.35
124.6
12.26
93.83
82.87
83.06
45.65
47.36
91.42
38.17
26.24
3.64
176.3
92.75
150.5
102.2
191.3
98.42
145.6
56.39
129.1
63.26
146.1
71.03
1.69
141.8
7.73
114.7
17.63
223
9.4
23.58
371.9
0.09
2291
5.46
123
1.2
78.83
7.14
SPIHT
Bitrate F
Imageset F
Score
Score
39.23
13.28
29.56
15.93
53.31
48.53
13.75
3.71
14.09
68.22
37.52
11.22
12.05
325.5
15.18
82.87
24.96
22.42
7.91
5.94
27.51
5.28
77.86
137
212.5
200.4
104
68.17
161
101.8
38.58
26.97
128
46.85
144.1
61.65
21.36
14
11.41
77.68
23.22
181.4
9.84
32.41
107.2
40.05
132.9
22.82
27.45
7.6
25.2
95.72
Blur F
Score
43.69
33.94
38.55
27.87
6.32
412.9
5.61
11.19
30.92
16.48
52.57
125.8
768.7
1128
572.2
24.28
215
333.6
35.9
10.17
17.26
8.45
525.6
47.28
67.31
12.55
BLUR
Imageset
F Score
2.06
17.76
24.13
0.96
55.11
45.53
107.2
39.77
1.71
0.77
2.44
21.09
23.41
60.04
17.95
6.39
11.17
27.84
62.5
1.80
8.31
14.74
69.98
101.7
6.77
2.11
NOISE
Noise F
Imageset
Score
F Score
9880
17.32
6239
20.4
1625
11.15
166.4
9.88
1981
43.51
44.61
4.38
3.82
6.17
58.04
45.63
567.5
52.01
198.8
19.03
704
10.8
87.76
27.87
158.5
24.84
47.29
38.42
107.1
4.83
2803
8.59
56.04
55.1
78.04
26.53
44.89
110.9
3.03
11.36
14.71
21.12
24.99
3.31
230.7
19.57
624.3
21.32
117.3
0.50
29.06
6.69
To establish the global metrics, we gave rank numbers from 1 to 26 to each one
metric under the four types of distortion as in Table 2.2. For example for JPEG the metrics
are ordered as H2, H1, S2, E2, etc. if we take into consideration their F-scores. Then we
summed the rank numbers and the metrics for which the sum of the scores were the
smallest were declared as the global metric, that is the one that qualifies well in all
discrimination tests.
40
The metrics that were the least sensitive to image set variation are D4, H3, C4, C5,
D6 etc.. However it can be observed that these metrics show in general poor performance
in discriminating distortion effects. On the other hand for the distortion sensitive metrics,
even though their image set dependence is higher than the so-called image independent
metrics, more of the score variability is due to distortion than to image set change. This can
be observed based on the higher F-scores for distortion effects as compared to image set
related F-scores.
ANOVA results are given for each image class (Fabrics, Faces, Remote Sensing)
separately, and two-way ANOVA results are presented for the combined set. In the two
bottom rows of Table 2.3. the metrics that are least sensitive to the coder type and to the
image set are given.
Table 2.3. One-way ANOVA results for each image class and two-way ANOVA results
for the distortions on the combined and image set independence
1-way
IMAGE SET
ANOVA Fabrics
Faces
Remote Sensing
2-way
Combined Set
ANOVA Image Set
Independence
Coder Type
Independence
JPEG
H4,H2,E2,S4
H2, D1,S3,H1
H2,H4,S4,S5
H2,H1,S2,E2
H1,H3
SPIHT
E1,S1,E2,S2
H4,D3,H2,C1
S2,S5,S4,S1
E2,S2,S5,H2
D4,C5
BLUR
S1,S5,E2,S4
S2,H1,S1,E2
D6,S5,S4,S1
S1,E2,S2,H1
C4,D4
NOISE
D1,D2,D5,D3
D1,S3,D2,D3
D1,D2,C3,C5
D1,D2,S3,D5
H3,Z4
D2,D1,Z4,D3
We also investigated the metrics with respect to their ability to respond to bit rate and
coder type. More specifically the first factor tested was bitrate variation and the second
was coder type variation. For this analysis the scores of the JPEG and SPIHT compressors
were combined. It was observed in Table 2.4. that:
41
The metrics that were capable of discriminating the coder type (JPEG versus
SPIHT) were quite similar, that is, D6, H2, H4 and H1 (Multiresolution error,
HVS L2 norm, DCTune, HVS L1 norm).
Finally the metrics that were most sensitive to distortion artifacts, but at the same
time, least sensitive to image set variation were C5, D1, D3, S3, D2, C4..., (Mean
angle-magnitude similarity, Mean square error, Modified infinity norm, Block
spectral magnitude error, Mean absolute error, Mean angle similarity...).
These metrics were identified by summing the two rank scores of metrics, the first
being the ranks in ascending order of distortion sensitivity, the second being in descending
order the image set sensitivity. Interestingly enough almost all of them are related to the
mean square error varieties. Despite its many criticisms, this may explain why mean square
error or signal-to-noise ratio measures have proven so resilient over time.
Table 2.4. ANOVA scores for the bit rate variability (combined JPEG and SPIHT scores)
and coder variation
ANOVA2
D1
D2
D3
D4
D5
D6
C1
C2
C3
C4
C5
E1
E2
S1
S2
S3
S4
S5
Z1
Z2
Z3
Z4
H1
H2
H3
H4
JPEG+SPIHT
Bitrate
Coder
89.79
0.75
74.98
2.72
71.55
1.21
70.52
43.85
17.07
0.0005
85.22
118.8
2.66
45.47
12.28
18.27
56.48
1.56
31.3
2.43
78.98
2.23
42.69
11.61
122.4
26.28
99.12
5.29
140.1
12.37
92.99
9.27
115.5
39.1
124.8
43.09
4.28
41.6
9.54
0.83
12.87
0.56
9.39
6.64
278.6
52.87
493
87.21
97.94
16.19
21.13
57.72
42
As expected the metrics that are responsive to distortions are also almost always
responsive to the image set. Conversely the metrics that do not respond to the image set
variation are also not very discriminating with respect to the distortion types. The fact that
the metrics are sensitive, as should be expected, to both the image content and distortion
artifacts does not eclipse their capability to potential as quality metrics. Indeed when the
metrics were tested within more homogeneous image sets (that is only within Face images
or Remote Sensing images etc.) the same high-performance metrics scored consistently
higher. Furthermore when one compares the F-scores of the metrics with respect to bit rate
variation and image set variation, even though there is a non-negligible interaction factor,
one can notice that the F-score due to bit rate is always larger than the F-score due to
Image sets.
Self Organizing Map (SOM) [55] is a pictorial method to display similarities and
differences between statistical variables, such as quality measures. We have therefore
obtained spatial organization of these measures via Kohonens self-organizing map
algorithm. The input to the SOM algorithm was vectors whose elements are the scores of
the measure resulting from different images.
measures, D1, and a certain compression algorithm, e.g., JPEG. The instances of this
vector will be 60-dimensional, one for each of the images in the set.
The first 30
43
The SOM organization of the measures in the 2-D space for pooled data from JPEG
and SPIHT coders is shown in Figure 2.2. The map consists of 70 X 70 cells. These maps
are useful for the visual assessment of possible correlation present in the measures. One
would expect that measures with similar trends and that respond in similar ways to artifacts
would cluster together spatially. The main conclusions from the observation of the SOM
and the correlation matrix are as follows:
Clustering tendency of pixel difference based measures (D1, D2, D4, D5) and
spectral magnitude based method (S3) is obvious in the center portion of the map,
a reflection of the Parseval relationship, that is, distortion in image energy in
spatial domain matches in the same way in the frequency domain. However
notice that spectral phase measures (S2, S5) stay distinctly apart from these
measures.
44
The human visual system based measures (H2, H3, H4), multiresolution pixeldifference measure (D6), E2 (edge stability measure) and C5 (mean anglemagnitude measure) are clustered in the right side of the map. The correlation of
the multiresolution distance measure, D6 with HVS based measures (H2, H3, H4)
is not surprising since the idea behind this measure is to mimic image comparison
by eye more closely, by assigning larger weight to low resolution components
and less to the detailed high frequency components.
The three correlation based measures (C1, C2, C3) are together in the lower part
of the map while the two spectral phase error measures (S2, S5) are concentrated
separately in the upper part of the map.
It is interesting to note that all the context-based measures (Z1, Z2, Z3, Z4) are
grouped in the upper left region of the map together with H1 (HVS filtered
absolute error).
The proximity between the Pratt measure (E1) and the maximum difference
measures (D3) is meaningful, since the maximum distortions in reconstructed
images are near the edges.
maximum distance measures can be used in codec designs to preserve the two
dimensional features, such as edges, in reconstructed images.
2.5. Conclusions
45
The Kohonen map of the measures has been useful in depicting similar ones, and
identifying the ones that are sensitive possibly to different distortion artifacts in
compressed images. The correlation between various measures has been depicted via
Kohonens Self-Organizing Map. The placement of measures in the two-dimensional map
has been in agreement with ones intuitive grouping.
Future work will address subjective experiments and prediction of subjective image
quality using the above salient measures identified. Another possible avenue is to combine
various fundamental metrics for better performance prediction.
46
3.1. Introduction
Unlike
However, all
communication between them is examined by the warden, Wendy, who will put them in
solitary confinement at the slightest suspicion of trouble. Specifically, in the general
model for steganography, we have Alice wishing to send a secret message m to Bob. In
order to do so, she embeds m into a cover-object c, to obtain the stego-object s. The
stego-object s is then sent through the public channel.
The warden Wendy who is free to examine all messages exchanged between Alice
and Bob can be passive or active. A passive warden simply examines the message and
tries to determine if it potentially contains a hidden message. If it appears that it does, then
she takes appropriate action, else she lets the message through without any action. An
active warden, on the other hand, can alter messages deliberately, even though she does not
see any trace of a hidden message, in order to foil any secret communication that can
nevertheless be occurring between Alice and Bob. The amount of change the warden is
allowed to make depends on the model being used and the cover-objects being employed.
For example, with images, it would make sense that the warden is allowed to make
changes as long as she does not alter significantly the subjective visual quality of a
suspected stego-image.
47
Just disabling and rendering it useless will defeat the very purpose of
steganography. In this paper, we present a steganalysis technique for detecting stegoimages, i.e. still images containing hidden messages, using image quality metrics.
Although we focus on images, the general techniques we discuss would also be applicable
to audio and video data.
Given the proliferation of digital images, and given the high degree of redundancy
present in a digital representation of an image (despite compression), there has been an
increased interest in using digital images as cover-objects for the purpose of
steganography. The simplest of such techniques essentially embed the message in a subset
of the LSB (least significant bit) plane of the image, possibly after encryption [60]. It is
well known that an image is generally not visually affected when its least significant bit
plane is changed. Popular steganographic tools based on LSB like embedding vary in their
approach for hiding information. Methods like Steganos and Stools use LSB embedding in
the spatial domain, while others like Jsteg embed in the frequency domain.
techniques include the use of quantization and dithering.
steganography techniques, the reader is referred to [60].
Other
techniques is that they assume a passive warden framework. That is they assume the
warden Wendy will not alter the image. We collectively refer to these techniques as
conventional steganography techniques or for brevity, more simply as steganography
techniques.
48
However, if digital
In this thesis, we develop steganalysis techniques both for conventional LSB like
embedding used in the context of a passive warden model and for watermarking which can
be used to embed secret messages in the context of an active warden.
In order to
distinguish between these two models, we will be using the terms watermark and message
when the embedded signal is in the context of watermarking and conventional
steganography, respectively. Furthermore, we simply use the terms marking or embedding
when the context of discussion is general to include both watermarking and steganography.
49
The techniques we present are novel and to the best of our knowledge, the first
attempt at designing general purpose tools for steganalysis. General detection techniques
as applied to steganography have not been devised and methods beyond visual inspection
and specific statistical tests for individual techniques [64, 65, 66, 67] are not present in the
literature. Since too many images have to be inspected visually to sense hidden messages,
the development of a technique to automate the detection process will be very valuable to
the steganalyst.
Our approach is based on the fact that hiding information in digital media requires
alterations of the signal properties that introduce some form of degradation, no matter how
small. These degradations can act as signatures that could be used to reveal the existence
of a hidden message. For example, in the context of digital watermarking, the general
underlying idea is to create a watermarked signal that is perceptually identical but
statistically different from the host signal. A decoder uses this statistical difference in
order to detect the watermark. However, the very same statistical difference that is created
could potentially be exploited to determine if a given image is watermarked or not. In this
thesis, we show that addition of a watermark or message leaves unique artifacts, which can
be detected using Image Quality Measures [68, 69, 70, 71, 72].
The rest of this section is organized as follows. In Section 3.2., we discuss the
selection of the image quality measures to be used in the steganalysis and the rationale of
utilizing concurrently more than one quality measure. We then show that the image
quality metric based distance between an unmarked image and its filtered version is
different as compared to the distance between a marked image and its filtered version.
Section 3.3. describes the regression analysis that we use to build a composite measure of
quality to indicate the presence or absence of a mark. Statistical tests and experiments are
given in Section 3.4. and, finally, conclusions are drawn in Section 3.5.
3.2. Choice of Image Quality Measures
As stated in the introduction, the main goal of this is to develop a discriminator for
message or watermark presence in still images, using an appropriate set of IQMs. Image
quality measurement continues to be the subject of intensive research and experimentation
50
functional of which, correlates well with subjective judgment, that is, the degree of
(dis)satisfaction of an observer [13]. The interest in developing objective measures for
assessing multimedia data lies in the fact that subjective measurements are costly, timeconsuming and not easily reproducible.
performance prediction of vision algorithms against quality loss due to sensor inadequacy
or compression artifacts [24]. In this work, however, we want to exploit image quality
measures, not as predictors of subjective image quality or algorithmic performance, but as
steganalysis tools, that is, as detection features of watermarks or hidden messages.
H (m, n ) = Kg (m, n )
(3.1)
51
where
g (m, n ) = 2 2
{(
exp m 2 + n 2 / 2 2
(3.2)
(3.3)
52
Watermark Signal w
Cover Signal f
Stego Signal g = f + w
+
(a)
f + w
(b)
(c)
Figure 3.1. Schematic descriptions of (a) watermarking or stegoing, (b) filtering an unmarked image, (c) filtering a marked image.
As for the selection of quality measures we have gleaned out the ones that serves
well the purpose of our steganalysis. The rationale of using several quality measures is
that different measures respond with differing sensitivities to artifacts and distortions. For
example, some measures like mean square error respond more to additive noise, others
such as spectral phase or mean square HVS-weighted (Human Visual System) error are
more sensitive to pure blur, while the gradient measure reacts to distortions concentrated
around edges and textures. Recall that some watermarking algorithms inject noise in block
DCT coefficients, others in a narrow-band of global DCT or Fourier coefficients, still
others operate in selected localities in the spatial domain. Since we want our steganalyzer
to be able to work with a variety of watermarking and steganography algorithms, a
multitude of quality features are needed so that the steganalyzer has the chance to probe all
features in an image that are significantly impacted by the embedding watermarking or
steganographic process.
In order to identify specific quality measures that are useful in steganalysis, we use
ANOVA [54] test. ANOVA helped us to distinguish measures that are most consistent and
accurate vis--vis the effects of watermarking and of steganography. Various quality
measures are subjected to a statistical test to determine if the fluctuations of the measures
53
unmarked
marked
1.002
1.001
M6
0.999
0.998
0.01
0.99
0.005
0.995
0
M3
M5
Figure 3.2. Scatter plots of the three Image Quality Measures (M3: Czenakowski measure,
M5: Image Fidelity, M6: Normalized Cross-correlation).
result from image variety or whether they arise due to treatment effects, that is,
watermarking or stegoing.
We performed three different ANOVA tests: The first was for watermarking, the
second for steganography, and the last one for both watermarking and steganography.
For watermarking, the first group consisted of the IQM scores computed from plain
images and their filtered versions. The remaining three groups consisted of the IQM
scores computed from watermarked images by Digimarc [75], PGS [76] and COX [77]
techniques, respectively, and their filtered versions. The data given to the ANOVA
algorithm consisted of four vectors, each of dimensions N , where N = 12 is the number
of images used in the test from the training set. More specifically, consider a typical
quality measure, say M ( i ) , where the parametric dependence upon the watermarking
algorithm is shown with i , i = 0! 3 , for plain images, Digimarc, PGS and COX
techniques, respectively. The N-dimensional vector M reads as:
54
M ( i ) = [M (1 i )! M (N i )] .
T
(3.4)
For steganography, the first group consisted of the IQM scores computed from plain
(non-marked) images and their filtered versions. The remaining three groups consisted of
the IQM scores computed from stegoed images by Steganos [78], Stools [79] and Jsteg
[80], respectively, and their filtered versions.
For the joint watermarking and steganography analysis, the first group consisted of
the IQM scores computed from plain images and their filtered versions. The remaining six
groups consisted of the IQM scores computed from watermarked images by Digimarc,
PGS and COX technique, stegoed images by Steganos, Stools and Jsteg, respectively, and
their filtered versions.
The implications of the result are two fold. One is that, using these features a
steganalysis tool can be designed to detect the watermarked or stegonagraphically marked
images, as we show in Section 3.3., using multivariate regression analysis. The other is
that, current watermarking or steganographic algorithms should exercise more care on
those statistically significant image features to eschew detection. For instance, the relative
ordering of the statistically significant IQMs for watermarking and steganographic
algorithms are different. While the Minkowsky measures were not statistically significant
for steganographic algorithms, they were for the watermarking algorithms. Minimizing the
Mean Square Error (MSE) or the Kullbeck-Leibler distance between the original (cover)
image and the stego image is not necessarily enough to achieve covert communication as
the evidence can be caught by another measure such as spectral measures. The selected
subset of image quality measures in the design of steganalyzer with respect to their
statistical significance were as follows:
55
Table 3.1. One-way ANOVA tests for watermarking, steganography and pooled
waterwarking and steganography
Stego
Watermark&Stego
Watermark
F
p
0.01
6.06
F
0.56
p
0.58
F
5.28
p
0.00
3.28
0.05
0.57
0.58
3.07
0.03
0.13
0.14
4.63
0.62
2.08
2.67
1.95
0.45
5.50
5.49
1.12
0.79
0.47
0.45
0.16
3.30
0.19
0.93
0.93
0.02
0.61
0.14
0.08
0.17
0.72
0.03
0.03
0.37
0.51
0.72
0.72
0.92
0.05
0.90
0.31
0.07
1.08
0.15
0.21
0.40
4.20
3.27
0.02
0.02
0.06
0.001
3.95
3.96
1.16
4.93
0.46
0.74
0.92
0.37
0.86
0.81
0.68
0.04
0.08
0.98
0.98
0.94
0.99
0.05
0.05
0.35
0.02
0.64
0.25
0.13
4.66
0.58
0.74
1.14
3.40
2.36
4.35
4.34
0.66
0.44
4.24
4.22
0.74
2.69
0.47
0.93
0.98
0.01
0.71
0.60
0.37
0.02
0.08
0.01
0.01
0.65
0.81
0.02
0.02
0.61
0.05
0.79
Watermarking: Mean Absolute Error D2, Mean Square Error D1, Czekanowski
Correlation Measure C3, Image Fidelity C2, Cross Correlation C1, Spectral Magnitude
Distance S, Normalized Mean Square HVS Error H.
Pooled Watermarking And Steganography: Mean Absolute Error D2, Mean Square
Error D1, Czekanowski Correlation Measure C3, Angle Mean C4, Spectral Magnitude
Distance S, Median Block Spectral Phase Distance S4, Median Block Weighted Spectral
Distance S5, Normalized Mean Square HVS Error H. We denote this feature set as
= {D1 , D2 , C 3 , C 4 , S 4 , S 5 , H }.
56
In the design phase of the steganalyzer, we regressed the normalized IQM scores to,
respectively, -1 and 1, depending upon whether an image did not or did contain a message.
Similarly, IQM scores were calculated between the original images and their filtered
versions. In the regression model [54], we expressed each decision label y in a sample of
n observations as a linear function of the IQM scores x s plus a random error, :
y1 = 1 x11 + 2 x12 + ... + q x1q + 1
y2 = 1 x21 + 2 x22 + ... + q x2 q + 2
"
yn = 1 xn1 + 2 xn 2 + ... + q xnq + n
(3.5)
57
In this expression, x ij denotes the IQM score, where the first index indicates the i'th image
and the second one the quality measure. The total number of quality measures considered
is denoted by q . The s denote the regression coefficients. The complete statement of
the standard linear model is
y = X nxq +
rank (X ) = q
E [ ] = 0 .
Cov[ ] = 2 I
such that
(3.6)
= X T X
) (X y ).
1
(3.7)
Once the prediction coefficients are obtained in the training phase, these coefficients can
be used in the testing phase. Given an image in the test phase, first it is filtered and the q
IQM scores are obtained using the image and its filtered version.
prediction coefficients, these scores are regressed to the output value. If the output exceeds
the threshold 0 then the decision is that the image is embedded, otherwise the decision is
that the image is not embedded. That is
y = 1 x1 + 2 x 2 + ... + q x q
for y 0
(3.8)
58
The
steganographic tools were Steganos [78], S-Tools [79] and Jsteg [80]. These tools were
among the most cited ones for their pleasing results with respect to steganographic
applications. We used the image database for the simulations. The database contained a
variety of images including computer generated images, images with bright colors, images
with reduced and dark colors, images with textures and fine details like lines and edges,
and well-known images like Lena, peppers etc. We performed eight experiments.
The first three experiments involved watermarking only, namely: 1) The steganalysis
of individual watermarking algorithms, Digimarc, PGS and Cox et. al. for admissible
watermark strengths; 2) The steganalysis of pooled watermarking algorithms at admissible
watermark strengths; 3) In the third experiment the steganalyzer was trained on images
watermarked by Digimarc, and tested on images watermarked by PGS and Cox et. al.
1 ()
"
q ()
X i = xi1 # xiq
= X T X
y i
) (X y )
1
(a)
1 ()
"
q ()
x = x1 # x q
y = 1 x1 + ... + q xq
(b)
59
The final two experiments involved both watermarking tools and steganography
algorithms. 7) The seventh experiment was the steganalysis of pooled three steganographic
and three watermarking algorithms for admissible levels of watermark strength and for
different message lengths. 8) In the last eighth experiment steganalyzer was trained on
images embedded with Steganos, Stools, watermarked by Digimarc and tested on images
embedded with Jsteg and watermarked by Cox et. al. The aim of the last three experiments
was to see the generalizing ability of the steganalyzer in case an image was to be analyzed
unknown to it in the learning phase. In experiments 1, 2 and 3 the feature set was
which was defined in Section II, for the experiments 4, 5 and 6 the feature set was ,
while the feature set was for the remaining experiments 7 and 8. In the training phase
of every experiment, the feature sets were regressed to 1 and +1.
The organizations of the training and testing samples for the experiments are given in
Tables 3.2.-3.12. The images in the training and test sets are denoted by numbers. More
specifically the training set is = {1, ! ,12} and the test set is = {13, ! ,22}. There were
four levels of watermark strength for Digimarc and PGS. We used the original settings of
Coxs technique; modified the 1000 most significant coefficients in spectral domain. The
embedded message sizes were 1/10 and 1/40 of the cover image size for Steganos and
Stools, while the message sizes were 1/100 of the cover image size for Jsteg.
Table 3.2. Training and test samples for Digimarc and PGS for experiment 1
Training samples
Test samples
Level 1
1,2,3
13,14,15
Level 2
4,5,6
16,17
Level 3
7,8,9
18,19,20
Table 3.3. Training and test samples for Cox for experiment 1
Training samples
Test samples
1000 coefficients
1...12
1322
Level 4
10,11,12
21,22
60
Table 3.4. Training and test samples for pooled watermarking algorithms for experiment 2
(L1: Level 1 etc.)
Levels
Train
Test
L1
1
13
Digimarc
L2
L3
2
3
14
15
L4
4
16
L1
5
17
L2
6
18
PGS
L3
7
19
COX
L4
8
20
9,10,11,12
21,22
Table 3.5. Training and test samples for experiment 3: Train on Digimarc, test on PGS and
COX
Training
WM Levels
Training samples
Testing
WM Levels
Test samples
Digimarc
L1
13
PGS
L1
1315
L2
46
L2
1618
L3
79
COX
L3
19,20
L4
1012
21,22
Table 3.6. Training and test samples for Stools for experiment 4
Message size
Training samples
Test samples
Table 3.7. Training and test samples for Jsteg for experiment 4
Message size
1/100 of image size
Training samples
112
Test samples
1322
Table 3.8. Training and test samples for Steganos for experiment 4. (Note: In certain
images the Steganos did not let the messages to be embedded no matter what their size)
Message size
1/40 of image size
Training samples
2,4,8
Test samples
15,17
61
Table 3.9. Training and test samples for pooled steganography algorithms for experiment 5
Message size
Training samples
Test samples
Steganos
1/40
2,4
13,15
Stools
1/40
1,3
14,16
1/10
8,10
17,19
1/10
5,6
18,20
Jsteg
1/100
7,9,11,12
21,22
Table 3.10. Training and test samples for experiment 6: train on Steganos and Stools, test
on Jsteg
Training
Msg. Size
Training samples
Testing
Msg. Size
Test samples
Steganos
1/40
2,4,8
Jsteg
1/100
1322
1/10
10,11
Stools
1/40
1,3,5,6
1/10
7,9,12
Table 3.11. Training and test samples for pooled watermarking and steganography
algorithms for experiment 7
Digimarc
PGS
COX
Steganos
Stools
Jsteg
Level or msg size L2
L3 L2 L3 1000 cof 1/40 1/10 1/40 1/10 1/100
Training samples 7
8
9 10 11,12
2
4
1
3
5,6
Test samples
18
19 20 21
22
13
15 14 16
17
Table 3.12. Training and test samples for experiment 8: train on Steganos, Stools and
Digimarc, test on Jsteg and Cox
Training
Level, msg. size
Train samples
Testing
Level, msg. size
Test samples
Digimarc
PGS
COX
Steganos
Stools
L2
L3 L2 L3 1000 cof 1/40 1/10 1/40 1/10
7
9
11 12
2,4 8,10 1,3 5,6
Digimarc
PGS
COX
Steganos
Stools
L2
L3 L2 L3 1000 cof 1/40 1/10 1/40 1/10
1317
Jsteg
1/100
Jsteg
1/100
18..22
62
Experiment
False
Alarm
Rate
1. a. Digimarc
2/10
1. b. PGS
2/10
1. c. Cox
4/10
2. Pooled Watermarking
3/10
3 Train on Digimarc, Test on PGS and Cox
5/10
4. a. Steganos
2/5
4. b. Stools
4/10
4. c. Jsteg
3/10
5 Pooled Steganography
5/10
6 Train on Steganos and Stools, Test on Jsteg
3/10
7 Pooled Watermarking and Steganography
5/10
8 Train on Digimarc, PGS, Steganos, Stools
4/10
Test on Cox and Jsteg
Miss
Rate
Correct
Detection
2/10
1/10
2/10
3/10
2/10
1/5
1/10
3/10
0/10
3/10
1/10
3/10
16/20
17/20
14/20
14/20
13/20
7/10
15/20
14/20
15/20
14/20
14/20
13/20
63
filtered version is different than the distance between a marked image and its filtered
version. We used image quality metrics as the feature set to distinguish between marked
and non-marked images. To identify specific quality measures, which provide the best
discriminative power, we used ANOVA technique. We have also pointed out the image
features that should be taken more seriously into account in the design of more successful
watermarking or steganographic techniques to eschew detection.
After selecting an
64
4.1. Introduction
compression methods, which give quantitative guarantees on the nature of the loss that is
introduced. Typically, most of the near-lossless compression techniques proposed in the
literature provide a guarantee that no pixel difference between the original and the
compressed image is above a given value [81]. Near-lossless compression is potentially
useful in remote sensing, medical imaging, space imaging and image archiving
applications, where the huge data size could require lossy compression for efficient storage
or transmission. However, the need to preserve the validity of subsequent image analysis
performed on the data set to derive information of scientific or clinical value puts strict
constraints on the error between compressed image pixel values and their originals. In
such cases, near-lossless compression can be used as it yields significantly higher
compression ratios compared to lossless compression and at the same time, the quantitative
guarantees it provides on the nature of loss introduced by the compression process are
more desirable compared to the uncertainties that are faced when using lossy compression.
Another way to deal with the lossy-lossless dilemma faced in applications such as
medical imaging and remote sensing is to use a successively refinable compression
technique that provides a bitstream that leads to a progressive reconstruction of the image.
The increasingly popular wavelet based image compression techniques, for example,
provide an embedded bit stream from which various levels of rate and distortion can be
obtained. With reversible integer wavelets, one gets a progressive transmission capability
all the way to lossless reconstruction. Hence such techniques have been widely cited for
potential use in applications like tele-radiology where a physician can request portions of
an image at increased quality (including lossless reconstruction) while accepting
unimportant portions at much lower quality, thereby reducing the overall bandwidth
65
required for transmitting an image [82, 83]. Indeed, the new still image compression
standard, JPEG 2000, provides such features in its extended forms [84].
We propose a technique that unifies the above two approaches. The proposed
technique produces a bitstream that results in progressive reconstruction of the image just
like what one can obtain with a reversible wavelet codec. In addition, the proposed
scheme provides near-lossless reconstruction with respect to a given bound after each layer
of the successively refinable bitstream is decoded (note, however that these bounds need to
be pre-decided at compression time and cannot be changed during decompression).
Furthermore, the compression performance provided by the proposed technique is superior
or comparable to the best-known lossless and near-lossless techniques proposed in the
literature [86, 87, 88].
refinement, density estimation and the data model in Section 4.2. The compression method
is described in Section 4.3. In Section 4.4. we give experimental results and conlusions are
drawn in Section 4.5.
66
The key problem in lossless compression involves estimating the p.m.f. (probability
mass function) of the current pixel based on previously known pixels (or previously
received information). With this in mind, the problem of successive refinement can then
be viewed as the process of obtaining improved estimates of the p.m.f.s with each pass of
the image. If we also restrict the "support" of the p.m.f. to a given length we then integrate
near-lossless compression and successive refinement with lossless compression in one
single framework.
reconstruction after each pass in the sense that each pixel is within counts of its original
value.
The length of the interval, = 2 + 1 , in which the pixel is, decreases with
successive passes and in the final pass we have lossless reconstruction, = 1 and = 0 .
In order to design a compression technique with these properties we consider image data
compression as asking the optimal question to determine the exact value or the interval of
the pixel depending on whether we are interested in lossless or near-lossless compression,
respectively. Our aim is to find the minimum description length of every pixel based on
the knowledge we have about its neighbors. We know from the Kraft Inequality that a
code length is just another way to express a probability distribution. Massey [89] observed
that the average number of guesses to determine the value of a random variable is
minimized by a strategy that guesses the possible values of the random variable in
decreasing order of probability. Our strategy is to estimate the probability density of the
current pixel using previous information, and based on this density to determine the
interval of the pixel by questioning the most probable interval where the pixel lies.
In the first pass, we assume that the data in use at a coarse level is stationary and
Gaussian in a small neighborhood and we hence use linear prediction. We fit a Gaussian
density for the current pixel, with the linear prediction value taken as the optimal estimate
of its mean, and linear prediction error as its variance. We divide the support of the current
pixels p.m.f., [0,255], into equal 2 + 1 length intervals, being integer. The intervals
are sorted with respect to their probability mass. If the pixel is found to lie in the interval
with highest probability mass the probability mass outside the interval is zeroed out and the
event 1 is fed to the entropy coder; otherwise the next question is asked whether it lies in
the next highest probability interval. Every time one receives a negative answer, the
67
probability mass within the given interval is zeroed out and the event 0 is fed to the
entropy coder till the right interval is found. At the end of the first pass, we have a "crude"
approximation of the image but the maximum error in reconstructed image e
is when
In the remaining passes we then refine the p.m.f. for each pixel by narrowing the size
of the interval in which it is now known to lie. The key problem here is how to refine the
p.m.f. of each pixel based on p.m.f.'s of its neighboring pixels. Note that the causal pixels
already have a refined p.m.f. but the non-causal pixels do not. The non-causal pixels now
give us vital information (like the presence of edges and texture patterns), which can be
used to get better estimates of the refined p.m.f.. However care must be taken, as the
available information is less redundant than in the first pass with respect to the current
p.m.f. estimation. That is we know in which interval the current pixel is, we have more
precise information of causal pixels and less precise information of non-causal pixels. We
need to estimate/refine the current p.m.f. within the constraint of its support.
The
refinement of the current p.m.f. should take all these into account. The p.m.f. estimation
method for second and remaining passes, outlined in the next section, which is simply a
causal and non-causal p.m.f. summation over the current p.m.f.s support takes
successfully all the information into account. Once the p.m.f. is estimated/refined for the
current pixel the same strategy, guessing the correct interval or the value depending on
their probability, is applied to constrain the pixel to narrower intervals or to their exact
values.
In the following sub-sections we review some key concepts and results from known
literature and show how we propose to use these in order to develop the proposed
technique for successively refinable lossless and near-lossless image compression.
4.2.1. Successive Refinement
68
individual solutions of the rate distortion problems can be written as Markov chain. Then
they give examples of signals along with distortion measures for which successive
refinement is possible, i.e. if the source is Gaussian and MSE (Mean Square Error) is used
as the distortion measure the source is successively refinable. Massey [89] considered the
problem of guessing the value of a realization of a random variable X by asking the
questions of the form Is X equal to x until the answer is Yes. It is observed that the
average number of guesses is minimized by a guessing strategy that guesses the possible
values of X in decreasing order of probability. In near-lossless compression we are
interested in intervals where the pixel lies rather than in their exact values, so the optimal
strategy for minimizing the average number of guesses is to guess the interval in
decreasing order of probability masses contained in the intervals. In either case, we first
need to construct probability mass estimates in order to use this strategy. In what follows,
we describe probability mass estimation for different passes.
4.2.2. P.m.f. Estimation in the First Pass
The Gaussian Model: Natural images in general do not satisfy Gaussianity or stationarity
assumptions. But at a coarse level, in a reasonable size neighborhood, the statistics can be
assumed not to differ from the above assumptions and the results of Gauss-Markov
property can be used. We use linear prediction in the first pass assuming the data in a
small neighborhood as stationary and Gaussian. We fit a Gaussian density for the current
pixel, with the linear prediction value taken as the optimal estimate of its mean.
We use causal pixel to predict the current pixel via normal linear regression model.
Suppose X i 1 , X i 2 , !, X i N are random variables representing the causal neighbors of the
current pixel X i , shown in Figure 4.1. Let x ( i 1) k , x ( i 2 ) k , ! , x ( i N ) k , k = 1,,K denote
their realizations. We assume a discrete-time scalar-valued random process {X i } that
satisfies the Nth-order linear prediction equation
X i = j (xi j ) + i
N
j =1
(4.1)
69
sequence that consists of i.i.d. random variables having a Gaussian density with zero mean
and variance 2 . Optimal MMSE (Minimum MSE) linear prediction for an Nth order
stationary Gauss-Markov process {X i } can be formulated as:
E [X i X i 1 , X i 2 , ! , X i N ] = j (xi j ).
N
(4.2)
j =1
For this standard linear model, according to Gauss-Markov theorem, the minimum
variance linear unbiased estimator = [ 1 ... N ] is the least square solution of (4.2) and is
given by [91, 54]
= XT X
) (X y )
1
(4.3)
where y = [X i 1 , X i 2 ,! , X i K ] denote the K context pixels given in Figure 4.2., while the
data matrix X
X i 11 ! X i 1 N
X= "
#
"
X i K 1 ! X i K N
consists of the prediction neighbors of y . The expected value of X i is given by (4.2) and
an unbiased estimator of prediction error variance 2 can be obtained [54] as
2 =
1
y T y T XT y .
K N 1
(4.4)
Based on the principle that the mean-square prediction for a normal random variable
is its mean value, then the density of X i conditioned on causal neighbors is given by
70
1
exp 2
2
2
f (x i xi 1 , xi 2 ,! , xi N ) =
$ i6
$ i 2
$ i 3
$ i 1
$ i 5
xi j (x i j )
j =1
(4.5)
$ i 4
Figure 4.1. Ordering of the causal prediction neighbors of the current pixel i , N=6.
%
%
%
%
%
"
"
"
"
"
"
"
"
"
$
$
$
$
%
%
%
%
Figure 4.2. The context pixels, denoted by and $ , used in the covariance estimation of
the current pixel . The number of context pixels is K=40.
4.2.3. P.m.f. Estimations in the Second Pass
L2 Norm Minimizing Probability Estimation: In finer quantal resolutions after the first pass
we have to leave aside the gaussian assumption since the image data at finer resolutions
behaves more randomly lacking correlation. We thus assume data is independent and
update the estimate of the current pixel density by using neighboring densities, that is by
minimizing the L2 norm of causal and non-causal densities. This stems from the fact that
in the first pass, most of the time the interval is guessed correctly in one question, leading
to Gaussian distributions which fit well to pixels at low resolution (large ). In later
passes the data becomes more independent of each other as more of the redundancy is
removed after each pass, resulting in decreased correlation. At this stage we can use the
non-causal densities as well, which are densities from the non-causal neighborhood of the
pixel from the previous pass. Several probability mass update methods are presented for
the second and higher passes.
71
estimation in the second and higher passes are given in Figure 4.3. Note that we have the
chance to use the non-causal pixels for prediction.
$ i 3
$ i 1
$ i2
i
i 6
i 7
$ i 4
i 5
i 8
Figure 4.3. Causal, $ , and non-causal, , neighbors of the current pixel , , used for
probability mass estimation in the second and higher passes.
Let pi (x ) denote the probability mass function to be estimated given the causal and
non-causal distributions
constraint
{p }
i j
j =1
(p (x ) p (x))
Minimizing
i j
subject to the
+ ( pi (x )) .
(4.6)
J ( pi ) =
(p (x ) p (x ))
1 j N
i j
Using the variational derivative with respect to pi (x ) one finds the distribution to be of the
form
pi (x ) =
*
1 N
pi j (x ) .
N j =1
(4.7)
The method has some desirable properties. If the neighboring interval censored
p.m.f.s do not overlap with the current one then they have no negative effect on the
estimate. If there exist some overlapping, then an evidence gathering from the causal and
non-causal neighbors for the indices of the current interval occurs as they give rise to
higher accumulated probabilities for some indices in the interval. Notice this method of
summing neighboring densities gives automatically more importance to more precise
information residing in the causal neighbor p.m.f.'s concentrated in narrower intervals than
to the less precise information in the non-causal ones.
72
p[ s ] =
si p[ s ]
1
N 1i N li p[l ]
(4.8)
1l 255
H ( p ) + D( p q ) bits on the average to describe the random variable. The squared Hellinger
norm between distributions with densities p and q is defined as
H 2 ( p, q ) =
p(x ) q(x ) dx
2
(4.9)
73
Many,
if
not
all,
smooth
function
classes
satisfy
the
equivalence
minimum description length principle, stochastic complexity, etc. Taking advantage of the
equivalence between D and H 2 we can use one for the other in the derivation of the
optimal pi (x ) .
*
minimize the total extra bits to obtain the shortest description length on the average:
J ( pi ) =
pi (x ) pi j (x ) dx + pi (x )dx
2
1 j N
(4.10)
where is the Lagrange multiplier. Again finding the variational derivative with respect
to p i (x ) and setting it equal to zero, we get
pi (x ) = T pi j (x )
1 j N
(4.11)
74
We can treat the problem of estimating the interval where the current pixel value is
in, within the framework of multi-hypotheses testing [94]. Let the H 1 ,..., H M denote the M
hypotheses where every hypothesis m is associated with an interval (Lm , Rm ] that has a
length of 2 + 1 . The random variable X has a probability mass under each hypothesis
H = m and denote this probability mass by
p X H (x m ) =
i(Lm , Rm ]
(i)
(4.12)
When each hypothesis has an a priori probability, p m = Pr{H = m}, the cumulative
probability mass of H = m and X = x is then p X H (x m )p m . The a posteriori probability
that H = m conditional on X = x is
pH
(m x ) =
p X H (x m )p m
p (x m)p
M
m =1
X H
(4.13)
The rule for maximizing the probability of being correct, so as to minimize the
number of guesses in finding the correct interval, is to choose that m for which p H
(m x )
H = arg max p m p H X (m x )
m
(4.14)
and known as maximum a posteriori (MAP) rule. For equal probabilities, this becomes
the maximum likelihood (ML) rule where we simply choose the hypothesis with the largest
likelihood
H = arg max p X H (x m ) .
m
(4.15)
75
We use ML rule in the first pass, while MAP rule is used in the following passes
since we have the a priori probability mass from the previous passes.
Defining the
indicator function as
xm = I {x (Lm , Rm ]}
(4.16)
where after hypothesis test the (Lm , Rm ] is the correct interval for the current pixel with
highest probability mass in it, the entropy coder is fed with one or zero, respectively, if the
indicator function is one or zero.
yes
yes
Succesive
Mode ?
no
Lossless
Mode?
yes
no
Gaussian pmf estimate,
ML test,
embedded
bitstream
We have
76
pmf estimate,
hypothesis test,
bitstream
pmf estimate
hypothesis test
xm 1
(Lm , Rm ]c
yes
hypothesis test
p =0
(Lm , Rm ]
=0
Entropy
Coder
0
bitstream
Figure 4.5 Details of the encoder block used in Figure 4.4. Here is the length of the
interval (Lm , Rm ] .
The probability mass within an interval is zeroed out every time we fail in making
the correct guess and feed an arithmetic encoder with zero until we find the correct
interval. The whole probability mass is within a fixed length interval when we proceed to
the next pixel.
77
In successive mode, for both lossless and near-lossless compression, the interval of
the current pixel at second or the following passes can be narrowed in two ways: One way
is to split it into two intervals and to perform binary hypothesis test. The other way is to
split the current interval into more than two equal length intervals and to perform multiple
hypothesis test.
bitstream Entropy
Decoder
z = {0,1}
pmf estimate
hypothesis test
z 1
hypothesis test
no
mid arg p
0
yes
p =0
(Lm , Rm ]
The decoder given in Figure 4.6. is similar to the encoder. Based on the estimated
p.m.f. and decoded sequences of successes and failures, the p.m.f. support for the current
pixel is found. Within the support of the estimated p.m.f. the intervals are sorted with
respect to their probability mass and the correct interval is found using decoded failure and
success events. Recall that the decoded events are the outcomes of the tests whether the
78
current pixel lies in that interval or not. Whenever the decoded event z is not success for
the given interval, this interval is discarded, until the correct interval is tested with success.
The reconstructed value of the current pixel is taken as the midvalue of the successfully
tested interval, which guarantees that the error in reconstructed value is not more than half
of the interval length minus one, that is e
interval.
4.4. Experimental Results
Successive and direct mode lossless compression results, which are obtained with
three passes are given in Table 4.1. The interval lengths of the pixel values are taken 8 in
the first pass, 4 in the second pass and 1 in the final third pass in the successive mode.
One-pass results are also given in the same table. Near-lossless results obtained with one
pass for = 1 , = 3 and = 7 , are given in Tables 4.2., 4.3., and 4.4. respectively.
Table 4.1. Comparison of lossless compression results: proposed method versus CALIC.
Proposed
CALIC [85]
3-pass 1-pass
BARB
4.18
4.21
4.45
LENA
4.07
4.07
4.11
ZELDA
3.81
3.79
3.86
GOLDHILL
4.65
4.69
4.63
BOAT
4.06
4.08
4.15
MANDRILL
5.89
5.96
5.88
PEPPERS
4.41
4.37
4.38
MEAN
4.44
4.45
4.49
=0
Image
SPIHT [52]
Bpp PSNR
CALIC [85]
e
bpp PSNR
CALIC [95]
e
bpp PSNR
Proposed
Bpp PSNR
Hotel
2.70 76.93 6
2.76 49.95 1
2.70 50.07 1
2.84 49.92 1
Ultrasound
1.60 46.33 8
1.99 52.36 1
1.60 51.76 1
1.70 49.84 1
Caf
3.22 47.09 7
3.30 49.97 1
3.22 50.09 1
3.44 49.85 1
Barbara
Finger
2.94 49.91 1
3.82 47.27 5
3.90 49.89 1
2.65 49.88 1
3.82 49.98 1
3.80 49.89 1
79
Image
SPIHT [52]
Bpp PSNR
CALIC [85]
e
Bpp PSNR
CALIC [95]
e
bpp PSNR
Proposed
Bpp PSNR
Hotel
1.69 41.77 11
1.74 42.31 3
1.67 42.37 3
1.75 42.30 3
Ultrasound
1.09 41.42 16
1.51 45.04 3
1.09 44.86 3
1.30 42.22 3
Caf
2.19 41.42 16
2.27 42.41 3
2.19 42.49 3
2.31 42.21 3
Barbara
1.92 42.23 3
Finger
2.67 40.71 15
1.61 42.27 3
2.73 42.11 3
2.67 42.18 3
2.63 42.10 3
Image
SPIHT [52]
Bpp PSNR
CALIC [85]
e
Bpp PSNR
CALIC [95]
e
bpp PSNR
Proposed
Bpp PSNR
Hotel
0.95 37.76 20
0.97 36.50 7
0.95 36.75 7
0.96 36.42 7
Ultrasound
0.72 37.17 25
0.99 37.19 7
0.72 38.55 7
0.78 36.48 7
Caf
1.43 36.49 30
1.50 36.31 7
1.43 36.45 7
1.48 35.80 7
Barbara
1.19 36.21 7
Finger
1.77 35.90 21
1.80 35.43 7
0.91 36.27 7
1.77 35.59 7
1.73 35.43 7
Table 4.5. Comparison of bit/pixel efficiency and peak signal to noise ratio in dB of the
proposed algorithm versus the CALIC [85] algorithm.
=1
=2
=3
=4
=5
=6
=7
bit/pixel gain
0.05
0.07
0.08
0.09
0.10
0.11
0.10
PSNR gain
-0.01
0.01
0.03
0.05
0.07
0.07
0.08
Notice that the lossless compression results in bits per pixel obtained with one pass
and three passes are almost the same, contrary to ones expectation that more than one pass
would yield better performance as the non-causal information is available in the following
passes.
This is because the non-causal information is less precise than the causal
information.
80
4.5. Conclusion
The originality of the method consists in looking at the image data compression as
one of asking the optimal questions to determine the interval in which the current pixel
lies. With successive passes of the image, the length of this interval is gradually narrowed
until it becomes of length one, in which case we have lossless reconstruction. Stopping the
process in any intermediate stage gives near-lossless compression. Although our
experimental results show that the proposed method brings in only modest gains in dB
measure or bit per pixel efficiency, we believe that there is room for improvement. Our
future work will explore different avenues for improving upon the results given in this
paper. For example, we have no mechanism for learning global or repeated patterns in the
image. Context based techniques like CALIC, keep a history of past events in a suitably
quantized form and use these to better model subsequent events.
We believe such
mechanisms when incorporated within our framework will give additional improvements.
The proposed techniques provides a flexible framework and many variations of the
basic method are possible. For example, the quality of reconstruction as defined by the
near-lossless parameter k can be made to vary from region to region or even from pixel to
pixel based on image content or other requirements. Given this fact, different regions in
the image can be refined to different desired precision using HVS properties. To this
effect, flat regions where compression artifact visibility is higher can be refined more
accurately, thus achieving perceptually lossless compression. In addition, it would be
interesting to extend the technique to multispectral images.
81
measures using a ANOVA analyses has revealed that local phase-magnitude measures (S2
or S5), HVS-filtered L1, L2 norms, edge stability measure are most sensitive to coding,
blur and artifacts, while the mean square error (D1) remains as the measure for additive
noise. One can state that combined spectral phase-magnitude measures and HVS filtered
error norms should be paid more attention in the design of coding algorithms and sensor
evaluation. On the other hand the pixel-difference based measures remain still to be the
measures responsive to distortions and least affected by image variety.
The Kohonen map of the measures has been useful in depicting similar ones, and
identifying the ones that are sensitive possibly to different distortion artifacts in
compressed images. The correlation between various measures has been depicted via
Kohonens Self-Organizing Map. The placement of measures in the two-dimensional map
has been in agreement with ones intuitive grouping.
Future work will address subjective experiments and prediction of subjective image
quality using the above salient measures identified. Another possible avenue is to combine
various fundamental metrics for better performance prediction.
82
features that should be taken more seriously into account in the design of more successful
watermarking or steganographic techniques to eschew detection.
After selecting an
Although our
83
experimental results show that the proposed method brings in only modest gains in dB
measure or bit per pixel efficiency, we believe that there is room for improvement. Our
future work will explore different avenues for improving upon the results given in this
thesis. For example, we have no mechanism for learning global or repeated patterns in the
image. Context based techniques like CALIC, keep a history of past events in a suitably
quantized form and use these to better model subsequent events.
We believe such
mechanisms when incorporated within our framework will give additional improvements.
The proposed techniques provides a flexible framework and many variations of the
basic method are possible. For example, the quality of reconstruction as defined by the
near-lossless parameter k can be made to vary from region to region or even from pixel to
pixel based on image content or other requirements. Given this fact, different regions in
the image can be refined to different desired precision using HVS properties. To this
effect, flat regions where compression artifact visibility is higher can be refined more
accurately, thus achieving perceptually lossless compression. It would be interesting to
extend the technique to multispectral images.
Finally, multidimensional pmf estimation and optimal question approach may result
in several birds with one shot.
84
REFERENCES
1.
Perlmutter, S.M., P.C. Cosman, R.M. Gray, R.A. Olshen, D. Ikeda, C.N. Adams, B.J.
Betts, M.B. Williams, K.O. Perlmutter, J. Li, A. Aiyer, L. Fajardo, R. Birdwell and
B.L. Daniel, Image Quality in Lossy Compressed Digital Mammograms, Signal
Processing, Vol. 59, pp. 189-210, 1997.
2.
Lambrecht, C. B., Ed., Special Issue on Image and Video Quality Metrics, Signal
Processing, Vol. 70, 1998.
3.
4.
5.
6.
Ridder, H., Minkowsky Metrics as a Combination Rule for Digital Image Coding
Impairments, in Proceedings SPIE 1666: Human Vision, Visual Processing, and
Digital Display III, pp. 17-27 1992.
7.
Watson, A. B. (Ed.), Digital Images and Human Vision, Cambridge, MA, MIT Press,
1993.
8.
Girod, B., Whats Wrong with Mean-squared Error, in A. B. Watson (Ed.), Digital
Images and Human Vision, Chapter 15, Cambridge, MA, MIT Press 1993.
85
9.
Miyahara, M., K. Kotani and V. R. Algazi, Objective Picture Quality Scale (PQS) for
Image Coding, IEEE Transactions On Communications, Vol. 46, No. 9, pp. 12131226, 1998.
10. Nill, N. B. and B. H. Bouzas, Objective Image Quality Measure Derived From
Digital Image Power Spectra, Optical Engineering, Vol. 31, No. 4, pp. 813-825,
1992.
11. Franti, P., Blockwise Distortion Measure for Statistical and Structural Errors in
Digital Images Signal Processing: Image Communication, 13, 89-98 (1998).
12. Winkler, S., A perceptual distortion metric for digital color images. in Proceedings
of the Fifth International Conference on Image Processing, vol. 3, pp. 399-403,
Chicago, Illinois, October 4-7, 1998.
13. Daly, S., The visible differences predictor: An algorithm for the assessment of image
fidelity, in A. B.Watson (Ed.), Digital Images and Human Vision, Cambridge, MA,
MIT Press, pp. 179-205, 1993.
14. Frese, T., C. A. Bouman and J. P. Allebach, Methodology for Designing Image
Similarity Metrics Based on Human Visual System Models, Proceedings of
SPIE/IS&T Conference on Human Vision and Electronic Imaging II, Vol. 3016, pp.
472-483, San Jose, CA, 1997.
15. CCIR, Rec. 500-2, Method for the Subjective Assessment of the Quality of Television
Pictures, 1986.
16. Van Dijk, M. and J. B. Martens, Subjective Quality Assessment of Compressed
Images, Signal Processing, Vol. 58, pp. 235-252, 1997.
17. Rohaly, A.M., P. Corriveau, J. Libert, A. Webster, V. Baroncini, J. Beerends, J.L Blin,
L. Contin, T. Hamada, D. Harrison, A. Hekstra, J. Lubin, Y. Nishida, R. Nishihara, J.
Pearson, A. F. Pessoa, N. Pickford, A. Schertz, M. Visca, A. B. Watson and S.
86
Winkler: "Video Quality Experts Group: Current results and future directions."
Proceedings SPIE Visual Communications and Image Processing, Vol. 4067, Perth,
Australia, June 21-23, 2000.
18. Corriveau, P. and A. Webster, "VQEG Evaluation of Objective Methods of Video
Quality Assessment", SMPTE Journal, Vol. 108, pp. 645-648, 1999.
19. Kanugo, T. and R. M. Haralick, A Methodology for Quantitative Performance
Evaluation of Detection Algorithms, IEEE Transactions on Image Processing, Vol.
4, No. 12, pp. 1667-1673, 1995.
20. Matrik, R., M. Petrou and J. Kittler, Error-Sensitivity Assessment of Vision
Algorithms, IEE Proceedings-Vis. Image Signal Processing, Vol. 145, No. 2, pp.
124-130, 1998.
21. Grim, M. and H. Szu, Video Compression Quality Metrics Correlation with Aided
Target Recognition (ATR) Applications, Journal of Electronic Imaging, Vol. 7, No.
4, pp. 740-745, 1998.
22. Barrett, H. H., Objective Assessment of Image Quality: Effects of Quantum Noise
and Object Variability, Journal of Optical Society of America, Vol. A, No. 7, pp.
1261-1278, 1990.
23. Barrett, H. H., J. L. Denny, R. F. Wagner and K. J. Myers, Objective Assessment of
Image Quality II: Fisher Information, Fourier-Crosstalk, and Figures of Merit for Task
Performance, Journal of Optical Society of America, Vol. A, No. 12, pp. 834-852,
1995.
24. Halford, C.E., K.A. Krapels, R.G. Driggers and E.E. Burroughs, Developing
Operational Performance Metrics Using Image Comparison Metrics and the Concept
of Degradation Space, Optical Engineering, Vol. 38, No. 5, pp. 836-844, 1999.
87
88
Watson (Ed.), Digital Images and Human Vision, Cambridge, MA, MIT Press, pp.
109-138, 1993.
35. Rajan, P. K. and J. M. Davidson, Evaluation of Corner Detection Algorithms,
Proceedings of Twenty-First Southeastern Symposium on System Theory, pp. 29-33,
1989.
36. Canny, J., A Computational Approach to Edge Detection, IEEE Transactions on
Pattern Analysis and Machine Intelligence, Vol. 8, No. 6, pp. 679-698, 1986 .
37. Carevic, D. and T. Caelli, Region Based Coding of Color Images Using KLT,
Graphical Models and Image Processing, Vol. 59, No. 1, pp. 27-38, 1997.
38. Tao, H. and T. Huang, Color Image Edge Detection using Cluster Analysis, IEEE
International Conference On Image Processing, pp. 834-836, California, 1997.
39. Trahanias, P. E. and A. N. Venetsanopoulos, Vector Order Statistics Operators as
Color Edge Detectors, IEEE Transactions on System Man and Cybernetics, Vol. 26,
No. 1, pp. 135-143, 1996.
40. Lipschutz, M. M., Theory and Problems of Differential Geometry, McGraw-Hill Inc.,
1969.
41. McIvor, M. and R. J. Valkenburg, A Comparison of Local Surface Geometry
Estimation Methods, Machine Vision and Applications, Vol. 10, pp.17-26, 1997.
42. Avcba, ., B. Sankur and K. Sayood, New Distortion Measures for Color Image
Vector Quantization, ISAS'98: World Multiconference on Systemics, Cybernetics and
Informatics II, pp. 496-503, July 12-16, Orlando, USA, 1998.
43. Duda, R. O. and P. E. Hart, Pattern Recognition and Scene Analysis, New-York,
Wiley, 1973.
89
44. Popat, K. and R. Picard, Cluster Based Probability Model and Its Application to
Image and Texture Processing, IEEE Transactions on Image Processing, Vol. 6, No.
2, pp.268-284, 1997.
45. Basseville, M., Distance Measures for Signal Processing and Pattern Recognition,
Signal Processing Vol. 18, pp. 349-369, 1989.
46. Nill, N. B., A Visual Model Weighted Cosine Transform for Image Compression and
Quality Assessment, IEEE Transactions on Communications, Vol. 33, No. 6, pp.
551-557, 1985.
47. Avcba, . and B. Sankur, Statistical Analysis of Image Quality Measures, in
Proceedings EUSIPCO'2000: 10th European Signal Processing Conference, pp.
2181-2184, 5-8 September, Tampere, Finland, 2000.
48. Avcba, . and B. Sankur, okbantli Imgelerde Nitelik llerinin Istatistiksel
Davranisi, SIU'99: IEEE 1999 Sinyal leme ve Uygulamalar Kurultay, Bilkent,
Ankara, 1999.
49. Avcba, . and B. Sankur, Kodlama Algoritmalarinin Bilgi Ierigini Koruma
Basarimlari, SIU98: 6. Sinyal Isleme ve Uygulamalari Kurultayi, pp. 248-253,
Kizilcahamam, Ankara, 1998.
50. Frese, T., C. A. Bouman and J. P. Allebach, A Methodology for Designing Image
Similarity Metrics Based on Human Visual System Models, Tech. Rep. TR-ECE 97-2,
Purdue University, West Lafayette, IN, 1997.
51. Wallace, G. K., The JPEG Still Picture Compression Standard, IEEE Transactions
on Consumer Electronics, Vol. 38, No. 1, pp. 18-34, 1992.
52. Said, A. and W. A. Pearlman, A New Fast and Efficient Image Codec Based on Set
Partitioning in Hierarchical Trees, IEEE Transactions Circuits and Systems: Video
Technology, Vol. 6, No. 3, pp. 243-250, 1996.
90
53. Martinez, A.M. and R. Benavente, The AR Face Database, CVC Technical Report No.
24, June 1998.
54. Rencher, A. C., Methods of Multivariate Analysis, John Wiley, New York, 1995.
55. Kohonen, T., Self-Organizing Maps, Springer-Verlag, Heidelberg, 1995.
56. Watson, A. B., DCTune: A Technique for Visual Optimization of DCT Quantization
Matrices for Individual Images, Society for Information Display Digest of Technical
Papers, Vol. 24, pp. 946-949, 1993.
57. Simmons, G. J., The prisoners problem and the subliminal channel, in Proceedings
IEEE Workshop Communications Security CRYPTO83, Santa Barbara, CA, 1983, pp.
5167.
58. Simmons, G. J., The history of subliminal channels, IEEE Journal of Selected Areas
on Communications, vol. 16, pp. 452462, May 1998.
59. Petitcolas, F. A. P., R. J. Anderson and M. G. Kuhn, Information HidingA
Survey, Proceedings of the IEEE, Vol. 87, pp.1062-1078, July 1999.
60. Johnson, N. F. and S. Katzenbeisser, A Survey of steganographic techniques, in S.
Katzenbeisser and F. Petitcolas (Eds.): Information Hiding, pp. 43-78. Artech House,
Norwood, MA, 2000.
61. Dugelay, J-L. and S. Roche, A Survey of current watermarking techniques, in S.
Katzenbeisser and F. Petitcolas (Eds.): Information Hiding, pp. 43-78. Artech House,
Norwood, MA, 2000.
62. Langelaar, G. C., I. Setyawan and R. L. Lagendijk, Watermarking Digital Image and
Video Data, A State of-the-Art Overview, IEEE Signal Processing Magazine, pp. 2046, September 2000.
91
92
93
94
92. Turnbull, B. W., The Empirical Distribution Function with Arbitrary Grouped,
Censored and Truncated Data, Journal Royal Statist. Soc. Ser. B, Vol. 38, pp. 290295, 1976.
93. Barron, A., J. Rissanen and B. Yu, The Minimum Description Length Principle in
Coding and Modeling, IEEE Transactions on Information Theory, Vol. 44, No. 6, pp.
2743-2760, October 1998.
94. Van Trees, H. L., Detection Estimation and Modulation Theory Part I, Wiley, 1968.
95. Wu, X. and P. Bao, L Constrained High-Fidelity Image Compression via Adaptive
Context Modeling, IEEE Transactions on Image Processing, pp.536-542, Apr. 2000.
95
Arikan, E., An Inequality on Guessing and its Application to Sequential Decoding, IEEE
Transactions On Information Theory, Vol. 42, No. 1, pp. 99-105, January 1991.
Avcba, ., B. Sankur, K. Sayood and N. Memon, Component Ratio Preserving
Compression for Remote Sensing Applications, Proceedings SPIE Conf. 3974:
Image and Video Communications and Processing, pp. 700-708, San Jose, USA,
2000.
Barni, M., F. Bartolini, and A. Piva, Digital Waermarking Of Visual Data: State of the Art
and New Trends, EUSIPCO'2000: 10th European Signal Processing Conference,
5-8 September, Tampere, Finland, 2000.
Barron, A. R. and T. Cover, Minimum Complexity Density estimation, IEEE
Transactions on Information Theory, Vol. 37, No. 4, pp. 1034-1054, July 1991.
Berger, T. and J. D. Gibson, Lossy Source Coding, IEEE Transactions On Information
Theory, Vol. 44, No. 6, pp. 2693-2723, October 1998.
Brocket, P. L., A. Charnes and K. H. Paick, Information Theoretic Approach to Unimodal
Density Estimation, IEEE Transactions on Information Theory, Vol. 41, No. 3, pp.
824-829, January 1991.
Cover, T. M. and J. A. Thomas, Elements of Information Theory, New York, Wiley, 1991.
Derin, H., and P. A. Kely, Discrete-Index Markov Type Random Processes, Proceedings
of the IEEE, Vol. 77. No. 10, pp. 1485-1510, October 1989.
Gubner, J. A., Distributed Estimation and Quantization, IEEE Transactions on
Information Theory, Vol. 39, No. 4, pp. 1456-1459, July 1993.
96
97
98