An Information Fidelity Criterion For Image Quality Assessment Using Natural Scene Statistics
An Information Fidelity Criterion For Image Quality Assessment Using Natural Scene Statistics
Abstract—Measurement of visual quality is of fundamental im- or videos that come from a system running under a given con-
portance to numerous image and video processing applications. figuration. The obvious way of measuring quality is to solicit
The goal of quality assessment (QA) research is to design algo- the opinion of human observers. However, such subjective eval-
rithms that can automatically assess the quality of images or videos
in a perceptually consistent manner. Traditionally, image QA algo- uations are not only cumbersome and expensive, but they also
rithms interpret image quality as fidelity or similarity with a “ref- cannot be incorporated into automatic systems that adjust them-
erence” or “perfect” image in some perceptual space. Such “full- selves in real-time based on the feedback of output quality. The
referenc” QA methods attempt to achieve consistency in quality goal of quality assessment (QA) research is, therefore, to design
prediction by modeling salient physiological and psychovisual fea- algorithms for objective evaluation of quality in a way that is
tures of the human visual system (HVS), or by arbitrary signal fi-
delity criteria. In this paper, we approach the problem of image QA consistent with subjective human evaluation. Such QA methods
by proposing a novel information fidelity criterion that is based on would prove invaluable for testing, optimizing, bench-marking,
natural scene statistics. QA systems are invariably involved with and monitoring applications.
judging the visual quality of images and videos that are meant for Traditionally, researchers have focussed on measuring signal
“human consumption.” Researchers have developed sophisticated fidelity as a means of assessing visual quality. Signal fidelity
models to capture the statistics of natural signals, that is, pictures
and videos of the visual environment. Using these statistical models is measured with respect to a reference signal that is assumed
in an information-theoretic setting, we derive a novel QA algorithm to have “perfect” quality. During the design or evaluation of a
that provides clear advantages over the traditional approaches. In system, the reference signal is typically processed to yield a dis-
particular, it is parameterless and outperforms current methods torted (or test) image, which can then be compared against the
in our testing. We validate the performance of our algorithm with reference using so-called full reference (FR) QA methods. Typ-
an extensive subjective study involving 779 images. We also show
that, although our approach distinctly departs from traditional ically this comparison involves measuring the “distance” be-
HVS-based methods, it is functionally similar to them under cer- tween the two signals in a perceptually meaningful way. This
tain conditions, yet it outperforms them due to improved modeling. paper presents a FR QA method for images.
The code and the data from the subjective study are available at [1]. A simple and widely used fidelity measure is the peak
Index Terms—Image information, image quality assessment signal-to-noise ratio (PSNR), or the corresponding distortion
(QA), information fidelity, natural scene statistics (NSS). metric, the mean-squared error (MSE). The MSE is the
norm of the arithmetic difference between the reference and the
test signals. It is an attractive measure for the (loss of) image
I. INTRODUCTION
quality due to its simplicity and mathematical convenience.
Images and videos of the three-dimensional (3-D) visual en- found in [3]–[5]. A number of HVS-based methods have been
vironment come from a common class: the class of natural proposed in the literature. Some representative methods include
scenes. Natural scenes form a tiny subspace in the space of [6]–[13].
all possible signals, and researchers have developed sophisti-
cated models to characterize these statistics. Most real-world B. Arbitrary Signal Fidelity Criteria
distortion processes disturb these statistics and make the image Researchers have also attempted to use arbitrary signal fi-
or video signals unnatural. We propose to use natural scene delity criteria in a hope that they would correlate well with per-
models in conjunction with distortion models to quantify the ceptual quality. In [14] and [15], a number of these are evalu-
statistical information shared between the test and the reference ated for the purpose of QA. In [16] a structural similarity metric
images, and posit that this shared information is an aspect of (SSIM) was proposed to capture the loss of image structure.
fidelity that relates well with visual quality. SSIM was derived by considering hypothetically what consti-
The approaches discussed above describe three ways in which tutes a loss in signal structure. It was claimed that distortions in
one could look at the image QA problem. One viewpoint is an image that come from variations in lighting, such as contrast
structural, from the image-content perspective, in which im- or brightness changes, are nonstructural distortions, and that
ages are considered to be projections of objects in the 3-D en- these should be treated differently from structural ones. It was
vironment that could come from a wide variety of lighting con- claimed that one could capture image quality with three aspects
ditions. Such variations constitute nonstructural distortion that of information loss that are complementary to each other: corre-
should be treated differently from structural ones, e.g., blur- lation distortion, contrast distortion, and luminance distortion.
ring or blocking that could hamper cognition. The second view-
point is psychovisual, from the human visual receiver perspec- C. Limitations
tive, in which researchers simulate the processing of images by A number of limitations of HVS-based methods are discussed
the HVS, and predict the perceptual significance of errors. The in [16]. In summary, these have to do with the extrapolation
third viewpoint, the one that we take in this paper, is the statis- of the vision models that have been proposed in the visual
tical viewpoint that considers natural images to be signals with psychology literature to image processing problems. In [16],
certain statistical properties. These three views are fundamen- it was claimed that structural QA methods avoid some of the
tally connected with each other by the following hypothesis: The limitations of HVS-based methods since they are not based on
physics of image formation of the natural 3-D visual environ- threshold psychophysics or the HVS models derived thereof.
ment leads to certain statistical properties of the visual stimulus, However, they have some limitations of their own. Specifically,
in response to which the visual system has evolved over eons. although the structural paradigm for QA is an ambitious para-
However, different aspects of each of these views may have dif- digm, there is no widely accepted way of defining structure and
ferent complexities when it comes to analysis and modeling. In structural distortion in a perceptually meaningful manner. In
this paper, we show that the statistical approach to image QA [16], the SSIM was constructed by hypothesizing the functional
requires few assumptions, is simple and methodical to derive, forms of structural and nonstructural distortions and the interac-
and yet it is competitive with the other two approaches in that it tion between them. In this paper, we take a new approach to the
outperforms them in our testing. Also, we show that the statis- QA problem. As mentioned in the Introduction, the third alter-
tical approach to QA is a dual of the psychovisual approach to native to QA, apart from HVS-based and structural approaches,
the same problem; we demonstrate this duality toward the end is the statistical approach, which we use in an information
of this paper. theoretic setting. Needless to say, even our approach will make
Section II presents some background work in the field of certain assumptions, but once assumptions regarding the source
FR QA algorithms as well as an introduction to NSS models. and distortion models and the suitability of mutual information
Section III presents our development of the information fidelity as a valid measure of perceptual information fidelity are made,
criterion (IFC). Implementation and subjective validation de- the components of our algorithm and their interactions fall
tails are provided in Sections IV and V, while the results are through without resorting to arbitrary formulations.
discussed in Section VI. In Section VII, we compare and con- Due to the importance of the QA problem to researchers and
trast our method with HVS-based methods, and conclude the developers in the image and video processing community, a con-
paper in Section VIII. sortium of experts, the video quality experts group (VQEG), was
formed in 1997 to develop, validate, and recommend objective
II. BACKGROUND video QA methods [17]. VQEG Phase I testing reported that
all of the proponent methods tested, which contained some of
FR QA techniques proposed in the literature can be divided
the most sophisticated video QA methods of the time, were sta-
into two major groups: those based on the HVS and those based
tistically indistinguishable from PSNR under their testing con-
on arbitrary signal fidelity criteria (a detailed review of the re-
ditions [18]. The Phase II of testing, which consisted of new
search on FR QA methods can be found in [2]–[5]).
proponents under different testing configurations, is also com-
plete and the final report has recommended an FR QA method,
A. HVS Error-Based QA Methods although it has been reported that none of the methods tested
HVS-based QA methods come in different flavors based on were comparable to the “null mode,” a hypothetical model that
tradeoffs between accuracy in modeling the HVS and computa- predicts quality exactly [19], meaning that QA methods need to
tional feasibility. A detailed discussion of these methods can be be improved further.
SHEIKH et al.: INFORMATION FIDELITY CRITERION FOR IMAGE QUALITY ASSESSMENT 2119
B. Distortion Model
III. INFORMATION FIDELITY CRITERION FOR
IMAGE QUALITY ASSESSMENT The distortion model that we use in this paper is also de-
scribed in the wavelet domain. It is a simple signal attenuation
In this paper, we propose to approach the QA problem as and additive Gaussian noise model in each subband
an information fidelity problem, where a natural image source
(3)
communicates with a receiver through a channel. The channel
imposes fundamental limits on how much information could where denotes the RF from a subband in the reference signal,
flow from the source (the reference image), through the channel denotes the RF from the corresponding
(the image distortion process) to the receiver (the human ob- subband from the test (distorted) signal, is
server). Fig. 1 shows the scenario graphically. A standard way a deterministic scalar attenuation field, and
2120 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 12, DECEMBER 2005
is a stationary additive zero-mean Gaussian noise RF with vari- information for one subband and later generalize for multiple
ance . The RF is white and is independent of and . subbands.
This model captures two important, and complementary, distor- Let denote elements from . In
tion types: blur and additive noise. We will assume that most this section, we will assume that the underlying RF is uncor-
distortion types that are prevalent in real world systems can be related (and, hence, is an RF with conditionally independent
roughly described locally by a combination of these two. In our elements given ), and that the distortion model parameters
model, the attenuation factors can capture the loss of signal and are known a priori. Let de-
energy in a subband to the blur distortion, while the process note the corresponding elements from . The mutual infor-
can capture additive noise separately. Additionally, changes in mation between these is denoted as .
image contrast that result from variations in ambient lighting are Due to the nonlinear dependence among the by way of
not modeled as noise since they too can be incorporated into the , it is much easier to analyze the mutual information assuming
attenuation field . is known. This conditioning “tunes” the GSM model for the
The choice of a proper distortion model is crucial for image fi- particular reference image, and, thus, models the source more
delity assessments that are expected to reflect perceptual quality. specifically. Thus, the IFC that we propose in this paper is the
In essence we want the distortion model to characterize what the conditional mutual information , where
HVS perceives as distortion. Based on our experience with dif- are the corresponding elements of
ferent distortion models, we are inclined to hypothesize that the , and denotes a realization of . In this paper, we will
visual system has evolved over time to optimally estimate nat- denote as . With the
ural signals embedded in natural distortions: blur, white noise, stated assumptions on and the distortion model (3), one can
and brightness and contrast stretches due to changes in ambient show
lighting. The visual stimulus that is encoded by the human eyes
is blurred by the optics of the eye as well as the spatially varying
(4)
sampling in the retina. It is therefore natural to expect evolu-
tion to have worked toward near-optimal processing of blurry
signals, say for controlling the focus of the lens, or guiding vi- (5)
sual fixations. Similarly, white noise arising due to photon noise
or internal neuron noise (especially in low light conditions) af-
fects all visual signals. Adaptation in the HVS to changes in (6)
ambient lighting has been known to exist for a long time [34].
Thus, HVS signal estimators would have evolved in response to
natural signals corrupted by natural distortions, and would be where we get (4) by the chain rule [36], and (5) and (6) by con-
near-optimal for them, but suboptimal for other distortion types ditional independence of given , independence of the noise
(such as blocking or colored noise) or signal sources. Hence, , the fact that the distortion model keeps independent of
“over-modeling” the signal source or the distortion process is , and that given and are independent of
likely to fail for QA purposes, since it imposes assumptions . Using the fact that are Gaussian given , and
of the existence of near-optimal estimators in the HVS for the are also Gaussian with variance , we get
chosen signal and distortion models, which may not be true. In
essence distortion modeling combined with NSS source mod-
eling is a dual of HVS signal estimator modeling. (7)
Another hypothesis is that the field could account for the
case when the additive noise is linearly correlated with .
(8)
Previously, researchers have noted that as the correlation of the
noise with the reference signal increases, MSE becomes poorer
in predicting perceptual quality [35]. While the second hypoth- (9)
esis could be a corollary to the first, we feel that both of these
hypotheses (and perhaps more) need to be investigated further
with psychovisual experiments so that the exact contribution of (10)
a distortion model in the quality prediction problem could be
understood properly. For the purpose of image QA presented in
this paper, the distortion model of (3) is adequate, and works where denotes the differential entropy of a continuous
well in our simulations. random variable , and for distributed as
[36].
Equation (10) was derived for one subband. It is straightfor-
C. Information Fidelity Criterion ward to use separate GSM RFs for modeling each subband of in-
terest in the image. We will denote the RF modeling the wavelet
Given a statistical model for the source and the distortion coefficients of the reference image in the th subband as , and
(channel), the obvious IFC is the mutual information between in test (distorted) image as , and assume that are indepen-
the source and the distorted images. We first derive the mutual dent of each other. We will further assume that each subband
SHEIKH et al.: INFORMATION FIDELITY CRITERION FOR IMAGE QUALITY ASSESSMENT 2121
is distorted independently. Thus, the RFs are also indepen- is known to be a spatially correlated field, and can be
dent of each other. The IFC is then obtained by summing over assumed to be unity without loss of generality.
all subbands
B. Assumptions About the Distortion Model
(11)
The IFC assumes that the distortion model parameters and
are known a priori, but these would need to be estimated
where denotes coefficients from the RF of the th in practice. We propose to partition the subbands into blocks
subband, and similarly for and . and assume that the field is constant over such blocks, as are
Equation (11) is our IFC that quantifies the statistical infor- the noise statistics . The value of the field over block ,
mation that is shared between the source and the distorted im- which we denote as , and the variance of the RF over block
ages. An attractive feature of our criterion is that like MSE , which we denote as , are fairly easy to estimate (by linear
and some other mathematical fidelity metrics, it does not in- regression) since both the input (the reference signal) as well as
volve parameters associated with display device physics, data the output (the test signal) of the system (3) are available
from visual psychology experiments, viewing configuration in-
formation, or stabilizing constants, which dictate the accuracy (12)
of HVS-based FR QA methods (and some structural ones, too). (13)
The IFC does not require training data either. However, some
implementation parameters will obviously arise once (11) is im- where the covariances are approximated by sample estimates
plemented. We will discuss implementation in the next section. using sample points from the corresponding blocks in the refer-
The IFC is not a distortion metric, but a fidelity criterion. It ence and test signals.
theoretically ranges from zero (no fidelity) to infinity (perfect fi-
delity within a nonzero multiplicative constant in the absence of C. Wavelet Bases and Inter-Coefficient Correlations
noise).1 Perfect fidelity within a multiplicative constant is some- The derivation leading to (10) assumes that is uncorrelated,
thing that is in contrast with the approach in SSIM [16], in which and, hence, is independent given . In practice, if the wavelet
contrast distortion (multiplicative constant) was one of the three decomposition is orthogonal, the underlying could be approx-
attributes of distortion that was regarded as a visual degrada- imately uncorrelated. In such cases, one could use (10) for com-
tion, albeit one that has a different (and “orthogonal”) contribu- puting the IFC. However, real cartesian-separable orthogonal
tion toward perceptual fidelity than noise and local-luminance wavelets are not good for image analysis since they have poor
distortions. In this paper, we view multiplicative constants (con- orientation selectivity, and are not shift invariant. In our imple-
trast stretches) as signal gains or attenuations interacting with mentation, we chose the steerable pyramid decomposition with
additive noise. Thus, with this approach, the same noise vari- six orientations [37]. This gives better orientation selectivity
ance would be perceptually less annoying if it were added to a than possible with real cartesian separable wavelets. However,
contrast stretched image than if it were added to a contrast at- the steerable pyramid decomposition is over-complete, and the
tenuated image. Since each subband has its own multiplicative neighboring coefficients from the same subband are linearly
constant, blur distortion could also be captured by this model as correlated. In order to deal with such correlated coefficients,
the finer scale subbands would be attenuated more than coarser we propose two simple approximations that work well for QA
scale subbands. purposes.
1) Vector GSM: Our first approximation is to partition the
IV. IMPLEMENTATION ISSUES subband into nonoverlapping block-neighborhoods and assume
In order to implement the fidelity criterion in (11), a number that the neighborhoods are uncorrelated with each other. One
of assumptions are required about the source and the distortion could then use a vector form of the IFC by modeling each neigh-
models. We outline them in this section. borhood as a vector random variable. This “blocking” of coef-
ficients results in an upper bound
A. Assumptions About the Source Model
Note that mutual information (and, hence, the IFC) can only
be calculated between RFs and not their realizations, that is,
a particular reference and test image under consideration. We
will assume ergodicity of the RFs, and that reasonable estimates where is a vector of wavelet
for the statistics of the RFs can be obtained from their realiza- coefficients that form the th neighborhood. All such vectors,
tions. We then quantify the mutual information between the RFs associated with nonoverlapping neighborhoods, are assumed to
having statistics obtained from particular realizations. be uncorrelated with each other. We now model the wavelet co-
For the scalar GSM model, estimates of can be obtained by efficient neighborhood as a vector GSM. Thus, the vector RF
localized sample variance estimation since for natural images
on a lattice is a product of a scalar RF
1Differential entropy is invariant to translation, and so the IFC is infinite for
and a zero-mean Gaussian vector RF of co-
perfect fidelity within an additive constant in the absence of noise as well. How-
ever, since we are applying the IFC in the wavelet domain on “AC” subbands variance . The noise is also a zero-mean vector Gaussian
only to which the GSM model applies, the zero-mean assumptions on U and V RF of same dimensionality as , and has covariance . If we
imply that this case will not happen.
2122 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 12, DECEMBER 2005
assume that is independent of , it is quite easy to In (21) and (22), is assumed to be unity without
show (by using differential entropy for Gaussian vectors) that loss of generality [38].
2) Downsampling: Our second approximation is to use a
subset of the coefficients by downsampling . Downsampling
(14) reduces the correlation between coefficients. We will assume
that the downsampled subband is approximately uncorrelated,
and then use (10) for scalar GSM on the downsampled subband.
(15) The underlying assumption in the downsampling approach is
that the quality prediction from the downsampled subbands
should be approximately the same as the prediction from the
where the differential entropy of a continuous vector
complete subband. This downsampling approach has an addi-
random vector distributed as a multivariate Gaussian tional advantage that it makes it possible to substantially reduce
where denotes the complexity of computing the wavelet decomposition since
the determinant, and is the dimension of [36]. Recalling only a fraction of the subband coefficients need to be computed.
that is symmetric and can be factorized as with In our simulations we discovered that the wavelet decompo-
sition is the most computationally expensive step. Significant
orthonormal and eigenvalues , and that for a distortion
speedups are possible with the typical downsampling factors of
model where , the IFC simplifies as follows:2
twelve or fifteen in our simulations. We downsample a subband
along and across the principal orientations of the respective
filters. In our simulations, the downsampling was done using
(16)
nearest-neighbor interpolation.
Further specifics of the estimation methods used in our testing
are given in Section VI.
(17)
V. SUBJECTIVE EXPERIMENTS FOR VALIDATION
(18)
In order to calibrate and test the algorithm, an extensive
psychometric study was conducted. In these experiments, a
(19) number of human subjects were asked to assign each image
with a score indicating their assessment of the quality of that
image, defined as the extent to which the artifacts were visible
(20) and annoying. Twenty-nine high-resolution 24-bits/pixel RGB
color images (typically 768 512) were distorted using five
where the numerator term inside the logarithm of (19) is the de- distortion types: JPEG2000, JPEG, white noise in the RGB
terminant of a diagonal matrix and, hence, equals the product of components, Gaussian blur, and transmission errors in the
the diagonal terms. The bound in (16) shrinks as increases. JPEG2000 bit stream using a fast-fading Rayleigh channel
In our simulations we use vectors from 3 3 spatial neighbor- model. A database was derived from the 29 images such that
hoods and achieve good performance. Equation (20) is the form each image had test versions with each distortion type, and for
that is used for implementation. each distortion type the perceptual quality roughly covered the
For the vector GSM model, the maximum-likelihood estimate entire quality range. Observers were asked to provide their per-
of can be found as follows [38]: ception of quality on a continuous linear scale that was divided
into five equal regions marked with adjectives “Bad,” “Poor,”
“Fair,” “Good,” and “Excellent,” which was mapped linearly
(21) on to a 1–100 range. About 20–25 human observers rated each
image. Each distortion type was evaluated by different subjects
in different experiments using the same equipment and viewing
where is the dimensionality of . Estimation of the co- conditions. In this way a total of 982 images, out of which 203
variance matrix is also straightforward from the reference were the reference images, were evaluated by human subjects
image wavelet coefficients [38] in seven experiments. The raw scores were converted to dif-
ference scores (between the test and the reference) [18] and
then converted to Z-scores [39], scaled back to 1–100 range,
(22)
and finally a difference mean opinion score (DMOS) for each
distorted image. The average RMSE for the DMOS was 5.92
2Utilizing the structure of C0! and C0! helps in faster implementations with an average 95% confidence interval of width 5.48. The
through matrix factorizations.
U V database is available at [1].
SHEIKH et al.: INFORMATION FIDELITY CRITERION FOR IMAGE QUALITY ASSESSMENT 2123
Fig. 2. Scatter plots for the quality predictions by the four methods after compensating for quality calibration: PSNR, Sarnoff’s JND-metrix, MSSIM, and IFC
for vector GSM. The IFC shown here uses only the horizontal and vertical subbands at the finest scale, and only the smallest eigenvalue in (20). (x) The distortion
types are: JPEG2000, (+) JPEG, (o) white noise in RGB space, (box) Gaussian blur, and (diamond) transmission errors in JPEG2000 stream over fast-fading
Rayleigh channel.
Fig. 3. HVS-based quality measurement system. We show that this HVS model is the dual of the scalar GSM-based IFC of (11).
time. We noted that about 40% to 50% of the time is needed for to be a particular HVS-based QA algorithm, the perceptual dis-
the computation of the wavelet decomposition. tortion criterion (PDC), within multiplicative and additive con-
We would like to point out the most salient feature of the IFC: stants that could be absorbed into the calibration curve
It does not require any parameters from the HVS or viewing
configuration, training data or stabilizing constants. In contrast,
the JND-metrix requires a number of parameters for calibration
such as viewing distance, display resolution, screen phosphor (26)
type, ambient lighting conditions, etc. [40], and even SSIM re- (27)
quires two hand-optimized stabilizing constants. Despite being
parameterless, the IFC outperforms both of these methods. It is where denotes the index of the th subband, and is the
reasonable to say that the performance of the IFC could improve number of subbands used in the computation.
further if these parameters, which are known to affect percep- We can make the following observations regarding PDC of
tual quality, were incorporated as well. (26), which is the HVS dual of the IFC (using the scalar GSM
model), in comparison with other HVS-based FR QA methods.
VII. SIMILARITIES WITH HVS BASED QA METHODS • Some components of the HVS are not modeled in Fig. 3
and (27), such as the optical point spread function and the
We will now compare and contrast IFC with HVS-based QA contrast sensitivity function.
methods. Fig. 3 shows an HVS-based quality measurement • The masking effect is modeled differently from some
system that computes the error signal between the processed HVS-based methods. While the divisive normalization
reference and test signals, and then processes the error signal mechanism for masking effect modeling has been em-
before computing the final perceptual distortion measure. A ployed by some QA methods [11]–[13], most methods
number of key similarities with most HVS-based QA methods divisively normalize the error signal with visibility
are immediately evident. These include a scale-space-orien- thresholds that are dependent on neighborhood signal
tation channel decomposition, response exponent, masking strength.
effect modeling, localized error pooling, suprathreshold effect • Minkowski error pooling occurs in two stages. First, a
modeling, and a final pooling into a quality score. localized pooling in the computation of the localized
In the Appendix we show the following relationship between MSE (with exponent 2), and then a global pooling after
the scalar version of the IFC in (10) and the HVS model of Fig. 3 the suprathreshold modeling with an exponent of unity.
for one subband Thus, the perceptual error calculation is different from
most methods, in that it happens in two stages with
(25) suprathreshold effects in between.
• In (26), the nonlinearity that maps the MSE to a
where and are as shown in Fig. 3. The MSE compu- suprathreshold-MSE is a logarithmic nonlinearity and
tation in Fig. 3 and (25) is a localized error strength measure. it maps the MSE to a suprathreshold distortion that is
The logarithm term can be considered to be modeling of the later pooled into a quality score. Watson et al. have used
suprathreshold effect. Suprathreshold effect is the name given to threshold power functions to map objective distortion
the fact that the same amount of distortion becomes perceptually into subjective JND by use of two-alternative forced
less significant as the overall distortion level increases. Thus, a choice experiments [41]. However, their method applies
change in MSE of, say, 1.0 to 2.0 would be more annoying than the supratreshold nonlinearity after pooling, as if the
the same change from 10.0 to 11.0. Researchers have previously suprathreshold effect only comes into play at the global
modeled suprathreshold effects using visual impairment scales quality judgement level. The formulation in (26) suggests
that map error strength measures through concave nonlineari- that the suprathreshold modeling should come before a
ties, qualitatively similar to the logarithm mapping, so that they global pooling stage but after localized pooling, and that
emphasize the error at higher quality [41]. Also, the pooling in it affects visual quality at a local level.
(25) can be seen to be Minkowski pooling with exponent 1.0. • One significant difference is that the IFC using the scalar
Hence, with the stated components, the IFC can be considered GSM model, or the PDC of (26), which are duals of each
2126 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 12, DECEMBER 2005
other, is notably inferior to the vector GSM-based IFC. performance of IFC. We are continuing efforts into devel-
We believe that this is primarily due to the underlying oping an IFC for a unified model that consists of source,
assumption about the uncorrelated nature of the wavelet distortion, and HVS models, and we feel that deeper
coefficients being inaccurate. This dependence of percep- insights into perception of quality would be gained.
tual quality on the correlation among coefficients is hard • We would like to remind the readers at this point that al-
to investigate or model using HVS error sensitivities, but though the IFC is similar to an HVS-based distortion mea-
the task is greatly simplified by approaching the same sure, it has not been derived using any HVS knowledge,
problem with NSS modeling. Thus, we feel that HVS- and its derivation is completely independent. The similar-
based QA methods need to account for the fact that nat- ities exist due to the similarities between NSS and HVS
ural scenes are correlated within subbands, and that this models. The difference is subtle, but profound!
inter-coefficient correlation in the reference signal affects
human perception of quality.4 VIII. CONCLUSIONS AND FUTURE WORK
• Another significant difference between IFC/PDC and
other HVS-based methods is distinct modeling of signal In this paper, we presented an IFC for image QA using NSS.
attenuation. Other HVS-based methods ignore signal We showed that using signal source and distortion models, one
gains and attenuations, constraining to be unity, and could quantify the mutual information between the reference
treat such variations as additive signal errors as well. In and the test images, and that this quantification, the IFC, quan-
contrast, a generalized gain in the IFC/PDC ensures tifies perceptual quality. The IFC was demonstrated to be better
that signal gains are handled differently from additive than a state-of-the-art HVS-based method, the Sarnoff’s JND-
noise components. Metrix, as well as a state-of-the-art structural fidelity criterion,
• One could conjecture that the conditioning on in the the SSIM index in our testing. We showed that despite its com-
IFC is paralleled in the HVS by the computation of the petitive performance, the IFC is parameterless. We also showed
local variance and divisive normalization. Note that the that the IFC, under certain conditions, is quantitatively sim-
high degree of self-correlation present in enables its ilar to an HVS-based QA method, and we compared and con-
adequate estimation from by local variance estimation. trasted the two approaches and hypothesized directions in which
Since this divisive normalization occurs quite early in the HVS-based methods could be refined and improved.
HVS model5 and since the visual signal is passed to the We are continuing efforts into improving the IFC by com-
rest of the HVS after it has been conditioned by divisive bining HVS models with distortion and signal source models,
normalization by the estimated , we could hypothesize incorporating color statistics, and inter-subband correlations.
that the rest of the HVS analyzes the visual signal condi- We are hopeful that this new approach will give new insights
tioned on the prior knowledge of , just as the IFC ana- into visual perception of quality.
lyzes the mutual information between the test and the ref-
erence conditioned on the prior knowledge of . APPENDIX
• One question that should arise when one compares the In this Appendix, we shall quantify the similarities between
IFC against the HVS error model is regarding HVS the scalar GSM version of the IFC of (10) and the HVS-based
model parameters. Specifically, one should notice that QA assessment method shown in Fig. 3. The model in Fig. 3 is
while functionally the IFC captures HVS sensitivities, based on calculating MSE in the perceptual space and then pro-
it does so without using actual HVS model parameters. cessing it further to yield the final perceptual distortion measure.
We believe that some of the HVS model parameters were Here we will only deal with coefficients in one subband and a
either incorporated into the calibration curve, or they did scalar GSM model.
not affect performance significantly enough under the We start by giving the formulation for the divisive normal-
testing and validation experiments reported in this paper. ization stage, which divides the input by its localized average.
Parameters such as the characteristics of the display de- Considering the input to the squaring block, this turns out to be
vices or viewing configuration information could easily normalization by the estimated local variance of the input of the
be understood to have approximately similar affect on all squaring block
images for all subjects since the experimental conditions
were approximately the same. Other parameters and
model components, such as the optical point spread func-
(28)
tion or the contrast sensitivity function, which depend
on viewing configuration parameters as well, are perhaps
less significant for the scope and range of quality of our
validation experiments. It is also reasonable to say that (29)
incorporating these parameters could further enhance the
4Equation (20) suggests that the same noise variance would cause a greater
Here, we have assumed that for , that is, the
loss of information fidelity if the wavelet coefficients of the reference image
were correlated than if they were uncorrelated. variance is approximately constant over the pixels neighbor-
5Divisive normalization has been discovered to be operational in the HVS hood of , which we denote by . Also note that the term
[21]. inside the parentheses in an estimate of the conditional local
SHEIKH et al.: INFORMATION FIDELITY CRITERION FOR IMAGE QUALITY ASSESSMENT 2127
variance of (or ) at given , which could be ap- [7] J. Lubin, “A visual discrimination mode for image system design and
proximated by the actual value. We have also assumed, without evaluation,” in Visual Models for Target Detection and Recognition, E.
Peli, Ed, Singapore: World Scientific, 1995, pp. 207–220.
loss of generality, that , since any nonunity [8] A. B. Watson, “DCTune: A technique for visual optimization of DCT
variance of could be absorbed into . The MSE between quantization matrices for individual images,” Soc. Inf. Display Dig. Tech.
and given could now be analyzed Papers, vol. XXIV, pp. 946–949, 1993.
[9] A. P. Bradley, “A wavelet visible difference predictor,” IEEE Trans.
(30) Image Process., vol. 5, no. 8, pp. 717–730, Aug. 1999.
[10] Y. K. Lai and C.-C. J. Kuo, “A Haar wavelet approach to compressed
image quality measurement,” J. Vis. Commun. Image Represen., vol. 11,
(31) pp. 17–40, Mar. 2000.
[11] P. C. Teo and D. J. Heeger, “Perceptual image distortion,” Proc. SPIE,
vol. 2179, pp. 127–141, 1994.
[12] D. J. Heeger and P. C. Teo, “A model of perceptual image fidelity,” in
Proc. IEEE Int. Conf. Image Processing, 1995, pp. 343–345.
[13] A. M. Pons, J. Malo, J. M. Artigas, and P. Capilla, “Image quality metric
(32) based on multidimensional contrast perception models,” Displays, vol.
where we have used and that given 20, pp. 93–110, 1999.
[14] A. M. Eskicioglu and P. S. Fisher, “Image quality measures and their
. Expanding the above expression and taking ex- performance,” IEEE Trans. Commun., vol. 43, no. 12, pp. 2959–2965,
pectation, and using independence between and , the fact Dec. 1995.
that , and are all zero-mean, and the fact that for zero- [15] I. Avcibaş, B. Sankur, and K. Sayood, “Statistical evaluation of image
quality measures,” J. Electron. Imag., vol. 11, no. 2, pp. 206–23, Apr.
mean Gaussian variables , where is the vari- 2002.
ance of , we get [16] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image
quality assessment: From error measurement to structural similarity,”
(33) IEEE Trans. Image Process., vol. 13, no. 4, pp. 600–612, Apr. 2004.
[17] The Video Quality Experts Group [Online]. Available:
http://www.vqeg.org/
The goal of this derivation is to compare the IFC of (10) and [18] A. M. Rohaly et al., “Video quality experts group: Current results and fu-
HVS-based MSE criterion ture directions,” Proc. SPIE Visual Commun. Image Process., vol. 4067,
pp. 742–753, Jun. 2000.
[19] Final Report From the Video Quality Experts Group on the Validation of
Objective Models of Video Quality Assessment, Phase II (2003, Aug.).
(34) [Online]. Available: ftp://ftp.its.bldrdoc.gov/dist/ituvidq/frtv2_final_re-
port/VQEGII_Final_Report.pdf
[20] A. Srivastava, A. B. Lee, E. P. Simoncelli, and S.-C. Zhu, “On advances
in statistical modeling of natural images,” J. Math. Imag. Vis., vol. 18,
(35) pp. 17–33, 2003.
[21] E. P. Simoncelli and B. A. Olshausen, “Natural image statistics and
neural representation,” Annu. Rev. Neurosci., vol. 24, pp. 1193–1216,
May 2001.
(36) [22] J. M. Shapiro, “Embedded image coding using zerotrees of wavelets co-
efficients,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3445–3462,
Hence, we have an approximate relation between the IFC and Dec. 1993.
[23] A. Said and W. A. Pearlman, “A new, fast, and efficient image codec
the HVS-based MSE based on set partitioning in hierarchical trees,” IEEE Trans. Circuits Syst.
Video Technol., vol. 6, no. 3, pp. 243–250, Jun. 1996.
(37) [24] D. S. Taubman and M. W. Marcellin, JPEG2000: Image Compression
Fundamentals, Standards, and Practice. Norwell, MA: Kluwer, 2001.
[25] R. W. Buccigrossi and E. P. Simoncelli, “Image compression via joint
where and are constants. statistical characterization in the wavelet domain,” IEEE Trans. Image
Process., vol. 8, no. 12, pp. 1688–1701, Dec. 1999.
ACKNOWLEDGMENT [26] M. K. Mihçak, I. Kozintsev, K. Ramachandran, and P. Moulin, “Low-
complexity image denoising based on statistical modeling of wavelet
The authors would like to thank Dr. E. Simoncelli and coefficients,” IEEE Signal Process. Lett., vol. 6, no. 12, pp. 300–303,
Dr. Z. Wang at the Center for Neural Science, New York Dec. 1999.
[27] J. K. Romberg, H. Choi, and R. Baraniuk, “Bayesian tree-structured
University, for insightful comments. image modeling using wavelet-domain hidden markov models,” IEEE
Trans. Image Process., vol. 10, no. 7, pp. 1056–1068, Jul. 2001.
[28] M. J. Wainwright, E. P. Simoncelli, and A. S. Wilsky, “Random cas-
REFERENCES cades on wavelet trees and their use in analyzing and modeling natural
[1] LIVE Image Quality Assessment Database, Release 2, H. R. Sheikh, images,” Appl. Comput. Harmon. Anal., vol. 11, pp. 89–123, 2001.
Z. Wang, L. Cormack, and A. C. Bovik. (2005). [Online]. Available: [29] E. Y. Lam and J. W. Goodman, “A mathematical analysis of the DCT
http://live.ece.utexas. edu/research/quality coefficient distributions for images,” IEEE Trans. Image Process., vol.
[2] M. P. Eckert and A. P. Bradley, “Perceptual quality metrics applied to 9, no. 10, pp. 1661–1666, Oct. 2000.
still image compression,” Signal Process., vol. 70, no. 3, pp. 177–200, [30] H. Choi and R. G. Baraniuk, “Multiscale image segmentation using
Nov. 1998. wavelet-domain hidden Markov models,” IEEE Trans. Image Process.,
[3] A. Bovik, Ed., Handbook of Image and Video Processing. New York: vol. 10, no. 9, pp. 1309–1321, Sep. 2001.
Academic, 2000. [31] J. Portilla and E. P. Simoncelli, “A parametric texture model based on
[4] S. Winkler, “Issues in vision modeling for perceptual video quality as- joint statistics of complex wavelet coefficients,” Int. J. Comput. Vis., vol.
sessment,” Signal Process., vol. 78, pp. 231–252, 1999. 40, no. 1, pp. 49–71, 2000.
[5] Z. Wang, H. R. Sheikh, and A. C. Bovik, “Objective video quality assess- [32] H. R. Sheikh, A. C. Bovik, and L. Cormack, “No-reference quality as-
ment,” in The Handbook of Video Databases: Design and Applications, sessment using natural scene statistics: JPEG2000,” IEEE Trans. Image
B. Furht and O. Marques, Eds. Boca Raton, FL: CRC, 2003. Process., vol. 14, no. 11, pp. 1918–1927, Nov. 2005.
[6] S. Daly, “The visible difference predictor: An algorithm for the assess- [33] E. P. Simoncelli, “Modeling the joint statistics of images in the wavelet
ment of image fidelity,” Proc. SPIE, vol. 1616, pp. 2–15, 1992. domain,” Proc. SPIE, vol. 3813, pp. 188–195, Jul. 1999.
2128 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 14, NO. 12, DECEMBER 2005
[34] B. A. Wandell, Foundations of Vision: Sinauer, 1995. Alan Conrad Bovik (S’80–M’81–SM’89–F’96) re-
[35] N. Damera-Venkata, T. D. Kite, W. S. Geisler, B. L. Evans, and A. C. ceived the B.S., M.S., and Ph.D. degrees in electrical
Bovik, “Image quality assessment based on a degradation model,” IEEE and computer engineering from the University of Illi-
Trans. Image Process., vol. 4, no. 4, pp. 636–650, Apr. 2000. nois, Urbana-Champaign, in 1980, 1982, and 1984,
[36] T. M. Cover and J. A. Thomas, Elements of Information Theory. New respectively.
York: Wiley, 1991. He is currently the Curry/Cullen Trust Endowed
[37] E. P. Simoncelli and W. T. Freeman, “The steerable pyramid: A flexible Chair in the Department of Electrical and Computer
architecture for multi-scale derivative computation,” in Proc. IEEE Int. Engineering, The University of Texas, Austin, where
Conf. Image Processing, Oct. 1995, pp. 444–447. he is the Director of the Laboratory for Image and
[38] V. Strela, J. Portilla, and E. Simoncelli, “Image denoising using a local Video Engineering (LIVE) in the Center for Percep-
Gaussian scale mixture model in the wavelet domain,” Proc. SPIE, vol. tual Systems. During the Spring of 1992, he held
4119, pp. 363–371, 2000. a visiting position in the Division of Applied Sciences, Harvard University,
[39] A. M. van Dijk, J. B. Martens, and A. B. Watson, “Quality assessment of Cambridge, MA. He is the editor/author of the Handbook of Image and Video
coded images using numerical category scaling,” Proc. SPIE, vol. 2451, Processing (New York: Academic, 2000). His research interests include digital
pp. 90–101, Mar. 1995. video, image processing, and computational aspects of visual perception, and
[40] JNDmetrix Technology (2003). [Online]. Available: http://www.sarnoff. he has published over 350 technical articles in these areas and holds two U.S.
com/productsservices/videovision/jndmetrix/downloads.asp patents.
[41] A. B. Watson and L. Kreslake, “Measurement of visual impairment Dr. Bovik was named Distinguished Lecturer of the IEEE Signal Processing
scales for digital video,” Proc. SPIE, Human Vis., Vis. Process., and Society in 2000, received the IEEE Signal Processing Society Meritorious
Digit. Display, vol. 4299, pp. 79–89, 2001. Service Award in 1998, the IEEE Third Millennium Medal in 2000, the
University of Texas Engineering Foundation Halliburton Award in 1991, and is
a two-time Honorable Mention winner of the International Pattern Recognition
Society Award for Outstanding Contribution (1988 and 1993). He was named
a Dean’s Fellow in the College of Engineering in 2001. He is a Fellow of
the IEEE and has been involved in numerous professional society activities,
including Board of Governors, IEEE Signal Processing Society, 1996–1998;
Editor-in-Chief, IEEE TRANSACTIONS ON IMAGE PROCESSING, 1996–2002;
Editorial Board, THE PROCEEDINGS OF THE IEEE, 1998–present; and Founding
General Chairman, First IEEE International Conference on Image Processing,
held in Austin in November 1994. He is a Registered Professional Engineer in
the State of Texas and is a frequent consultant to legal, industrial, and academic
institutions.