Quantization Noise Gray
Quantization Noise Gray
1 Introduction
The heart of a modulator and any other analog-to-digital converter (ADC) is a quan-
tizer, a device which maps real numbers into a nite set of possible representative values,
often as few as two. Any analysis of the behavior of a modulator must include consid-
eration of the behavior of the quantizer. The quantization operation is inherently nonlinear
and hence rigorous analysis is complicated even in the simplest of systems. When quantizers
are incorporated into linear systems with feedback such as modulators and bang-bang
control systems, the analysis becomes even more dicult. Simulations cannot capture all
aspects of possible system behavior and are not always reproducible as di erent random
number generators are used and care is not always taken to ensure that sample functions
are long enough for sample averages to be close to expectations with high probability. As
a result, various methods based on approximations have been widely used, even in some
applications where they were known to give misleading or outright incorrect results. Often,
however, approximate methods have quite successfully predicted some aspects of system
behavior, as many of the other chapters of this book will attest. These approximations
are usually implicitly or explicitly based on either the asymptotic results of Bennett [1] or
on the exact results of Widrow [2] as extended by Sripad and Snyder [3]. We shall see,
however, that the underlying conditions assumed by these results can be and usually are
1
invalid in typical modulators and there are not any good guidelines for determining
when the approximations might nonetheless yield good results. A common reason for using
the approximation methods in spite of their shortcomings is that simulations are inadequate
and that exact analysis is thought to be either impossible or prohibitively dicult. Success-
ful applications of the approximations do not lessen the puzzlement of engineers who use
such methods to quantify the behavior of a system, and then nd that the system exhibits
bizarre artifacts not suggested by the theory. They often suspect the system rather than
the theory of being awed. A classical example is the ability of approximate methods to
correctly predict the signal-to-noise ratio of a single-loop modulator while producing
a completely incorrect prediction of the quantization error spectrum that fails to include
audible objectionable tones in the nal output.
The goal of this chapter is to discuss in some detail the most common approximations,
their underlying justi cation in quantization systems, and the common errors made in their
application. The hope is to give practicing engineers a degree of skepticism in their use of
these methods and to prepare them for unpleasant surprises. We also consider a variety
of examples where the approximations are not needed and exact descriptions of system
behavior can be found by combining linear systems methods with a few nonlinear system
techniques. The mathematical methods are not particularly deep; most come from Fourier
analysis and probability theory, but much of the algebra and calculus is not particularly
pretty and is left to the references. By \exact" it is meant that exact solutions are found to
the nonlinear di erence equations modeling modulators. Certainly real system will have
many variations that are not yet included in the basic equations, but the given nonlinear
equations will be solved without recourse to linearizing approximations.
Unfortunately the exact solutions do not extend to all architectures of practical in-
terest, but they do provide a rich collection of types of behavior and of important attributes
of systems which determine that behavior. These examples can provide useful insight to
more complex systems. The theory also provides some surprising results, including the fact
that some of the common approximations hold exactly in some systems even though the
underlying conditions usually assumed for those approximations are violated.
Because the quantizer plays the key role in a modulation, we begin by looking at
quantization error in a simple quantizer that is used without feedback or linear ltering.
This leads in several steps to the more complicated example of modulation.
2
2 Uniform Quantization
The basic common component to most analog-to-digital converters is a uniform quantizer.
It is assumed that the quantizer has an even number, say M , of levels and that the distance
between the output levels (the bin width ) is . The special case of M = 2 is common in
modulators, but the theory here and later holds for any even M .
The M quantization levels are equally interspersed in the interval B = [a; a + M ],
where often the region is chosen to be symmetric around the origin, i.e., a = ?M =2, and
each level yk is the center of its quantization cell Rk = [a + k; a +(k +1)); k = 0; 1; : : :; M .
Any input u in this range will map into a quantized value q (u) = yk if u 2 Rk , i.e., the
quantizer mapping is a minimum distance (nearest neighbor) mapping. For a given number
of levels M , there are only two free parameters in a uniform quantizer: the o set a and the
bin width .
The quantizer error is de ned as = q (u) ? u. If u is in the region B , then the maximum
error resulting is =2. Outside this region the input is mapped into the nearest quantization
level, but the error is greater than =2 and the quantizer is said to overload or saturate. We
will refer to the interval B as the no-overload region of the quantizer. Some of the methods
described hold only when when u is in the no-overload region with probability 1.
If the input is described by a probability distribution, then the the performance of a
quantizer is often measured by the mean squared error, the expected error energy E (2). A
uniform quantizer is said to be optimal for an input probability distribution if for the given
M the parameters a and are chosen to minimize the mean squared error.
If the input is a sequence of samples un , then we will be interested in the error sequence
n = q(un) ? un , and we will wish to see if it resembles random noise in some way. A trivial
rewriting of this formula yields the so-called \additive noise model" of quantization
qn = q(un) = un + n
expressing the output of the quantizer as its input plus a \noise" term. There is no genuine
modeling here, this is simply a convenient de nition of the quantization error such that the
output can be written as the sum of the input and a \noise" term. The modeling enters
when assumptions are made about the statistical behavior of and its dependence on the
input signal, as will be seen.
3
3 The Additive White Noise Approximation
Many of the original results and insights into the behavior of quantization error are due to
Bennett [1], and much of the work since then has its origins in that classic paper. Bennett
rst developed conditions under which quantization noise could be reasonably modeled
as additive white noise. Unfortunately, much of the literature assumes more than was
proved by Bennett and often uses Bennett's approximations when his results do not apply.
Subsequently Sripad and Snyder [3] extended Widrow's approach [2] and found necessary
and sucient conditions for certain aspects of the approximation to hold exactly. We
here explore the approximations and these underlying conditions in order to consider their
suitability for use in analyzing modulators.
A common statement of the approximation is that the quantization error n has the
following properties, which we refer to collectively as the \input-independent additive white
noise approximation":
Property 1: n is statistically independent of the input signal uk for all n; k (strong ver-
sion) or n is uncorrelated with the input signal un (weak version),
Property 2: n is uniformly distributed in [?=2; =2],
Property 3: n is an independent identically distributed (i.i.d.) sequence (strong version)
or n has a at power spectral density (it is \white") (weak version).
These approximations enormously simplify system analysis because they replace a de-
terministic nonlinearity by a stochastic linear system, thereby permitting the use of linear
systems methods to analyze an otherwise linear system containing a quantizer. These prop-
erties have been used in the wide majority of published analyses of systems containing
quantizers in the communications, control, and signal processing literature. Most often the
approximation is made without reference and with only small (if any) mention of its possible
limitations. The natural questions that arise are
Question 1: Are the approximations good under ordinary conditions; that is, do they
accurately model the true behavior of quantization error?
Question 2: If the approximations are not good, is it still possible for them to yield good
predictions of actual system behavior?
4
It is easy to demonstrate a negative answer to Question 1 if the strong form of the
approximation is considered: the quantizer error is a deterministic function of the input
and hence cannot be statistically independent of the input. There is hope, however, that
a weaker form of independence of uncorrelation (or linear independence) holds in the sense
that E [unk ] = E [un]E [k ] for all n; k, where E denotes expectation or probabilistic av-
erage. This property is sucient to ensure that second order analysis involving output
correlations and spectra can be carried out without the complexity of cross-terms. This
allows the common \noise shaping" interpretation of modulators because it implies
that the quantization noise can be ltered without thereby also changing the input signal.
This approximation is almost universally made in the analysis of oversampled ADC's, yet,
as we shall see, it can be incorrect.
Question 2 is not so easily answered. Engineering mathematics often uses ideas such
as impulses and icker (1=f ) noise that are physically impossible, yet yield perfectly good
predictions of real system behavior when carefully used and suitably interpreted. The
answer to this question will vary depending on the system and a goal of this chapter is to
provide a feel for examples where answers based on the white noise approximation can be
trusted and where they cannot be.
If we back o on the complete input-independent additive noise model by eliminating the
rst property, Bennett's theory provides a motivation for approximating quantization noise
by Properties 2 and 3, which will be seen to hold under speci c conditions ( rst proved by
Bennett). When these properties hold approximately, we shall refer to the approximation as
the \additive white noise approximation," dropping the \input-independent" modi er. We
shall also discuss the non-asymptotic results of Sripad and Snyder which provide conditions
under which several of the common approximations hold in an exact sense. Unfortunately,
we shall see that the conditions of both Bennett and of Sripad and Snyder do not hold in
typical modulators and hence the additive white noise approximation is not justi ed
mathematically. Perhaps surprisingly, we shall later nd that in spite of this fact, the
additive white noise approximation in fact holds exactly for ideal multistage and higher
order architecures provided the quantizers do not overload.
Property 2 might be true if the quantizer cannot overload, but it is clearly false if overload
occurs with nonzero probability. Property 3 is also plausible, but must be demonstrated
for a particular system. A simple but important example where both Properties 2 and 3
hold exactly is in the case where the input signal is itself i.i.d. and is uniformly distributed
5
over the no-overload range. In this example it is easy to see that the quantization error
is uniformly distributed over (?=2; =2) and has zero mean and variance 2=12, the
ubiquitous result for the mean squared error of a uniform quantizer with bin width .
It should be pointed out that even in the case of a uniformly distributed input signal,
Property 1 is not true even in the weak sense, i.e., the input signal and quantization error
are not uncorrelated. It is straightforward to show that E (unk ) = ?2n n?k , where l is
the Kronecker delta function, 1 if l = 0 and 0 otherwise. In words, the correlation between
input and error at a common sample time is as large as the error variance, it is not 0!
In contrast, if the input probability density is a triangle on the no-overload range (the
convolution of two uniform densities covering half the no-overload range), then again the
error is uniformly distributed and white, but now E (unk ) = 0 for all n and k, i.e., the input
and the quantizer are indeed uncorrelated. The point is that the weak version of Property
1 might or might not hold, depending on the input density and the particular quantizer.
Bennett argued more generally that Properties 2 and 3 are approximately true (along
with some other properties) if certain underlying conditions hold. The uniform white quan-
tization noise assumption subsequently gained a wide popularity, largely due to the work
of Widrow [2] who provided a sucient condition for the quantizer error to be uniformly
distributed. For completeness we quote the basic results of Bennett and sketch their proof.
Bennett's Theorem
Suppose that the following conditions (Bennett's conditions) hold:
1. The input is in the no-overload region,
2. M asymptotically large,
3. asymptotically small, and
4. the joint probability density function (pdf) of the input signal at di erent sample
times is smooth.
Then the error sequence has the following properties:
1. The sequence fn g is approximately uniformly distributed, that is, has marginal pdf
(1
f ( ) if 2 (? 2 ; 2 ) (1)
0 otherwise.
6
This in turn implies that E [n ] 0,
2
2n 12 ; (2)
and
E [qn] E [un]; (3)
i.e., the expectation of the quantized output is approximately the same as that of the
input. In statistical terms, the quantized value is an unbiased estimator of the input.
2. The sequence fn g is approximately an independent identically distributed (i.i.d.)
random process.
Sketch of proof of Bennett's theorem: First consider marginal distribution of the error.
De ne as usual the cumulative distribution function (cdf) Fn ( ) = Pr(n ); 2
(?=2; =2) and the pdf fn ( ) = dFn ( )=d : Referring to the de nitions of q and we
can write MX?1
Fn ( ) = Pr(n and un 2 Rk ):
k=0
Since the pdf is assumed to be smooth, the mean value theorem of calculus implies that
Z ?(M=2+k)+
Pr(n and un 2 Rk ) = fun ( ) d fun (yk ) :
?(M=2+k)
Using the Riemann sum approximation to an integral yields
MX?1 Z M=2
Fn ( ) fun (yk ) fu (u)du ;
k=0 ?M=2 n
and hence
fn ( ) 1 for 2 (? 2 ; 2 ):
To prove that the error sequence is approximately memoryless, a similar idea is applied to
vectors of error samples:
X
Pr(l l ; l = n; : : :; n + k ? 1) = Pr(l l and ul 2 Ril ; l = n; : : :; n + k ? 1);
i1 ;:::;ik
For l 2 (?=2; =2), l = 1; : : :; k
Pr(l l and ul 2 Ril ; l = n; : : :; n + k ? 1) =
7
Z ?(M=2+k)+ 1 Z ?(M=2+k)+ k
fun ;:::;un+k?1 ( 1; : : :; k ) d 1 d k
?(M=2+k) ?(M=2+k)
fun ;:::;un+k?1 (yi1 ; : : :; yik ) 1 k ;
whence
fn ;:::;n+k?1 ( 1; : : :; k ) 1k :
This immediately implies that
R(n; k) = E [nk ] 2 n?k :
Bennett did not explicitly treat the issue of the correlation of quantizer error and input,
but his basic method of calculus approximations can be applied to the task. The analysis is
somewhat more delicate, however, since higher order terms can make a di erence as pointed
out by Kollar [4]. It is easy to argue that Rq; (n; k) 0 when n 6= k, so we focus on the case
of n = k. Following [4], we can approximate the input density in the quantization bin Rl
for output yl by a Taylor series expansion as fu (yl + ) fu (yl ) + fu0 (yl ) , where the higher
order terms can be shown to be negligible in comparison. This leads to the approximation
that
MX?1
Rq;(n; k) = E [q(Un)k ] = yl E [kjq(un ) = yk ] Pr(q(un) = yk )
l=0
MX?1 Z =2
= yl (? )[fu (yl ) + fu0 (yl ) ] d
l=0 ?=2
MX ?1 2
= ? 12 yl fu0 (yl)
l=0
Z a+M yf 0 (y ) dy:
2
(4)
12 a u
Integrating by parts then yields
2
Rq;(n; n) 12 (1 ? (a + M )fu (a + M ) ? afu (a)): (5)
The behavior thus depends on the behavior of the input density near the borders of the
no-overload range [a; a + M ]. If the density is zero at the edge of the no-overload range,
then Rq;(n; n) 2=12 = R (n; n), which implies in turn that
Ru; (n; k) = E [unk ] = E [(q(un) ? n )k ] (6)
= E [q (Un)k ] ? R (n; k) = 0;
8
i.e., Ru; (n; k) 0 and the input and quantization error are uncorrelated. If, however,
the density is not 0 at the borders of the no-overload zone (as is the case with a uniform
density on the no-overload region), then the signal and quantizer error are not uncorrelated.
Bucklew and Gallagher [5] have shown that if the uniform quantizer is optimal, that is, if a
and are chosen to minimize mean squared quantization error, then one will have exactly
Rq; (n; k) = 0 and hence Ru; (n; k) = ?2 n?k and the input and quantizer error have
correlation equal to minus the quantizer error energy. Optimal choice of will involve
a shrinking of the binwidth and a resulting overload of the quantizer, but the Bennett
approximations still hold [5]. We note that this same result holds if the Lloyd-Max optimal
nonuniform quantizer is used (see, e.g., [6] and pp. 180{181 of [8]).
An alternative to the asymptotic (large number of quantizer levels) analysis of Bennett
is the exact approach of Sripad and Snyder [3], which is a variation of the characteristic
function method that will be used here. Sripad and Snyder demonstrated necessary and
sucient condition for the various properties to hold. The conditions are stated in terms
of the characteristic function
u ( ) = E [ejun ] (7)
of the input random variable and the joint characteristic function un ;uk (; ) = E [ej (un +uk ].
The input process fun g is assumed to be stationary so that the characteristic function does
not depend on n.
A necessary and sucient condition for the quantizer error to be uniformly distributed
on [?=2; =2] is that
k ) = 0; k = 1; 2; : : ::
u ( 2 (8)
A necessary and sucient condition for n and k to be independent and uniformly dis-
tributed on [?=2; =2] is that un ;uk ( 2l ; 2m 6 (0; 0).
) = 0, for all l; m for which (l; m) =
A sucient condition for n and un to be uncorrelated is that
( 2k ) = _ ( 2k ); k = 1; 2; : : :;
u u (9)
where _ is the derivative of u with respect to its argument.
The condition that u have zero value for all integral multiples of 2= is satis ed
by a random variable u having a uniform density over [?=2; =2] as well as for any
density formed by adding an independent random variable x to u to form x + u, since the
resulting product of characteristic functions will inherit the zeros of u . It is also satis ed
9
for a uniform density on [?A; A] if 2A= is an integer M . The sucient condition for
uncorrelated signal and quantization error is satis ed, for example, by a triangular density
on [?A; A] if 2A= is an integer M . It should be pointed out that in this case the uniform
quantizer is not optimal so that this result is consistent with Kollar's development, but
di ers from Bucklew and Gallagher.
It should be noted that the Sripad and Snyder conditions yield exact results rather
than approximations, but the binwidth is essentially assumed to be xed. Furthermore,
because the derivation involves a Fourier series expansion of the quantizer error probability
density function on [?=2; =2], the derivation implicitly assumes that the quantizer does
not overload, i.e., that all of the nonzero probability density resides in the no-overload
region.
We have no seen a variety of conditions under which various aspects of the white noise
approximation are true approximately or exactly. The question no is whether these condi-
tions are relevant to . First consider the Bennett conditions.
1) Is the input is in the no-overload region?
Often this is not known, a problem especially true if the quantizer is inside a feedback
loop. This must be veri ed for a particular modulator architecture and is in fact known
only for a few.
2) Is M asymptotically large?
This is almost never true in , where typically M = 2, which is not large by any
stretch of the imagination.
3) Is asymptotically small?
This is almost never true in , where typically is as large as the allowed input signal
range.
4) Is the joint density of the input signal at di erent sample times smooth?
This is never true in where the input to the quantizer includes a discrete component
due to the feedback from the quantizer. Thus the pdf of the quantizer input has a continuous
component and an impulsive component, violating the smoothness condition used to prove
the Bennett theorem.
If the Bennett theorem cannot be made to apply, the next alternative is to test the Sripad
and Snyder conditions. This is not easily done, however, because in a modulator the
input signal to the quantizer includes both an original signal and a fed back output of the
quantizer, as well as linear ltering. Hence the Sripad and Snyder conditions cannot be
10
tested without solving for the quantizer input density. This is in fact close to the method
that will be used to approach the problem.
Given the above observations, the white noise approximation is at best suspicious and at
worst simply wrong in analysis. Not surprisingly, it was found early in some oversampled
ADCs (such as the simple single loop modulator) that the quantizer noise was not
at all white and that the noise contained discrete spikes whose amplitude and frequency
depended on the input [9, 10]. Perhaps surprisingly, however, simulations and actual circuits
of higher order modulators and of interpolative coders often exhibited quantizer noise
that appeared to be approximately white. Unfortunately, these systems also often exhibited
unstable behavior not predicted by the white noise analysis.
The approximation can be modi ed to attempt to better approximate quantization
error. Common approaches involve replacing the quantizer by a linear gain using describing
function analysis. (See, for example, [11, 12, 13, 14, 15].) The remaining error can then be
approximated by input independent white noise, an approach introduced by Booton [16, 17]
and applied to modulators (with additional linear ltering permitted in the loops) as
in Ardalan and Paulos [18].
One can improve the approximations by using higher order terms in various expan-
sions of the quantizer nonlinearity, but this approach has not been noticeably successful
in the ADC application, primarily because of its diculty. In addition, traditional power
series expansions are not well suited to the discontinuous nonlinearities of quantizers. (See
Arnstein [19] and Slepian [20] for series expansion solutions for quantization noise in delta
modulators and DPCM.)
Another approach to the analysis of quantization error is to modify the system by adding
a small random signal or dither at the input of the quantizer. By suitably choosing the
dither signal, one can in some cases force the quantization error to satisfy all aspects of the
white noise approximation, but at the possible cost of corrupting the signal and reducing
the allowed no-overload range [21, 22, 23, 24].
The approach taken here is a variation of the classical characteristic function method
of Rice [25] and the transform method of Davenport and Root [26], who represented mem-
oryless nonlinearities using Fourier or Laplace transforms. A similar application of Fourier
analysis was made to quantization noise analysis by Clavier, Panter, and Grieg [27, 28],
whose work was contemporary to Bennett's, but did not have the impact of the latter.
They provided an exact analysis of the quantizer noise resulting when a uniform quantizer
11
is driven by one or two sinusoids and thereby demonstrated both that quantization noise
could behave quite unlike the predictions of the Bennett theory and that in some cases the
behavior of such noise could be exactly quanti ed.
Subsequently the characteristic function method was applied to the study of quantization
noise by Widrow [29], and his formulation has been used in the subsequent development
of conditions under which the quantization noise is white, in particular by by Sripad and
Snyder [3] and the classic work of Iwersen on delta modulation [31].
Combining the characteristic function method with solutions to nonlinear di erence
equations and some basic results from nonlinear dynamical systems theory, we can obtain
an exact analysis of several interesting modulators.
12
For some speci c examples of sequences un , (13) can be used to obtain a form of Fourier
series representation directly for the sequence en . We take a more direct route and focus on
second order properties (mean, correlation, spectra) rather than a complete characterization
of the sequence. Here the primary interest is the long term average behavior of the error
sequence en . In particular we look at the average mean, second moment, and autocorrelation
function. As we also wish to consider probablistic expectations when dealing with random
inputs such as dithered inputs, it is useful to consider averages that include both time and
probabilistic averages. A useful formalism for simultaneously considering such averages is
the class of quasi-stationary processes considered by Ljung [32, 62]. A discrete time process
en is said to be quasi-stationary if there is a nite constant C such that E (en) C for all
n; and jRe(n; k)j C for all n; k; where Re(n; k) = E (enek ); and if for each k the limit
lim 1X N
Re(n; n + k) (14)
N !1N n=1
exists, in which case the limit is de ned as Re (k). Following Ljung we introduce some
notation: Given a process xn , de ne
E fxng = Nlim 1X N
E (xn); (15)
!1 N n=1
if the limit exists. Thus for a quasi-stationary process fen g the autocorrelation is given by
Re(k) = E fenen+k g; (16)
the mean is de ned by
me = E feng (17)
and the average power is given by
Re (0) = E fe2ng: (18)
Other moments are similarly de ned. These moments reduce to the corresponding time
averages or probabilistic averages in the special cases of deterministic or random processes,
respectively.
The power spectrum of the process is de ned in the general case as the discrete time
Fourier transform of the autocorrelation:
1
X
Se(f ) = Re(n)e?2jfn ; (19)
n=?1
13
where the frequency f is normalized to lie in [0; 1]. The usual linear system input/output
relations hold for this general de nition of spectrum (see Chapter 2 of Ljung [32]). In fact,
the class of quasi-stationary processes can be viewed as the most general class for which the
familiar formulas of ordinary linear system second-order correlation and spectral analysis
remain valid.
We now proceed to apply the basic formulas (13) and (12) to nd an expression for the
moments. Plugging (13) into (17) and (12) into (18) and assuming that the limits can be
interchanged results in
Efen g = Nlim 1X N X 1
e2jl un = X 1 E fe2jl un g;
!1 N n=1 l6=0 2jl l6=0 2jl
1 + X 1 2jl un
E fe2ng = 12 2(l ) 2 E fe g;
l6=0
and for k 6= 0
X X j j 2j (i un +l un+k )
Re(k) = Efe g
i6=0 l6=0 2i 2l
These expressions can be most easily given in terms of the one-dimensional characteristic
function
u
u (l) = E fe2jl n g (20)
and a two-dimensional characteristic function
u un+k
(uk) (i; l) = E fe2j (i n +l ) g; k 6= 0; (21)
as
X 1
E feng = u (l); (22)
l6=0 2jl
1 + X 1 (l);
E fe2ng = 12 (23)
2 u
l6=0 2(l)
and for k 6= 0
X X 1 1 (k)
Re(k) = ? 2i 2l u (i; l): (24)
i6=0 l6=0
The interchange of the limits is an important technical point that must be justi ed in any
particular application.
If the characteristic functions of (20){(21) can be evaluated, then the moments and
spectrum of the process can be computed from (22){(24).
14
5 PCM Quantization Noise
First consider a purely deterministic input to a simple quantizer with M levels, a simple
pulse coded modulation (PCM) system with no feedback. We will not consider in detail
the example of a dc input to an ordinary uniform quantizer in any detail because the
results are trivial. We consider a more interesting (and active) input, a sinusoid un =
A sin(n!0 + ) with a xed initial phase . We assume that A M=2 so that the quantizer
is not overloaded. De ne also f0 = !0 =2 .
For the given purely deterministic example, the one-dimensional characteristic function
can be expressed as
15
This remarkable result follows since the sequence of numbers < an + b > uniformly lls the
unit interval and hence the sums approach an integral in the limit. Applying (26) to (25)
yields Z1
u (l) = duej 2l sin(2u) = J0 (2l ); (27)
0
where = A= and Jm is the ordinary Bessel function of order m.
The mean and second moment of the quantizer noise can then be found using the fact
J0 (r) = J0(?r): X 1
Efen g = 2jl J0(2l ) = 0; (28)
l6=0
1 1 X1 1
2
Efen g = 12 + 2 l2 J0(2l ): (29)
l=1
Note that the result does not depend on the frequency of the input sinusoid (provided
the frequency is an irrational number) and that the time average mean is 0, which agrees
with that predicted by the assumption that n is uniformly distributed on [?=2; =2].
The second moment, however, di ers from the value of 1/12 predicted by the uniform
assumption by the right hand sum of weighted Bessel functions. Note that if = A=
becomes large (which with A held xed and the no-overload assumption means that the
number of quantization levels is becoming large), then J0 (2l ) ! 0 and hence the second
moment converges to 1/12 in the limit.
To compute the autocorrelation of the quantization noise, we use similar steps to nd
the joint characteristic function (uk) (i; l) as
X1
Re(k) = Sne2jkn ; (30)
n=?1
where n =< (2n ? 1)!0=2 > are normalized frequencies in [0; 1) and
1
X
Sn = ( 1 J2n?1 (2l
l) )2 (31)
l=1
are the spectral components at the frequency n . Thus
X
Se(f ) = Sn (f ? n): (32)
n
where (f ) denotes a Dirac delta function.
16
The spectrum of the quantizer error therefore is purely discrete and consists of all odd
harmonics of the fundamental frequency of the input sinusoid. The energy at each harmonic
depends in a very complicated way on the amplitude of the input sinusoid. In particular,
the quantizer noise is decidedly not white since it has a discrete spectrum and since the
spectral energies are not at. Thus here the white noise approximation of Bennett and of the
describing function approach is invalid, even if M is large. Claasen and Jongepier [30] argued
that if the spectrum analyzer has limited resolution, then one can make assumptions about
the statistical behavior of the coecients in the spectrum which leads to an approximately
white spectrum. Indeed a short term FFT will look somewhat white, but higher resolution
will clearly show the discrete nature of the error. In the case to be considered, even
short term FFTs clearly show the spikes and the corresponding tones can be heard in audio
reconstructions.
The crosscorrelation is handled in a similar fashion to obtain
X j 2l un+k 1 J (2l )
X
Rue(k) = Efune g = A cos(k!0) 1 ; (33)
l6=0 l=1 l
which is not equal to the product of the means since the error mean is 0. Thus the error
and the input are not asymptotically uncorrelated.
The basic procedure used above of computing characteristic functions which in turn
yield the quantization error moments and spectra can be used with more complicated input
signals to obtain exact formulas which can be evaluated numerically.
6 Dithered PCM
We next consider a quantizer input process of the form the form un = xn + wn , where
xn is the possibly nonstationary original system input (such as the deterministic sinusoid
previously considered) and wn is an i.i.d. random process which is called a dither process.
A key attribute of the dither process is that it is independent of the xn process, that is, xn
is independent of wk for all times n and k. We still require that the quantizer input un be in
the no-overload region. This has the e ect of reducing the allowed input dynamic range and
hence limiting the overall SQNR. Dithering has long been used as a means of improving the
subjective quality of quantized speech and images (see Jayant and Noll, Section 4.8, and the
references therein [35]). The principal theoretical property of dithering was developed by
Schuchman [22], who proved that if the quantizer does not overload and the characteristic
17
function of the marginal probability density function of the dither signal is 0 at integral
multiples of 2=, then the quantizer error n = q (xn + wn ) ? (xn + wn ) is independent
of the original input signal xn . It follows from Sripad and Snyder [3] that under these
conditions the quantization error is also white. See, for example, [35, 24]. It is not true,
however, that the quantization noise q (xn + wn ) ? xn is independent of signal or white (a
common misconception that is still found in some texts, see [36, 24] for a discussion).
Given a stationary random process wn , recall the de nition of (7) of w ( ) = E (ej wn );
the ordinary characteristic function of wn . Because of the independence of the processes,
the one-dimensional characteristic function of (20) becomes
18
Thus in this example we have from (22){(24) that en has zero mean, a second moment of
1/12, and an autocorrelation function Re (k) = 0 when k 6= 0, that is, the quantization
error is indeed white when Schuchman's condition is satis ed. This is true for a general
quasi-stationary input, including the sinusoid previously considered. Observe that (38) is
in fact a Sripad and Snyder condition for the generalized characteristic function for the
quantizer input process. Since it is obtained by multiplying the characteristic function of
the input by those for i.i.d. uniform random variables, the resulting dithered signal results
in uniform white quantization error. If the input sequence is A sin(n!0 ) as before and the
dither sequence is an i.i.d. sequence of uniform random variables on [?=2; =2], then the
overload condition becomes A= M ? 1=2, which has e ectively reduced the allowable
.
A similar exercise shows that the error and input are uncorrelated. Additional e ort is
needed to prove independence (see, e.g., [24]).
Although dithering yields a quantizer error with nice statistical properties, it corrupts
the signal (unless subtractive dither is used) and reduces the SQNR achievable with a given
quantizer since the input amplitude must be reduced enough so that the original signal plus
the dither stays within the no-overload region. This loss may be acceptable (and small)
when the number of quantization levels is large. It is signi cant if there are only a few
quantization levels. For example, if M = 2, then a uniform dither on [?=2; =2] can only
avoid overload if the signal is con ned to have magnitude less than =2.
7 Single Loop
The basic modulator can be motivated by an intuitive argument based on the dithering
idea. Suppose that instead of adding an i.i.d. random process to the signal before quantiza-
tion, the quantization noise itself is used as a dither signal, that is, i.i.d. signal-independent
noise is replaced by deterministic signal-dependent noise which (hopefully) approximates a
white signal-independent process. Reversing the noise sign for convenience and inserting a
delay in the forward path of the feedback loop (to re ect the physical delay inherent in a
quantizer) yields the system described by the nonlinear di erence equation
un = xn?1 ? n?1 = un?1 + xn?1 ? q(un?1 ); n = 1; 2; (39)
This di erence equation can be depicted is equivalent to the traditional discrete time form
19
of a single-loop modulator, which can therefore be thought of as a deterministically
dithered PCM system, an idea introduced in 1960 by C. C. Cutler in Figure 2 of [37],
where he referred to the system as a quantizer \with a single step of error compensation."
The name \Delta-Sigma" modulator was introduced by Inose and Yasuda in 1963 [38], who
provided the rst published description of its basic properties. The name was intended to
re ect the fact that the system rst took a di erence (Delta) and then integrated (Sigma).
The modern popularity of these systems, much of the original analysis, and the alternative
name \Sigma-Delta" modulator is due to Candy and his colleagues [39, 40, 41, 42, 10]. The
name re ects the fact that the system can also be represented as the cascade of an
integrator (Sigma) and a Delta-modulator. In the author's opinion, this is a better name
because the system does not really form a di erence of successive samples of the input
signal as suggested by the Delta Sigma name, it forms the di erence between the input
and a digital approximation of the previous input that is fed back. The name does not
incorporate the key attribute of quantization in the system, the reverse order does. The
author bows to the majority of coauthors, however, and adopts the older name.
Given the interpretation of the system as a deterministically dithered quantizer, one
might hope that the deterministic dither might indeed yield a white quantization noise
process, but unfortunately this circular argument does not hold for the simple single-loop
system, as will be seen.
Since un = q (un ) ? n , (39) yields the di erence equation
q(un) = xn?1 + n ? n?1 ; (40)
which has the intuitive interpretation that the quantizer output can be written as the input
signal (delayed) plus a di erence (or discrete time derivative) of an error signal. The hope is
that this di erence will be a high frequency term which can be removed by low pass ltering
to obtain the original signal. For convenience we assume that u0 = 0 and we normalize the
above terms by and use the de nition of n to write
en = n = q(un) ? un = q (xn?1 ? n?1 ) ? xn?1 ? e ; n = 1; 2; : (41)
n?1
Since u0 = 0, 0 = =2.
We shall assume that the input range is [?b; b), that is, ?b xn < b for all n. Intuitively,
we would like to make small in order to keep the quantizer error small; but we dare not
make it too small or the quantizer may overload (M is considered xed). An easy induction
20
argument in [43, 44] shows that the smallest value of as a function of b for which overload
never occurs is = 2b=(M ? 1): In the most common case of a binary quantizer, = 2b.
This implies that for chosen in this manner, the Bennett condition of not overloading the
quantizer is met for the simple single loop modulator.
To nd an explicit expression for en in terms of the xn , sum equation (40) from k = 1
to n:
Xn q (u ) X
k =
n x
k?1 + X e ? X e
n n Xn x
k?1 + e ? 1 :
k k ?1 = n 2
k=1 k=1 k=1 k=1 k=1
De ne 1(u) = 1 if u 0 and 0 otherwise so that q (u)= = 1(u) ? 1=2 and we have for
n = 1; 2;
1 Xn x
k ?1 1 Xn
yn = 2 ? en = ( + 2 ) ? 1(uk?1 ):
k=1 k=1
(yn is more convenient to deal with than en .)
Taking the fractional part of both sides yields
Xn
< yn >=< ( xk?1 + 21 ) > :
k=1
If ?b xn < b for all n, then yn 2 [0; 1) and hence < yn >= yn . Thus y0 = 0 and
?1
nX ?1 x
nX
yn =< ( 12 + xk ) >=< n2 + k >; n = 1; 2;
k=0 k=0
and hence ?1 x
nX
en = 12 ? < n2 +
k >: (42)
k=0
Compare this with PCM case en = 1=2? < un = > :
When the quantizer is put into a feedback loop with an integrator, the overall e ect is
to integrate the input plus a constant bias before taking the fractional part. The overall
nonlinear feedback loop therefore appears as an ane operation (linear plus a bias) on the
input followed by a memoryless nonlinearity.
The techniques used to nd the time average moments for en in the memoryless quantizer
case can now be used by replacing un by the sum
?1
nX
sn = ( 12 + xk ); (43)
k=0
21
evaluating the characteristic functions of (20){(21) and applying (22){(24) for s instead
of u . Thus nP
?1
2jl xi
s (l) = E fejln e i=0 g (44)
nP
?1 n?P
1+k
2j (i+l) xm 2jl xm
(sk) (i; l) = ejlk E fej (i+l)n e m=0 e m=n g: (45)
To evaluate these limits it is necessary as in the PCM case to assume a particular form
for the input signal. A simple but important signal is a dc value: xk = x for all k, where
?b x < b is xed. Although clearly of limited practical application, it can be considered
as an approximation to a very slowly varying input, that is, to the case where modulator
has a large oversampling ratio as it might for sensor measurements.
Analogous to the analysis for PCM with a sinusoidal input, there are two possible
assumptions on the dc value which lead to nice solutions, but which have fundamentally
di erent behavior and interpretation. The assumption that we shall make is that x=2b is an
irrational number. This assumption is physically and intuitively correct for an ADC because
any truly analog random signal will be describable by a probability density function and
hence with probability 1 will produce an irrational number. As in the PCM case, choosing
x=2b irrational will permit the evaluation of the above limits using asymptotic results from
ergodic theory such as Weyl's theorem (26). If on the other hand it is assumed that x=2b
is a rational number, than the limits become nite sums and the output and error signals
become periodic, that is, that \tones" or \limit cycles" are produced. Much of the analysis
described here can be modi ed for rational inputs. For example, the analysis for single
loop for rational dc inputs may be found in [45, 47]. We do not pursue the analysis
for rational inputs here because in the author's view it is of little interest in describing
the behavior of ADC systems to analyze carefully behavior resulting from zero probability
inputs. For further discussion on the issue of irrational vs. rational dc values see Iwersen
[46]. This presents a potential cause for confusion, however, because simulations of ADC
behavior on a digital computer will necessarily produce rational input signals and hence the
resulting periodic behavior will appear to disagree with the theory. The reconciliation of this
apparent paradox is to make sure that the simulations well approximate the assumptions
required by the theory. If a rational dc is selected with a modest denominator (e.g., a few
hundred), then the assumption of an irrational input is clearly violated and the resulting
22
signals will indeed be periodic and the various statistics poor approximations to the theory.
If one instead generates a random number using a uniform random number generator on a
digital computer, the resulting number will still be rational, but it will be \approximately"
irrational in that with high probability the fraction in lowest terms will have a denominator
that is extremely large (hundreds of thousands or millions). This means that the resulting
signals will be periodic, but with extremely long periods so that spectral analyzers will
not see the periodicities. All statistics computed will well match the theory in this case
(provided of course that the theory is correct).
One system where the assumption of a rational dc input signal is valid is a DAC since by
de nition a digital input signal can take on only rational values. Hence the results developed
here for irrational inputs do not apply to the analysis of modulators for DAC. The basic
methods can still be used, but the asymptotic results must be replaced by appropriate nite
sums and the answers will be di erent from those of the irrational case.
Assuming an irrational dc input x, we can replace un by sn = n , where = (1=2+x=),
in (20){(21) and evaluate the characteristic functions using (26):
Z1 (
2 0; 1; 2; : : ::
s (l) = due = j u (46)
0 1; l = 0
8 2lk
<e ; i = ?l
(sk)(i; l) = : 2lk R 1 j 2u (47)
e 0 due = 0; otherwise,
Thus we have from (22){(23) that E fen g = 0 and E fe2n g = 121 , which agrees with the
uniform noise approximation, that is, these are exactly the time average moments one
would expect with a sequence of uniform random variables. The second order properties,
however, are quite di erent. From (24) and 1.443.3 of [48]
X X1
Re(k) = ( 21l )2ej2lk = 21 12 cos(2llk 2
) = 1 ? < k > (1? < k >): (48)
12 2
l6=0 l=1
This does not correspond to a white process. The exponential expansion implies that the
spectrum is purely discrete having amplitude
8
< 0; if n = 0
Sn = : 1 2 ; if n 6= 0: (49)
(2n)
at frequencies < n >=< n(1=2 + x=) >. Thus the locations and hence the amplitude of
spikes of the the quantizer error spectrum depend strongly on the value of the input signal.
23
Thus as in the simple PCM case with a sinusoidal input, the Bennett and describing function
white noise approximations inaccurately predict the spectral nature of the quantizer noise
process, which is neither continuous nor white.
Next consider a more \active" input xn = A cos n!0 , where !0 =2 is assumed to be
irrational and where we consider a full scale sinusoid with jAj = b as an example. De ne
= =2 sin !20 : The same general procedure with a lot more algebra [49] now results in
me = 0; (50)
1 ? X1 1
l
Re(0) = 12 ( 2l ) 2 (?1) J0 (4l ); (51)
l=1
X 1
Ry (k) = Sm ej2km ; (52)
m=?1
where 81
>
> 2; m=0
>
>
<
Sm = > ( 1 P1 l=1
Jm (2 (2l?1)) (?1)l )2; m even
2l?1 (53)
>
>
>
: ( 1 P1 Jm (4 l) (?1)l )2;
l=1 2l m odd.
and 8 !0 1
< < m 2 ? 2 >; m even
>
m = > (54)
: < m !0 >; m odd.
2
With a sinusoidal input, the input and quantizer error are not uncorrelated.
As in the PCM case, the spectrum of yn is purely discrete and has amplitude sl at the
frequency l . This spectrum is extremely non-white since it is not continuous and not at.
The output frequencies depend on the input frequency !0 and comprise all harmonics of
the input frequency !0 . It is interesting to observe that not only are all harmonics of the
input frequency contained in the output signal, but also all shifts of these harmonics by
(when computed in radians). These shifted harmonics are not present in the PCM case.
25
?1
nX ?1
nX
en = n =< 12 ? n2 + pk >= ? 21 + < l( 12 + x
n?l ) > : (59)
k=0 l=0
As in the ordinary modulator, we can modify (20){(24) by replacing the quantizer input
un by a sum term
nX i?1 1 x
?1 X ?1
nX
sn = ( 2 + l ) = l( 12 + x
n?l ): (60)
i=0 l=0 l=0
and then proceed exactly as before. We here illustrate the results only for the simple case
of an irrational dc input xn = x. De ne = 1=2 + x= as before and we have that
sn = =2n2 ? =2n. The limits in the characteristic functions are evaluated using a form
of Weyl's theorem to obtain (
s (l) = 0; l 6= 0 (61)
1; l = 0.
( 1; i = l = 0
(sk)(i; l) = (62)
0; otherwise.
These characteristic functions are identical to those of the dithered PCM case in (37) and
(38) and hence the conclusions are the same: the generalized Sripad and Snyder conditions
are met and the quantization error in the second stage is indeed white and its marginal rst
and second moments agree with those of a uniform distribution! Since only the second stage
error appears in the nal reconstruction in an ideal system, the white noise approximation
can safely be used for SQNR analysis. It is perhaps surprising that such a purely deter-
ministic system with a xed dc input can produce a sequence that appears to be uniformly
distributed white noise when its rst and second order moments are measured. A slight
variation on the foregoing analysis can be used to prove that the second stage quantizer
noise and the original input are asymptotically uncorrelated.
The production of a deterministic signal that masquerades as white noise is reminiscent
of the theory of \chaos," the branch of nonlinear dynamical systems theory that focuses
on transformations on points that produce sequences that appear to be random. There is,
however, nothing chaotic about the quantization noise sequence. Technically, its Lyapunov
exponent is zero and hence it is not chaotic. Chaos can be made to occur in variations
on the basic architecture, e.g., by using an integrator in the feedback loop with gain
greater than 1. See, e.g., Feely and Chua [50, 51] and Schreier[52].
It should be reemphasized that the above analysis depended critically on the underlying
assumption of an irrational dc input. If the input were rational, then the asymptotic limits
26
would be replaced by nite sums and the behavior would be di erent. In particular, the
error and output sequences would be periodic and the system would exhibit \limit cycle"
behavior. As previously discussed, the periodic behavior would be evident if a rational input
signal with a modest denominator is chosen in a simulation. Choosing the input using a
good random number generator, yields results that well approximate the asymptotic theory.
It is also important to note that the solutions hold for the idealized system represented by
the nonlinear di erence equations. Real circuits would not have perfectly matched non-
leaky integrators which would result in behavior departing from the theory. In particular,
one would expect to see discrete frequency components in such systems even for irrational
dc inputs (as one does for a single stage idealized system). The accuracy of the theory at
predicting actual behavior for simulated or physical circuits depends strongly on the degree
to which the simulations re ect the assumptions of the theory and the physical circuits
implement the commonly used nonlinear di erence equations used to describe the systems.
The analysis can be extended to the case of a sinusoidal input, but the analysis is much
more complicated and the noise is not white [53] and it is not asymptotically uncorrelated
with the input.
27
a relation which bears a remarkable resemblance to (58) for the output of the two-stage
modulator (and hence is capable of the same interpretation).
A well known diculty with the second (and higher) order modulators is their
potential for quantizer overload. In particular, if one uses a binary quantizer with levels b
in a second order system, then it is easy to nd an input within the range [?b; b) which will
overload the quantizer and hence will be capable of producing large errors. The potential
overload also has the serious consequence for our purposes that it renders invalid a basic
technique of the approach used here. No application of the techniques of this chapter seems
possible for the case of a binary quantizer, but the techniques do apply if we permit a two
bit (or higher) quantizer. It can be shown that the smallest value of for which no overload
occurs is given by [56] = 2b=(M ? 3). Clearly this result is useful only if M 4, that is,
if the quantizer has at least two bits. For the present we make this assumption and we can
then proceed as earlier.
As with the rst order loop analysis we normalize the error and then sum the di erence
equation twice (since there is a second-order di erence) to nd
Xn
n?l ) ? X l1(u
n
yn = 21 ? n = l( 12 + x n+1?l ): (65)
l=1 l=1
In the special case of a dc input, (65) can be written as
Xn X n Xn X
n
yn = l ? l1(un+1?l ) = l( ? 1(un+1?l )) = (n + 1 ? l)( ? 1(ul)); (66)
l=1 l=1 l=1 l=1
where as before = (1=2 + x=):
As in the rst-order case,
Xn 1 x
n?l ) > :
< yn >=< l( 2 + (67)
l=1
If there is no overload, then yn =< yn > and the solution is identical to that of He et
al. (and to that for the two-stage modulator [58]). The problem is that if the quantizer
has only one bit, then it is not in general true that yn =< yn > and hence (67) does not
provide a solution for the error. The analysis does characterize < yn > as being a uniform
white noise sequence, but this is the fractional part of the quantizer error and not in general
the quantizer error itself.
There is no simple solution to this problem. The failure of the analysis to apply to
the one-bit second order system does not detract from the usefulness or popularity of the
28
system, it only leaves open the issues of nding the properties of the quantizer error and of
comparing those properties to those predicted by the white noise approximation.
As will be considered in Chapter 5, exact results for the one bit second-order case have
been developed and reported by many researchers, including Wang [59], Hein and Zakhor
[60], and by Pinault and Lopresti [61]. Both Wang and Pinault and Lopresti used ideas from
dynamical systems techniques to show that the two integrator states must eventually lie in
a compact set, demonstrating a form of stability for the system. The quantizer overloads,
but an absolute bound on the integrator states (and hence the quantizer error) can be found
as a function of the dc input. In particular their results provide a bound of the form
X
n
j n1 (q (uk ) ? x)j Cn ; (68)
k=1
where C = 5=4 using the normalizations adopted here (it can be tightened to 1). It can
further be shown that
1 X
n
nlim
!1 n n = 0;
k=1
but the evaluation of the variance, correlation, and spectra remains an open problem.
10 Some Extensions
Dithered Single-Loop
The methods can be applied to a dithered the modulator with an input of the form
xn = v n + w n ; (69)
where wn is an i.i.d. process that is independent of the quasi-stationary process vn and
it is assumed that the input and dither are constrained within [?b; b) to avoid quantizer
overload. Again the characteristic functions can be found and use to show that [62] the
error sequence has 0 mean and variance 1/12, but that the spectrum of the error is not
at. For a suitable dither sequence, however, it is smooth and tends to become increasingly
white as the dither signal is increased (and the input is correspondingly decreased to avoid
overload). In the limit of 0 signal the quantization error becomes white. As always, these
results depend strongly on the underlying assumptions. Here, for example, the conclusions
cannot be trusted if the sum of the dither and signal are allowed to cause quantizer overload.
29
Multistage and Higher Order
The two stage results can be extended to multiple stages with binary quantizers, where
it can be shown that dc inputs, sinusoidal inputs, and sums of sinusoidal inputs all yield
white quantization noise if the integrators are assumed to be ideal and the dc or frequencies
required to be irrational [63]. Similarly, dithering multistage (two stage or more) mod-
ulators with i.i.d. noise which does not cause overload also yields white quantization noise
[62]. He, Buzo, and Kuhlman found the spectra for multi-bit higher order modulators
[55, 57]. As with the second order case considered here, the quantizer must have sucient
bits to avoid overload (speci cally, if the number of loops is k, then k bits are needed). They
consider both dc and sinusoidal inputs. As with the two stage and second order systems
considered here, the M -stage (one bit per stage) and M th order (single k bit quantizer)
modulators yield the same quantization noise spectra.
Leaky Integrating
All of the systems considered here so far have a key aspect in common that permitted
solution: the linear ltering within the loop consisted only of ideal discrete time integrators;
more complicated lters such as leaky integrators or integrators with non-unity gain were
not considered. While results for general ltering do not exist, the special case of leaky
integration and non-unity gain has been considered. Kie er [64, 65, 66] has extended the
single-stage result to more general systems with dc inputs which include DPCM and
leaky integrating s, but his techniques di er from those considered here and have not
been fully exploited for the application. Feely and Chua [50] used ideas from dynamical
systems theory to describe various properties of a leaky modulator, including the input-
output relation. The methods described here can be applied to the leaky integrator and
the integrator with non-unit gain [67]. The analysis is complicated, but the results can be
easily summarized.
For the special case of a dc input, the following properties hold:
The quantizer sequence and error sequence are periodic, even for irrational dc inputs
(unlike ideal integrator case).
The error sequence is no longer uniform and the mean is not the input dc. This means
traditional decoders (low pass lters) give biased reproduction (unlike ideal integrator
30
case). In fact, it can be shown that for a dc input, the input/output relation resulting
when a long comb lter is used for digital-to-analog conversion is a form of Cantor
function or a\devil's staircase" and it is a complicated function to describe analytically.
Even when an arbitrarily large number of bits are used to reconstruct the input, the
output is biased.
The quantizer errors are not white.
The SQNR is reduced from the ideal integrator case.
Multibit Quantizer, Single-bit Feedback
The methods described here do not work for all popular architectures, but they do work
for a variety of systems. One such example of interest is the system introduced by Leslie
and Singh [68, 69] Although the theoretical treatment of modulation given here permits
multibit quantization, such systems have the practical shortcoming of requiring extremely
accurate digital-to-analog conversion in the feedback loop (a problem which vanishes in the
binary quantizer case). Leslie and Singh proposed combining multibit in the forward loop
with single bit in the feedback loop. Analysis shows that the performance of this system
is identical to that of an ordinary multibit single stage having an additional bit in the
quantizer (and feedback loop). Hence previous analysis applies [70].
Related Work
During recent years many e orts have been made to nd and apply exact analysis methods
to modulators for the purpose of describing their behavior, predicting their performance,
and developing improved systems. These works have in common with this chapter the goal of
avoiding unjusti ed application of the white noise approximation, but the detailed methods
and applications are not constrained to those described here. Of particular relevance to
the issues considered here are the work of Delchamps on the behavior of control systems
containing quantizers inside of feedback loops [71, 72], the work of Galton demonstrating
the existence of stationary distributions for the error sequence for a general class of
modulators with random inputs such as Gaussian processes [58, 73, 74, 75], the work of
Kie er [76, 64, 65] and Kie er and Dunham [77] on the stability and convergence of one-
bit feedback quantizers, and the work of Thao and Vetterli [79, 78] and Hein and Zakhor
31
[60, 80] on optimal nonlinear decoders for modulators.
11 Final Thoughts
It might be said that the theory has failed to keep up with the fast pace of practice, that
the best modulators have been developed based on engineering insight and suspect
approximations, and exact analysis has usually followed far behind, if at all. Nonetheless,
this chapter argued for a proper appreciation of the common approximation techniques,
their origins, and their limitations, and to demonstrate several important examples where
exact analysis is possible. Many open problems remain, including the evaluation of the
second order properties of the basic one-bit second order modulator as well as the
general stability properties as well as rst and second order properties of the wide variety
of hybrid cascade and higher order systems that have been proposed. Perhaps with time
some of these dicult problems may yet yield to solution or interesting new systems may
be found by modifying successful systems to make them more amenable to exact analysis.
Acknowledgements
I would like to thank Ping-Wah Wong, Wu Chou, Sang Ju Park, and Rick VanderKam
for their cooperation on much of the research reported herein. I would also like to thank
A. H. Gray for rst interesting me in quantization theory and pointing out the errors
common in applications of Bennett's theory, Gabor Temes and Jim Candy for the pleasure
of lecturing with them in several short courses on modulation, and Istvan Kollar for
helpful discussions on the di ering properties of the approaches of Widrow, Sripad, and
Snyder and of Bucklew and Gallagher. Much of the research leading to the results described
here was funded by the National Science Foundation under Grant No. MIP-9014335 and by
a Stanford University Center for Integrated Systems Seed Grant.
References
[1] W. R. Bennett, \Spectra of quantized signals," Bell Systems Technical Journal, vol. 27, pp. 446{472,
July 1948.
[2] B. Widrow, \A study of rough amplitude quantization by means of Nyquist sampling theory," IRE
Transactions Circuit Theory, vol. CT-3, pp. 266{276, 1956.
[3] A. B. Sripad and D. L. Snyder, \A necessary and sucient condition for quantization errors to be
uniform and white," IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-25, pp. 442{448, October
1977.
32
[4] Istvan Kollar, private communication.
[5] J.A. Bucklew and N.C. Gallagher, Jr., \Some properties of uniform step size quantizers," IEEE Trans-
actions on Information Theory, Vol. IT-26, pp. 610{613, 1980.
[6] J.A. Bucklew and N.C. Gallagher, Jr., \A note on optimum quantization," IEEE Transactions on
Information Theory, Vol. IT-25, pp. 365{366, 1979.
[7] J.A. Bucklew, \Two results on the asymptotic performance of quantizers," IEEE Transactions on
Information Theory, Vol. IT-30, pp. 341{348, 1984.
[8] A. Gersho and R.M. Gray, Vector Quantization and Signal Compression, Kluwer Academic Publishers,
Boston, 1992.
[9] T. Misawa, J. E. Iwersen, L. J. Loporcaro, and J. G. Rush, \A single-chip CODEC with lters utilizing
? modulation," IEEE J. Solid State Circuits, vol. SC-16, pp. 333{341, August 1981.
[10] J. C. Candy and O. J. Benjamin, \The structure of quantization noise from sigma-delta modulation,"
IEEE Trans. Comm., vol. COM-29, pp. 1316{1323, Sept. 1981.
[11] M. Vidyasagar, Nonlinear Systems Analysis. Englewood Cli s,New Jersey: Prentice Hall, 1978.
[12] A. Gelbe and W. E. V. Velde, Multiple-Input Describing Functions and Nonlinear Systems Design. New
York: McGraw-Hill, 1968.
[13] D. P. Atherton, Stability of Nonlinear Systems. Chichester: Research Studies Press: Wiley, 1981.
[14] D. P. Atherton, Nonlinear Control Engineering. New York: Van Nostrand Theinhold, 1982.
[15] A. R. Bergens and R. L. Franks, \Justi cation of the describing function method," SIAM Journal of
Control, vol. 9, pp. 568{589, 1971.
[16] R. C. Booton, Jr., \The analysis nonlinear control systems with random inputs," in Proceedings of the
Symposium on Nonlinear Circuit Analysis, (Polytechnic Institute of Brooklyn), April 1953.
[17] R. C. Booton, Jr., \Nonlinear control systems with statistical inputs," tech. rep., Massachusetts Insti-
tute of Technology, Cambridge, Mass., March 1952.
[18] S. H. Ardalan and J. J. Paulos, \An analysis of nonlinear behavior in delta-sigma modulators," IEEE
Trans. Circuits and Systems, vol. CAS-34, pp. 593{603, June 1987.
[19] D. S. Arnstein, \Quantization error in predictive coders," IEEE Trans. Comm., vol. COM-23, pp. 423{
429, April 1975.
[20] D. Slepian, \On delta modulation," Bell Syst. Tech. J., vol. 51, pp. 2101{2136, 1972.
[21] L. G. Roberts, \Picture coding using pseudo-random noise," IRE Trans. on Information Theory, vol. IT-
8, pp. 145{154, February 1962.
[22] L. Schuchman, \Dither signals and their e ects on quantization noise," IEEE Transactions on Com-
munication Technology, vol. COM-12, pp. 162{165, December 1964.
[23] J. Vanderkooy and S. P. Lipshitz, \Dither in digital audio," J. Audio Eng. Soc., vol. 35, pp. 966{975,
December 1987.
[24] R. M. Gray and T. J. Stockham, Jr., \Dithered quantizers," IEEE Trans. Inform. Theory, vol. 38,
pp. 805{812, May 1993.
[25] S. O. Rice, \Mathematical analysis of random noise," in Selected papers on noise and stochastic processes
(N. Wax and N. Wax, eds.), pp. 133{294, New York, NY: Dover, 1954. Reprinted from Bell Systems
Technical Journal,Vol. 23:282{332 (1944) and Vol. 24: 46{156 (1945).
[26] W. B. Davenport and W. L. Root, An Introduction to the Theory of Random Signals and Noise. New
York: McGraw-Hill, 1958.
[27] A. G. Clavier, P. F. Panter, and D. D. Grieg, \Distortion in a pulse count modulation system," AIEE
Transactions, vol. 66, pp. 989{1005, 1947.
33
[28] A. G. Clavier, P. F. Panter, and D. D. Grieg, \PCM distortion analysis," Electrical Engineering,
pp. 1110{1122, November 1947.
[29] B. Widrow, \Statistical analysis of amplitude quantized sampled data systems," Transactions Amer.
Inst. Elec. Eng.,Pt. II: Applications and Industry, vol. 79, pp. 555{568, 1960.
[30] T. A. C. M. Claasen and A. Jongepier, \Model for the power spectral density of quantization noise,"
IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-29, pp. 914{917, August 1981.
[31] J. E. Iwersen, \Calculated quantizing noise of single-integration delta-modulation coders," Bell Syst.
Tech. J., pp. 2359{2389, September 1969.
[32] L. Ljung, System Identi cation. Englewood Cli s,NJ: Prentice-Hall, 1987.
[33] H. Bohr, Almost Periodic Functions, Chelsea, New York, 1947.
[34] K. Petersen, Ergodic Theory. Cambridge: Cambridge University Press, 1983.
[35] N. S. Jayant and P. Noll, Digital Coding of Waveforms. Englewood Cli s,New Jersey: Prentice-Hall,
1984.
[36] S. P. Lipshitz, R. A. Wannamaker, J. Vanderkooy, and J. N. Wright, \Non-subtractive dither," IEEE
Transactions on Signal Processing, 1993. to appear.
[37] C. C. Cutler, \Transmission systems employing quantization," 1960. U. S. Patent No. 2,927,962.
[38] H. Inose and Y. Yasuda, \A unity bit coding method by negative feedback," Proc. IEEE, vol. 51,
pp. 1524{1535, November 1963.
[39] J. C. Candy, \A use of limit cycle oscillations to obtain robust analog-to-digital converters," IEEE
Trans. Comm., vol. COM-22, pp. 298{305, March 1974.
[40] J. C. Candy, \A use of double integration in sigma delta modulation," IEEE Trans. Comm., vol. COM-
33, pp. 249{258, March 1985.
[41] J. C. Candy, \Decimation for sigma delta modulation," IEEE Trans. Comm., vol. COM-34, pp. 72{76,
January 1986.
[42] J. C. Candy, Y. C. Ching, and D. S. Alexander, \Using triangularly weighted interpolation to get 13-bit
PCM from a sigma delta modulator," IEEE Trans. Comm., pp. 1268{1275, November 1976.
[43] R. M. Gray, \Oversampled sigma-delta modulation," IEEE Trans. Comm., vol. COM-35, pp. 481{489,
April 1987.
[44] R. M. Gray, \Quantization noise spectra," IEEE Trans. Inform. Theory, vol. IT-36, pp. 1220{1244,
November 1990.
[45] V. Friedman, \Structure of the limit cycles in sigma delta modulation," IEEE Trans. Commun., vol. 36,
no. 8, , pp. 972{979, Aug 1988.
[46] J. E. Iwersen, \Comments on 'The structure of the limit cycles in sigma delta modulation'." IEEE
Transactions on Communications, vol.38, no.8, p. 1117, Aug. 1990.
[47] R.M. Gray, \Spectral Analysis of Quantization noise in single-loop sigma-delta modulation with dc
inputs," IEEE Transactions on Communications, pp. 588-599, June 1989.
[48] I. S. Gradshteyn and I. M. Ryzhik, Table of Integrals,Series,and Products. New York: Academic Press,
1965.
[49] R. M. Gray, W. Chou, and P. W. Wong, \Quantization noise in single-loop sigma-delta modulation
with sinusoidal inputs," IEEE Trans. Inform. Theory, vol. IT-35, pp. 956{968, 1989.
[50] O. Feely and L. Chua, \The e ect of integrator leak in ? modulation," IEEE Trans. Circuits and
Systems, vol. 38, pp. 1293{1305, November 1991.
[51] O. Feely and L. Chua, \Nonlinear dynamics of a class of analog-to-digital converters," International
Journal of Bifurcation and Chaos, Vol. 2, pp. 325{340, June 1992.
34
[52] R. Schreier, \Destabilizing limit cycles in Delta-Sigma modulators with chaos," Proceedings of the 1993
International Symposium on Circuits and Systems, Chicago, Ill., pp. 1369{6, 1993.
[53] P. W. Wong and R. M. Gray, \Two stage sigma-delta modulation," IEEE Trans. Acoust. Speech Signal
Process., pp. 1937{1952, November 1989.
[54] N. He, A. Buzo, and F. Kuhlmann, \A frequency domain waveform speech compression system based
on product vector quantizers," in International Conference on Acoustics, Speech, and Signal Processing,
(Tokyo, Japan), April 1986.
[55] N. He, A. Buzo, and F. Kuhlmann, \Multi-loop sigma-delta quantization: Spectral analysis," in Inter-
national Conference on Acoustics, Speech, and Signal Processing, pp. 1870{1873, 1988.
[56] N. He, F. Kuhlmann, and A. Buzo, \Double-loop sigma-delta modulation with dc input," IEEE Trans.
Comm., vol. COM-38, pp. 487{495, 1990.
[57] N. He, F. Kuhlmann, and A. Buzo, \Multi-loop sigma-delta quantization with dc input," IEEE Trans.
Inform. Theory, vol. 38, pp. 1015{28, 1993.
[58] P. W. Wong and R. M. Gray, \Sigma-delta modulation with i.i.d. gaussian inputs," IEEE Trans. Inform.
Theory, vol. IT-36, pp. 784{778, July 1990.
[59] H. Wang, \A geometric view of ? modulation," IEEE Transactions on Circuits and Systems-II,
vol. 39, pp. 402{405, June 1992.
[60] S. Hein and A. Zakhor, \Stability and scaling of double loop ? modulators," in Proceedings 1992
ISCAS, pp. 1312{1315, IEEE, 1992.
[61] S. C. Pinault and P. V. Lopresti, \On the behavior of the double loop Sigma Delta modulator," IEEE
Trans. Circuits and Systems, to appear.
[62] W. Chou and R. M. Gray, \Dithering and its e ects on Sigma-Delta and multistage Sigma-Delta
modulation," IEEE Trans. Inform. Theory, vol. 37, pp. 500{513, May 1991.
[63] W. Chou, P. W. Wong, and R. M. Gray, \Multi-stage Sigma-Delta modulation," IEEE Trans. Inform.
Theory, vol. IT-35, pp. 784{796, 1989.
[64] J. C. Kie er, \Sturmian minimal systems associated with the iterates of certain functions on an in-
terval," in Proceedings of the Special Year on Dynamical Systems, Lecture Notes in Mathematics,
Springer-Verlag, 1988.
[65] J. C. Kie er, \Analysis of DC input response for a class of one-bit feedback encoders," IEEE Trans.
Comm., vol. COM-38, pp. 337{340, 1990.
[66] J. C. Kie er, \Note on `Spectral analysis of quantization noise in a single-loop sigma-delta modulator
with dc input'," IEEE Trans. Comm., Vol. 38, pp. 337{340, March 1990.
[67] S. J. Park and R. M. Gray, \Sigma-Delta modulation with leaky integration and constant input," IEEE
Trans. Inform. Theory, vol. 38, pp. 1512{1533, September 1992.
[68] T. Leslie and B. Singh, \An improved sigma-delta modulator architecture," in Proceedings 1990 IEEE
International Symposium on Circuits and Systems, vol. 1, (New Orleans, LA), pp. 372{5, IEEE, May
1990.
[69] T. Leslie and B. Singh, \Sigma-delta modulators with multibit quantising elements and single-bit
feedback," IEE Proceedings G (Circuits, Devices and Systems), vol. 139, pp. 356{62, June 1992.
[70] S. J. Park and R. M. Gray, \Sigma-Delta modulation with leaky integration and constant input," in
Abstracts of the 1991 IEEE International Symposium on Information Theory, (Budapest, Hungary),
p. 119, IEEE, June 1991.
[71] D. Delchamps, \Exact asymptotic statistics for sigma-delta quantization noise," in Proceedings Twenty-
Eighth Annual Allerton Conference on Communication, Control and Computing, (Monticello, IL), Oct
1990.
35
[72] D. Delchamps, \Quantizer dynamics and their e ect on the performance of digital feedback control
systems," in Proceedings of the 1992 American Control Conference, vol. 3, (Chicago, IL), pp. 2498{503,
American Autom. Control Council, June 1992.
[73] I. Galton, \Granular quantization noise in the rst-order ? modulator," IEEE Trans. Inform.
Theory, Vol. 39, pp. 1944{1956, November 1993.
[74] I. Galton, \Granular quantization noise in a class of modulators," IEEE Trans. Inform. Theory,
1993. submitted.
[75] T. Koski, \Statistics of the binary quantizer error in sigma delta modulation with i.i.d. input," IEEE
Trans. Inform. Theory, to appear 1994.
[76] J. C. Kie er, \Stochastic stability for feedback quantization schemes," IEEE Trans. Inform. Theory,
vol. IT-28, pp. 248{254, March 1982.
[77] J. C. Kie er and J. G. Dunham, \On a type of stochastic stability for a class of encoding schemes,"
IEEE Trans. Inform. Theory, Vol. IT-29, pp. 703{797, November 1983.
[78] N. Thao and M. Vetterli, \Optimal MSE signal reconstruction in oversampled A/D conversion using
convexity," in Proceedings of the International Conference on Acoustics, Speech, and Signal Processing,
IEEE, March 1992.
[79] N. Thao and M. Vetterli, \Oversampled A/D conversion using alternate projections," in Proceedings of
the Twenty- fth Annual Conference on Information Sciences and Systems, pp. 241{248, 1991.
[80] S. Hein and A. Zakhor, Sigma Delta Modulators: Nonlinear Decoding Algorithms and Stability Analysis.
Boston: Kluwer Academic Publishers, 1993.
36