Pulse-Code Modulation (PCM) Is A Method Used To
Pulse-Code Modulation (PCM) Is A Method Used To
PCM streams have two basic properties that determine their fidelity to the original analog
signal: the sampling rate, which is the number of times per second that samples are taken;
and the bit depth, which determines the number of possible digital values that each
sample can take.
Contents
[hide]
• 1 Modulation
• 2 Demodulation
• 3 Limitations
• 4 Digitization as part of the PCM process
• 5 Encoding for transmission
• 6 History
• 7 Nomenclature
• 8 See also
• 9 References
• 10 Further reading
• 11 External links
[edit] Modulation
Sampling and quantization of a signal (red) for 4-bit PCM
In the diagram, a sine wave (red curve) is sampled and quantized for pulse code
modulation. The sine wave is sampled at regular intervals, shown as ticks on the x-axis.
For each sample, one of the available values (ticks on the y-axis) is chosen by some
algorithm. This produces a fully discrete representation of the input signal (shaded area)
that can be easily encoded as digital data for storage or manipulation. For the sine wave
example at right, we can verify that the quantized values at the sampling moments are 7,
9, 11, 12, 13, 14, 14, 15, 15, 15, 14, etc. Encoding these values as binary numbers would
result in the following set of nibbles: 0111 (23×0+22×1+21×1+20×1=0+4+2+1=7), 1001,
1011, 1100, 1101, 1110, 1110, 1111, 1111, 1111, 1110, etc. These digital values could
then be further processed or analyzed by a purpose-specific digital signal processor or
general purpose DSP. Several Pulse Code Modulation streams could also be multiplexed
into a larger aggregate data stream, generally for transmission of multiple streams over a
single physical link. One technique is called time-division multiplexing, or TDM, and is
widely used, notably in the modern public telephone system. Another technique is called
Frequency-division multiplexing, where the signal is assigned a frequency in a spectrum,
and transmitted along with other signals inside that spectrum. Currently, TDM is much
more widely used than FDM because of its natural compatibility with digital
communication, and generally lower bandwidth requirements.
There are many ways to implement a real device that performs this task. In real systems,
such a device is commonly implemented on a single integrated circuit that lacks only the
clock necessary for sampling, and is generally referred to as an ADC (Analog-to-Digital
converter). These devices will produce on their output a binary representation of the input
whenever they are triggered by a clock signal, which would then be read by a processor
of some sort.
[edit] Demodulation
To produce output from the sampled data, the procedure of modulation is applied in
reverse. After each sampling period has passed, the next value is read and a signal is
shifted to the new value. As a result of these transitions, the signal will have a significant
amount of high-frequency energy. To smooth out the signal and remove these undesirable
aliasing frequencies, the signal would be passed through analog filters that suppress
energy outside the expected frequency range (that is, greater than the Nyquist frequency
fs / 2). Some systems use digital filtering to remove some of the aliasing, converting the
signal from digital to analog at a higher sample rate such that the analog filter required
for anti-aliasing is much simpler. In some systems, no explicit filtering is done at all; as
it's impossible for any system to reproduce a signal with infinite bandwidth, inherent
losses in the system compensate for the artifacts — or the system simply does not require
much precision. The sampling theorem suggests that practical PCM devices, provided a
sampling frequency that is sufficiently greater than that of the input signal, can operate
without introducing significant distortions within their designed frequency bands.
The electronics involved in producing an accurate analog signal from the discrete data are
similar to those used for generating the digital signal. These devices are DACs (digital-to-
analog converters), and operate similarly to ADCs. They produce on their output a
voltage or current (depending on type) that represents the value presented on their inputs.
This output would then generally be filtered and amplified for use.
[edit] Limitations
There are two sources of impairment implicit in any PCM system:
• Choosing a discrete value near the analog signal for each sample leads to
quantization error, which swings between -q/2 and q/2. In the ideal case (with a
fully linear ADC) it is uniformly distributed over this interval, with zero mean and
variance of q2/12.
• Between samples no measurement of the signal is made; the sampling theorem
guarantees non-ambiguous representation and recovery of the signal only if it has
no energy at frequency fs/2 or higher (one half the sampling frequency, known as
the Nyquist frequency); higher frequencies will generally not be correctly
represented or recovered.
Extra information: PCM data from a master with a clock frequency that can not be
influenced requires an exact clock at the decoding side to ensure that all the data is used
in a continuous stream without buffer underrun or buffer overflow. Any frequency
difference will be audible at the output since the number of samples per time interval can
not be correct. The data speed in a compact disk can be steered by means of a servo that
controls the rotation speed of the disk; here the output clock is the master clock. For all
"external master" systems like DAB the output stream must be decoded with a
regenerated and exact synchronous clock. When the wanted output sample rate differs
from the incoming data stream clock then a sample rate converter must be inserted in the
chain to convert the samples to the new clock domain.
Some forms of PCM combine signal processing with coding. Older versions of these
systems applied the processing in the analog domain as part of the A/D process; newer
implementations do so in the digital domain. These simple techniques have been largely
rendered obsolete by modern transform-based audio compression techniques.
• DPCM encodes the PCM values as differences between the current and the
predicted value. An algorithm predicts the next sample based on the previous
samples, and the encoder stores only the difference between this prediction and
the actual value. If the prediction is reasonable, fewer bits can be used to represent
the same information. For audio, this type of encoding reduces the number of bits
required per sample by about 25% compared to PCM.
• Adaptive DPCM (ADPCM) is a variant of DPCM that varies the size of the
quantization step, to allow further reduction of the required bandwidth for a given
signal-to-noise ratio.
• Delta modulation is a form of DPCM which uses one bit per sample.
In telephony, a standard audio signal for a single phone call is encoded as 8,000 analog
samples per second, of 8 bits each, giving a 64 kbit/s digital signal known as DS0. The
default signal compression encoding on a DS0 is either μ-law (mu-law) PCM (North
America and Japan) or A-law PCM (Europe and most of the rest of the world). These are
logarithmic compression systems where a 12 or 13-bit linear PCM sample number is
mapped into an 8-bit value. This system is described by international standard G.711. An
alternative proposal for a floating point representation, with 5-bit mantissa and 3-bit
radix, was abandoned.
Where circuit costs are high and loss of voice quality is acceptable, it sometimes makes
sense to compress the voice signal even further. An ADPCM algorithm is used to map a
series of 8-bit µ-law or A-law PCM samples into a series of 4-bit ADPCM samples. In
this way, the capacity of the line is doubled. The technique is detailed in the G.726
standard.
Later it was found that even further compression was possible and additional standards
were published. Some of these international standards describe systems and ideas which
are covered by privately owned patents and thus use of these standards requires payments
to the patent holders.
Some ADPCM techniques are used in Voice over IP communications.
Ones-density is often controlled using precoding techniques such as Run Length Limited
encoding, where the PCM code is expanded into a slightly longer code with a guaranteed
bound on ones-density before modulation into the channel. In other cases, extra framing
bits are added into the stream which guarantee at least occasional symbol transitions.
In other cases, the long term DC value of the modulated signal is important, as building
up a DC offset will tend to bias detector circuits out of their operating range. In this case
special measures are taken to keep a count of the cumulative DC offset, and to modify the
codes if necessary to make the DC offset always tend back to zero.
Many of these codes are bipolar codes, where the pulses can be positive, negative or
absent. In the typical alternate mark inversion code, non-zero pulses alternate between
being positive and negative. These rules may be violated to generate special symbols
used for framing or other special purposes.
[edit] History
In the history of electrical communications, the earliest reason for sampling a signal was
to interlace samples from different telegraphy sources, and convey them over a single
telegraph cable. Telegraph time-division multiplexing (TDM) was conveyed as early as
1853, by the American inventor Moses B. Farmer. The electrical engineer W. M. Miner,
in 1903, used an electro-mechanical commutator for time-division multiplex of multiple
telegraph signals, and also applied this technology to telephony. He obtained intelligible
speech from channels sampled at a rate above 3500–4300 Hz: below this was
unsatisfactory. This was TDM, but pulse-amplitude modulation (PAM) rather than PCM.
In 1926, Paul M. Rainey of Western Electric patented a facsimile machine which
transmitted its signal using 5-bit PCM, encoded by an opto-mechanical analog-to-digital
converter.[3] The machine did not go into production. British engineer Alec Reeves,
unaware of previous work, conceived the use of PCM for voice communication in 1937
while working for International Telephone and Telegraph in France. He described the
theory and advantages, but no practical use resulted. Reeves filed for a French patent in
1938, and his U.S. patent was granted in 1943.
The first transmission of speech by digital techniques was the SIGSALY vocoder
encryption equipment used for high-level Allied communications during World War II
from 1943. In 1943, the Bell Labs researchers who designed the SIGSALY system
became aware of the use of PCM binary coding as already proposed by Alec Reeves. In
1949 for the Canadian Navy's DATAR system, Ferranti Canada built a working PCM
radio system that was able to transmit digitized radar data over long distances.[4]
PCM in the late 1940s and early 1950s used a cathode-ray coding tube with a plate
electrode having encoding perforations.[5][6] As in an oscilloscope, the beam was swept
horizontally at the sample rate while the vertical deflection was controlled by the input
analog signal, causing the beam to pass through higher or lower portions of the perforated
plate. The plate collected or passed the beam, producing current variations in binary code,
one bit at a time. Rather than natural binary, the grid of Goodall's later tube was
perforated to produce a glitch-free Gray code, and produced all bits simultaneously by
using a fan beam instead of a scanning beam.
The National Inventors Hall of Fame has honored Bernard M. Oliver[7] and Claude
Shannon[8] as the inventors of PCM,[9] as described in 'Communication System
Employing Pulse Code Modulation,' U.S. Patent 2,801,281 filed in 1946 and 1952,
granted in 1956. Another patent by the same title was filed by John R. Pierce in 1945, and
issued in 1948: U.S. Patent 2,437,707. The three of them published "The Philosophy of
PCM" in 1948.[10]
Pulse-code modulation (PCM) was used in Japan by Denon in 1972 for the mastering and
production of analogue phonograph records, using a 2-inch Quadruplex-format videotape
recorder for its transport, but this was not developed into a consumer product.
[edit] Nomenclature
The word pulse in the term Pulse-Code Modulation refers to the "pulses" to be found in
the transmission line. This perhaps is a natural consequence of this technique having
evolved alongside two analog methods, pulse width modulation and pulse position
modulation, in which the information to be encoded is in fact represented by discrete
signal pulses of varying width or position, respectively. In this respect, PCM bears little
resemblance to these other forms of signal encoding, except that all can be used in time
division multiplexing, and the binary numbers of the PCM codes are represented as
electrical pulses. The device that performs the coding and decoding function in a
telephone circuit is called a codec.
Companding
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Original signal
While the compression used in audio recording and the like depends on a variable-gain
amplifier, and so is a locally linear process (linear for short regions, but not globally),
companding is non-linear and takes place in the same way at all points in time. The
dynamic range of a signal is compressed before transmission and is expanded to the
original value at the receiver.
The electronic circuit that does this is called a compandor and works by compressing or
expanding the dynamic range of an analog electronic signal such as sound. One variety is
a triplet of amplifiers: a logarithmic amplifier, followed by a variable-gain linear
amplifier and an exponential amplifier. Such a triplet has the property that its output
voltage is proportional to the input voltage raised to an adjustable power. Compandors
are used in concert audio systems and in some noise reduction schemes such as dbx and
Dolby NR (all versions).
Companding can also refer to the use of compression, where gain is decreased when
levels rise above a certain threshold, and its complement, expansion, where gain is
increased when levels drop below a certain threshold.
The use of companding allows signals with a large dynamic range to be transmitted over
facilities that have a smaller dynamic range capability. For example, it is employed in
professional wireless microphones since the dynamic range of the microphone audio
signal itself is larger than the dynamic range provided by radio transmission.
Companding also reduces the noise and crosstalk levels at the receiver.
Many of the music equipment manufacturers (Roland, Yamaha, Korg) used companding
for data compression in their digital synthesizers. This dates back to the late 1980s when
memory chips would often come as one the most costly parts in the instrument.
Manufacturers usually express the amount of memory as it is in the compressed form. i.e.
24MB waveform ROM in Korg Trinity is actually 48MB of data. Still the fact remains
that the unit has a 24MB physical ROM. In the example of Roland SR-JV expansion
boards, they usually advertised them as 8MB boards which contain '16MB-equivalent
content'. Careless copying of the info and omitting the part that stated "equivalent" can
often lead to confusion.
Contents
[hide]
• 1 History
• 2 See also
• 3 References
• 4 External links
[edit] History
The use of companding in an analog picture transmission system was patented by A. B.
Clark of AT&T in 1928 (filed in 1925):[1]
In the transmission of pictures by electric currents, the method which consists in sending
currents varied in a non-linear relation to the light values of the successive elements of
the picture to be transmitted, and at the receiving end exposing corresponding elements of
a sensitive surface to light varied in inverse non-linear relation to the received current.
—A. B. Clark patent
In 1942, Clark and his team completed the SIGSALY secure voice transmission system
that included the first use of companding in a PCM (digital) system.[2]
In 1953, B. Smith showed that a nonlinear DAC could result in the inverse nonlinearity in
a successive-approximation ADC configuration, simplifying the design of digital
companding systems.[3]
μ-law algorithm
From Wikipedia, the free encyclopedia
Jump to: navigation, search
The µ-law algorithm (often u-law, ulaw, mu-law, pronounced /ˈmjuː/) is a companding
algorithm, primarily used in the digital telecommunication systems of North America and
Japan. Companding algorithms reduce the dynamic range of an audio signal. In analog
systems, this can increase the signal-to-noise ratio (SNR) achieved during transmission,
and in the digital domain, it can reduce the quantization error (hence increasing signal to
quantization noise ratio). These SNR increases can be traded instead for reduced
bandwidth for equivalent SNR.
It is similar to the A-law algorithm used in regions where digital telecommunication
signals are carried on E-1 circuits, e.g. Europe.
Contents
[hide]
• 1 Algorithm Types
o 1.1 Continuous
o 1.2 Discrete
• 2 Implementation
• 3 Usage Justification
• 4 Comparison with A-law
• 5 See also
• 6 References
• 7 External links
[edit] Continuous
where μ = 255 (8 bits) in the North American and Japanese standards. It is important to
note that the range of this function is -1 to 1.
[edit] Discrete
G.711 is unclear about what the values at the limit of a range code up as. (e.g. whether
+31 codes to 0xEF or 0xF0). However G.191 provides example C code for a μ-law
encoder which gives the following encoding. Note the difference between the positive
and negative ranges. e.g. the negative range corresponding to +30 to +1 is -31 to -2. This
is accounted for by the use of a 1's complement (simple bit inversion) rather than 2's
complement to convert a negative value to a positive value during encoding.
[edit] Implementation
There are three ways of implementing a μ-law algorithm :
Analog
Use an amplifier with non-linear gain to achieve companding entirely in the
analog domain.
Non-linear ADC
Use an Analog to Digital Converter with quantization levels which are unequally
spaced to match the μ-law algorithm.
Digital
Use the quantized digital version of the μ-law algorithm to convert data once it is
in the digital domain.
As the digital age dawned, it was noted that this pre-existing algorithm had the effect of
significantly reducing the number of bits needed to encode recognizable human voice.
Using μ-law, a sample could be effectively encoded in as few as 8 bits, a sample size that
conveniently matched the symbol size of most standard computers.
μ-law encoding effectively reduced the dynamic range of the signal, thereby increasing
the coding efficiency while biasing the signal in a way that results in a signal-to-
distortion ratio that is greater than that obtained by linear encoding for a given number of
bits. This is an early form of perceptual audio encoding.
The μ-law algorithm is also used in the .au format, which dates back at least to the
SPARCstation 1 as the native method used by Sun's /dev/audio interface, widely used as
a de facto standard for Unix sound. The .au format is also used in various common audio
APIs such as the classes in the sun.audio Java package in Java 1.1 and in some C#
methods.
This plot illustrates how μ-law concentrates sampling in the smaller (softer) values. The
values of a μ-law byte 0-255 are the horizontal axis, the vertical axis is the 16 bit linear
decoded value. This image was generated with the Sun Microsystems c routine g711.c
commonly available on the Internet.
[edit] Comparison with A-law
The µ-law algorithm provides a slightly larger dynamic range than the A-law at the cost
of worse proportional distortion for small signals. By convention, A-law is used for an
international connection if at least one country uses it.
A-law algorithm
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Graph of μ-law & A-law algorithms
where A is the compression parameter. In Europe, A = 87.7; the value 87.6 is also used.
The reason for this encoding is that the wide dynamic range of speech does not lend itself
well to efficient linear digital encoding. A-law encoding effectively reduces the dynamic
range of the signal, thereby increasing the coding efficiency and resulting in a signal-to-
distortion ratio that is superior to that obtained by linear encoding for a given number of
bits.
Narrowband
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may
be challenged and removed. (March 2009)
In the study of wireless channels, narrowband implies that the channel under
consideration is sufficiently narrow that its frequency response can be considered flat.
The message bandwidth will therefore be less than the coherence bandwidth of the
channel. This is usually used as an idealizing assumption; no channel has perfectly flat
fading, but the analysis of many aspects of wireless systems is greatly simplified if flat
fading can be assumed.
Narrowband can also be used with the audio spectrum to describe sounds which occupy a
narrow range of frequencies.
Wideband
From Wikipedia, the free encyclopedia
Jump to: navigation, search
For the automotive term, see Wideband (automotive).
The term Wideband Audio or (also termed HD Voice or Wideband Voice) denotes a
telephone conversation using a wideband codec, which uses a greater frequency range of
the audio spectrum than conventional telephone calls, resulting in a clearer sound.
According to the United States Patent and Trademark Office, WIDEBAND is a registered
trademark [1] of WideBand Corporation, a USA based manufacturer of Gigabit Ethernet
managed switches, adapters, and networking equipment. [2]
Contents
[hide]
• 1 Quantization Noise
• 2 Applications
• 3 See also
• 4 External links
• 5 Notes
• 6 References
[edit] Applications
This section may require copy-editing.
In electronics, adaptive quantization is a quantization process that varies the step size
based on the changes of the input signal, as a means of efficient compression. Two
approaches commonly used are forward adaptive quantization and backward adaptive
quantization.
In digital signal processing the quantization process is the necessary and natural follower
of the sampling operation. It is necessary because in practice the digital computer with its
general purpose CPU is used to implement DSP algorithms. And since computers can
only process finite word length (finite resolution/precision) quantities, any infinite
precision continuous valued signal should be quantized to fit a finite resolution, so that it
can be represented (stored) in CPU registers and memory.
We shall be aware of the fact that, it is not the continuous values of the analog function
that inhibits its binary encoding, rather it is the existence of infinitely many such values
due to the definition of continuity,(which therefore requires infinitely many bits to
represent). For example we can design a quantizer such that it represents a signal with a
single bit (just two levels) such that, one level is "pi=3,14..." (say encoded with a 1) and
the other level is "e=2.7183..." ( say encoded with a 0), as we can see, the quantized
values of the signal take on infinite precision, irrational numbers. But there are only two
levels. And we can represent the output of the quantizer with a binary symbol.
Concluding from this we can see that it is not the discreteness of the quantized values that
enable them to be encoded but the finiteness enabling the encoding with finite number of
bits.
In theory there is no relation between quantization values and binary code words used to
encode them (rather than a table that shows the corresponding mapping, just as
examplified above). However due to practical reasons we may tend to use code words
such that their binary mathematical values has a relation with the quantization levels that
is encoded. And this last option merges the first two paragraphs in such a way that, if we
wish to process the output of a quantizer within a DSP/CPU system (which is always the
case) then we can not allow the representation levels of the quantizers to take on arbitrary
values, but only a restricted range such that they can fit in computer registers.
A quantizer is identified with its number of levels M, the decision boundaries {di} and
the corresponding representation values {ri}.
The output of a quantizer has two important properties: 1) a Distortion resulting from the
approximation and 2) a Bit-Rate resulting from binary encoding of its levels. Therefore
the Quantizer design problem is a Rate-Distortion optimization type.
If we are only allowed to use fixed length code for the output level encoding (the
practical case) then the problem reduces into a distortion minimization one.
The design of a quantizer usually means the process to find the sets {di} and {ri} such
that a measure of optimality is satisfied (such as MMSEQ (Minimum Mean Squared
Quantization Error))
Given the number of levels M, the optimal quantizer which minimizes the MSQE with
regards to the given signal statistics is called the Max-Lloyd quantizer, which is a non-
uniform type in general.
The most common quantizer type is the uniform one. It is simple to design and
implement and for most cases it suffices to get satisfactory results. Indeed by the very
inherent nature of the design process, a given quantizer will only produce optimal results
for the assumed signal statistics. Since it is very difficult to correctly predict that in
advance, any static design will never produce actual optimal performance whenever the
input statistics deviates from that of the design assumption. The only solution is to use an
adaptive quantizer.
Quantization error
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Contents
[hide]
• 6 External links
At lower levels the quantization error becomes dependent on the input signal, resulting in
distortion. This distortion is created after the anti-aliasing filter, and if these distortions
are above 1/2 the sample rate they will alias back into the audio band. In order to make
the quantization error independent of the input signal, noise with an amplitude of 2 least
significant bits is added to the signal. This slightly reduces signal to noise ratio, but,
ideally, completely eliminates the distortion. It is known as dither.
Quantization noise for a 2-bit ADC. The difference between the blue and red signals in
the upper graph is the quantization error,[dubious – discuss] which is "added" to the quantised
signal and is the source of noise.
The most common test signals that fulfil this are full amplitude triangle waves and
sawtooth waves.
For example, a 16-bit ADC has a maximum signal-to-noise ratio of 6.02 × 16 = 96.3 dB.
When the input signal is a full-amplitude sine wave the distribution of the signal is no
longer uniform, and the corresponding equation is instead
Here, the quantization noise is once again assumed to be uniformly distributed. When the
input signal has a high amplitude and a wide frequency spectrum this is the case.[1] In this
case a 16-bit ADC has a maximum signal-to-noise ratio of 98.09 dB. The 1.761
difference in signal-to-noise only occurs due to the signal being a full-scale sine wave
instead of a triangle/sawtooth.
(Typical real-life values are worse than this theoretical minimum, due to the addition of
dither to reduce the objectionable effects of quantization, and to imperfections of the
ADC circuitry. On the other hand, specifications often use A-weighted measurements to
hide the inaudible effects of noise shaping, which improves the measurement.)
For complex signals in high-resolution ADCs this is an accurate model. For low-
resolution ADCs, low-level signals in high-resolution ADCs, and for simple waveforms
the quantization noise is not uniformly distributed, making this model inaccurate.[2] In
these cases the quantization noise distribution is strongly affected by the exact amplitude
of the signal.
The calculations above, however, assume a completely filled input channel. If this is not
the case - if the input signal is small - the relative quantization distortion can be very
large. To circumvent this issue, analog compressors and expanders can be used, but these
introduce large amounts of distortion as well, especially if the compressor does not match
the expander.
Coherence bandwidth
From Wikipedia, the free encyclopedia
Jump to: navigation, search
This article may require cleanup to meet Wikipedia's quality standards. Please
improve this article if you can. The talk page may contain suggestions. (December
2007)
This article does not cite any references or sources.
Please help improve this article by adding citations to reliable sources. Unsourced material may
be challenged and removed. (May 2009)
The coherence bandwidth varies over cellular or PCS communications paths because the
multipath spread D varies from path to path.
[edit] Application
Frequencies within a coherence bandwidth of one another tend to all fade in a similar or
correlated fashion. One reason for designing the CDMA IS-95 waveform with a
bandwidth of approximately 1.25 MHz is because in many urban signaling environments
the coherence bandwidth Wc is significantly less than 1.25 MHz. Therefore, when fading
occurs it occurs only over a relatively small fraction of the total CDMA signal
bandwidth. The portion of the signal bandwidth over which fading does not occur
typically contains enough signal power to sustain reliable communications.This is the
bandwidth over which the channel transfer function remains virtually constant.
The Nyquist–Shannon sampling theorem, after Harry Nyquist and Claude Shannon, is
a fundamental result in the field of information theory, in particular telecommunications
and signal processing. Sampling is the process of converting a signal (for example, a
function of continuous time or space) into a numeric sequence (a function of discrete time
or space). Shannon's version of the theorem states:[1]
The theorem is commonly called the Nyquist sampling theorem; since it was also
discovered independently by E. T. Whittaker, by Vladimir Kotelnikov, and by others, it is
also known as Nyquist–Shannon–Kotelnikov, Whittaker–Shannon–Kotelnikov,
Whittaker–Nyquist–Kotelnikov–Shannon, WKS, etc., sampling theorem, as well as
the Cardinal Theorem of Interpolation Theory. It is often referred to simply as the
sampling theorem.
In essence, the theorem shows that a bandlimited analog signal that has been sampled can
be perfectly reconstructed from an infinite sequence of samples if the sampling rate
exceeds 2B samples per second, where B is the highest frequency in the original signal. If
a signal contains a component at exactly B hertz, then samples spaced at exactly 1/(2B)
seconds do not completely determine the signal, Shannon's statement notwithstanding.
This sufficient condition can be weakened, as discussed at Sampling of non-baseband
signals below.
More recent statements of the theorem are sometimes careful to exclude the equality
condition; that is, the condition is if x(t) contains no frequencies higher than or equal to
B; this condition is equivalent to Shannon's except when the function includes a steady
sinusoidal component at exactly frequency B.
The theorem also leads to a formula for reconstruction of the original signal. The
constructive proof of the theorem leads to an understanding of the aliasing that can occur
when a sampling system does not satisfy the conditions of the theorem.
The sampling theorem provides a sufficient condition, but not a necessary one, for perfect
reconstruction. The field of compressed sensing provides a stricter sampling condition
when the underlying signal is known to be sparse. Compressed sensing specifically yields
a sub-Nyquist sampling criterion.
Contents
[hide]
• 1 Introduction
• 2 The sampling process
• 3 Reconstruction
• 4 Practical considerations
• 5 Aliasing
• 6 Application to multivariable signals and images
• 7 Downsampling
• 8 Critical frequency
• 9 Mathematical basis for the theorem
• 10 Shannon's original proof
• 11 Sampling of non-baseband signals
• 12 Nonuniform sampling
• 13 Beyond Nyquist
• 14 Historical background
o 14.1 Other discoverers
o 14.2 Why Nyquist?
• 15 See also
• 16 Notes
• 17 References
• 18 External links
[edit] Introduction
A signal or function is bandlimited if it contains no energy at frequencies higher than
some bandlimit or bandwidth B. A signal that is bandlimited is constrained in how
rapidly it changes in time, and therefore how much detail it can convey in an interval of
time. The sampling theorem asserts that the uniformly spaced discrete samples are a
complete representation of the signal if this bandwidth is less than half the sampling rate.
To formalize these concepts, let x(t) represent a continuous-time signal and X(f) be the
continuous Fourier transform of that signal:
or, equivalently, supp(X) ⊆ [−B, B].[2] Then the sufficient condition for exact
reconstructability from samples at a uniform sampling rate fs (in samples per unit time) is:
The quantity 2B is called the Nyquist rate and is a property of the bandlimited signal,
while fs/2 is called the Nyquist frequency and is a property of this sampling system.
The time interval between successive samples is referred to as the sampling interval:
where n is an integer. The sampling theorem leads to a procedure for reconstructing the
original x(t) from the samples and states sufficient conditions for such a reconstruction to
be exact.
The continuous signal varies over time (or space in a digitized image, or another
independent variable in some other application) and the sampling process is performed by
measuring the continuous signal's value every T units of time (or space), which is called
the sampling interval. In practice, for signals that are a function of time, the sampling
interval is typically quite small, on the order of milliseconds, microseconds, or less. This
results in a sequence of numbers, called samples, to represent the original signal. Each
sample value is associated with the instant in time when it was measured. The reciprocal
of the sampling interval (1/T) is the sampling frequency denoted fs, which is measured in
samples per unit of time. If T is expressed in seconds, then fs is expressed in Hz.
[edit] Reconstruction
Reconstruction of the original signal is an interpolation process that mathematically
defines a continuous-time signal x(t) from the discrete samples x[n] and at times in
between the sample instants nT.
Fig.2: The normalized sinc function: sin(πx) / (πx) ... showing the central peak at x= 0,
and zero-crossings at the other integer values of x.
• The procedure: Each sample value is multiplied by the sinc function scaled so
that the zero-crossings of the sinc function occur at the sampling instants and that
the sinc function's central point is shifted to the time of that sample, nT. All of
these shifted and scaled functions are then added together to recover the original
signal. The scaled and time-shifted sinc functions are continuous making the sum
of these also continuous, so the result of this operation is a continuous signal. This
procedure is represented by the Whittaker–Shannon interpolation formula.
• The condition: The signal obtained from this reconstruction process can have no
frequencies higher than one-half the sampling frequency. According to the
theorem, the reconstructed signal will match the original signal provided that the
original signal contains no frequencies at or above this limit. This condition is
called the Nyquist criterion, or sometimes the Raabe condition.
If the original signal contains a frequency component equal to one-half the sampling rate,
the condition is not satisfied. The resulting reconstructed signal may have a component at
that frequency, but the amplitude and phase of that component generally will not match
the original component.
This reconstruction or interpolation using sinc functions is not the only interpolation
scheme. Indeed, it is impossible in practice because it requires summing an infinite
number of terms. However, it is the interpolation method that in theory exactly
reconstructs any given bandlimited x(t) with any bandlimit B < 1/(2T); any other method
that does so is formally equivalent to it.
• If the highest frequency B in the original signal is known, the theorem gives the
lower bound on the sampling frequency for which perfect reconstruction can be
assured. This lower bound to the sampling frequency, 2B, is called the Nyquist
rate.
• If instead the sampling frequency is known, the theorem gives us an upper bound
for frequency components, B<fs/2, of the signal to allow for perfect
reconstruction. This upper bound is the Nyquist frequency, denoted fN.
• Both of these cases imply that the signal to be sampled must be bandlimited; that
is, any component of this signal which has a frequency above a certain bound
should be zero, or at least sufficiently close to zero to allow us to neglect its
influence on the resulting reconstruction. In the first case, the condition of
bandlimitation of the sampled signal can be accomplished by assuming a model of
the signal which can be analysed in terms of the frequency components it
contains; for example, sounds that are made by a speaking human normally
contain very small frequency components at or above 10 kHz and it is then
sufficient to sample such an audio signal with a sampling frequency of at least
20 kHz. For the second case, we have to assure that the sampled signal is
bandlimited such that frequency components at or above half of the sampling
frequency can be neglected. This is usually accomplished by means of a suitable
low-pass filter; for example, if it is desired to sample speech waveforms at 8 kHz,
the signals should first be lowpass filtered to below 4 kHz.
• The sampling theorem does not say what happens when the conditions and
procedures are not exactly met, but its proof suggests an analytical framework in
which the non-ideality can be studied. A designer of a system that deals with
sampling and reconstruction processes needs a thorough understanding of the
signal to be sampled, in particular its frequency content, the sampling frequency,
how the signal is reconstructed in terms of interpolation, and the requirement for
the total reconstruction error, including aliasing, sampling, interpolation and other
errors. These properties and parameters may need to be carefully tuned in order to
obtain a useful system.
[edit] Aliasing
Main article: Aliasing
The Poisson summation formula shows that the samples, x[n]=x(nT), of function x(t) are
sufficient to create a periodic summation of function X(f). The result is:
(Eq.1)
Fig.3: Hypothetical spectrum of a properly sampled bandlimited signal (blue) and images
(green) that do not overlap. A "brick-wall" low-pass filter can remove the images and
leave the original spectrum, thus recovering the original signal from the samples.
If the sampling condition is not satisfied, adjacent copies overlap, and it is not possible in
general to discern an unambiguous X(f). Any frequency component above fs/2 is
indistinguishable from a lower-frequency component, called an alias, associated with one
of the copies. The reconstruction technique described below produces the alias, rather
than the original component, in such cases.
Fig.4 Top: Hypothetical spectrum of an insufficiently sampled bandlimited signal (blue),
X(f), where the images (green) overlap. These overlapping edges or "tails" of the images
add, creating a spectrum unlike the original. Bottom: Hypothetical spectrum of a
marginally sufficiently sampled bandlimited signal (blue), XA(f), where the images
(green) narrowly do not overlap. But the overall sampled spectrum of XA(f) is identical to
the overall inadequately sampled spectrum of X(f) (top) because the sum of baseband and
images are the same in both cases. The discrete sampled signals xA[n] and x[n] are also
identical. It is not possible, just from examining the spectra (or the sampled signals), to
tell the two situations apart. If this were an audio signal, xA[n] and x[n] would sound the
same and the presumed "properly" sampled xA[n] would be the alias of x[n] since the
spectrum XA(f) masquerades as the spectrum X(f).
For a sinusoidal component of exactly half the sampling frequency, the component will in
general alias to another sinusoid of the same frequency, but with a different phase and
amplitude.
1. Increase the sampling rate, to above twice some or all of the frequencies that are
aliasing.
2. Introduce an anti-aliasing filter or make the anti-aliasing filter more stringent.
The anti-aliasing filter is to restrict the bandwidth of the signal to satisfy the condition for
proper sampling. Such a restriction works in theory, but is not precisely satisfiable in
reality, because realizable filters will always allow some leakage of high frequencies.
However, the leakage energy can be made small enough so that the aliasing effects are
negligible.
Color images typically consist of a composite of three separate grayscale images, one to
represent each of the three primary colors — red, green, and blue, or RGB for short.
Other colorspaces using 3-vectors for colors include HSV, LAB, XYZ, etc. Some
colorspaces such as cyan, magenta, yellow, and black (CMYK) may represent color by
four dimensions. All of these are treated as vector-valued functions over a two-
dimensional sampled domain.
Similar to one-dimensional discrete-time signals, images can also suffer from aliasing if
the sampling resolution, or pixel density, is inadequate. For example, a digital photograph
of a striped shirt with high frequencies (in other words, the distance between the stripes is
small), can cause aliasing of the shirt when it is sampled by the camera's image sensor.
The aliasing appears as a moiré pattern. The "solution" to higher sampling in the spatial
domain for this case would be to move closer to the shirt, use a higher resolution sensor,
or to optically blur the image before acquiring it with the sensor.
Another example is shown to the left in the brick patterns. The top image shows the
effects when the sampling theorem's condition is not satisfied. When software rescales an
image (the same process that creates the thumbnail shown in the lower image) it, in
effect, runs the image through a low-pass filter first and then downsamples the image to
result in a smaller image that does not exhibit the moiré pattern. The top image is what
happens when the image is downsampled without low-pass filtering: aliasing results.
The application of the sampling theorem to images should be made with care. For
example, the sampling process in any standard image sensor (CCD or CMOS camera) is
relatively far from the ideal sampling which would measure the image intensity at a
single point. Instead these devices have a relatively large sensor area at each sample point
in order to obtain sufficient amount of light. In other words, any detector has a finite-
width point spread function. The analog optical image intensity function which is
sampled by the sensor device is not in general bandlimited, and the non-ideal sampling is
itself a useful type of low-pass filter, though not always sufficient to remove enough high
frequencies to sufficiently reduce aliasing. When the area of the sampling spot (the size
of the pixel sensor) is not large enough to provide sufficient anti-aliasing, a separate anti-
aliasing filter (optical low-pass filter) is typically included in a camera system to further
blur the optical image. Despite images having these problems in relation to the sampling
theorem, the theorem can be used to describe the basics of down and up sampling of
images.
[edit] Downsampling
When a signal is downsampled, the sampling theorem can be invoked via the artifice of
resampling a hypothetical continuous-time reconstruction. The Nyquist criterion must
still be satisfied with respect to the new lower sampling frequency in order to avoid
aliasing. To meet the requirements of the theorem, the signal must usually pass through a
low-pass filter of appropriate cutoff frequency as part of the downsampling operation.
This low-pass filter, which prevents aliasing, is called an anti-aliasing filter.
But for any θ such that sin(θ) ≠ 0, x(t) and xA(t) have different amplitudes and different
phase. This and other ambiguities are the reason for the strict inequality of the sampling
theorem's condition.
Fig.8: Spectrum, Xs(f), of a properly sampled bandlimited signal (blue) and images
(green) that do not overlap. A brick-wall low-pass filter, H(f), removes the images, leaves
the original spectrum, X(f), and recovers the original signal from the samples.
From Figures 3 and 8, it is apparent that when there is no overlap of the copies (aka
"images") of X(f), the k = 0 term of Xs(f) can be recovered by the product:
where:
H(f) need not be precisely defined in the region [B, fs − B] because Xs(f) is zero in that
region. However, the worst case is when B = fs/2, the Nyquist frequency. A function that
is sufficient for that and all less severe cases is:
Therefore:
The original function that was sampled can be recovered by an inverse Fourier transform:
[3]
On the left are values of f(t) at the sampling points. The integral on the right will
be recognized as essentially the nth coefficient in a Fourier-series expansion of
the function F(ω), taking the interval –W to W as a fundamental period. This
means that the values of the samples f(n / 2W) determine the Fourier coefficients
in the series expansion of F(ω). Thus they determine F(ω), since F(ω) is zero for
frequencies greater than W, and for lower frequencies F(ω) is determined if its
Fourier coefficients are determined. But F(ω) determines the original function f(t)
completely, since a function is determined if its spectrum is known. Therefore the
original samples determine the function f(t) completely.
Shannon's proof of the theorem is complete at that point, but he goes on to discuss
reconstruction via sinc functions, what we now call the Whittaker–Shannon interpolation
formula as discussed above. He does not derive or prove the properties of the sinc
function, but these would have been familiar to engineers reading his works at the time,
since the Fourier pair relationship between rect (the rectangular function) and sinc was
well known. Quoting Shannon:
Let xn be the nth sample. Then the function f(t) is represented by:
As in the other proof, the existence of the Fourier transform of the original signal is
assumed, so the proof does not say whether the sampling theorem extends to bandlimited
stationary random processes.
A similar result is true if the band does not start at zero frequency but at
some higher value, and can be proved by a linear translation
(corresponding physically to single-sideband modulation) of the zero-
frequency case. In this case the elementary pulse is obtained from sin(x)/x
by single-side-band modulation.
That is, a sufficient no-loss condition for sampling signals that do not have baseband
components exists that involves the width of the non-zero frequency interval as opposed
to its highest frequency component. See Sampling (signal processing) for more details
and examples.
A bandpass condition is that X(f) = 0, for all nonnegative f outside the open band of
frequencies:
for some nonnegative integer N. This formulation includes the normal baseband condition
as the case N=0.
A non-trivial example of exploiting extra assumptions about the signal is given by the
recent field of compressed sensing, which allows for full reconstruction with a sub-
Nyquist sampling rate. Specifically, this applies to signals that are sparse (or
compressible) in some domain. As an example, compressed sensing deals with signals
that may have a low over-all bandwidth (say, the effective bandwidth EB), but the
frequency components are spread out in the overall bandwidth B, rather than all together
in a single band, so that the passband technique doesn't apply. In other words, the
frequency spectrum is sparse. Traditionally, the necessary sampling rate is thus B / 2.
Using compressed sensing techniques, the signal could be perfectly reconstructed if it is
sampled at a rate slightly greater than the EB / 2. The downside of this approach is that
reconstruction is no longer given by a formula, but instead by the solution to a convex
optimization program which requires well-studied but nonlinear methods.
The sampling theorem, essentially a dual of Nyquist's result, was proved by Claude E.
Shannon in 1949 ("Communication in the presence of noise"). V. A. Kotelnikov
published similar results in 1933 ("On the transmission capacity of the 'ether' and of
cables in electrical communications", translation from the Russian), as did the
mathematician E. T. Whittaker in 1915 ("Expansions of the Interpolation-Theory",
"Theorie der Kardinalfunktionen"), J. M. Whittaker in 1935 ("Interpolatory function
theory"), and Gabor in 1946 ("Theory of communication").
Others who have independently discovered or played roles in the development of the
sampling theorem have been discussed in several historical articles, for example by
Jerri[6] and by Lüke.[7] For example, Lüke points out that H. Raabe, an assistant to
Küpfmüller, proved the theorem in his 1939 Ph.D. dissertation; the term Raabe condition
came to be associated with the criterion for unambiguous representation (sampling rate
greater than twice the bandwidth).
Meijering[8] mentions several other discoverers and names in a paragraph and pair of
footnotes:
As pointed out by Higgins [135], the sampling theorem should really be considered in
two parts, as done above: the first stating the fact that a bandlimited function is
completely determined by its samples, the second describing how to reconstruct the
function using its samples. Both parts of the sampling theorem were given in a somewhat
different form by J. M. Whittaker [350, 351, 353] and before him also by Ogura [241,
242]. They were probably not aware of the fact that the first part of the theorem had been
stated as early as 1897 by Borel [25].27 As we have seen, Borel also used around that time
what became known as the cardinal series. However, he appears not to have made the
link [135]. In later years it became known that the sampling theorem had been presented
before Shannon to the Russian communication community by Kotel'nikov [173]. In more
implicit, verbal form, it had also been described in the German literature by Raabe [257].
Several authors [33, 205] have mentioned that Someya [296] introduced the theorem in
the Japanese literature parallel to Shannon. In the English literature, Weston [347]
introduced it independently of Shannon around the same time.28
27
Several authors, following Black [16], have claimed that this first part of the sampling
theorem was stated even earlier by Cauchy, in a paper [41] published in 1841. However,
the paper of Cauchy does not contain such a statement, as has been pointed out by
Higgins [135].
28
As a consequence of the discovery of the several independent introductions of the
sampling theorem, people started to refer to the theorem by including the names of the
aforementioned authors, resulting in such catchphrases as “the Whittaker-Kotel’nikov-
Shannon (WKS) sampling theorem" [155] or even "the Whittaker-Kotel'nikov-Raabe-
Shannon-Someya sampling theorem" [33]. To avoid confusion, perhaps the best thing to
do is to refer to it as the sampling theorem, "rather than trying to find a title that does
justice to all claimants" [136].
In 1958, Blackman and Tukey[13] cited Nyquist's 1928 paper as a reference for the
sampling theorem of information theory, even though that paper does not treat sampling
and reconstruction of continuous signals as others did. Their glossary of terms includes
these entries:
When Shannon stated and proved the sampling theorem in his 1949 paper, according to
Meijering[8] "he referred to the critical sampling interval T = 1/(2W) as the Nyquist
interval corresponding to the band W, in recognition of Nyquist’s discovery of the
fundamental importance of this interval in connection with telegraphy." This explains
Nyquist's name on the critical interval, but not on the theorem.
Similarly, Nyquist's name was attached to Nyquist rate in 1953 by Harold S. Black:[14]
"If the essential frequency range is limited to B cycles per second, 2B was given
by Nyquist as the maximum number of code elements per second that could be
unambiguously resolved, assuming the peak interference is less half a quantum
step. This rate is generally referred to as signaling at the Nyquist rate and 1/(2B)
has been termed a Nyquist interval." (bold added for emphasis; italics as in the
original)
According to the OED, this may be the origin of the term Nyquist rate. In Black's usage,
it is not a sampling rate, but a signaling rate.
[edit]
Sampling (signal processing)
From Wikipedia, the free encyclopedia
Jump to: navigation, search
Signal sampling representation. The continuous signal is represented with a green color
whereas the discrete samples are in blue.
Contents
[hide]
• 1 Theory
o 1.1 Observation period
• 2 Practical implications
• 3 Applications
o 3.1 Audio sampling
3.1.1 Sampling rate
3.1.2 Bit depth (quantization)
3.1.3 Speech sampling
o 3.2 Video sampling
• 4 Undersampling
• 5 Oversampling
• 6 Complex sampling
o 6.1 Notes
• 7 See also
• 8 References
• 9 External links
[edit] Theory
See also: Nyquist–Shannon sampling theorem
For convenience, we will discuss signals which vary with time. However, the same
results can be applied to signals varying in space or in any other dimension and similar
results are obtained in two or more dimensions.
Let x(t) be a continuous signal which is to be sampled, and that sampling is performed by
measuring the value of the continuous signal every T seconds, which is called the
sampling interval. Thus, the sampled signal x[n] given by:
The sampling frequency or sampling rate fs is defined as the number of samples obtained
in one second, or fs = 1/T. The sampling rate is measured in hertz or in samples per
second.
We can now ask: under what circumstances is it possible to reconstruct the original signal
completely and exactly (perfect reconstruction)?
The frequency equal to one-half of the sampling rate is therefore a bound on the highest
frequency that can be unambiguously represented by the sampled signal. This frequency
(half the sampling rate) is called the Nyquist frequency of the sampling system.
Frequencies above the Nyquist frequency fN can be observed in the sampled signal, but
their frequency is ambiguous. That is, a frequency component with frequency f cannot be
distinguished from other components with frequencies NfN + f and NfN – f for nonzero
integers N. This ambiguity is called aliasing. To handle this problem as gracefully as
possible, most analog signals are filtered with an anti-aliasing filter (usually a low-pass
filter with cutoff near the Nyquist frequency) before conversion to the sampled discrete
representation.
[edit] Observation period
The observation period is the span of time during which a series of data samples are
collected at regular intervals. More broadly, it can refer to any specific period during
which a set of data points is gathered, regardless of whether or not the data is periodic in
nature. Thus a researcher might study the incidence of earthquakes and tsunamis over a
particular time period, such as a year or a century.
The observation period is simply the span of time during which the data is studied,
regardless of whether data so gathered represents a set of discrete events having arbitrary
timing within the interval, or whether the samples are explicitly bound to specified sub-
intervals.
The conventional, practical digital-to-analog converter (DAC) does not output a sequence
of dirac impulses (such that, if ideally low-pass filtered, result in the original signal
before sampling) but instead output a sequence of piecewise constant values or
rectangular pulses. This means that there is an inherent effect of the zero-order hold on
the effective frequency response of the DAC resulting in a mild roll-off of gain at the
higher frequencies (a 3.9224 dB loss at the Nyquist frequency). This zero-order hold
effect is a consequence of the hold action of the DAC and is not due to the sample and
hold that might precede a conventional ADC as is often misunderstood. The DAC can
also suffer errors from jitter, noise, slewing, and non-linear mapping of input value to
output voltage.
Jitter, noise, and quantization are often analyzed by modeling them as random errors
added to the sample values. Integration and zero-order hold effects can be analyzed as a
form of low-pass filtering. The non-linearities of either ADC or DAC are analyzed by
replacing the ideal linear function mapping with a proposed nonlinear function.
[edit] Applications
[edit] Audio sampling
Digital audio uses pulse-code modulation and digital signals for sound reproduction. This
includes analog-to-digital conversion (ADC), digital-to-analog conversion (DAC),
storage, and transmission. In effect, the system commonly referred to as digital is in fact a
discrete-time, discrete-level analog of a previous electrical analog. While modern systems
can be quite subtle in their methods, the primary usefulness of a digital system is the
ability to store, retrieve and transmit signals without any loss of quality.
When it is necessary to capture audio covering the entire 20–20,000 Hz range of human
hearing, such as when recording music or many types of acoustic events, audio
waveforms are typically sampled at 44.1 kHz (CD), 48 kHz (professional audio), or
96 kHz. The approximately double-rate requirement is a consequence of the Nyquist
theorem.
There has been an industry trend towards sampling rates well beyond the basic
requirements; 96 kHz and even 192 kHz are available.[1] This is in contrast with
laboratory experiments, which have failed to show that ultrasonic frequencies are audible
to human observers; however in some cases ultrasonic sounds do interact with and
modulate the audible part of the frequency spectrum (intermodulation distortion). It is
noteworthy that intermodulation distortion is not present in the live audio and so it
represents an artificial coloration to the live sound.[2]
One advantage of higher sampling rates is that they can relax the low-pass filter design
requirements for ADCs and DACs, but with modern oversampling sigma-delta converters
this advantage is less important.
[edit] Bit depth (quantization)
Audio is typically recorded at 8-, 16-, and 20-bit depth, which yield a theoretical
maximum signal to quantization noise ratio (SQNR) for a pure sine wave of,
approximately, 49.93 dB, 98.09 dB and 122.17 dB [3]. Eight-bit audio is generally not
used due to prominent and inherent quantization noise (low maximum SQNR), although
the A-law and u-law 8-bit encodings pack more resolution into 8 bits while increase total
harmonic distortion. CD quality audio is recorded at 16-bit. In practice, not many
consumer stereos can produce more than about 90 dB of dynamic range, although some
can exceed 100 dB. Thermal noise limits the true number of bits that can be used in
quantization. Few analog systems have signal to noise ratios (SNR) exceeding 120 dB;
consequently, few situations will require more than 20-bit quantization.
For playback and not recording purposes, a proper analysis of typical programme levels
throughout an audio system reveals that the capabilities of well-engineered 16-bit
material far exceed those of the very best hi-fi systems, with the microphone noise and
loudspeaker headroom being the real limiting factors[citation needed].
Speech signals, i.e., signals intended to carry only human speech, can usually be sampled
at a much lower rate. For most phonemes, almost all of the energy is contained in the
5Hz-4 kHz range, allowing a sampling rate of 8 kHz. This is the sampling rate used by
nearly all telephony systems, which use the G.711 sampling and quantization
specifications.
Standard-definition television (SDTV) uses either 720 by 480 pixels (US NTSC 525-line)
or 704 by 576 pixels (UK PAL 625-line) for the visible picture area.
[edit] Undersampling
Plot of sample rates (y axis) versus the upper edge frequency (x axis) for a band of width
1; grays areas are combinations that are "allowed" in the sense that no two frequencies in
the band alias to same frequency. The darker gray areas correspond to undersampling
with the lowest allowable sample rate.
Main article: Undersampling
When one samples a bandpass signal at a rate lower than the Nyquist rate, the samples
are equal to samples of a low-frequency alias of the high-frequency signal; the original
signal will still be uniquely represented and recoverable if the spectrum of its alias does
not cross over half the sampling rate. Such undersampling is also known as bandpass
sampling, harmonic sampling, IF sampling, and direct IF to digital conversion.[4]
[edit] Oversampling
Oversampling is used in most modern analog-to-digital converters to reduce the
distortion introduced by practical digital-to-analog converters, such as a zero-order hold
instead of idealizations like the Whittaker–Shannon interpolation formula.
Although complex-valued samples can be obtained as described above, they are much
more commonly created by manipulating samples of a real-valued waveform. For
instance, the equivalent baseband waveform can be created without explicitly computing
[note 2]
by processing the product sequence through a
[note 3]
digital lowpass filter whose cutoff frequency is B/2. Computing only every other
sample of the output sequence reduces the sample-rate commensurate with the reduced
Nyquist rate. The result is half as many complex-valued samples as the original number
of real samples. No information is lost, and the original s(t) waveform can be recovered,
if necessary.
[edit]
Signal-to-quantization-noise ratio
From Wikipedia, the free encyclopedia
Jump to: navigation, search
It has been suggested that this article or section be merged with Quantization
error. (Discuss)
The SQNR formula is derived from the general SNR (Signal-to-Noise Ratio) formula for
the binary pulse-code modulated communication channel:
where
As SQNR applies to quantized signals, the formulae for SQNR refer to discrete-time
digital signals. Instead of m(t), we will use the digitized signal x(n). For N quantization
steps, each sample, x requires ν = log2N bits. The probability distribution function (pdf)
representing the distribution of values in x and can be denoted as f(x). The maximum
magnitude value of any x is denoted by xmax.
As SQNR, like SNR, is a ratio of signal power to some noise power, it can be calculated
as:
The signal power is:
Giving:
When the SQNR is desired in terms of Decibels (dB), a useful approximation to SQNR
is:
where ν is the number of bits in a quantized sample, and is the signal power
calculated above. Note that for each bit added to a sample, the SQNR goes up by
approximately 6dB ( ).
Instantaneous values of the input signal that are low, relative to the reference level, are
increased, and those that are high are decreased.
The original dynamic range of a compressed signal may be restored by a circuit called an
"expander".
This article incorporates public domain material from the General Services
Administration document "Federal Standard 1037C" (in support of MIL-STD-188).
The MDCT was proposed by Princen, Johnson, and Bradley in 1987, following earlier
(1986) work by Princen and Bradley to develop the MDCT's underlying principle of
time-domain aliasing cancellation (TDAC), described below. (There also exists an
analogous transform, the MDST, based on the discrete sine transform, as well as other,
rarely used, forms of the MDCT based on different types of DCT or DCT/DST
combinations.)
In MP3, the MDCT is not applied to the audio signal directly, but rather to the output of a
32-band polyphase quadrature filter (PQF) bank. The output of this MDCT is
postprocessed by an alias reduction formula to reduce the typical aliasing of the PQF
filter bank. Such a combination of a filter bank with an MDCT is called a hybrid filter
bank or a subband MDCT. AAC, on the other hand, normally uses a pure MDCT; only
the (rarely used) MPEG-4 AAC-SSR variant (by Sony) uses a four-band PQF bank
followed by an MDCT. Similar to MP3, ATRAC uses stacked quadrature mirror filters
(QMF) followed by an MDCT.
Contents
[hide]
• 1 Definition
o 1.1 Inverse transform
o 1.2 Computation
• 2 Window functions
• 3 Relationship to DCT-IV and Origin of TDAC
o 3.1 Origin of TDAC
o 3.2 TDAC for the windowed MDCT
• 4 See also
• 5 References
[edit] Definition
As a lapped transform, the MDCT is a bit unusual compared to other Fourier-related
transforms in that it has half as many outputs as inputs (instead of the same number). In
particular, it is a linear function (where R denotes the set of real
numbers). The 2N real numbers x0, ..., x2N-1 are transformed into the N real numbers X0, ...,
XN-1 according to the formula:
The inverse MDCT is known as the IMDCT. Because there are different numbers of
inputs and outputs, at first glance it might seem that the MDCT should not be invertible.
However, perfect invertibility is achieved by adding the overlapped IMDCTs of
subsequent overlapping blocks, causing the errors to cancel and the original data to be
retrieved; this technique is known as time-domain aliasing cancellation (TDAC).
The IMDCT transforms N real numbers X0, ..., XN-1 into 2N real numbers y0, ..., y2N-1
according to the formula:
(Like for the DCT-IV, an orthogonal transform, the inverse has the same form as the
forward transform.)
In the case of a windowed MDCT with the usual window normalization (see below), the
normalization coefficient in front of the IMDCT should be multiplied by 2 (i.e.,
becoming 2/N).
[edit] Computation
Although the direct application of the MDCT formula would require O(N2) operations, it
is possible to compute the same thing with only O(N log N) complexity by recursively
factorizing the computation, as in the fast Fourier transform (FFT). One can also compute
MDCTs via other transforms, typically a DFT (FFT) or a DCT, combined with O(N) pre-
and post-processing steps. Also, as described below, any algorithm for the DCT-IV
immediately provides a method to compute the MDCT and IMDCT of even size.
The transform remains invertible (that is, TDAC works), for a symmetric window wn =
w2N-1-n, as long as w satisfies the Princen-Bradley condition:
Note that windows applied to the MDCT are different from windows used for other types
of signal analysis, since they must fulfill the Princen-Bradley condition. One of the
reasons for this difference is that MDCT windows are applied twice, for both the MDCT
(analysis) and the IMDCT (synthesis).
In order to define the precise relationship to the DCT-IV, one must realize that the DCT-
IV corresponds to alternating even/odd boundary conditions: even at its left boundary
(around n=–1/2), odd at its right boundary (around n=N–1/2), and so on (instead of
periodic boundaries as for a DFT). This follows from the identities
and
.
Thus, if its inputs are an array x of length N, we can imagine extending this array to (x, –
xR, –x, xR, ...) and so on, where xR denotes x in reverse order.
Consider an MDCT with 2N inputs and N outputs, where we divide the inputs into four
blocks (a, b, c, d) each of size N/2. If we shift these by N/2 (from the +N/2 term in the
MDCT definition), then (b, c, d) extend past the end of the N DCT-IV inputs, so we must
"fold" them back according to the boundary conditions described above.
(In this way, any algorithm to compute the DCT-IV can be trivially applied to the
MDCT.)
Similarly, the IMDCT formula above is precisely 1/2 of the DCT-IV (which is its own
inverse), where the output is shifted by N/2 and extended (via the boundary conditions) to
a length 2N. The inverse DCT-IV would simply give back the inputs (–cR–d, a–bR) from
above. When this is shifted and extended via the boundary conditions, one obtains:
One can now understand how TDAC works. Suppose that one computes the MDCT of
the subsequent, 50% overlapped, 2N block (c, d, e, f). The IMDCT will then yield,
analogous to the above: (c–dR, d–cR, e+fR, eR+f) / 2. When this is added with the previous
IMDCT result in the overlapping half, the reversed terms cancel and one obtains simply
(c, d), recovering the original data.
The origin of the term "time-domain aliasing cancellation" is now clear. The use of input
data that extend beyond the boundaries of the logical DCT-IV causes the data to be
aliased in exactly the same way that frequencies beyond the Nyquist frequency are
aliased to lower frequencies, except that this aliasing occurs in the time domain instead of
the frequency domain. Hence the combinations c–dR and so on, which have precisely the
right signs for the combinations to cancel when they are added.
For odd N (which are rarely used in practice), N/2 is not an integer so the MDCT is not
simply a shift permutation of a DCT-IV. In this case, the additional shift by half a sample
means that the MDCT/IMDCT becomes equivalent to the DCT-III/II, and the analysis is
analogous to the above.
Above, the TDAC property was proved for the ordinary MDCT, showing that adding
IMDCTs of subsequent blocks in their overlapping half recovers the original data. The
derivation of this inverse property for the windowed MDCT is only slightly more
complicated.
Recall from above that when (a,b,c,d) and (c,d,e,f) are MDCTed, IMDCTed, and added
in their overlapping half, we obtain (c + dR,cR + d) / 2 + (c − dR,d − cR) / 2 = (c,d), the
original data.
Now we suppose that we multiply both the MDCT inputs and the IMDCT outputs by a
window function of length 2N. As above, we assume a symmetric window function,
which is therefore of the form (w,z,zR,wR) where w and z are length-N/2 vectors and R
denotes reversal as before. Then the Princen-Bradley condition can be written:
, with the multiplications and additions performed elementwise,
or equivalently (reversing w and z).
(Note that we no longer have the multiplication by 1/2, because the IMDCT
normalization differs by a factor of 2 in the windowed case.)
Similarly, the windowed MDCT and IMDCT of (c,d,e,f) yields, in its first-N half:
In digital signal processing, a quadrature mirror filter is a filter most commonly used
to implement a filter bank that splits an input signal into two bands. The resulting high-
pass and low-pass signals are often reduced by a factor of 2, giving a critically sampled
two-channel representation of the original signal.
| H0(ejΩ) | 2 + | H1(ejΩ) | 2 = 1
In other words, the power sum of the high-pass and low-pass filters is equal to 1. The
filter responses are symmetric about Ω = π / 2