Noise Shaping Techniques For Analog and Time To Digital Converters Using Voltage Controlled Oscillators
Noise Shaping Techniques For Analog and Time To Digital Converters Using Voltage Controlled Oscillators
JUL O1 2008
ARCAHNES
LIBRARIES
Noise Shaping Techniques for Analog and Time to Digital
Converters Using Voltage Controlled Oscillators
by
Matthew A. Z. Straayer
Abstract
Advanced CMOS processes offer very fast switching speed and high transistor density
that can be utilized to implement analog signal processing functions in interesting
and unconventional ways, for example by leveraging time as a signal domain. In this
context, voltage controlled ring oscillators are circuit elements that are not only very
attractive due to their highly digital implementation which takes advantage of scaling,
but also due to their ability to amplify or integrate conventional voltage signals into
the time domain. In this work, we take advantage of voltage controlled oscillators
to implement analog- and time-to-digital converters with first-order quantization and
mismatch noise-shaping.
To implement a time-to-digital converter (TDC) with noise-shaping, we present a
oscillator that is enabled during the measurement of an input, and then disabled in
between measurements. By holding the state of the oscillator in between samples, the
quantization error is saved and transferred to the following sample, which can be seen
as first-order noise-shaping in the frequency domain. In order to achieve good noise-
shaping performance, we also present key details of a multi-path oscillator topology
that is able to reduce the effective delay per stage by a factor of 5 and accurately
preserve the quantization error from measurement to measurement.
An 11-bit, 50Msps prototype time-to-digital converter (TDC) using a multi-path
gated ring oscillator with 6ps of delay per stage demonstrates over 20dB of ist-order
noise shaping. At frequencies below 1MHz, the TDC error integrates to 8 0fsrms for a
dynamic range of 95dB with no calibration of differential non-linearity required. The
157x258pm TDC is realized in 0.13pm CMOS and operates from a 1.5V supply.
The use of VCO-based quantization within continuous-time (CT) EA ADC struc-
tures is also explored, with a custom prototype in 0.13pm CMOS showing measured
performance of 86/72dB SNR/SNDR with 10MHz bandwidth while consuming 40mW
from a 1.2V supply and occupying an active area of 640pm X 660pm. A key element
of the ADC structure is a 5-bit VCO-based quantizer clocked at 950 MHz which
we show achieves first-order noise-shaping of its quantization noise. The quantizer
structure allows the second order CT ZA ADC topology to achieve third order noise
shaping, and direct connection of the VCO-based quantizer to the internal DACs of
the ADC provides intrinsic dynamic element matching (DEM) of the DAC elements.
I owe much to Michael Perrott, who has freely given his time to me and this work, and
who has pushed me to think hard about fundamentals, and to balance my instinct
with reason. Collaborating with him on this work has simply been a pleasure. Hae-
Sung Lee has helped guide this work in numerous ways with constant support, quite
literally from the very first day. My colleagues at MIT have provided wonderful
feedback, ideas, and friendship. I thank Belal Helal for his diligence in testing TDC
deadzones, and for first demonstrating the GRO-TDC in a system. Chun-Ming Hsu
provided many ideas for the GRO, and his excellent work on the digital PLL proved to
be a wonderful demonstration of the GRO-TDC at the system level. Matt Park and
Min Park provided invaluable feedback on the ADC, and Charlotte Lau and Kerwin
Johnson helped immeasurably with administering software.
The opportunity for me to work on this research was made possible by the generous
support from MIT Lincoln Laboratory, and for that support I am truly grateful. Mark
Gouker's leadership, vision, and mentorship throughout the process has been both
encouraging and insightful. I am thankful to the Lincoln Scholars Program and to
Dave Shaver for their committment to fund this work, and to Tim Hancock, who
many times helpfully lent his ear as well as his constructive feedback. I thank Andy
Messier for his willingness to debug verilog code with me, George Fitch for providing
GPIB code, and also Rick Slattery, Peter Murphy, and Lenny Johnson for support
with packaging.
Thanks are in order to Frequency Electronics, Inc. for providing access to high-
quality quartz oscillators for testing the fractional / integer digital PLL. In addition,
many people in the high-speed data converters group at Analog Devices, Inc. provided
helpful guidance and resources for testing the EA ADC.
1 Introduction 19
1.1 Area of focus .......... .................... .. 19
1.2 Primary contributions .......................... 22
1.3 Thesis overview ...................... ........ 24
11 Conclusion 173
List of Figures
1-2 The basic concept of a VCO-based ADC and TDC in this work . . . 21
2-5 An Vernier TDC that effectively amplifies the input time interval . 33
2-6 A dual-step TDC that incorporates both the delay-chain and Vernier
techniques ............... . ..... ............. 34
3-16 Basic concept of using multiple inputs for each delay stage ...... . 66
3-17 Techniques to reduce effective delay by modifying the standard inverter 66
3-18 Example for optimizing multi-path oscillator resolution ..... .... 69
3-19 Delay cell topology for the proposed gated ring oscillator . ...... 70
3-20 Schematic of the proposed multi-path GRO .............. 71
3-21 Inverter delay cell layout for the prototype GRO ............ 72
3-22 Delay cell layout floorplan for the prototype multi-path GRO ..... 73
3-23 Simulated transient voltages of the multi-path delay element outputs 74
3-24 Concept of how the overlapping skew from positive and negative tran-
sitions for a multi-path GRO significantly reduces the total skew . .. 75
3-25 Multi-path GRO skew vs. phase for typical conditions ......... 77
3-26 Multi-path GRO skew vs. phase for stepped disable widths ...... 77
3-27 Multi-path GRO peak-to-peak skew vs. disable width and rise / fall
time ....... . . ................. 78
4-1 Using two counters for each output stage to keep track of the total
number of phase transitions . .................. .... 82
4-2 Double-counting transitions in the GRO measurement ...... ... . 82
4-3 Basic concept of calculating the GRO-TDC output by differentiating
phase............ ....................... .. 85
4-4 Chart showing the logical states of a standard 15-stage ring oscillator
for each of the 30 possible discrete phase states . ........... 86
4-5 Accomodating a counter with a limited range . ............ 87
4-6 A potential phase error when the oscillator state is determined by both
registers and counters ................... ...... . 88
4-7 Combining register and latch functions into a single element ..... 88
4-9 Overall block diagram of efficient and robust phase measurement tech-
nique for an inverter-based GRO . .................. . 90
4-10 Simulated transient voltages of the multi-path delay element outputs
when mismatch is included ................... ..... 92
4-11 Logical states of the 47-stage multi-path oscillator for each of the 94
possible quantized phase states ................... .. 93
4-12 A geometric view of an example multi-path GRO state ....... . 94
4-13 Re-arranging the logical states of the multi-path GRO into groups that
correspond to the 7 measurement cells . ................ 95
4-14 Overall system block diagram for the proposed 47-stage multi-path
GRO-TDC ................. ........ ...... 96
6-15 Measured -58dBc spurious performance from the MDLL prototype . 126
6-16 Measured MDLL phase noise at 1.6GHz output frequency ....... 127
8-2 Utilizing VCO for implicit barrel shift DEM of DAC elements .... 145
8-4 A model in discrete-time (a) and continuous-time (b) for the VCO-
based quantizer EA ADC with non-linearity error En1 and quantization
error E, ................... .............. 150
8-6 Model for the prototype ADC including excess loop delay and a minor
compensation loop ................... ......... 153
8-7 Behavioral simulation results of an example VCO-based quantizer EA
ADC with (a) 2 nd order loop filter with NTF zeros at DC and (b) 4 th
order loop filter with optimized zeros for Fb = 20MHz ........ . 154
Introduction
(out(t) = 2n;KvVtune(T)dT Kv = F
0 dVtune
and defines the slope of the curve, K, [Hz/V], as the small-signal voltage-to-frequency
gain. Second, we also see that the VCO effectively behaves as a continuous-time (CT)
voltage-to-phase integrator. Since the output phase of an oscillating VCO accumu-
lates without end, the VCO voltage-to-phase integration is then ideal in the sense
that there is infinite DC gain. Finally, while the phase of the VCO output signal
changes continuously, its voltage output toggles between two discrete output levels:
high voltage and low voltage. Consequently, the VCO can seamlessly drive other
digital blocks with little additional signal conditioning or amplification.
It is well-known that a simple ADC can be formed with a VCO structure by simply
adding a frequency measurement capability as depicted in Figure 1-2(a). As we will
see, the measurement circuits can be implemented a number of ways, however we can
conceptualize this circuit for now as simply counting the number of VCO periods in
each sampling clock period. The digital output of the measurement circuit will then
correspond proportionally to the input voltage through the K, gain factor.
CLK Tin[k]
Vtune(t)
& Fout(t)
Out[k]
Figure 1-2 The concept of VCO-based converters: (a) a simple analog-to-digital con-
verter, (b) a gated ring oscillator time-to-digital converter
binary values, 0 and the nominal oscillation frequency, and the analog input Tin is
now the length of time that the oscillator is enabled. The measurement circuit again
monitors the number of VCO periods or transitions that occur during the sample
clock period such that the converter output linearly corresponds with the width of
the input signal.
A very interesting aspect to both of these converter architectures is that, despite
a digital implementation, the analog quantization error for each sample can actually
be saved and passed along to the following measurement. If each sample corrects for
the error from the previous sample, then the average quantization error will improve
significantly by sampling the same input multiple times. In fact, we can say that
properly preserving and accounting for this error will result in first-order noise-shaping
in the frequency domain.
Although first-order noise-shaping is well-known and can be achieved in a rel-
atively straight-forward manner for the ADC of Figure 1-2(a), to our knowledge
noise-shaping for a TDC has not been previously demonstrated. In order to prac-
tically achieve good noise-shaping performance for the TDC of Figure 1-2(b), the
quantization error must be preserved during the time that the oscillator is disabled.
In fact, holding the phase state of a VCO represents a new concept outside of the
typical operating conditions for an oscillator. We therefore explore the key issues
in transferring this error, and present key details of a multi-path oscillator topology
that is able to significantly improve raw resolution and at the same time accurately
preserve the quantization error from measurement to measurement.
An 11-bit, 50Msps prototype time-to-digital converter (TDC) using a multi-path
gated ring oscillator with 6ps of delay per stage demonstrates over 20dB of 1st-order
noise shaping. At frequencies below 1MHz, the TDC error integrates to 8 0fsrms for a
dynamic range of 95dB with no calibration of differential non-linearity required. The
157x258pm TDC is realized in 0.13pm CMOS and operates from a 1.5V supply.
The use of VCO-based quantization within continuous-time (CT) EA ADC struc-
tures is also demonstrated, with a custom prototype in 0.13pm CMOS showing mea-
sured performance of 86/72dB SNR/SNDR with 10MHz bandwidth while consuming
40mW from a 1.2V supply and occupying an active area of 640ym X 660pm. A
key element of the ADC structure is a 5-bit VCO-based quantizer clocked at 950
MHz which we show achieves first-order noise-shaping of its quantization noise. The
quantizer structure allows the second order CT EA ADC topology to achieve third
order noise shaping, and direct connection of the VCO-based quantizer to the internal
DACs of the ADC provides intrinsic dynamic element matching (DEM) of the DAC
elements.
* The introduction of a gated ring oscillator topology that, when used in a time-
to-digital converter, can achieve first-order noise-shaping of quantization and
mismatch error
* The analysis of errors due to gating an oscillator that can fundamentally limit
noise-shaping performance
* The mitigation of these errors with a multi-path ring oscillator topology that
linearizes the gating operation and reduces the effective delay per stage to a
small fraction of an inverter delay
Background on Time-to-Digital
Converters
2.1 Introduction
Accurate measurement of time has had a critical role in the development of science
throughout history, starting with the earliest examples of analog clocks based on solar
motion and water flow, and including the most accurate caesium resonators available
today. As a subset of time-keeping technology, time-to-digital converters (TDC), or
time-interval meters (TIM), allow for precise measurement of the time between two
events. Historically, TDC have had significant application in experimental physics.
For example, in the nuclear physics community, measurements of mean lifetime, par-
ticle identification, and time-of-flight require precise TDC, and many of the early
integrated circuit TDC addressed such needs [53]. Today, TDC continue to serve an
important role not only in experimental applications, but also in commercial time-of-
flight applications such as laser rangefinding and positive electron tomography (PET)
medical imaging technology [70].
A relatively new application for TDC that has emerged is closed-loop timing sys-
tems that are fully integrated in silicon technology. Since advanced CMOS processes
have begun to offer extremely compact, robust, and flexible processing power, many
applications have begun to replace traditional analog signal processing blocks with
Tq
Reference
Tt
m o
Start Stop
Signals it ilk
tstart tstop
digital signal processing. Such a shift in architectural design places a relatively in-
creased burden on the mixed-signal interface, especially in terms of converter perfor-
mance. For systems that require precise control or alignment of timing signals, such
as phase-locked loops (PLL), delay-locked loops (DLL), and clock and data recovery
(CDR) circuits, the TDC is a fundamental element that can bridge the gap between
the continuous-time analog domain and the discrete-time digital domain.
Considering that there is an extensive history of TDC prior to the development of
digital PLL, it is useful to understand how today's state-of-the-art TDC technology
relates to older ideas that have been around for some time. In fact, a review of the
historical developments of TDC over the past 50 years or more reveals that, while
technology has seen a tremendous change from vacuum tubes and ferrite pot-core
transformers to present-day advanced CMOS, the concepts and techniques for divid-
ing time into measureable intervals have remained remarkably the same. Given this
context, although it is possible to think of TDC architectures in terms of implementa-
tion details, it is also instructive to think of the architectures in a conceptual manner.
In this way, we can both understand current practice and, at the same time, shape
the future efforts in TDC development by considering how these simple but powerful
ideas best can be used within a new, yet undefined, component technology.
We then examine Figure 2-1, which is a picture describing the general operation
of a TDC that can serve as an entry point into the discussion of many different
TDC architectures and ideas. The figure, while modified slightly for our purposes,
is basically equivalent to Figure 1 from Baron's 1957 original manuscript on the
Vernier technique [4].1 From the figure we see that the input time interval, Tin =
tatop - tstart, can be divided up into a number of smaller reference time intervals of
nominal length T,. An estimate of Tin can be trivially calculated by counting the
number of intermediate reference pulses or events (i.e. Tout[k] = Out[k]Tq), although
there is an error to this method at both the beginning and end of the measurement,
Given these definitions, we can express the input and output relationship for a TDC
as
Tout[k] = Ti,[k] - Terror[k], (2.2)
Since the raw TDC resolution is limited by Tq, it is not surprising that a great deal
of effort over the years has been made in reducing this value, either directly through
technology advancement, or effectively by using design techniques, a few examples of
which will be covered later in the following section. While these efforts have made
significant progress in improving TDC resolution, applications continue to demand
the best resolution and/or range than can be achieved in a practical fashion.
For many early TDC applications, and especially for experimental applications,
the form factor of the TDC was less important than achieving high-resolution and
accuracy. As a result, many of the best TDC solutions in terms of resolution are large,
1We should note that within this manuscript we find that Baron "recognizes the fact that the
Hughes Research and Development Laboratories, prior to the work described in this report, had
fabricated a similar vernier measuring system."
ONn3
IV-
E
0.
o0 0
o
0
m
8
102 o0
I-
o 0o o o
o o
U)
m 0
Inl
I n 4 1
consume significant power, and require complex tuning or calibration. For example,
in the dual-conversion approach, classic voltage-domain analog-to-digital converters
can be utilized for a TDC by integrating current onto a reference capacitor for each
input sample [68], which converts time into voltage before digitization. Although this
approach may result in excellent resolution for a particular technology, the architec-
ture is analog-intensive, is not power efficient, and does not take advantage of the
ability to resolve digital edges in modern CMOS.
In contrast, TDC constructed with digital CMOS technology have benefited greatly
from process feature scaling, since a more advanced process results in not only com-
pact and fully-integrated solutions, but also smaller CMOS gate delays and the
accompanying improvement in resolution. Figure 2-2(a) plots reported LSB size
for TDC implemented in CMOS over the last decade versus the CMOS technol-
ogy node (this work is shown with a x), and a best-fit line to the data is also
shown [8-10, 13, 18, 19, 27, 29, 30, 34, 37, 43, 44, 46, 48, 56, 66, 71]. We can clearly see
from this data evidence that CMOS scaling has indeed resulted in better TDC reso-
lution, and assuming that at least some new process developments are made in the
future, TDC resolution should continue to improve.
On the other hand, Figure 2-2(b) demonstrates that when the LSB size of various
TDC are normalized to the minimum transistor gate length in the process 2 , the
performance of TDC has been relatively flat. While advancements have certainly
been made in adapting TDC architectures for modern CMOS, improvements to the
fundamental relationship between gate delay and LSB step size have been difficult to
realize.
Certainly one way to interpret this data is to say that the best way to achieve
an optimal TDC resolution performance is to wait, i.e. to follow Moore's law until
scaling enables better performance with known TDC techniques. While this may be a
valid approach for some applications, it does not aid the TDC designer in optimizing
resolution performance for a given technology. Given the difficulty in improving the
raw resolution in a standard CMOS process, it then becomes important to fully
explore techniques such as oversampling to improve effective resolution performance,
which is a primary focus of this work.
Moreover, when considering future CMOS TDC and process scaling, it is well
known that transistor and parasitic mismatch has become a very real and significant
problem for the most advanced technologies [54]. Therefore, while intrinsic delay may
continue to decrease in the future, for traditional TDC.architectures to benefit from
this we also require the accuracy of the delay to improve as well. We will later see
that mismatch can be a bottleneck for many TDC architectures. Therefore, achieving
high performance in the presence of large delay mismatch is a critical requirement for
future TDC that has so far seen little attention at the architectural level compared
with the relative efforts to improve raw resolution.
Since we have described some of the basic challenges facing TDC, in the next
section we will review some of the state-of-the-art TDC architectures along with
their associated performance tradeoffs. This review will lead into the focus of this
work, which is a CMOS gated ring oscillator (GRO) TDC. The GRO-TDC makes full
2
Gate propagation delay is often approximated to be proportional to transistor length [81,82],
and therefore normalizing to transistor gate length is a reasonable way to normalize fundamental
resolution.
Delay = Tq
I I
U
Sta
-V -Tstop0 1 1 Out
F,0o
'ut
Stc
3LV/J
U'
AM.
Tin
use of oversampling to address the issues of limited TDC resolution in the presence
of large mismatch, while at the same time achieving a large dynamic range, compact
area, and low power consumption.
T. Ti
representing the second event. A thermometer code is then generated at the register
output, which corresponds to the number of delay elements that have transitioned
within the measurement interval Tin[k]. The TDC output Out[k] is then simply calcu-
lated as the sum of the thermometer code, and is related to the input by Equation 2.3,
where the overall error can be described in this case as
Although the delay-chain architecture offers a simple TDC with moderate perfor-
mance, an important limitation to consider is the high cost for increasing its range.
Increasing the dynamic range of the delay-chain TDC requires a linear increase in
the number of delay elements, which similarly increases the power consumption and
decreases the maximum sampling rate.
A simple solution to the limited range of the delay-chain TDC is to wrap or fold
the end of the chain back to the beginning through a multiplexer that is controlled
by digital logic, as shown in Figure 2-4. The multiplexer selects the start signal
during the beginning of each time interval, and after this start signal has occured
then quickly switches to select the end of the delay-chain so that the subsequent edge
transitions rotate around the ring. This technique allows each of the delay elements
to be used multiple times per measurement, and the TDC output is simply found
by counting and summing all of the delay element transitions that occur during Ti,.
Compared to the delay-chain TDC, the cyclic TDC core does not scale up at all with
larger range, and the counters will only scale with the logarithm of range.
Asymmetry in the delay-chain structure due to the multiplexer increases the mis-
match for that particular element, which degrades the differential non-linearity per-
formance. Techniques to match the multiplexer delay to that of a delay element can
be used, such as incorporating a multiplexer with fixed connections in each of the
delay elements [23]. In terms of integral non-linearity, the cyclic TDC has better
performance than the delay-chain TDC for large input signals due to the periodic use
of delay elements.
While the TDC range can be improved with the simple cyclic TDC, a more prob-
lematic issue that has not been addressed is the coarse resolution, which is limited
to a minimum inverter delay in the process. Although over time technology scaling
will improve the intrinsic delay, the mismatch of delay elements is expected to get
worse. Additionally, as mentioned in the preceding section, physical limitations due
to TDC thermal and 1/f noise will continue to be out-of-reach for resolutions limited
by a gate-delay. Therefore, an important problem to consider is how Tq of the simple
delay-chain architecture can be divided into smaller intervals in order to significantly
improve TDC resolution.
The Vernier delay technique [4] is one of the older techniques for time digitization that
has been adapted for improving the resolution of digital CMOS TDC [13, 55,57], and
has been widely documented in the literature. As shown in Figure 2-5, the concept
is to effectively stretch the input time interval Tin by delaying both the start and
stop signals with delay-chains. What defines the resolution in this case is not the
absolute rate of transitions (gate-delay being equal to the number of transitions per
second), but the relative rate of transitions. As a result, the effective resolution of
the Vernier TDC is found to be the difference of the two delays, or more specifically,
a,.-~
l in
SPI | ~ivy 1
-qfr
rt S1
Stai I 1
din Out
Stop
Stol
..h r ,alevD
Figure 2-5 An Vernier TDC that effectively amplifies the input time interval
Tq = Delayi - Delay2.
Given this result, the Vernier technique may appear to be able to substantially
improve a TDC resolution. However, there are a number of issues to consider that
practically limit the resolution improvement to a factor of 4-10. Specifically, the
same issues that are found in the simple delay-chain TDC (e.g. range, sernsitivity to
mismatch) are present in the Vernier TDC, except that, along with the resolution,
the magnitude of the problems have also been amplified. Although the Vernier delay
elements may be tuned to match a fixed offset and calibrated at the system level,
such techniques are both cumbersome and dependent on system-level architecture
design [76].
To reduce the size of practical Vernier TDC, various dual step architectures based
on Vernier techniques have been proposed [27,56,57], as shown in Figure 2-6. These
architectures often have a simple delay-chain TDC (Figure 2-3) as the first stage, and
then further refine the initial measurement by amplifying the residual error and then
passing it to a second, higher resolution Vernier TDC. Another dual step technique
that amplifies time error using the metastability property of digital gates has also
been proposed, and in this case a larger resolution improvement up to a factor of 20
is reported [34].
Single Delay Chain Vernier
le
uurput uurput
'l l
IIIIII
Delay, .
Figure 2-6 A dual-step TDC that incorporates both the delay-chain and Vernier tech-
niques
Although the range for these architectures is larger than what would be achieved
for a single-step TDC using the same resolution improvement techniques, the funda-
mental range versus size tradeoff does not improve compared with the simple delay-
chain TDC discussed earlier. Interestingly, a cyclic architecture similar to Figure 2-4
may be used to significantly increase the range of the single or dual-step Vernier
TDC [57]. In this case, the decoding logic and calibration become more complicated
due to the many logical states that are supported.
mo
Out
Stop
Tin
Figure 2-7 An analog interpolating TDC that creates transitions with sub-gate-delay
spacing
-+* :4-Tq
-4 4 Delay
Delay Delay Start
Start
Sol
=$1)
=$1
Out
Stop Registers
a 1
,,to
Stop
Out Tin
Figure 2-8 A digital technique for creating transitions with sub-gate-delay spacing
From the examples described in the previous section, we clearly seek TDC implemen-
tations not only with excellent resolution, but also with inherently robust sensitivity
to issues such as mismatch. It is in this context that we proceed to consider how
oversampling may be used to improve TDC performance.
Oversampling describes the quantization of a signal with fixed bandwidth (Fb) at
a speed F, much faster than the Nyquist rate required to reconstruct the original sig-
nal without aliasing. Because we often assume that the quantization error, Terror, is
random and uniformly distributed over the quantization step, linear system analysis
is commonly applied to compute the quantization noise power spectral density (PSD).
DeDeterminterministic Whte
Deterministic ,with minimal n,,i.a quantization
Averaged Tq- .-
TDC
Output i_
L
Figure 2-9 The DC transfer characteristics for (a) a completely deterministic TDC, (b)
a deterministic TDC with small jitter either due to thermal noise or the input, and (c) a
TDC with "white" quantization error due to inherent error scrambling or external dithering
Such standard analysis in the frequency domain assumes that the resulting quantiza-
tion error is spectrally white and that its PSD in discrete time ideally decreases with
sampling rate,
T2
T
PSDerror= (2.5)
12F,'
It is then expected that filtering of the converter output to remove the undesidered
bandwidth will also remove a similar proportion of quantization noise, thus realizing
the improved signal-to-noise ratio that oversampling can ideally provide.
However, as just mentioned, such analysis depends on the quantization error being
random and uniformly distributed over the quantization step, which is not true in
general for quantizers with small input signals. As we saw earlier, an important
characteristic of the delay-chain TDC is that, since there is no error at the beginning
of the measurement (Equation 2.4), the output and error for each measurement are
deterministicfunctions of the input. As a result, the DC transfer characteristic of an
ideal delay-chain TDC shown in Figure 2-9(a) reveals a non-linear staircase function.
For this class of deterministic converters, there is no inherent scrambling of the TDC
error that generally can be used to improve effective resolution through oversampling.
In practice, even for deterministic TDC, there is a small amount of noise from
both the input signal and the TDC itself that will round off the edges of this staircase
function. As shown in Figure 2-9(b), the resulting DC transfer characteristic is now
smoothed somewhat, although the staircase non-linearity can still be evident. In fact,
a linear DC transfer characteristic (i.e. a random quantization error) can be achieved
in a deterministic quantizer only if the input signal is sufficiently large compared to
the quantization step size, which includes the situation where the input signal itself is
noisy, or if the physical noise internal to the converter is larger than the quantization
step size. This condition is illustrated in Figure 2-9(c).
In a closed-loop system such as a PLL, there are certain conditions in which the
system may provide such scrambling of the TDC input, for example as it may in a
fractional-N EA PLL. However, there are many applications to be aware of that do not
provide such a dithering. For example, the TDC input for high-performance integer-
N PLL limits to a very small range with very little deviation or noise, and a lack of
random error in deterministic TDC can be a significant problem. This situation can
be compared to the classic dead-zone in an analog phase detector, which is well-known
to cause erratic limit-cycle behavior in integer-N PLL.
One solution to this problem is to intentionally modulate the TDC input with a
sufficiently noisy signal in order to improve the randomness of the quantization error.
Of course, adding unknown noise to a TDC input is a rather poor way to linearize
the quantization. Instead, if the "noisy" signal is known and the gain of the TDC
is well-characterized, this "noise" can then be subtracted from the TDC output,
which ideally would result in a random error that can benefit from oversampling.
However, we note that the uncalibrated or residual non-linear quantization error due
to mismatch will not be corrected with averaging or filtering, since these errors will
already have folded in the sampling process to corrupt the bandwidth of interest.
For example, let us consider a high performance Vernier TDC running at 50Msps
that has been optimized at the system level to detect small input signals by mod-
ulating Tin with a psuedo-random noise source. We can assume that Tq has been
improved by a factor of 4 from the raw gate-delay of 20ps to reach 5ps resolution.
Further, a run-time calibration circuit has been designed that allows for compensa-
tion of the psuedo-random input sequence and delay element mismatch. Through
this calibration, the effect of mismatch has been reduced from a delay error standard
deviation of 10% to an absolute error standard deviation of only 1%, an improvement
of over 20dB. The overall rms TDC quantization error for a fixed 50kHz analog band-
width (typical bandwidth for a EA PLL) can then be estimated by the rms sum of
quantization noise and mismatch error as
T2(2Fb)
Terrorms = 12 F + (Tmm-rms) 2 (2.6)
12F,
(5
(5 xx 10-12)2(1
12)2( x x 105)
10) + (200 x 10-15)2 (2.7)
S 12(50 x 106)
= 210fs (2.8)
Figure 2-10 illustrates the classical ring oscillator-based TDC composed of a ring of
delay elements [46, 59], which shares a number of characteristics with the cyclic TDC
from Figure 2-4. First, we note that for both topologies the oscillator transitions are
counted during the input time window Ti,, here designated by the Enable signal.
Next, all counter outputs are summed together and stored as the TDC output before
being reset (during Enable low) to prepare for the next measurement. Finally, due
Tin[k-1] Tin[k]
Enable
, ' It I IJ
Count _
-Tstaht[k-1] TstJrt[k] -star[k+ 1]
Out 6 7
Figure 2-10 Classical oscillator-based TDC
to the logarithmic scaling of the counter range, the oscillator-based TDC also has the
attribute of a large dynamic range with reasonable silicon area.
A key difference between the two architectures, however, is found when examining
the overall quantization error for the oscillator-based TDC. We find that counting
the transitions of a free-running oscillator results in error equivalent to the funda-
mental expression given earlier by Equation 2.1 and repeated here for convenience,
Terror[k] = Tso,[k] - Tstart[k]. Compared with the delay-chain or cyclic TDC error
from Equation 2.4, we now include both Tstart and Tsop, which indicates that each
measurement of the oscillator-based TDC will have an additionalerror contribution
from Tatart. For our purposes, we can assume that the oscillator phase at the be-
ginning of each sample is random, and subsequently Tstart is also random having
uniform density on the interval [0, Tq]. By way of contrast, the cyclic TDC "phase"
is effectively set to 0 at the beginning of each measurement.
To have benefit from oversampling, we thankfully do not require the overall TDC
error Trr,, to also be a random variable with uniform density, as in fact this criteria
is quite difficult to satisfy for small inputs. Rather, we require Terror to be a white
random variable with flat power spectral density (PSD) across all frequencies and for
all inputs, including zero frequency. In addition, we require Terror to be uncorrelated
with Ti,. Discussion of the special cases, for example where Ti, is exactly equal to an
integer multiple of Tq (i.e. Terror = 0 V Ti, = kTq), will be postponed until later, using
the justification for now that this special case ideally occurs with zero probability and
can therefore be ignored.
Due to the random properties of Ttart, the oscillator-based TDC satisfies the
above criteria for Terror. We can expect that the small penalty of larger error for the
inclusion of Ttart can be easily offset by the resolution improvement by oversampling.
Interestingly, the oversampling benefit in the oscillator-based TDC is not constrained
to simply improving the quantization error, but also extends to improving errors from
mismatch as well.
To further explain how mismatch is also improved by oversampling, we first con-
sider an input Ti, that is less than an oscillator period. As mentioned earlier, the
oscillator starting phase is random with uniform density, which implies that the delay
elements that transition during the Enable window are chosen with a white random
process that is independent of the input. Therefore, input intervals that are a fraction
of the oscillation period will have mismatch error with flat power spectral density.
Next, we can consider intervals of Ti, that are longer than an oscillation period.
In this case, Tn, can be seen as an interval composed of two parts: an integer number
of periods, which does not contribute mismatch error, and the residual fraction of a
period that does have mismatch contribution. The argument from the first case can
again be used on the residual part of the input with length of less than a period. As
a result, we can conclude that for inputs of any length, mismatch error is reduced
through oversampling and has no contribution towards integral non-linearity for the
oscillator-based TDC.
At this point another example is helpful to quantitatively compare a simple
oscillator-based TDC with raw resolution of a gate-delay resolution with the sub-
gate-delay approaches discussed earlier. For this example, let us consider the same
sample rate of 50Msps, analog bandwidth of 50kHz, gate-delay of 20ps, and mis-
match of 10%. Since we will rely on oversampling to reduce mismatch, we can also
assume that there is no calibration. With these parameters set, the overall rms TDC
quantization error is found to be
By comparing the two examples so far, while the simple oscillator TDC achieves
resolution performance that is on the same order of the Vernier TDC, the result is
achieved with much simpler implementation and without input dithering or calibra-
tion. This demonstrates the benefits of oversampling, not only for improving raw
resolution, but also for mitigating the effect of mismatch. The error for the oscillator
TDC has raw delay and mismatch components that decrease togetherwith oversam-
pling, while the Vernier error has a floor set by the ability to calibrate the mismatch
error.
Although the oversampling with the oscillator-based TDC does offer improved
resolution, it comes at a fairly expensive penalty in terms of bandwidth and power.
In terms of bandwidth, to effectively decrease T, by a factor of 2, the oversampling
rate would need to be increased by a factor of 4 times the rate. Equivalently, a
doubling of the sample rate results in decreasing the quantization error by 3dB, which
is a small though helpful improvement. When it comes to power efficiency, in many
applications the input signal Ti, is quite small compared to the measurement period,
Ts, yet the ring oscillator continues to run freely regardless of the measurement state.
This results in wasted power that could otherwise be spent on improving the raw
delay resolution of the oscillator.
Oscillator
Phases
, t' ,/ , I
Out a ; 1 7
the number of delay element transitions during a measurement interval. Also similar
is the ability of the GRO-TDC to achieve large range with a small number of delay
elements. However, the key innovation in the gated ring oscillator is that instead of
enabling the counters during the measurement window, the ring oscillator itself is
gated with the Enable signal, with the state of the oscillator preserved in between
measurements.
By preserving the oscillator state at the end of the measurement interval Tin[k- 1],
the quantization error Tst,[k - 1] from that measurement is also preserved. In fact,
when the following measurement of Ti,[k] is initiated, the previous quantization error
is carried over as Tstart[k] = T•,[k- 1]. This results in first-order noise shaping of the
quantization error in the frequency domain, as evidenced by the first-order difference
operation on Ttp since the measurement error is given by
Measurement 1
Enable
Measurement 2
Enable
Measurement 3
Enable
Measurement 4
hL_ hL_ rL_ JL_ r-L_ %It
Figure 2-12 Barrel-shifting of GRO delay elements to achieve first-order shaping of mis-
match error
shown in Figure 2-12. What is clearly evident in this figure is that the selection of
delay elements for a given input is equivalent to the well-known barrel-shift algorithm
for dynamic element matching. Similar to the transfer of quantization error, the
mismatch errors for one sample are also passed along to and subtracted from the
following sample. Therefore, we can expect that in the case of oversampling, the GRO-
TDC architecture ideally achieves high resolution without the need for calibration,
even in the presence of large mismatch.
Now comparing the GRO-TDC to the oscillator-based TDC for a single-shot mea-
surement, the GRO-TDC will have the same additional quantization error penalty
found in Equation 2.1. However, when considering again the benefits from oversam-
pling, the GRO-TDC quantization error will ideally decrease by 9dB for a doubling
of the sample rate, which is a significant improvement compared to the 3dB possible
for the oscillator TDC. This relationship can be clearly seen in the expression for rms
TDC quantization error
T 2.1 (27rFb)3
,=
Terrorms
d~rms -
Terr
(2.13)
(2.13
An example GRO-TDC using the same parameters as the previous oscillator example
will then ideally have rms TDC quantization error of only
While this ideal performance level is far below typical thermal and 1/f noise levels
for digital CMOS, even the potential to achieve TDC resolution that is limited by
physical processes in a simple architecture is very compelling. The combination of
oversampling with first-order quantization noise and mismatch shaping is quite pow-
erful and can result in very high resolution conversion. Moreover, as will be seen in
the following sections, the GRO-TDC requires only a modest level of complexity that
can be implemented with small area and power consumption.
Chapter 3
While first-order quantization noise shaping is very appealing for many applications,
it is yet unclear that preserving a ring oscillator state through the stop and start
operation is possible, and even more unclear is whether a simple circuit topology can
yield useful and practical results. Because the noise shaping we desire depends on
the accurate transfer of quantization error from one measurement to the next (i.e.
Tstart[k] = Tstop[k - 1]), it is important to consider how well this can be accomplished
with simple circuitry, and also how imprecise error transfer will affect noise-shaping.
Towards this end, we now consider a simple circuit topology to illustrate the key
design challenges of the gated ring oscillator.
Figure 3-1 illustrates one potential implementation for gating a ring oscillator by
using switches [21]. Starting from a classical inverter-based ring oscillator with an
odd number of stages, these switches are added in series to the positive and negative
power supply connections for each inverter, and all switches share a common state.
When the switches are closed, oscillation is enabled and the ring of inverters behaves.
identically to a classical ring oscillator (Figure 3-1(a)). Conversely, when the switches
Enabled Ring Oscillator Disabled Ring Oscillator
1 11
(a)
S
--- i--. I . .L
ale y Element
are open, the inverter delay element is unable to charge or discharge the parasitic
output capacitance, and as a result oscillation is suspended (Figure 3-1(b)). The
oscillator phase at the end of the enabled state is then held during the disabled state
with the charge stored on the parasitic capacitance of the delay elements.
The delay element switches of Figure 3-1 are well-suited for CMOS technology,
and can therefore be implemented for each element with complementary transistors
M1 and M4 as shown in Figure 3-2. For an odd number of stages, all of the NMOS
switches are controlled by an Enable signal, and likewise all of the PMOS switches
are controlled by an Enable signal (for simplicity, Enable will be used in reference to
the differential signals).
We should note that there are many ring oscillator configurations that can be
gated to hold phase information, including differential implementations. In fact,
differential delay elements are used in most TDC to achieve good differential non-
linearity performance, mitigating the mismatch between rising and falling edges. For
the GRO, however, the single-ended configuration shown in Figures 3-1 and 3-2 may
be preferable to a differential one. As explained earlier in Section 2.6, the error from
differential non-linearity is actually first-order shaped, and the single-ended topology
has half the power and area.
As mentioned earlier, perfectly preserving the GRO phase state is equivalent to setting
the initial quantization error Ttart[k] equal to the final error of the preceding sample,
Ttop[k - 1], and is required to achieve ideal noise-shaping. In a practical implemen-
tation, however, we can expect that the analog quantization error is not preserved
perfectly, and it is therefore important to understand the physical limitations as well
as the implications of practical quantization error transfer. With this goal in mind,
we begin by describing the issue of quantization error transfer in general terms, which
then will provide a context for evaluating specific GRO implementations.
When the output of a delay element is in transition, there are a number of dy-
namic mechanisms that determine the location and movement of charge within the
circuit. In Figure 3-3, for example, when the transition is interrupted by disabling the
oscillator, the dynamics of the transition are replaced by an entirely new and distinct
set of dynamics. For the interruption of a negative transition in Figure 3-3(a) or its
inverted positive transition in (b), the charge will redistribute to satisfy an equipo-
tential condition across the FET resistor that is left on, even in the disabled state.
Upon enabling the delay element once again, the transition resumes, however we can
see that the charge distribution within the cell is not the same as it was during the
original transition. Moreover, it is also clear that the amount of charge redistribution
depends on the state of the oscillator when Enable transitions low.
In addition to the charge redistribution within a delay element for transitioning
outputs, there is also some charge redistribution during the disable time for delay
elements that have an input in transition. As shown in Figure 3-4, both the switch
Vdd
4s
Ena Cd
Rinv:
my 0o
Vss
Enable T--
Vod Vd I
Vd ---------------- ------ Vo
(b)-----------
(
II
I!
(a) (b)
Figure 3-3 Conceptual picture of a transition being interrupted with a disable window.
A negative transition is shown in (a), and the approximate inverse is shown in (b) for a
positive transition.
drain voltages Vdp and Vd and the output voltage Vo, will be pulled towards the input
voltage Vo_-, until the respective inverter core transistor turns off. Compared to the
case where the output is in transition, when the oscillator is enabled again most of
the redistributed charge here will quickly move back close to its original distribution
before the output begins to transition. While the charge redistribution for this case
is seen as secondary, it may also have a small effect on precise quantization error
transfer.
Since we now understand that the analog state information at the beginning of
a measurement interval is not strictly equal to the final state of the GRO from the
previous measurement, we now need to introduce this error in our mathematical
Vdd
Enable
I- r
I Vd
Vdd Vdd
v 0 .I Vo
0 , oi
SVdn VVd-
(a) (b)
Vss
Figure 3-4 Conceptual illustration of how charge redistribution within a delay element
depends on the input level
model. To do this, for each measurement k we first define a variable OGRo [k] that
is equal to the GRO phase at the time when the negative Enable transition crosses
mid-supply. Second, we recognize that Tstart[k] will no longer be equal the value of
Tstop[k - 1], and we define another time error, Tskew that is a function of GRO phase
OGRO [k].Tskew now models the corruption of the analog phase state as an unintended
consequence of gating the oscillator by the relation
! m
Enable
(-Tdisable , Tin I-
Figure 3-5 Phase trajectory skew (error) due to the physical non-idealities of gating an
oscillator
As defined in Figure 3-5, we can see that to account for this, Tskew should be
subtracted from the input measurement interval, Tin. On average, a positive value of
Tskew will pull the quantized output to be slightly smaller than it should be, and sim-
ilarly a negative value of Tskew will result in a slightly larger output. Mathematically,
this can be seen when the measured GRO output time is given by
Tout [k] = T [k] - Tskew (GRO [k]) - Tstop [k] + Tstart[k]. (3.2)
We can then continue to use Equation 2.2, stated again for convenience as
Before we make comments on how the gating skew error affects the overall GRO
output, first let us recall the discussion on oversampling considerations for classical
quantizers from the previous chapter in Section 2.4. The applicable part of this
discussion is that the non-linear DC transfer characteristic of a classical quantizer
can be made to appearlinear only if the quantization error is adequately scrambled.
Without scrambling, the output of the classical quantizer should be expected to be
non-linear, especially for inputs that are small or that create distinct .quantization
patterns.
Because the gating skew error is also non-linear, adequately scrambling Tskew by
randomizing OGRO can also linearize the TDC behavior in the same manner that the
classical quantizer can be linearized. In this linear approximation, we can expect
that the GRO-TDC will have two non-physical noise profiles, a first-order noise-
shaped quantization error in addition to a white noise floor due to the skew error.
The required scrambling action can be accomplished by a combination of methods,
including random physical processes such as 1/f and thermal noise, intentional ran-
domization of the input signal through dithering, and pseudo-random patterns such
as the shuffling of delay element mismatch.
However, a lack of scrambling will leave the non-linearity to cause complex effects
in the quantizer output, especially when the converter is placed in a feedback system.
One example of these effects is the appearance of a deadzone in the quantizer DC
transfer characteristic, which will be discussed in Section 3.1.4. Generally avoiding
these effects is a very difficult challenge, and moreover the noise-shaping benefit of
the GRO-TDC architecture may be not be realized at all without a scrambled GRO
phase. Therefore, it is important to understand the root cause of this gating skew
error in more detail so that it may be minimized and appropriately scrambled by
design.
Delay 2. ..i
Nt
Element
Output
Voltages
4
* * I
Delay 2 ' 4 S
Element
Skew 3
Error
* I 1 *
Total
2N-1 0 1 2 3
GRO Phase State
Figure 3-6 Concept of how the gating skew error for an inverter-based GRO is the sum
of the skew from the positive and negative transitions
Next, in the center of Figure 3-6, we depict the contribution from each individual
delay element to the gating skew error, Tskew. While the actual shape and magnitude
of the error contributions shown here are conceptual, in practice we do know that
each delay element only contributes to the overall Take while its input or output
transitions between logic levels. We also can expect that the contribution from the
rising and falling transitions will be somewhat different from each other.
Last, on the bottom of Figure 3-6, we show that the overall skew error is simply the
Enable
Output
* 14. Trise/fall " .
Enable b
+ To ,-- Tdisable
Output
t= 0 toutput
combination of the individual contributions from each delay element, which reveals a
periodicity to the skew error of 2 Tq due to the difference between the rising and falling
transitions, or alternately the difference between the NMOS and PMOS transistors.
For example, we can expect that since the PMOS switch transistors are twice as
large as the NMOS, the amount of charge injected from the PMOS will similarly be
twice the amount of charge injected from the NMOS. While we acknowledge that this
simplistic decomposition of GRO skew lacks precision, it does provide a backdrop for
understanding the complex features of the error.
To gain. a more empirical view of gating skew for the inverter-based GRO, we
can simulate Tskew as a function of the GRO phase state OGRo at the transistor-
level in Spectre (SPICE) for a variety of conditions using the testbench shown in
Figure 3-7. For each curve, OGRo is swept by stepping To, which successively moves
the falling edge of Enable across the GRO states. After a disable time Tdisable, the
oscillator is enabled again and allowed to reach steady-state. We then monitor the
time to,,tpt at which a GRO delay element output transitions. Finally, the value of
Tskew is calculated from toutput by subtracting the disable time, and then comparing to
a reference time of toutput that is obtained from a simulation with no disable window.
The primary simulation parameters (excluding OGRo or To) are the length of the
Disable le
. 0 Width (ps) ps)
I) 0.1
0.5
* -0.1 1.0
S-0..2 ·i......,....
............ .~~.....
..............
........ 2.0
4.0
7.5
15
E. -0.3
30
z i30ps:
.0.4
0 0.5 1.0 1.5 2.0 2.5 3.0 0 0.5 1.0 1.5 2.0 2.5 3.0
GRO Disable Time Normalized to Tq GRO Disable Time Normalized to Tq
(Tel Tq = 6GRO-47/) (T/eTq = OGRO' 4 7 /it)
(a) (b)
Figure 3-8 Gating skew Tskew as a function of OGRO for stepped values of disable width
(Tdisable) from (a) 0.1-30ps and (b) 30-4000ps. The Enable rise and fall times are held
constant at 0.5ps.
disable time and the rise/fall times of the Enable signal. As shown in Figure 3-7,
Enable signals are constructed with piecewise linear voltage sources, and the length
of the disable time is taken from the 50% crossings of the supply. SPICE models for
a standard 0.13j/m CMOS process with ideal matching are used throughout.
A first simulation to examine varies the width of the disable time with a very fast
rise/fall time that is held constant at 0.5ps, which is close to the ideal case of zero-
width rise and fall times but large enough to avoid convergence issues. Because many
charge transfer mechanisms occur with exponential time constants, the disable width
is stepped from 0.1 to 4000ps with approximately logarithmic increments. Figure 3-8
plots the results from this simulation, where (a) displays Tke,, for short disable widths
of 0.1ps to 30ps, and (b) corresponds to the longer values of Tdisable, ranging from
30ps to 4,000ps.
By looking at these results, there are a few immediate observations on which to
comment. First, we can see in Figure 3-8(a) that as the disable width decreases, Tskew
limits to a zero-value, which is the same as the reference simulation with no disable
at all. Although this result is what intuition would suggest, it is satisfying to see that
the unrealistically fast Enable transients do not cause non-physical behavior in the
VI
I Vdd
Rinv Enable
ICp
Enable
- lCd Ri
P
(a) (b)
Figure 3-9 Schematic depicting two time constants present in the charge redistribution
within a delay element whose output is in transition at the disable time
simulation. Second, we can verify that Tskew is indeed periodic with 2 Tq, as predicted
by Figure 3-6. Last, as we can see clearly by the separation of Figure 3-8 into (a) and
(b), there are at least two time constants that dominate the motion of charge in the
inverter cell.
To explain the presence of more than one time constant, consider that when
the transition is interrupted, either the top or bottom of the inverter is open, with
the schematics for both cases drawn earlier in Figure 3-3, and shown here again for
convenience. When a GRO transition is disabled, the switch transistors turn off
and the charge in the switch transistor channels quickly diffuses, approximately half
moving to the supply and the other half into the inverter core. The capacitance at the
inverter drain, Cd, will at first absorb this charge injection at a rate determined by
the first time constant, and then eventually the voltage across R,,, will settle to zero
at a rate determined by the second time constant. Additional error with long time
constants may arise from delay elements with interrupted transitions at the input,
since these transistors are very weakly on and can have very large impedances.
Next, we know that the rise and fall times of Enable are practically much larger
than 0.5ps, which means that the turn-on time of transistors will depend on interac-
tion between the voltages within the GRO core and the voltages of Enable. We show
0.2 Rise I Fall
) Time (ps)
0.1 100
. 0
50
25
*
12
-0.1
6
~
o
Z
-0.2
-0.3
.. .........
.......
~~~~~~.......... 0.5
3
.0 A
0 0.5 1.0 1.5 2.0 2.5 3.0
GRO Disable Time Normalized to Tq
47
(Tel Tq = *)GRo-1c)
Figure 3-10 Takew as a function of 0 GRO for stepped values of rise and fall time
(Trise,,/fal) from 0.5-100ps. The disable width is held constant at 1,000ps.
· r Ae
ILl. Z -
~ ic :3(1;:
0.30
"";;' ,,;;,,I : ii:
~.!. l. .~. .~.i.:. i
S 0.20
~i~~i~r
riIirIi jj:jrr
~E
-I*-L.I·LIUU- -.L·j.*Lijill--j.l.i.lili:
I1I:;
jrcr;
Irli;
:·r
iii . 0.26 ~.ii .....l..~...iJ.ij~~
Iri iii; ij:1 I1,..~.. 1 i 1 : i;::
""';' X ;;
Ililllli Iii: j;,
1111111 1?11::!11 :I:lll·i x i
S0.15 if:
·
j j i:
o0.22 ~...~..1..·~.. rsLI~.~~~;..-..'~i.~~L~i'
I ii: !~L'''c
i
i' 'ii,-
i""' :"'
1 0.10
jii
i;
:::ii
;,,;;;
iic·;
,Ir;
j:::j
,,,·
j;iji
':·"''
i i
w 0.18 ----~--?--.-~-
----
j--L--:-t-i-;::
iI 1i
j
rrr·i-----r--1-1-r-·-r
iii
-----r--r-
ijii
I-ri
:i ii
,,i I
""'*' "·"'~' '"~'"*I~II
0 O.' :i::::/ ill i:::i 0
0.05
X ";""'
';"';i
;L"':
;II
":?
I" j:::i
j::::i
~·iiiii
~~'~"" '' - 0.14 ·...-.--
iiiiii ..... ~..i..~~~i
,-,-,.c~···~i----·-;----,---·-;-
ii i:~i~----: ~i~ji
. --- *-CJ-~t~~· ^-r~-r~X*L --*-·1-i-(4~~
,:;,,,,
: iii~i
IL n·i~:: ":""' i:::i ~:iji
~ ' ' i~jji
liil~~l Irij
jllllj
u
i ::ii
' ·"
""'
0.10 ""
10-13 10-12 10-1 10-10 10- 10-8 10-13 10-12 10-11 10-10
Disable width (s) Rise I fall time (s)
(b)
Figure 3-11 Peak-to-peak Tskew/Tq plotted vs. (a) disable width and (b) rise / fall time
in Figure 3-10 the results of another simulation, this time with varied rise and fall
time and a constant disable time. An interesting thing to note here is that longer rise
and fall times effectively smooth out the peaks of the skew function, yet maintain the
same overall shape with surprising consistency.
3I 3
S5.0
0
O I
S IIi7 I I
I33ti II3 -
iit III II, I
/ I |I!
I- i I I II
I I I I
4.0 I I
I •
I
I
I
I
i
t
I I I i
1 II I I
! I I I I
I I I I
r I I
3.0 I I I i
I r i
I I I I i
Figure 3-12 Simulated deadzones in the DC GRO-TDC transfer curve caused by gating
skew
effects of skew are caused by variation of Tskew with GRO phase, the DC offset is
irrelevant and can be removed. In Figure 3-11(a), we again can see the significant
changes in error magnitude for small disable widths, which have been explained al-
ready by the charge redistribution, and in (b) the slight decrease of Tskew with larger
rise / fall times can clearly be seen. Specifically, Tskew starts with a peak-to-peak
deviation of about 0.23T, for fast rise / fall times, and then weakens to about 0.1 4 Tq
for 100ps transitions. Thus, while a longer slope to Enable may be detrimental in
terms of jitter, in this case it actually contributes a small amount of "averaging" that
could be seen as helpul in terms of gating skew error.
Due to the shallow slope of Takew versus disable width for large values of Tdisable
seen in (a), we can say that in a standard 0.13pm CMOS process there is relatively
little charge lost to switch leakage. In deep sub-micron process technologies, however,
it is possible that subthreshold and gate leakage will present another source of error
that will change the shape and dependence of Tskew as a function of 9 GRo and Tdisable-
3.1.4 Deadzone effects
As mentioned earlier, many complex and interesting non-linear effects in the TDC
output can be caused by the gating skew error if the GRO phase is not scrambled
adequately. One important effect of the non-linear quantization error transfer is that
deadzones can be found in the GRO-TDC DC transfer characteristic. Since this is
a very standard measure for converter accuracy, it is worthwhile to understand this
issue in more detail.
In this expression, we can again see that the gating skew will push and pull the
GRO phase with a magnitude and direction determined by the phase of the previous
measurement. However, we can also see here that if the magnitude of Tskew is larger
than ET, then the influence of Tskew on the TDC output is also larger than it is for
CT. Recall that in the ideal GRO, where Tskew = 0 V 9 GRO, even very small values of
CT will slowly accumulate over time and eventually cause the TDC output to change.
With the presence of a large, unwanted error that is a periodic function of the GRO
phase, the GRO will be pulled until a steady-state is reached. We can expect that
the gating skew error in a steady-state deadzone will be given by
With this insight, a few comments can be made on the deadzones. Notice that in
Figure 3-12, the even integer values exhibit larger deadzones than the odd integers,
which is consistent with the periodicity of Tskew. However, if mismatch were to be
added, we would expect the period of Tskew to be equal to the GRO oscillation period,
2NTq. Therefore, practical deadzones are likely to be most severe when the GRO is
stopped on the exact same delay element transition for each measurement, which is
similar to injection-locking the GRO with the TDC sampling frequency. In this case,
we need to either provide a large amount of GRO phase scrambling, or reduce the
magnitude of Tske, far below that of random physical processes internal to the GRO.
Finally, let us consider the approach to reduce the magnitude of Tskew through
interpolation or averaging. We have seen in Figure 3-6 that the gating skew is com-
posed of contributions from alternating positive and negative transitions. If multiple
skew contributions can be averaged together, then it may be possible to scale both
the gating skew as well as the effective oscillator delay, Tq. Therefore, we proceed
to consider architectural modifications to the GRO of Figure 3-2 that can achieve
sub-gate-delay raw resolution.
In this section, we first explore the suitability of various sub-gate-delay ring oscillator
topologies for implementing a gated ring oscillator. We then identify the most promis-
ing of these architectures to be the multi-path oscillator, and follow with a detailed
analysis, considering especially the critical architectural issues and tradeoffs for use
as a gated ring oscillator. Next, we present a design methodology and circuit details
for use within a prototype GRO-TDC, and then revisit the issue of quantization error
transfer accuracy, or gating skew. We demonstrate through simulations the marked
improvements in gating skew error using the proposed multi-path oscillator compared
to the simpler inverter topology discussed previously in Section 3.1.3, and provide a
physical explanation for the improved skew performance.
Phase state
s I&%a
Enable Z2
***m... •=....
... ..........
·..........
· 1··1
.... · · · .. . .
Z4
** ** Zs
Z2 Z3 Z4 Z5 Z6 Z7 Z8 Z7
Enable
Figure 3-13 Illustration of the problem in using resistive interpolation for the GRO
Earlier in Section 2.3, a few techniques for creating sub-gate-delay TDC resolution
were discussed. Due to geometric similarities, the approaches commonly used in cyclic
TDC are of particular interest for application to the GRO. For example, it is natural
to consider the interpolating technique implemented either with rings of resistors or
with transistors. In addition to oscillators within the TDC community, research in
precisely generating multi-phase signals for fixed-frequency phase and delay-locked
loop applications have also investigated similar ideas that can be considered for the
GRO [11, 12,16,31,35,36,38,42,63].
The resistor ring often used in multi-phase oscillator applications is able to gener-
ate very high-resolution and low differential non-linearity, however we can quickly see
that this particular topology has fundamental problems for the gated ring oscillator.
To explain, Figure 3-13 applies the concept of resistive interpolation to the gated ring
oscillator, with the assumption that a differential delay element structure would be
used in practice. Although the power and area penalty of the differential structure
can be tolerated for the GRO (as discussed earlier in Section 3.1.1), the main issue
here is that when the GRO is disabled, current will continue to flow in the resistor
- -- L I -
tesolution improves,
but with the same
oscillator core,
Tskew is the same!
ring. The effect of resistor averaging which is quite useful for dynamic phase interpo-
lation will actually destroy the analog phase information during the disabled state.
We can conclude that for the GRO, at least in the disabled state, each delay element
cell should be held in isolation so that charge does not escape.
Since resistors are problematic for the GRO, it may seem logical to replace the
resistors with digital gates as the interpolating elements (as in Figure 2-8 [16,63]). In
this case, digital gates can be isolated so that charge does not flow between stages.
However, this approach is also flawed for application to the GRO, since interpolation's
primary advantage of reducing the raw quantization error does not address the funda-
mental issue of gating skew. Recall that for the GRO, raw quantization and mismatch
error is noise-shaped, and therefore not a primary concern. Rather, the problem with
the GRO is that the gating skew error, Tskew, arises from the alternating sequence of
positive and negative transitions within the active core of the oscillator. Therefore,
significant improvement of Tskew by means of reducing Tq will only be possible if the
oscillator core itself is modified.
As shown in Figure 3-15, one possibility for modifying the oscillator core is to
couple together M multiple oscillators, each with N stages. This architecture also
creates sub-gate-delay resolution, theoretically reducing the effective delay per stage
0
0
Figure 3-15 Coupled oscillators used to reduce the effective delay per stage
by a factor of M [36, 38]. One issue that must be carefully considered for a system
of coupled oscillators is the stability of oscillation within the primary mode, which
is defined by adjacent delay elements transitioning in sequence around the ring. A
large coupling factor between the M oscillators can ensure stability in the primary
mode, however increased coupling also has the undesired effect of slowing down the
transitions.
Oscillators that operate continuously (e.g. for PLL and DLL applications) may
well be able to support an inital reset operation that establishes the primary mode.
However, the very premise of the GRO is that it will be stopped and started at the
same phase state with no intervention or reset operation. While we are concerned
with reducing the delay of each stage as much as possible, at the same time we need
to achieve a well-defined oscillation through the gating operation. Therefore, robust
oscillation in the primary mode is a critical requirement for the GRO design.
Another possibility for creating sub-gate-delay resolution that is quite suitable for
the GRO is shown in Figure 3-16. In this multi-path topology, each delay element uses
state information from more than one output stage to determine when to begin its
transition. Interestingly, the coupled oscillators we just discussed are a subset within
the category of multi-path oscillators, because the coupling requires contribution from
more than one element per node. However, we can optimize the multi-path oscillator
for the GRO application with more degrees of freedom than the coupled oscillator.
For example, the multi-path topology is not restricted to having M -N stages, and a
Single Input Multiple Inputs
Single Output Single Output
Figure 3-16 Basic concept of using multiple inputs for each delay stage
vl Voi
VO] VOIJ, Vo -J
Voi
VOiiK-l V01-- V0
Figure 3-17 Techniques to reduce effective delay by modifying the standard inverter
By optimizing the number, placement, and weight of the connections, each stage
begins to transition before the full transition of the immediately preceding stage
is completed such that the effective delay through the stage is minimized. Stable
oscillation in the primary mode can also be assured through proper design, with a
reduction of the gate delay again reported to be a factor of 2.
We now consider that each of these two techniques in Figure 3-17(a) and (b) can be
combined together to result in an unrestricted set of transistor connections as shown
in Figure 3-17(c). In the proposed topology, K transistorsconnect to a set of output
stages {Vo,_ 1 , V,_ 2, ... , VO-iK }, which gives the designer a much larger optimization
space compared with Figures 3-17(a) and (b). Specifically, the connection and size of
each transistor in the delay cell can be independently adjusted, and the overall design
can be fully optimized to decrease the effective delay while maintaining a stable,
robust oscillation.
To describe a particular oscillator design, let J be the set of integers {jl, j2, jKj
K
J = Wk" jk (3.8)
k=1
67
To make use of these definitions, we first consider a standard ring oscillator with
J = 1. Not coincidentally, we find that the oscillation period is equal to T,,o =
2NTinv/J. Here Ti,, is the delay of a standard inverter, and N is the number of
stages. While this is convenient for the case of the standard inverter ring, to be
more general we say that Jeff is the effective weighted average of J, defined when the
period for any ring oscillator is given by
2NTinv
TosC =- N (3.9)
Jeff
To continue, we next can consider a multi-path ring oscillator with accelerated tran-
sistions that reduces the oscillation period to Tose = 2NTq. By combining with
Equation 3.9, we have
Tiny
Tq- jef. (3.10)
From this result, the designer may be tempted to reduce Tq by increasing Jeff as
much as possible, however, there are a number of practical considerations that limit
its attainable value. First, consider that only connections to the previous (N - 1)/2
stages are useful for primary mode oscillation, and a more practical rule is to restrict
J to a maximum of N/3. Therefore, achieving a large value of Jeff requires a large
number of stages, which in turn requires larger area.
Second, we need to consider the stability of the oscillator. If J is large, and/or
concentrated heavily at N/3 without a distributed contribution over the entire range
of J, secondary oscillation modes become difficult to suppress, especially in the pres-
ence of mismatch. While using a prime number of stages can be helpful in this regard,
a conservative design should have at least one input connection with some weight for
every 4 stages to ensure that transitions occur in the proper sequence. We will later
see that this strategy is also helpful in reducing the gating skew.
Finally, the larger values in J will typically add more parasitic wiring capacitance
to the delay element output, since these elements need to be placed further away.
Moreover, the parasitic capacitance will also become more important when multiple
connections with small weights are chosen. We then introduce qr,an efficiency factor
>
Ceo
E
Lu
0
0 Weighted Average of
Delay Element Connections
(. Wk'jk)
Figure 3-18 Example tradespace for optimizing the resolution of a multi-path oscillator
operating in its primary mode by considering the weighted average of J
that takes into consideration switching transients and wiring parasitics to result in
K
Jeff= J =
" E Wk *jk. (3.11)
k=1
The delay cell from Figure 3-17(c) can be easily modified to accomodate the gat-
ing functionality by again placing appropriate switches above and below the inverter
core as shown in Figure 3-19. In the same manner as was described earlier in Sec-
ton 3.1.1, all impedances are high in magnitude during the disabled state, which will
approximately preserve the oscillator state in between measurements.
With the delay cell building block now defined, let us consider the number of stages
Vo0 .
VOiK.1
Figure 3-19 Delay cell topology for the proposed gated ring oscillator
N that is appropriate for use in the GRO-TDC application. Counting and measuring
the GRO outputs with standard digital logic places an upper bound on the oscillation
frequency of 2GHz, which is a period conservatively equal to ten inverter delays in
the 0.13pm CMOS process. For a minimum design goal of Jeff - 5, this implies that
the number of stages N P 50. An upper bound on N is less strict, and is determined
primarily by practical limitations such as the number of connections per stage and
silicon area (for the same set J = {jl.. .K}, a larger N does not reduce T,). Another
issue for choosing N is that a prime value inherently has better rejection of undesirable
modes than does a value of N with large odd factors, such as 45 = 3 .15 = 5 . 9. As
a result of these considerations, we propose here that N = 47.
To set the delay cell transistor connections and sizes, we use a soft approach
based on empirical simulation results in combination with the desire to minimize
layout complexity and area. A useful metric for evaluating designs is the power-delay
product, which can achieve a local minimum for a well-designed multi-path oscillator.
Another useful indicator of stability, albeit somewhat qualitative, is the steady-state
start-up time of the oscillator when given a minor charge injection onto one of the
oscillator nodes.
Although simulation is used for final assignment of connections and weights, there
are a number of guidelines that are also useful to generate a first-pass design that
Delay Stage #1
Qm
•
Total Finger
Transistor Function J W Width Fingers Width
('m) (G'm)
PMOS Inverter 13 0.25 3.00 3 1.00
PMOS Inverter 11 0.16 2.00 2 1.00
NMOS Inverter 9 0.24 1.20 2 0.60
NMOS Inverter 5 0.24 1.20 2 0.60
NMOS Inverter 1 0.12 0.60 1 0.60
PMOS Switch N/A N/A 7.90 5 1.58
NMOS Switch N/A N/A 4.50 5 0.9
Table 3.1 Details of the prototype GRO inverter delay cell
Simulations without taking into consideration wiring parasitics indicate that this
design efficiently achieves J = 7.9, which, assuming a NMOS/PMOS strength of 2.4
1 2 3 6 7 10 11 14 15 18 19 20
X 47 4 5 8 9 12 13 16 17 22 21
45 46 41 40 37 36 33 32 29 28 23 24
44 43 42 39 38 35 34 31 30 27 26 25
Figure 3-22 Delay cell layout floorplan for the prototype multi-path GRO
To include the parasitics, we can estimate ql equal to 0.5-0.7, which results in a value
of Jeff = 4-6. Compared with the prior work on multi-path oscillator architectures
discussed earlier with Jeff = 2, this is roughly a factor of 2-3 improvement. The
expected performance is confirmed with measurement results, which are discussed in
Chapter 5.
To minimize mismatch both in the delay elements and in the routing parasitic ca-
pacitance, a serpentine arrangement of delay elements within was used in the layout,
as shown in Figure 3-22. However, the routing was done by hand in a single pass, em-
ploying no special techniques to equalize the routing lengths or parasitic capacitances
of each delay element output.
With the raw resolution of the TDC much improved with the multi-path architecture,
we can now revisit the issue of reducing the magnitude of the quantization error
jL
4)
Time
Figure 3-23 Simulated transient voltages of the multi-path delay element outputs
transfer non-linearity. Recall the hypothesis from before that the magnitude of Tskew
can be reduced by averaging the skew contributions from multiple elements that are
in transition at the same time. To get a sense of how the different transitions relate
to each other in the proposed multi-path GRO design, Figure 3-23 plots all of the
transient voltages on the same time axis. If we look closely at this figure and carefully
count the number of transitions active at any given time, we should not be surprised
to find about 13 overlapping transitions, since 13 is the maximum value of J for this
particular multi-path design.
With this picture in mind, we can then revise the cartoon depicting gating skew
error for the multi-path architecture as shown in Figure 3-24. At the top of the figure,
the alternating pattern of positive and negative transitions vs. GRO phase state are
the same waveforms as in Figure 3-23, except that here each delay element output is
presented individually. We also see that defining the GRO phase state is much more
ambiguous, which is an issue that is later discussed in more detail in Chapter 4.
Because the transitions of the multi-path oscillator delay elements are much wider
with respect to Tq than before, we can also expect that the gating skew contribution
from each transition will be much wider as well. In the center of Figure 3-24, we
now depict a gating skew error with the same conceptual shape and magnitude as
Transition Width
Spans Entire Range
of Input Connections
Delay 18
Element
Output
Voltages
20
17
Delay 18
Element
Skew 19
Error
Total
Skew
10 15 20 25
GRO Phase State
Figure 3-24 Concept of how the overlapping skew from positive and negative transitions
for a multi-path GRO significantly reduces the total skew
before in Figure 3-6, although here the width of the contribution has effectively been
stretched over a span of 13 Tq. Depicting the individual error contribution in this
way is a physically intuitive and reasonable thing to do, since we have already seen
that charge redistribution is the primary mechanism for skew error, and also that
any delay element in transition will observe some amount of charge redistribution.
In addition, the equivalent circuit schematic for each individual delay element during
the disable window has not significantly changed from the schematic shown earlier in
Figure 3-9.
Finally, we show at the bottom of Figure 3-24 that the overall skew error is the
"average" of the individual contributions from each delay element, with the result
being much smaller than any of the individual contributions. To consider how this
"averaging" relates to the physical oscillator, recall that the delay elements in a multi-
path oscillator are strongly coupled together. Thus, when a charge is unnaturally
injected into one of the transitioning delay elements, its influence will be mitigated
by the inertia of the other delay elements, since all of the elements must work together
in converging to a single phase state. This physical analogy provides some justification
for depicting the total skew as an "average" of individual contributions that we will
later verify through simulation and measurements.
Although Figure 3-24 demonstrates that summing the overlapping transition skew
contributions from many stages will result in a smoother skew function with decreased
variation, it is not clear how much improvement we can expect. In fact, the amount
of reduced variation that results from "averaging" multiple functions in this manner
strongly depends on the specific characteristics of the individual functions, as well as
the time offset that separates them. In addition, we have so far approximated the
oscillator state space with only two-dimensions (phase and time), which provides a
useful, albeit crude, tool for understanding the relevant issues of gating skew, but
does not model the complex intricacies within the GRO. Therefore, we again turn to
a simulation testbench similar to that in Figure 3-7 to gain a more quantitative sense
of the improved variation of Tskew in the multi-path GRO.
Figure 3-25 displays a single simulated curve for the proposed multi-path GRO
Tske as a function of GRO, assuming typical operating conditions of a Ins disable
width and 50ps rise / fall times, and also with the DC component of Tkew, removed for
clarity. While the smooth, near sinusoidal shape and period of 2 Tq are as expected,
the aspect of this figure that is striking is the very small magnitude of the error.
Compared to the gating skew error simulated for the inverter-based GRO, the peak-
to-peak magnitude shown here is almost an order of magnitude smaller, and this result
is with respect to Tq. Thus when the reduction of Tq is also considered so that the
gating skew error is seen in units of time, the simulated peak-to-peak variation of the
proposed multi-path oscillator is smaller than the inverter-based GRO by significantly
more than a factor of 10.
0.015
0.010
GRO Disable
0.00Time Normalized to Tq
- -0.005 -
o -0.010 -
z
-0.015
0 1.0 2.0 3.0 4.0
GRO Disable Time Normalized to Tq
47 1r)
(TeITq = OGRO'
Figure 3-25 Multi-path GRO Tskew as a function of GRO for typical conditions with a
9
S .1ps
0 Disable Disable
Width (ps) Width (ps)
U) ............ 30
CO -0.4
0.1 -0.4 .........................
............
.........
.0.
0.5 60
1.0 •125
(UI 2.0 -0.8 ............ ...........
............ 250
................
C. .0.8
4.0 500
7.5 1000
7a -1.2 15 -1.2 . 2000
E 30 4000
0
z. i -1.6 : .
3101.52.0
0.5 2.5 3. 30ps
0 0.5 1.0 1.5 2.0 2.5 3. (0 0 0.5 1.0 1.5 2.0 2.5 3.0
GRO Disable Time Normalized to Tq GRO Disable Time Normalized to Tq
47 47
(Tel Tq = %GRO /') (TelTq = OGRO' /n)
(a) (b)
Figure 3-26 Multi-path GRO gating skew Tskew as a function of OGRO for stepped values
of disable width (Tdisable) from (a) 0.1-30ps and (b) 30-4000ps. The Enable rise and
fall times are held constant at 0.5ps.
To compare the multi-path GRO topology with the inverter-based approach sim-
ulated earlier, the same set of simulation conditions are applied to trace Tskew as
a function of 9 GRO. As a fair design comparison, both simulations have the total
transistor widths within each delay element, with the multi-path transistor gates be-
ing assigned to multiple delay elements according to the prototype design instead, of
77
A A•C
" II"LZ
"'L
'I'''
----,
...;I -----,
···~···~···~~··?;I ··~·····
,,.,, ii U3U · ' ';i'
·
' "
~i~·?·~
:::::iir ~.~.,;.~ i
'; ~"-'`""""
L~"'
·~~~:·i
"'
'" '""';"' ''''
'~':' ~ ,·;·~ ~ .,,.
· ''''''
''"'
·L:"'
· '''"'
:"'
Iri
~;;;~ ii;
"'"
";"
"'';'
jj! ii;i~t
"'
o) 0.025
·
?·
~ ; · ·~ ·· ·:· ~
i· ·
·
,,,,,.
~ · ~~·;·~
;·
"~ .... ';
~
~·····
· ··
j: '''
'"' ·· ·· · -·······-···~············· · ·-· · ·~········-···
~ · · -- ·· · :·-··~·····~
·- ·· · ·~
·"'
:"'
"''
ii:::: :ii:: ii CD :::i : : ::jr::1
,I
*C 0.015 · · ~;·I
'" '·'" hi: .··; X
:fi··i X
· · · · ····
0 - '?'
···
";'
:i "'" ·.. · ~.
L·?il
Il·i· ::i:l ::11: Ii 0.020 Ii
Ili:l
LI·l '' ';:''
"'" i
* 0.010 rlIl~ ,;i;.1
i
~l;rl
···llrl
··?111 ,.ii "';;?i ::11:
"'"
i:
,..i
-W 0.005
·;iiilil '"~"::":
·····:(·I "'
'"' ~?'
'"'
"'"
""
~~`'*o~"
i 0
S0.015 ---- !--i--~~-i!i:i----!--S--'i-ii~if----
''?
.,..,..
;i·r
··· Ilrrr
Ir·,
";"
':"
:··· rXI
.,.....
"" iji *:
":
::::: i· · · i ~~;···i
CL
n1 i'''
···--· ·
':X:?""" ""''
"""'
'
'
"""
"""
i:j nn-n
i.. i
;;:i
·
· ,I
:I
10-13 10-12 1011 10-10 10-9 10-4 10-13 10-12 10-11 10-10
Disable width (s) Rise I fall time (s)
(a) (b)
Figure 3-27 Multi-path GRO peak-to-peak Tke,,/Tq plotted vs. (a) disable width and
(b) rise / fall time
sharing a common connection. However, no attempt was made to scale the parasitic
capacitance for the multi-path design, since an accurate value is specific to the im-
plementation and difficult to estimate accurately. To avoid artificially inflating the
performance improvement of the multi-path oscillator by neglecting this important
consideration, all results from both simulations are normalized to T,.
In Figure 3-26, the normalized value of Tskew for the multi-path GRO simulation is
plotted for a wide range of disable widths from 0.1-4000ps. To again visualize the two
time constants that are present in the multi-path oscillator, the figure is separated
according to shorter disable widths of 0.1-30ps on the left in (a), and longer disable
widths of 30-4,000ps on the right in (b). As mentioned earlier, the circuit schematics
during the disable window for both the inverter and multi-path delay elements are
virtually equivalent to each other, with only a modification needed for the value of
•n,,v which was defined for the inverter-based GRO in Figure 3-9. Therefore, it is
not surprising at all to see the same trends appear in the multi-path oscillator, and
we attribute the movement of Tskew to the same charge redistribution mechanisms
that were discussed earlier in Section 3.1.3. The peak-to-peak variation in Tskew as a
function of disable width is plotted in Figure 3-27(a).
The trend of multi-path GRO gating skew error versus the rise / fall time of
78
Enable also appears very similar to the inverter-based GRO results discussed earlier,
with a slight decrease of variation in Tskew for slower Enable transitions. This result is
seen most clearly in the plot of peak-to-peak variation in Tskew as shown in Figure 3-
27(b). To recapilutate an earlier comment, a slower Enable may suffer from increased
thermal and 1/f noise contributions, although it does seem to provide some benefit
in terms of smoothing out the gating skew error.
By providing a physical intuition as well as simulation results, we can conclude that
the gating skew error for the proposed multi-path GRO architecture is significantly
reduced compared to the inverter-based topology. We can also say that there are two
key features of the proposed multi-path oscillator design that enable such a marked
improvement in the gating skew error. First, the large number of delay elements in
transition at any given time means that the overall gating skew has the potential to be
influenced by more than one delay element. Second, the distributed set of connections
in this design, chosen originally to ensure oscillation in the primary mode, provide a
web that strongly couples the delay elements together. Together, these features enable
a very digital circuit structure to accurately preserve the analog state information that
is required for noise-shaping. We will later see in Chapters 5 and 6 that this level of
gating skew performance is inherently sufficient to achieve robust first-order shaping
of the quantization and mismatch error.
Chapter 4
St.
St
Figure 4-1 Using two counters for each output stage to keep track of the total number
of phase transitions
Counter
GRO Delay Output
Element Output to Adder
LatCh Counter
CLKlatch
Enable -1
CLKlatch ,,
counter input simply needs to be latched early enough to guarantee that the ripple
counter has properly settled from the time of the last possible input transition.
The second issue of dealing with counter inputs that are stopped at invalid logic
levels can be helped somewhat by a few obvious circuits, however addressing the
fundamental issue is more complex. For example, buffering and using positive feed-.
back for delay stage outputs very close to a logical threshold can virtually eliminate
metastability problems, but this does not decrease the possibility of the delay stage
output moving across the threshold during the disabled window. As shown in Fig-
ure 4-2(a) without additional measures in place, the counter can advance more than
one count for a single transition event.
One possible solution to address this double-counting of transitions is to de-glitch
the negative pulse with a carefully timed latch control signal. Figure 4-2(b) illustrates
that after the GRO is enabled, the delay element output in question will quickly
resolve itself to a logical state that it will hold for a relatively long time ( NT ).
By opening the latch slightly after Enable T (but well before the next transition),
any potential glitches at the counter input will be removed.
When contemplating how phase measurement entirely with only counters can be
used for the multi-path oscillator, there are two primary areas of concern. First, while
30 counters operating at a relatively slow rate (<1GHz) does not consume much area
or power for the standard inverter GRO [21], the situation worsens considerably for
the multi-path oscillator, which would require 94 counters operating at approximately
2GHz. Second, de-glitching the counter inputs by means of careful timing is not a
robust technique that can easily be implemented with simple digital synthesis tech-
niques. Although it is understood that the GRO core has custom attributes, we would
prefer a more elegant solution that is robust even when implemented by relatively
crude automation.
In this section we describe an efficient and robust phase measurement technique that
is applicable to a wide range of ring oscillators. Compared to the previous approach,
which operates on each delay element outputs independently, the foundation for this
technique takes advantage of the predictable sequencing of oscillator transitions. In
this way, the motivation for both the multi-path oscillator topology and the phase
measurement approach is that the designer can anticipate known phase state patterns
to make decisions more intelligently. However, we will first continue to use the stan-
dard oscillator in this section to illustrate the concepts of this technique, and later
apply the concepts to the multi-path oscillator. We also present a robust de-glitch
circuit that does not require precision timing to avoid double-counting.
For the GRO, counting the delay element outputs is, by far, more expensive in terms
of area and power than is sampling the output with a digital register, since in general
the TDC sampling rate is much slower than the oscillation frequency. Yet, a single
counter provides a full record of its input transitions since its last reset, whereas an
undersampled single register appears to provide no transition information at all. In
fact, a single register only provides a crude sample of the phase of the oscillator.
However, we know that the GRO frequency and phase are related by
AO [k]
fGRo[k] = A[k] (4.1)
Ts '
We see that it is not only possible to estimate frequency indirectly via phase, but the
noise-shaping properties of the GRO-TDC are also preserved through Terror. There-
fore, calculating phase by using N registers appears very attractive compared to
tracking frequency with 2N counters.
Figure 4-3 illustrates the basic concept of calculating the GRO-TDC output by
STi,[k]
15-Stage Gated Ring Oscillator Enable
Enable E
1
Quantized t[1]
Phase .
Residual
Sta (max = 30) .
...............
.....
k......
. ..
........
Stc 660,
Phase 30, 30 _
Count
Phase[k] 56 63
-Phase[k-1] -50 -56
Out[k] 6 7
Figure 4-3 Basic concept of calculating the GRO-TDC output by differentiating phase
quantizing the oscillator phase with registers, and then differentiating from sample to
sample. Of course, the problem with using only registers to measure the average GRO
frequency (or number of transitions) is that the oscillator phase value is calculated
modulo 2N (or 27r, depending on units), and without a means to keep track of the
number of phase wraps, the measurement output will be incorrect. To solve this
problem, we separate the phase into two components, a fine phase residual that is
calculated from the registers, and a coarse phase that accumulates 2N (in the figure
N = 15) each time the oscillator phase wraps around without a reset operation.
The coarse phase accumulation can simply be implemented by counting the positive
transitions of a chosen delay element.
To accurately calculate the phase residual we need to observe the entire oscillator
state, which means utilizing the outputs from all the delay elements. The key idea here
is to leverage a simple, predictable mapping between the sampled oscillator output
code and phase that is inherent in the fundamental operation of the oscillator. For
example, Figure 4-4 charts how the 30 possible phase states of the example 15-stage
ring oscillator are encoded in the delay element output values. The starting phase,
or equivalently the state mapping to zero residual, is determined by the polarity and
location of the counted delay element. To calculate a binary-coded phase residue for
LU
MMMMONNONEENEMMMMMM MMMMMMMMMMM Key:
MMMMMMMMMMMMMMM
MMMMMM MMMMMMMMMMMMMMMMMMMMMMMM
MMMMMMMM MMMMMMMMMMMMMMMMMMMMMM Logical 0
O
15
MMMMMMMMMM MMMMMMMMMMMMMMMMMMMM
MMMMMMMMMMMM MMM
MMMMMMMMMMMMMMM
15 [MMMMMMMMMMMMMM MI
0 Quantized GRO Phase State 29
Figure 4-4 Chart showing the logical states of a standard 15-stage ring oscillator for
each of the 30 possible discrete phase states
each state, we then use a Karnaugh map for each of the phase residue bits, and to
determine the number of transistions for each measurement we implement a simple
first-order difference operation.
Although using a counter to deal with the modulo 2N phase wrapping, ironically
the fundamental problem is not solved, instead it is inherited by the counter. There-
fore, we use the overflow output of a standard f-bit ripple counter to indicate that its
range has been exceeded (by design overflow should happen at most once per mea-
surement), and to compensate we simply need to add 21 to the first-order difference
output (for that measurement only). Figure 4-5 extends the example of a 15-stage
oscillator, where the counter has a range of 3 = 4 bits.
Earlier in Section 4.1, we discussed how latching the counter inputs during the disable
window with careful timing could prevent double-counting transitions that would
destroy the quantization error-shaping. While any phase measurement error at all
is destructive, if one of the 2N counters in the first approach double counts a single
transition, the TDC output will be off by a single LSB. For the phase measurement
approach just discussed, the majority of delay elements are seen only by registers
Tin[k]
Enable
Phase Count
(max = 24-30)
. . .480:
Overflow a a
I
Phase[k] a
454 'ai 45 a a
which are much less sensitive to glitch events. However, if a double-counting error in
the counter is made in this topology, it is likely that the TDC output will be wrong
by at least 2N, since the counter output is amplified by this value. This magnitude of
error lacks noise-shaping and would likely be very disruptive at the application level.
The first step that we take to remove counting glitches is again to latch the counter
input, as seen in Figure 4-6. The delay element output to be counted, Vo...., is then
input to both a latch as well as a register, since its state information is required for
both the phase count as well as the phase residue. Putting aside the issue of double-
counting for now, we can see another potential source of phase measurement error
that the latch output is not guaranteed to be the same as the register output. The two
distinct samplers will undoubtedly have different offset and sampling instants, which
is problematic since the GRO output Vo,. can be held near mid-supply during the
disable times. Although in noise-shaping applications it is likely that a latch/register
N-Stage Gated Ring Oscillator
Enable
CLK
St; GRO
Output Vocount
Figure 4-6 A potential phase error when the oscillator state is determined by both reg-
isters and counters
Vocount Vocount
Figure 4-7 Combining register and latch functions into a single element to resolve the
potential discrepancy between register and latch outputs. (a) shows the original problem-
atic implementation, (b) illustrates that a D flip-flop is composed of two serial D latches,
and (c) combines the redundant latches to ensure the same signal is observed by both
the register and the counter.
discrepancy would be corrected in the next sample, the TDC outputs for at least two
samples would be incorrect, which is generally unacceptable.
A very efficient way to resolve the potential discrepancy between the latch and
register outputs is to utilize a common latch circuit for both functions. Figure 4-7
illustrates that we can implement a D flip-flop register as a master-slave pair of D
latches, and coincidentally in (b) we find that the first D function is implemented by
both the register as well as the latch. As seen in (c), we can eliminate a redundant
latch, and at the same time achieve a unified signal path that will ensure the phase
count and phase residual are consistent with each other (note that in this statement
we rely on a monotonically increasing phase during the enabled window).
With assurance of a consistent phase count and phase residue, we now focus on
the issue of double-counting errors due to glitches at the counter inputs (as discussed
earlier in regard to Figure 4-2). Instead of clocking the latch signals with precisely
controlled timing, we can avoid glitches with a more robust technique that once again
leverages the predictable sequence of the oscillator phase state.
Recall that the goal for the de-glitch circuit is to ensure that the counter incre-
ments exactly once for each GRO phase rotation, regardless of how slowly the counter
voltage threshold is crossed, or even how many times the threshold is crossed! There-
fore, the key idea for the proposed de-glitch circuit is the knowledge that when the
counter input Vo,,•, is held near mid-supply, almost all the other stages are resolved
to unambiguous logic levels.
Specifically, if Vo 0 . is transitioning high at the time Enable 1, then we can say
0 o.
number of stages (e.g. Vode-glitch = Zcount-2) has just competed its positive transition.
Similarly, the negative transitions follow in the same sequence. Therefore, we can say
that to prevent the counter input from "moving backward", it should only transition
when both Vodegltch and Vocount share the same logic level. A truth table for the de-
glitch logic can seen in Table 4.1, where Lde-glitch and Lwont are defined as the latch
outputs corresponding to Vode g•itch and Voco.., respectively, and Led is the de-glitch
logic output.
Figure 4-8 shows one circuit realization of the de-glitch logic using a four-transistor
stack along with weak regeneration. When both Vodeglitch and Voco,,, share the same
logic level, the appropriate pull-up or the pull-down network is active. Alternately, if
abp
I
8
6
I!
8
P
I:
1
B
8
ed:
r
CL 8
I
I
8
f
P
t
dr
a d·
St.
St4
Figure 4-9 Overall block diagram of efficient and robust phase measurement technique
for an inverter-based GRO
the inputs do not have the same value, the networks have no influence and Led is held
constant with the positive feedback. Since the overlap time of Vodeg•i•ch and Vo....
is shorter than half the oscillator period, the timing requirements for this circuit can
influence not only transistor sizing for the de-glitch logic, but the choice for the overall
number of oscillator stages N as well.
The overall block diagram of the technique applied to a standard inverter-based
GRO is pictured in Figure 4-9. The diagram includes both the efficient measurement
approach with phase differentiation as well as the robust de-glitch circuits just dis-
cussed. With the example of the simple GRO implementation completed, we can now
consider these techniques for use in the more complex multi-path oscillator.
As mentioned before, the key idea of the phase measurement technique is to leverage
the predictable relationship between the GRO state and phase in order to significantly
reduce the complexity of the measurement circuitry. In the case of the serial inverter
ring oscillator, the predictable relationship is established because each inverter must
wait to transition until the preceding stage is close to completing its own transition.
Therefore, both the transitions and phase of an inverter-based ring oscillator must
proceed in a monotonic sequence according to each delay element's location on the
ring.
In contrast to the inverter-based topology, each delay element in the multi-path
oscillator may begin transitioning well before the preceding stage is close to complet-
ing it own transition. In fact, this "anticipation" is the very thing that allows for
0J
(UI
I-
0
Time
Figure 4-10 Simulated transient voltages of the multi-path delay element outputs when
mismatch is included
significant reduction of the effective delay per stage. As a result, we have already
seen in Figure 3-23 how approximately 13 delay elements in the proposed multi-path
oscillator are in transition at any instant.
Recall that the quantized GRO state is encoded with the logical value of the delay
element outputs, and that mapping from the GRO state to phase requires knowing the
exact transition sequence that delay elements will undergo during oscillation. In this
case, the sequence of transitions may be deterministic within a specific realization, but
this sequence is almost impossible to predict, and in addition there is the possibility
-a
..............................
............. Key:
O4o sees
(' ...... ......... .... Mmoss!
............. ...... NEMONMENERNM
memoZZ: ....................... ::::EMMRMMMMMiý
ownsommoseson
Logical
.8806220 GaNNERNMEMNOZ: sessoMMUMMEN: BWiýl
-:0EMMUM
....... '$monsoons ROMMENORNE20:80 Mýýý'ENOW
so
47
0 Quantized GRO Phase State 93
Figure 4-11 Logical states of the 47-stage multi-path oscillator for each of the 94 pos-
sible quantized phase states
that two transitions could cross their respective logical thresholds in a random order.
With so much ambiguity clearly evident in the GRO state, establishing a predictable
relationship between the transition sequences becomes a primary challenge.
We illustrate the ambiguity in the mapping between the quantized state and phase
for a 47-stage multi-path GRO in Figure 4-11. This mapping is a critical part of
the phase measurement approach, but without being able to predict the transition
sequences in the design flow, it is impossible to hard-wire logical circuits that precisely
calculate the overall GRO phase.
One potential way to solve this issue is to create an algorithm that populates
a dynamic look-up table based on observing the TDC output, but this approach is
cumbersome and inefficient. Alternately, we could simply revert back to counting
each of the delay element outputs independently, but we have already discussed the
associated drawbacks in this case. Fortunately, there is a compromise between having
a single counter and having 2N counters.
Figure 4-12 illustrates the concept of partitioning the entire GRO state into 7
smaller measurement cells. Here we choose enough cells and distribute the cell inputs
so that instead of having multiple ambiguous inputs in the state-to-phase logic, there
is at most one delay element in transition per cell at any given time. The tradeoff in
__ -I---__
__ -- ._ .. ..
(a) (b)
Figure 4-12 A geometric view of an example multi-path GRO state illustrating (a) the
unpredictable transition sequence considering the entire multi-path oscillator, and (b) a
partitioned approach that re-establishes predictable transition sequences within each of
the 7 independent measurement cells
this approach is the increased power of having one counter for each cell instead of one
counter for the entire GRO. This small penalty is far outweighed by re-establishing
the predictable sequence of states, at least with respect to each individual cell. The
measurement cells can then independently calculate their outputs, which are then
simply summed together in the final step to result in the overall TDC output.
The interesting aspect to this approach is that although we have separated the
entire GRO state into independent groups for purposes of measurement, we have not
altered any of the GRO properties. In fact, as shown in Figure 4-13(a), by simply
rearranging the outputs in a convenient manner, the ambiguity in the overall GRO
phase state has not actually been resolved. Instead, as shown in (b), the phase state
for each cell is now predictable and internally self-consistent. From this perspective,
these 7 cells may be seen as coupled oscillators, although in this case we do not require
all of the cells to be equivalent, nor do we require a conventional pattern for the state
sequence.
For convenience it is simpler to have one measurement cell repeated multiple times,
however the prime number of GRO stages is more important for stability reasons. In
Cell 1
tr"--~ti...
r~wm ...............RR
Cell 2
SCell 3
-
Ogo
.· 0, 019111aa#~
... ..
U, Key:
cc Cell 4
Logical 0
#t##
I ...............
O Cell 5
- ------------ -----
Cell 6 ~
mrmnl~~~~~yyl mrml MORINYII~
---------
Cell 7
0 Quantized GRO Phase State 93
(a)
Cell with
7 inputs Cell with
MEMOMME
NEEMMEM 5 inputs Key:
Delay
Stage MEMOMMEME MENEM
OMMEMOM
Outputs MEMO MOMMEEMNNE
0 13 0 9
Quantized Cell Phase State
(b)
Figure 4-13 Re-arranging the logical states of the multi-path GRO into groups that
correspond to the 7 measurement cells. (a) charts the ambiguity in the overall GRO
phase state, and (b) charts the predictable phase state for the smaller cells
general, it is possible to have only two kinds of cells, and here we choose to have 6
cells with 7 inputs each, and 1 cell with 5 inputs. The assignment of delay elements
to cells is shown in Table 4.2, and we note that the first pair of inputs for each cell
are separated by 6 stages to use in the de-glitch logic.
Finally, we now show a system block diagram for the proposed 47-stage multi-path
GRO-TDC in Figure 4-14. Although we have discussed at some length the GRO core
and the measurement cells, a few other digital circuit blocks are also needed within
the TDC. The timing generation block takes a start and stop signal input, generates
the differential Enable signal, and sufficienctly buffers Enable in order to drive the
1.1 1 2.1 2 3.1 3 4.1 4 5.1 5 6.1 6
1.2 7 2.2 8 3.2 9 4.2 10 5.2 11 6.2 12 7.1 13
1.3 14 2.3 15 3.3 16 4.3 17 5.3 18 6.3 19 7.2 20
1.4 21 2.4 22 3.4 23 4.4 24 5.4 25 6.4 26 7.3 27
1.5 28 2.5 29 3.5 30 4.5 31 5.5 32 6.5 33 7.4 34
1.6 35 2.6 36 3.6 37 4.6 38 5.6 39 6.6 40 7.5 41
1.7 42 2.7 43 3.7 44 4.7 45 5.7 46 6.7 47
cell inputs
Table 4.2 Assignment of delay element outputs to measurement
Table 4.2 Assignment of delay element outputs to measurement cell inputs
Figure 4-14 Overall system block diagram for the proposed 47-stage multi-path GRO-
TDC
GRO core with modest rise and fall times (and correspondingly modest jitter). In
addition, the timing generation block derives the other clocking signals as required by
the measurement cells. Last, the output adder receives all of the calculated outputs
from each of the measurement cells and sums them to result in the overall GRO-TDC
output.
Although the GRO-TDC can easily accommodate very large range input signals by
adding bits to the counters, the penalty for doing so is an increase in the minimum
length of the disable time, since these counters must fully settle and be sampled
before the oscillator can be enabled again. In addition, the processing time for a
large number of bits can increase the pipeline delay of the measurement cells and
output adder significantly if very high-speed operation also must be supported. Low
pipeline delay is important in many closed-loop applications because it can pose an
upper limit on the loop bandwidth. Therefore, design of the overall TDC must trade
the parameters of maximum sampling rate, maximum range, and acceptable pipelined
delay against each other to find an appropriate balance.
For our prototype demonstrations, we chose to implement two versions of the
GRO-TDC, with specifications determined by the system applications. The multi-
path GRO core is common and, assuming a minimum resolution design takes first
priority, can be used for a very wide range of applications. The first TDC is a general
purpose 11-bit, 100Msps version that can typically be used for systems comparing to
crystal references such as PLL and multiplying DLL [21,25]. The second version is
an 8-bit, 500Msps TDC that can be used in high-speed timing applications such as
CDR. These various applications will briefly be discussed later in Chapter 6.
Chapter 5
At this point, the concept of a gated ring oscillator TDC has been introduced, and
a number of design considerations have been discussed that relate to overall per-
formance, for example raw resolution, gating skew error, measurement precision,
efficiency, and range vs. sampling rate. To demonstrate how these considerations
relate to a practical implementation, a total of three GRO-TDC were designed and
fabricated in 0.13pm CMOS technology.
The first GRO-TDC is based on a simple 15-stage inverter-based oscillator core,
and has a 10-bit measurement range using only counters. The second and third GRO-
TDC are based on the same multi-path GRO that is described in 3.2.2, and both use
the efficient readout techniques described in Section 4.3. The difference between these
two GRO-TDC, then, is the range and maximum operating frequency, with a 8-bit,
500Msps part and an 11-bit, 100Msps version. A single microphotograph depicting an
11-bit GRO-TDC 1.0mmx 1.0mm die is shown in Figure 5-1, which is nearly identical
to the other die in terms of visible markings.
In this chapter, we first describe the requirements and proposed approach for the
measurement setup. Next, we present measurement results for the inverter-based
GRO-TDC, including the non-linear effects of the gating skew error for this imple-
mentation such as corrupted noise shaping and deadzones. Then, measurements of
the 11-bit multi-path GRO-TDC are shown, which verifies the inherent noise-shaping
capability of the GRO-TDC architecture. Finally, we conclude this section with a
U
E
E
0
CD
|I11.0 mm I
100
fflWfLU1[
Out[k]
GRO-TDC
Figure 5-2 A method to create a low-noise input signal for the GRO-TDC testing
modulating the power supply of an off-chip buffer, which is suitable for measuring
the GRO-TDC noise performance with relatively small input signals. This signal
generation capability is designed onto a gold-plated FR-4 circuit board, which also
provides power supplies, decoupling capacitors, and a substrate for direct-bonding of
the GRO-TDC chips.
Due to the limitations just discussed for generating large input signals, we choose
to use two synchronized signal generators to verify large-signal performance across
the full range of the GRO-TDC. In this setup, the frequency and phase of the first
signal is held constant, and the second signal generator can be phase modulated to
create a time difference that fully spans the GRO-TDC range. Again, the quality
of the input in terms of both noise and linearity using this approach is quite poor,
however the measurement does establish a full-scale signal level for the TDC.
101
65,536 pt. FFT
^^ Hanning window "' Raw TDC Output
'U ""
"" '' ''"'"'
';""" '' "''":'
"""" '' """"
"""" 60
,,...
,,,, ;.· ' """"'"'"""
,,.,,..~.. ..,...;.
· """"" """" ' """"
. ;,,,~,,...,.~.~..
""' 55 ------------------
------------------
r---------------
(III · 1··1~1~~
· ,II· · (i~·i~·i
)Itx I· (~)·)(I(
11II1 L )~~lilr(·) Il)rl(LI I··i()l)
· I)II ·i (Il··li)··
Irt·:··~·) I~I·)III 1 1·~1111~
(·(
· II·I 1 111·111
~·rl·L··
· 1 · ~·)11·1 II Illr)()l
L~~(I1·~
0.50
-20 r~*·r~·rn,~-~·~·r-lls~~*rrrrr-1-*
· · ~r · 1 111'111 · 1 ~l·*-r*-·t*~r-l·1~T~--(-
· Irlll·· ~~r~lr~LI---ll
1 Il~rltl(
PN 11111
~lrl~
1 I11()1)I
1 L·tr·(lr
0
I·I·
111(, I 1)l;1LII
1 I)~·~I~( C 45
Z .40 0
0
ii~i
'!
44
-- 35 - ------------- --------- ------ -----
-80
..
TI/ 30 ---- ---------- ---------------------
-----------
-100 -100L -- 290 -- --
104 100 106 10' 0 5 10 15
Frequency (Hz) Time (Ips)
(a) (b)
and the minimum sampling period is 4nsec as required for timing control. For all
reported measurements, the nominal sampling rate is 50MHz.
While the measurement shown in Figure 5-3 does not achieve a low noise floor,
it does represent a fairly linear behavior compared to other measurement scenarios.
There are many other inputs that can be applied to the inverter-based GRO-TDC
that do not result in adequate scrambling of the GRO phase, and the resulting non-
102
65,536 pt. FFT
Hanning window
-Irl I ,r,,
I ctrr
I riir
1 ,rir
-c-*-,-~*i
( rirI
1 rrri
I trrr
i rric
S-40 I L11(
I )))I
tllll
I L11)
-Cr*-+-t~C1
( (llr
11111
111(1
11(11
a -60 1(111
(1111
I LII(
I 1111
~,4~C-C*I
--
)1)11
t ()II
1 1111
11111
r-t--*rL-~L1
) ((11
11111
11111
tlLII
1--L-L-~((
--
1(111
I LI1L
a 11111
11111
^·C-~-L-LC~
1(111
1((1)
-100 1 I~I
4C
I I(LI~
1 rllU
r r
- Il
Frequency (Hz)
A--
13.90
0.
5 13.85
0 O 13.80
t, 13.75
0 0
0 13.70
S13.65
*
S13.60
13.55
Variable Delay Control Input Variable Delay Control Input
(a) (b)
Figure 5-5 A measured DC transfer characteristic for the inverter-based GRO-TDC that
demonstrates the presence of deadzones. (a) indicates the deadzone behavior for integer
TDC outputs, and (b) shows the potential for small deadzones at non-integer output
values.
linearity can be clearly seen in the TDC output. For example, in Figure 5-4, an
input that is synchronously modulated at 12.5MHz is applied to the GRO-TDC, and
the FFT spectrum clearly first-order noise-shaping. However, the figure also reveals
non-linear behavior since there is no observable noise floor in the TDC output.
The issue of deadzones in the DC transfer characteristic as a result of gating skew
non-linearity was theoretically discussed earlier in Section 3.1.4, and this behavior
103
can also been seen experimentally in the inverter-based GRO-TDC. To generate a DC
signal for the GRO-TDC, the variable testing delay is controlled with a digital-to-
analog converter, and in this particular measurement setup the overall tuning system
is quite non-linear. Nevertheless, a DC transfer characteristic of the inverter-based
GRO-TDC is plotted in Figure 5-5. The deadzone behavior at the integer boundaries
is clearly evident in (a), with larger deadzone widths for the even TDC outputs as
predicted from simulations. Closer examination of the curve in (b) reveals that much
smaller deadzones are possible for some non-integer TDC outputs as well.
A 1.5V supply is used in general for measurements, and functional operation was
verified from 1.0-1.6V. As shown in Figure 5-6, the raw delay per stage of the GRO is
a strong function of the power supply, and has a nominal value of 6ps at 1.5V. Also
as expected, the power consumption of the GRO-TDC is measured to be a linear
function of the width of the input signal. At 50Msps, the minimum power is 2.2mW
for a very small input, and the maximum is 21mW for full-scale.
The measured multi-path delay of 6ps represents an improvement factor of over
5 compared to an inverter-based GRO-TDC delay of 30-35ps under the same voltage
supply and operating conditions. This result verifies the significant benefit in raw
resolution that multi-path oscillators can offer for TDC applications. Recall that in
Section 3.2.1, we defined Jeff to be the product of a weighted sum of multi-path
104
1A
1z
IA
0.
10 . . . . .... .. . .. .. ... ...
-mp
. ...............-----
0 6
0
C,
S
Figure 5-6 Measured delay per stage for the multi-path GRO vs. power supply voltage
connections with an efficiency factor, 77. Now with the measured results in place, we
can calculate Jeff of the multi-path design to be 5, and using the values of J and W
given in Table 3.1, we find that
K
Jeff = wk "jk, (5.1)
k=1
5 = 77(13.0.25 + 11.0.16 + 9.0.24 + 5-0.24 + 1.0.12), (5.2)
,q = 0.60. (5.3)
Therefore, while this particular design has lost a small amount of efficiency due to
implementation parasitics compared to the improvement that might be expected from
unextracted simulations, the speed benefit compared to the inverter-based implemen-
tation is still very significant.
A typical method to measure efficiency for converters is by the standard figure of
merit, P/IF,/ 2 ENOB. Although it is difficult to calculate the effective number of bits
for the TDC in a manner comparable to a classical ADC, as an alternative we can
use an efficiency figure of merit defined by
Power
(Sampling Rate)(Conversion levels)
105
21 x 10- 3
(50 x 106)(211)
- 0.2pJ/step. (5.6)
The GRO-TDC compares favorably with other TDC in this metric, yet there is a
fundamental flaw in this FOM because it does not appropriately factor the TDC res-
olution. For example, a very large range can easily be achieved by using a cyclic TDC,
but this does not imply anything about the minimum detectable signal. Therefore,
we now move on to demonstrate the strength of the GRO-TDC, which of course is
the ability to achieve first-order noise-shaping.
While the improved raw resolution is an important benefit of the multi-path oscillator,
recall that a fundamental design goal is to linearize the noise-shaping performance by
significantly reducing the gating skew. To examine whether this is accomplished with
the prototype GRO-TDC, we can apply very small input signals using the modulation
techniques described in the previous section. After collecting data in this way we can
examine the TDC output in both the time and frequency domain, and also in the DC
transfer characteristic to look for the presence of any non-linear deadzone behavior.
Figure 5-7 shows the both the frequency and time domain GRO-TDC 50-Msps
output with a 26kHz input of 1.2 pspp in addition to a DC level of about 1.6ns. In
(a), the 65,536 point FFT is performed with a Hanning window on 20 sequential
collects before being averaged to result in the double-sided power spectral density as
shown. Noise-shaping of more than 20dB is clearly evident, with 1/f noise appearing
to dominate at low frequencies. The wide, shaded horizontal line in Figure 5-7 shows
that the low frequency power spectral density of the GRO-TDC output is comparable
to what ideally would be produced by a 50Msps classical quantizer (i.e. no noise
shaping) with ips steps.
By looking at the time domain output after digitally filtering with a 1MHz band-
width in Figure 5-7(b), the GRO-TDC is clearly able to resolve a 1.2 pspp signal,
106
65,536 pt. FFT
40 (Planning window +
+
20x averaging)
20x averaging)
Input
._
of
window
-50
1.2ps,"" "" ""
Wiirr rryiror
---
0.
0 -60
0
i -70 C
MI
~ -80 Noiq.of8 IAZ Wr wiirhlS
rii uI 0--
peg, "S
U-
0
0 -90
-inn
·
• o . . . .
104 10i 106 107 ii
Frequency (Hz) Time (Jps)
(a) (b)
Figure 5-7 Measured GRO-TDC output for a 1.2pspp, 26kHz input signal. (a) plots the
signal and power spectral density in the frequency domain, and (b) is a transient view of
the output after digital low-pass filtering with a 1MHz bandwidth
whereas a classical quantizer with ips resolution would struggle due to the lack of
quantization noise scrambling. In fact, the integrated noise of the GRO-TDC from
2kHz to 1MHz is below 80frmsm, which includes the noise of both the GRO and the
off-chip buffer delay.
When considering how Tskew affects the noise-shaping of the multi-path oscillator,
recall that our hypothesis from earlier was that if Tskew could be reduced below the
level of physical random processes, then it would be scrambled and contribute a neg-
ligable amount of error to the overall TDC output. To conservatively estimate the
overall GRO-TDC jitter due to random physical processes, we approximate the ther-
mal noise floor of the GRO-TDC by taking the minimum PSD value of -88dBps 2 /Hz
from Figure 5-7(a). This implies that the rms jitter for the entire TDC bandwidth
due to thermal noise alone is about 281fsms. By comparison, the maximum simulated
peak-to-peak error of Tkew,,Tq from Figure 3-27 is less than 0.025. For Tq = 6ps, the
gating skew error in units of time is 150fspp (107fsrm).
Although the simulated gating skew error for the multi-path GRO-TDC is below
its thermal noise floor, which should inherently scramble the GRO phase with ad-
equate magnitude to linearize the performance, we do observe a small deadzone in
107
~~~ ~
ZUD.U
. i
.
- ----- ------ ------ -----
5 282.3 294.5 ------ ------ -------
0.0 0.
282.2 " 294.0 ------ -------
--------------
------
-------
- ---
-------
0Z 0
a 282.1 •..i
..... ...i
------i------- ------
""-'-"""'--""'- i
II
0 293.5
'293.0
0
--- ---
----- --------- + - ------
----- -- ----
------------
-------
-----
---
-4------ -------
0 282.0
M 292.5 -- ------ ~~ ------
- - -- -- - -- -----
" 281.9
- - ---- - -- ----
-
--
-
- r ----V-
----
- ps
- --
-.-
-------- - -- --- -- -- 6 292.0
2)
i 281.8 . 291.5 ------- i ----
+ ---- ---- r------
4-------+4----+
------------
( 281.7 291.0
"•Rt R ',in c
Variable Delay Control Input Variable Delay Control Input
(a) (b)
Figure 5-8 A measured DC transfer characteristic for the multi-path GRO-TDC that
demonstrates (a) the presence of small deadzones for TDC outputs at 2NK, and (b)
linear behavior for integer TDC outputs
the multi-path GRO for the special case when the input time is close to an integer
multiple of the GRO period, T,, = KTos,, where K is an integer. This result is
shown in Figure 5-5(a). Recall in Section 3.1.4 that the most sensitive location for
deadzone behavior is when the GRO is stopped on the exact same transition for each
measurement, since this is similar to injection-locking the GRO with the TDC sam-
pling frequency. Therefore, we can hypothesize that while the contribution of Tskew
with a period of 2T, has been reduced dramatically compared to the inverter-based
architecture, the mismatch between delay elements is now the dominant source of
Tskew, and this error is periodic with To,, = 2NT,.
For the multi-path GRO, no deadzones are evident for GRO-TDC outputs other
than at 2NK (e.g. Figure 5-5(b)), and the size of the worst-case deadzone is only
1. ps. Assuming that the size of this deadzone corresponds with the peak-to-peak
error of Tskeu, for the entire GRO phase state, and also assuming that this error is
typically scrambled, we can expect that the GRO-TDC output noise generally be
dominated by 1/f and quantization noise as shown in Figure 5-7. Therefore, we can
conclude that the multi-path GRO has significantly linearized the converter perfor-
mance compared to the inverter-based GRO topology. Compared to the inverter-
based GRO that demonstrated deadzones even for fractional outputs, in a system
108
.L2000
O 1500
o1000
I-*
0 500
01
"0 20 40 60 80 100
Time (gs)
Figure 5-9 Raw measured GRO-TDC output for a 26kHz input signal with an amplitude
near full-scale
application avoiding the small range of GRO-TDC outputs that correspond with
2KN is relatively straightforward.
To illustrate the full 11-bit operation of the GRO-TDC, Figure 5-9 plots raw
output data from the chip when a 26kHz input is applied with amplitude near full-
scale. A frequency domain plot is not given in this case due to input signal quality
as described in the previous section. Nonetheless, with a full-scale of 11-bits, the
dynamic range in a 1MHz bandwidth is calculated to be 95dB, or an equivalent
range of 15.5-bits. The TDC efficiency from before earlier can now be calculated in
the 1MHz bandwidth to be 0.23pJ/step, which is almost identical to the efficiency
calculated with full bandwidth due to the GRO quantization noise-shaping.
A summary of the 11-bit GRO-TDC performance is shown in Table 5.1. Further
demonstration of the measured TDC performance can be seen in Chapter 6, where
the GRO-TDC has been proven in a number of system applications.
5.4 Discussion
Table 5.2 compares the prototype GRO-TDC to other reported CMOS TDC. Al-
though we notice that there are different examples of TDC with comparable perfor-
109
Specification Value
Maximum Sampling Frequency 100 MHz
Range 11-bits
Raw delay resolution 6ps
Effective resolution ips @ 50Msps
Integrated noise 80fs, 2kHz-1MHz
Dynamic range 95dB
Power 2.2-21mW (1.5V)
Efficiency 0.2pJ/step
GRO-TDC Area 157pm x 258pm
Total Chip Area 1.Ommx 1.0mm
Technology 0.13Lim IBM CMOS
110
mance in any given metric, the GRO-TDC achieves state-of-the art performance in
all areas, with no calibration of differential non-linearity required.
The drawbacks to the GRO-TDC are similar in nature to issues that many TDC
architectures face. For example, a large delay variation across power supply is an
issue that is inherently related to the use of digital circuit elements as time references.
While the TDC gain can be often be calibrated at the system level, dynamic issues
such as power supply coupling can be harder to eliminate, causing possible issues
such as spurs in a digital PLL. Additionally, an issue that the GRO-TDC shares with
cyclic converters is the linear relationship between power consumption and the input
signal. This strong correlation can cause non-linearities at the system level.
The one drawback that is most unique to the gated ring oscillator architecture,
the gating skew error from stopping and starting the oscillator, can be a real and
significant contribution of error for some GRO implementations. However, we have
also shown that these errors can be practically mitigated by proper design and imple-
mentation of a multi-path oscillator. The multi-path techniques outlined in this work
have not only improved the effective resolution by a factor of 5 compared to classic
inverter rings, but also have reduced the gating skew errors to a level comparable
to that of random physical processes, which significantly limits their contribution to
the overall TDC error. To our knowledge, this work for the first time practically
demonstrates a noise-shaping time-to-digital converter with the ability to accurately
transfer error across a gap of inactivity from one measurement to the next.
Because of the very high resolution that is possible with the GRO-TDC, the ap-
plications that will significantly benefit from this technology are likely to be the most
demanding in terms of performance. We will discuss in the chapter to follow a few of
these applications that are able to demonstrate lower noise, spurious content, higher
bandwidths, etc. as a result of the GRO-TDC performance than would otherwise be
possible. The fundamental architecture of the GRO-TDC is compact, efficient, and
simple, and therefore can be easily adapted to many other less demanding applica-
tions as well, especially if techniques are used to trade resolution for power. Finally,
we anticipate that as TDC become more adapted into integrated systems, the use
111
of digital, high-performance TDC such as the GRO will become more sophisticated,
and perhaps lead to the enabling of system architectures that would not be practical
in a previous technology.
112
Chapter 6
113
re (t)
Nsd[m]
I
I-A Quantization I I
Noise I
SzA(ej2f)f n[k] I
I
I
t 1-z. Iz=e.2nfT
background. Therefore, we show in Figure 6-2 a model for the fractional-N digital
PLL that includes noise contributions from the TDC, VCO, and EA quantization.
Although the GRO-TDC quantization noise has been shown to be first-order shaped,
we depict a white PSD for simplicity here corresponding to thermal noise limitations.
Note that in this model the TDC replaces the analog phase and frequency detector
(PFD) and charge pump, which means that its noise performance will similarly be low-
pass filtered in the PLL output phase noise according to the PLL closed-loop transfer
function G(f). In fact, this low-pass response is clearly visible when the model is
114
TDC-referred DCO-referred
Noise Noise
2
Sq(9ei zT) SO,(f) dB/dec
f
MA Quantization t)
Noise t)
S I'(ej2nfT T 1-G(f
S f2TK
G
I2nN
. G(f)l S (ej2nfT
,dBc/Hz 2
12 S .(ej2nfT,'
STG(f)
T |1-e-j2xfT
Sf
m afo
Figure 6-3 Transfer functions for the three primary contributions to the digital PLL
phase noise
expanded in Figure 6-3 to consider how each of these three noise sources contributes
to the output phase noise. Based on the figure, we have that the contribution to the
PLL output phase noise from the TDC is
To provide a simple example of how typical TDC resolution will map into PLL
phase noise, we now consider a delay-chain TDC with resolution of 20ps. In Figure 6-
4(a), we can see that for a 50kHz PLL bandwidth with typical VCO and EA noise
parameters, the TDC quantization noise does not contribute significantly to the phase
noise at any offset frequency. However, when a larger loop bandwidth of 500kHz is
desired, the TDC noise will dominate the output phase noise for offset frequencies
up to 2MHz. In addition, the EA quantization noise becomes the other source of
significant noise in the system, which is. not acceptable. Increased loop bandwidth
is desirable for locking time, in-loop modulation, etc., and we see that this requires
both a high-resolution TDC as well as EA quantization noise suppression.
115
-40 Output Phase Noise of Synthesizer -40 Output Phase Noise of Synthesizer
-160
-160
ar VEO
Y~ppi J ora er
oisei (3' ~r ::;d
w::: .vww
don e · ·~ - --- ~ -
-180
104 105 106 10O 104 10s 106
Frequency Offset (Hz) Frequency Offset (Hz)
(b)
Figure 6-4 Calculated phase noise of a 3.6GHz fractional-N digital PLL using an inverter-
based TDC with 20ps resolution and assuming (a) a 50kHz loop bandwidth output, and
(b) a 500kHz loop bandwidth
(t)
Figure 6-5 A fractional-N digital PLL using the GRO-TDC and quantization noise can-
cellation
A conceptual block diagram of a digital PLL using the GRO-TDC and quantiza-
tion noise cancellation is shown in Figure 6-5 [25]. In this case, the high-resolution
from the GRO-TDC allows the quantization error from the EA division to be accu-
rately subtracted in the digital domain. Accomplishing this compensation digitally
is quite simple to implement [25], and eliminates the problems with mismatch that
plague analog implementations. As a result of the EA noise suppression and the
improved resolution of the GRO-TDC, we can see in Figure 6-6 the much improved
phase noise despite the large 500kHz PLL bandwidth.
116
a
10
10 10 4 105 10 6
Frequency Offset (Hz)
Figure 6-6 Calculated phase noise of a 3.6GHz fractional-N digital PLL using the pro-
totype GRO-TDC
To substantiate these calculations, we can refer to a custom digital PLL that was
implemented using the GRO-TDC in a 0.13/pm CMOS process [25]. The fully inte-
grated 1.4mmx 1.4mm chip has an active area of 0.95mm 2 including an on-chip VCO,
the GRO-TDC, and digital circuitry. Current consumption is 26mA from a 1.5V
supply, excluding the VCO output buffer that consumes 7mA from a 1.1V supply.
Figure 6-7 shows the measured phase noise at 3.67GHz from an Agilent Signal Source
Analyzer E5052A, where the results are shown with and without cancellation of the
quantization noise. As the figure reveals, greater than 15 dB noise cancellation is
achieved such that out-of-band noise is dominated by the VCO. With noise cancel-
lation enabled, the in-band noise is -108dBc/Hz at a 400kHz offset, and out-of-band
noise is -132dBc/Hz and -150dBc/Hz at 3MHz and 20MHz offsets, respectively. In
particular, the very low in-band phase noise verifies the. very high-resolution of the
GRO-TDC achieved through noise-shaping.
To examine how the 1/f phase noise below 10kHz offset frequencies in this mea-
surement can be compared to the GRO-TDC chip measurements, we first convert
from the power spectral density shown in Figure 5-7 to a TDC quantization noise,
S, (ej27rfT), by multiplying with 2. 10-24/T, which accounts for the double-sided to
117
104W
tamNoise 2 Zýwa RN
100 6UKA77~el
'cf 40 40,
6W0 kHZ 10 SL
x :=2:S C*C~z
A y P Xi Fulldarvo
4nalysjstwp I RaYpll
Rang*
4..108,dB0c/Hz
@400k~zwith noise,
cancelaton -150
~d4H
-/H
Figure 6-7 Measured output phase noise from the prototype 3.6GHz fractional-N digital
PLL using the GRO-TDC
single-sided spectral densities, the TDC sampling rate, and the unit change from
picoseconds to seconds. For example, at 10kHz offset, the GRO-TDC PSD from Fig-
ure 5-7 is approximately -79dBps 2 /Hz, which in this case means that Sq -239dBs.
When this value of Sq is substituted into Equation 6.1, we find that
This calculated value is about 1dB lower than the digital PLL noise seen in Figure 6-
7, which can likely be attributed to 1/f noise added from other PLL circuits in the
signal path. As we will see in the next section, we can expect that any increase in
delay or TDC measurement offset to result in additional noise.
Note that the GRO-TDC used in this high-performance digital PLL requires no
118
calibration of TDC differential non-linearity, and does not receive any special treat-
ment at the system level to avoid deadzones or limit-cycle behavior. In addition, the
reported phase noise results are robust, repeatable, and consistent over time, which
proves the robust implementation of the GRO and the employed phase measurement
techniques.
Finally, the reference spur was measured with an Agilent Spectrum Analyzer
8595E to be -65dBc, and fractional spurs were tested from 3.620 GHz to 3.670 GHz.
The worst case spurs are -42dBc at carrier frequencies of 3.6496 and 3.6504GHz, and
typical spurs were measured below -64dBc. Reduction of fractional spurs is an on-
going research area for PLL for both analog and digital PLL [74], since achieving
excellent spectral purity is an important consideration for fully-integrated synthesiz-
ers. Although the fundamental issues of crosstalk and power supply coupling can be
improved through careful layout and design, in the future we may expect significant
improvement in this area from novel system architecture that can take advantage of
either the high-performance or digital nature of converters such as the GRO-TDC.
119
L.
0
o C
E"
eo
*0
_J
Figure 6-8 The relationship between the magnitude of the TDC input and the random
measurement error due to thermal and 1/f noise. (a) depicts the TDC input / output
transfer characteristic, and (b) generally relates the statistical measurement jitter to the
TDC input
ble output frequencies, all with very high accuracy and low-noise, from a single system
clock. Although typical timing reference frequencies are comparable or below crystal
oscillator frequencies in the wireless communication industry, the normalized phase
noise performance for these applications can often be 30dB lower than standards such
as Bluetooth, or even GSM. Another primary difference between these applications
is that for synchronization of a frequency reference, we are primarily concerned with
adjusting the output frequency to compensate for slow drift due to temperature and
other environmental changes. Therefore, a very low loop bandwidth of 10-100Hz is
needed, which can be leveraged (through Equation 6.1) to reduce the impact of TDC
noise significantly.
Even despite the very low loop bandwidths that are permissible in this application,
the GRO-TDC 1/f noise will still have non-zero contribution to the output phase
noise. To consider how this contribution can be minimized, in Figure 6-8(a), we see
that a large DC value for the TDC input will result in increased uncertainty in the
TDC output due to the accumulating jitter of the TDC delay elements. Another
way to view the same issue, as presented in [20] and shown in (b), is to plot the
120
Standard Fractional-N
Architecture Reference
Dr-9----
eeIC
PLL
Output
Phase ,
Error .1L
Large Phase Error
Prototype
Fractional I/ Integer Reference
Architecture
Reference PLL
Phase Loop LO utu
Output
DetectorE Filter i M'FREF
Fractional-N PLL
Phase _ _
Integer Fractional Errorinimal Phase Error
Divider MFREF Multiplier
Minimal Phase Error
Figure 6-9 Concept behind the proposed fractional / integer synthesizer that minimizes
the length of time input into the GRO-TDC
jitter of a measurement output vs. the length of the measurement interval with a
log-log scale. In either case, the conclusion is the same in that a smaller average TDC
input will reduce its overall noise contribution (Note that this conclusion describes
a fundamental issue of time uncertainty, and is equally valid for digital as well as
analog PLL).
In fractional-N digital PLL, the TDC offset must be set large enough to accomo-
date many periods of the VCO, since the divider value is dithered within a range
of 4-8, depending on the order of EA modulation. This large offset shown on the
top of Figure 6-9 does introduce additional noise. However, in many communication
systems this issue is not of concern for two reasons. First, with an output frequency
typically larger than 1-2GHz, even a TDC offset of 10 VCO periods represents a rela-
tively small length of time. Second, typical wireless communication requirements for
in-band noise are not high, as evidenced by the use of low-cost, relatively high-noise
crystal oscillators (high-noise when compared to the requirements for the current
application) .
121
100MHz
Analog Crystal
DRf.....
eIlul
Filter Oscillator
For synchronizing a 100MHz crystal to a timing reference with even lower fre-
quency with the lowest possible noise, it is clear that a classical fractional-N architec-
ture is non-optimal. To achieve a much smaller average TDC input, as shown on the
bottom of Figure 6-9, we instead propose a fractional / integer PLL architecture. In
this architecture a fractional divider is implemented by first multiplying the 100MHz
crystal output with a fractional-N digital PLL, and then following with an integer
division. Although there are two GRO-TDC in the signal path, which may intuitively
imply a larger noise, the sum of TDC input widths here is much smaller than in a
classic fractional-N topology. Specifically, the fractional-N GRO-TDC sees an average
input of less than 2ns, and the primary loop phase error can be maintained at a very
small value because the frequency of the feedback signal is synchronized to be equal
the reference frequency.
122
S-- 98MHz Reference
-60 r 100MHz PLL Output
-80 k IAIli
%100
-140
.
11ki-1601-
-180•. 107
100 101 102 103 104 105 106
Offset Frequency (Hz)
Figure 6-11 Measured 100MHz phase noise of the prototype fractional / integer syn-
thesizer
limit the ability to arbitrarily define the loop dynamics, issues such as settling time
and loop bandwidth are somewhat flexible in this application.
A 98MHz, fixed-frequency, temperature-regulated, quartz crystal oscillator is used
as the timing reference frequency, and a tunable 100MHz oscillator with the same
characteristics is used for the output frequency. The tuning gain of the 100MHz
oscillator, K, is about 500Hz/V. Both crystal oscillators are manufactured and pro-
vided by Frequency Electronics, Inc.
Measured phase noise performance for both the 98MHz timing reference as well
as the 100MHz synchronized output are shown in Figure 6-11. The overall PLL has
Type-II dynamics, and has a loop bandwidth for this measurement set approximately
to 10Hz. Although the entire loop filter can be implemented in discrete-time, due to
the more relaxed size and cost requirements there is little penalty in this application
for including a coarse RC analog filter with a pole at about 100Hz to attenuate spurs
and noise from the FPGA and DAC. As seen in the measured data, all noise outside
the frequency range of 10-300Hz is limited by the crystal oscillators, and a peak
deviation from the crystal noise is about 10dB for frequencies of 50-100Hz.
123
)ut
Ref
Sel
Ref I Vtune
mtoo high
Mux No.too low
wI ideal
Out 1... Y JTJ1L (ideal)
Figure 6-12 Concept of a multiplying delay-locked loop
The PLL noise performance demonstrated in this prototype for very low-offset
frequencies is competitive with all-analog implementations, yet the result is obtained
while maintaining the versatility and portability. of an all-digital PLL. The proposed
fractional / integer architecture is easily adaptable to accomodate different frequency
plans for both the reference as well as the radio frequency. Finally, while the frac-
tional / integer prototype underutilizes the 3.6GHz on-chip output, if fractional-N
multiplication is required at the application level, a hybrid approach using a variety
of off-chip oscillators can also be considered.
124
MDLL Output
TDC Enable
T+A T T+A T
TDC Output IT+A T TT+A
Correlation t -
running ring oscillator VCO with a reference frequency edge, where N corresponds
to the frequency multiplication factor. This has been shown to allow significant
suppression of jitter caused by phase noise of the VCO [2]. However, as shown in
the figure, an incorrect setting of the Vt,, voltage on the VCO (which tunes its
corresponding frequency) leads to substantial undesired "deterministic jitter" due to
corresponding periodic changes in the output period [1-2, 4-6].
Because elimination of this deterministic jitter is quite challenging in the analog
domain due to mismatch [4,5], an alternate approach is proposed in [21] that uses the
GRO-TDC to measure and compensate for this error. With the approach illustrated
in Figure 6-13, only one signal is examined, Enable, whose pulse width alternates
twice every reference cycle between the free running period of the oscillator, T, and
the period of the error-affected cycle, T + A. By doing a relative comparison of each
consecutive pulse period of the Enable signal, the value of A can be obtained in a
manner such that the issue of mismatch is greatly mitigated since only one signal is
being examined.
The overall MDLL prototype, which is shown in simplified form in Figure
6-14,
consists of two integrated chips, a GRO-TDC chip and another with the MDLL
core logic, an FPGA board that implements the correlator, accumulator, a first-
125
FPGA
MARKER a
5S.i MHz
-58.31 dB
"-...1 d:.... ...... .........
..................
. ' ........
WA SB
SC FS
CORR •::•i:...........
. .....
......... .........
........ = :..
......... .. . .......
................
. ..................... ...................
Figure 6-15 Measured -58dBc spurious performance from the MDLL prototype
order, digital EA modulator, and other basic logic operations, an off-chip, low noise,
100MHz reference source, and a commercially available 16-bit DAC. While 16-bits
are available for the DAC, only 8-bits are used in conjunction with a first order EA
modulator. Notice that again in this architecture, the key elements are the GRO-
TDC, a custom oscillator, a DAC, and some digital logic, which highlights how a
high-performance TDC can be leveraged for multiple applications by adding a small
number of new components.
126
Figure 6-16 Measured MDLL phase noise at 1.6GHz output frequency
As the measurement of the MDLL output with HP8595E spectrum analyzer re-
veals in Figure 6-15, the reference spur of the MDLL prototype is -58.3dBc. From
this number the deterministic jitter is reported to be 760fspp, which validates the
proposed techniques ability to achieve sub-picosecond deterministic jitter. As an ad-
ditional measure of the performance, the phase noise of the MDLL output is shown in
Figure 6-16. The random jitter can be estimated by integrating the measured phase
noise from 1kHz to 40MHz, and is reported to be 679fsrms.
127
as well (e.g. pulse injection-locked oscillators [22]), and the digital processing can
easily be modified to compensate for other spectral content as well. In fact, given
the high-resolution TDC now available as a tool for designers, this general technique
appears to be very promising for a wide variety of future system architectures.
128
Chapter 7
Background on VCO-based
quantizers
129
VCO
VRf
Ref
130
Clock
Vtune
Oscillator
Count
Error
S q[O]
- ----------
,.
q[1]
R q[2] q[3]a
S-q[0] -q[1] :-q[2] .-q[3]
I I
Out 3 3 4 * 3
oscillator with a constant Vtune input. The key point here is that the truncation error
q[k] at the end of a clock period boundary is not lost, but rather it is accounted for in
the following measurement. The accumulation of phase error from sample to sample
is then maintained to within a single quantization level, which leads to a time-varying
output even with a constant input. This is shown in the figure by the extra count in
the third sample of the sequence [3 3 4 3]. Examination of the quantization error
signal, Error, in the figure reveals that it takes the form
where q[k] corresponds to the truncation error that occurs at the edge of each clock
period boundary. Under the assumption that q[k] is white in its noise profile, Equa-
tion 7.1 reveals that the overall quantization error is first-order noise-shaped.
The oscillator-based ADC of Figure 7-1 and 7-2 can be related to the well-known
slope-based converter (single or dual slope) [64] in that both architectures translate
an input voltage signal into the time-domain, where it is then quantized. However,
we make a key distinction that the single-slope ADC effectively compares an input
signal to an integrating waveform, while the VCO-based quantizer actually integrates
131
Ring Oscillator
m--- --..------ i i----- - -
Ref JI 1I -
V'uno__ SI
.iL I ii I
-
! | I II
muug13 JU 1 u It
Figure 7-3 Improved resolution by counting positive and negative transitions of a multi-
phase VCO
the input signal in continuous time. As a result, the slope-based ADC lacks noise
shaping, and is not well-suited for oversampling applications. In fact, the linear
tradeoff between sampling rate and dynamic range limit the slope-based Nyquist
converters to high-resolution applications only when a very low input bandwidth is
desired. Regardless, the many variations on these time-based circuits for ultra low-
power sensor applications highlight the efficiency of combining voltage or current
integration with digital clocks [14,33, 73, 75].
To improve the raw resolution of the VCO-based quantizer, the VCO needs to
generate more edge transitions during the sample period. This can be accomplished
by adopting a ring-oscillator structure to generate N multiple VCO output phases,
as proposed in [24] and shown in Figure 7-3. Here, each positive and negative phase
output from the ring-VCO drives a counter input, producing a total count with higher
resolution by a factor of 2N compared to the single-phase VCO-based ADC of [5,26]
for the same period.
Although the VCO-based quantizer shown in Figure 7-3 provides a convenient
illustration of the basic principles involved, its practical implementation is problem-
atic due to the reset operation that is used on its counters. Indeed, in cases where
a VCO edge occurs in close proximity to the reset signal (which will occur quite of-
ten), the measured edge count is likely to become corrupted due to the propagation
132
VtMn
e N-Stage Ring Oscillator
VCO XOR Quantlzer
S. Fclock > 2*FVCO
Output Output Output
*** 101010010
M I o-- 110000111---> 5
Clock N-bit Rtgister
iNAgagg ->001111111 -- ->7
|*** 0 rI i-bit J
010101010
- --- --- 11 1 1 10000 ---> 5
101011010
Wg---- 000001100 ---> 2
k XOR:Gates , 101010110
359M US--> 111100011 --- > 6
I i ,
010110101
Out
delay characteristics of the counters and the need for adequate setup times on the
sampling registers. This count corruption process will, in turn, destroy the desired
noise shaping properties of the structure.
133
with each progressing sample. This property will be exploited later in this chapter.
Ts
T < N, (7.2)
min {Tde (V)}
where Tdelay (V) is the propagation delay of each delay stage as a function of VCO
tuning voltage, and T, is the sampling period. Since the oscillator period, Tco,
corresponds to the time it takes a given edge to propagate through each delay stage
twice, we also have
T,,co (V) = 2NTdelay (V). (7.3)
By combining Equations 7.2 and 7.3, we can offer alternative views of the same
restriction to be that
where F,,o (V) corresponds to the instantaneous frequency (in Hz) of the oscillator and
F, = 1/T, corresponds to the frequency (in Hz) of the sampling clock. Equation 7.5
therefore states that the maximum oscillator frequency should be confined to be less
than half of the quantizer clock frequency. If we assume that the nominal oscillator
frequency, Fo, is half of its maximum value (such that half of the elements transition
for zero input), then we are left with requiring a sampling rate that is four times the
nominal VCO frequency. Thus, we have another design constraint that
2
F, N 2
Tdelay' (7.6)
where Tde-ay is the nominal delay for each oscillator stage.
134
VCO / Quantization Output
Noise Noise Noise
-20 dB/dec 20 dBNdec
First Order f Bdec f 20
VCO Quantizer Difference
Vt Out
Figure 7-5 Block diagram model and corresponding linearized frequency domain model
of the VCO-based quantizer
Figure 7-5 depicts a functional block diagram of the VCO-based quantizer on the left,
and its corresponding linearized frequency domain model on the right. Comparing the
block diagram to the corresponding quantizer structure in Figure 7-4, the VCO block
corresponds to the ring oscillator and the Quantizer block corresponds to the first set
of registers which sample the quantized phase signal of the VCO. The First Order
Difference block corresponds to comparison of the register values to their previous
sample values by the XOR gates in Figure 7-4. In the corresponding frequency domain
model, the VCO is represented as an integrator with gain 27rK,, which represents
conversion of the Vtne voltage to a VCO phase signal, and the addition of phase
noise. The Quantizer is modeled as a sampler that adds quantization noise, and the
135
First Order Difference block is seen as a 1 - z- 1 transfer function that performs a
discrete-time differentiation.
A key observation offered by Figure 7-5 is that the quantization noise is first-order
noise-shaped by virtue of the first order difference operation shown in the figure,
which is in agreement with the time domain view of the quantization noise described
in Equation 7.1. We also see that the VCO phase noise is shaped as well, but the
result of such shaping is a flat spectrum due to the -20 dB/dec slope of the original
phase noise signal. In reality, the shaped VCO phase noise will also include 1/f noise,
but this is ignored for now for the sake of modeling simplicity.
In effect, the First Order Difference block converts the VCO phase signal to a
corresponding VCO frequency signal. To be precise, however, the discrete-time (DT)
differentiation is not an exact inverse function of the continuous-time (CT) integra-
tion, noting first that sampling will alias the input signal, and that the 1 - z- 1 filter
is only an approximation to the CT differentiation. As shown in Figure 7-6(a), the
resulting DT spectrum of the VCO frequency measurement tightly follows the input
spectrum for low frequencies with the expected low-frequency gain factor of 27rK,,
but then begins to fall off slightly around F,/2 (Q = r) due to the CT/DT inverse
approximation.
For purposes of linear analysis it can be useful to choose a primary time domain
in which to operate, and for historical reasons we choose here to use discrete time.
Therefore, we next will develop a DT model for the VCO-quantizer that will be helpful
later in this chapter. First, it is commonly known that the DT accumulation can be
136
CT Input CT VCO DT VCO DT Quantizer
1 Phase Phase 2xK, Output
Alias Alias
Sf _Sum
Sum
1 1
1- -1 1 - e-sT
(7.7)
(7.8)
1 - (I + 1
+
+(---T) +(-8.Ts)
2
+
+(-8.T)
3
+"')
+
sT,'
SIsl < Fs. (7.9)
To create the DT model, we then replace the CT VCO gain of 2rK,/s with the
DT VCO gain of 2rKT,/ (1 - z-1), and move the sampler gain of 1/T, before the
VCO-quantizer as illustrated in Figure 7-6(b). Not surprisingly, for low frequency
input signals we can now approximate the VCO-quantizer as a single block with gain
137
Avco-q(z) that translates an input voltage Vtne(z) to a frequency (in rad/sample) at
the VCO output Out(z) by
Avco-q(Z) Out(z)
(z) 2-rKT, [rad/sample/V]; w < F,. (7.10)
Now that a model for the VCO quantizer has been described, we can utilize the
well-established analysis of oversampling quantizers in order to provide a theoretical
bound to its SNR performance. The expression for peak signal-to-quantization noise
ratio (SQNR) of a EA converter is found in [17] to be
where 3 is the number of bits, n is the EA order, and the oversampling ratio OSR =
F,/(2Fb). For the first-order VCO-based quantizer, with Tdeay and F, as the primary
design variables related to N through Equation 7.6, we have
22 - 1 = N = (7.12)
Fs - Tdelay
9 F
SQNRpea = F (7.13)
47r2 (Fb) 3 (Tdelay) 2
One important thing to notice from Equation 7.13 is that SQNRpeak of the VCO-
based quantizer improves independently with both faster sampling and faster delay
elements. For a series connected ring oscillator, the nominal delay per stage is set to
be approximately twice the minimum inverter delay in the process, and the sampling
rate is set to be as large as practical. Thus, advancing the process to reduce the digital
delay by a factor of 2 can improve SQNRpeak by 9dB for the same input bandwidth.
138
Input Input , Output
Spectrum Harmonic Spectrum
fIll hllf
7.3 Example
The previous subsections highlighted quantization noise and, to a lesser extent, ther-
mal noise as key non-idealities of the VCO-quantizer. However, one important issue
that has so far been neglected is that the voltage-to-frequency tuning curve of a VCO
is quite non-linear in practice. Figure 7-7 shows that the impact of such non-linearity
is to introduce harmonic distortion which can significantly degrade the SNDR per-
formance of the quantizer. Although the linear models so far provide an intuitive
understanding of the VCO-quantizer, we will now see that the VCO non-linearity is
actually a critical bottleneck to achieving good SNDR performance when this quan-
tizer is used for analog-to-digital conversion.
To gain a better idea of the relative limitations posed by each of these nonideal-
ities, we now present an example design of a VCO-based quantizer. Considering a
0.13pm CMOS process technology, along with typical noise and non-linearity perfor-
mance, we choose to make the following assumptions for the design example:
139
Example VCO-quantizer Output Spectrum
Frequency (Hz)
From Equation 7.6, the above choice of sampling frequency and delay implies that N
= 31 and that F,,o = 250MHz. The K, of 750MHz/V then restricts the maximum
input signal to be ±300mV.
Figure 7-8 displays the impact of the three key nonidealities on the quantizer
output spectrum given a 2.5MHz input signal near full-scale. The figure illustrates
first order noise shaping of the quantization noise, filling in of the low frequency noise
by the VCO phase noise, and harmonic distortion caused by the non-linear VCO
tuning characteristic.
In this example, let us choose to lowpass filter the quantizer output with a band-
width Fb set to 20MHz, which coincides with the point at which the influence of
quantization noise is comparable to that of the VCO phase noise. In such a situation,
we obtain the following SNDR values:
140
. Quantization noise and VCO phase noise: 65dB,
50dB)
This example clearly reveals that VCO non-linearity forms the primary bottleneck
to achieving high SNDR values for the VCO-based quantizer. It is this issue that leads
us to the EA ADC architecture presented in the next chapter.
141
142
Chapter 8
143
AAA . AAA Clock AAA
1 JV
fact that many traditional techniques for reducing quantization noise also apply to
suppressing VCO-based quantizer non-linearity. Last, we confirm the model with
behavioral simulation of two idealized converters.
144
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
I
r
Figure 8-2 Utilizing VCO for implicit barrel shift DEM of DAC elements
namic element matching (DEM) techniques are well known, many approaches become
overly complex for many levels or are not suitable for clocking at very high-speed.
Fortunately, the multi-bit VCO-based quantizer can implement a barrel-shift DEM
algorithm without any penalty in terms of latency or power, which is a significant
advantage of the architecture.
Figure 8-2 illustrates how by connecting the outputs of the VCO-based quantizer
to the DAC elements in a bit-wise fashion, the phase rotation of the VCO inherently
implements the barrel-shift DEM algorithm [39]. Instead of digitally summing the
XOR outputs prior to the feedback DAC, an analog summation is accomplished with
current after the DAC. The first element to be used in a sample period is the last
one left over from the previous sample, which ensures that each element is used with
equal likelihood. To generate the output word, digital adders are still required, but
these may be pipelined as the delay has been removed from the critical path.
We should note that some very demanding applications have avoided use of the
barrel-shift algorithm due to the potential for tones created by limit-cycles in the
signal band. This issue is a valid concern, as will be seen in Chapter 10, although the
level of degradation can be considered to be negligable for the vast majority of appli-
cations. Compared to the comparator-based quantizer, which has no inherent DEM
145
Il K
Vin
0 Vthreshold VDD
8.1.2 Metastability
Another critical aspect to a high-speed EA ADC design is that the quantizer bit deci-
sions must be made quickly and decisively. It is then worthwhile to consider a useful
advantage of the VCO-based quantizer over classical comparator-based, multi-level
quantizers with respect to metastability behavior. Let us first consider metastabil-
ity for the general case of a single comparator and then apply this result to both
quantizer topologies.
As shown in Figure 8-3, the comparator regeneration time, TcQ, between the
sampling clock edge and a valid output, is a strong function of how close the input
voltage lV, is to the comparator threshold voltage, Vthreshold. Without noise, the
regeneration time is infinite for an input voltage exactly equal to Vthreshold . If we can
allot a maximum regeneration time TCQ-,ax for the comparator decision to be made,
then there is a small voltage 6,/2 for which
For simplicity, we can say that the input voltage to the comparator is a random
variable with uniform density on the interval [0, VDD], which gives us the probability
146
of metastability in a single comparator to be
PCMP [metastability] = Pcomp [Vin I TCQ (Vin) > TCQ-max] VDD (8.2)
In an ideal FLASH ADC, the input voltage interval [0, VDD] is uniformly divided
into N subintervals, each with a unique threshold voltage centered on the subinterval.
Let us assume that for a single input only one comparator has an input signal close
to its threshold, which gives the probability of metastability for the FLASH ADC of
NgS
Pflash [metastability] = Pflash [Vin ITcQ (Vin) > TcQ-ma] N6" (8.3)
VDD
147
high-speed operation with minimal power consumption.
148
8.1.4 Power Supply Considerations
One final issue to consider in the design of high resolution ADC is the correlation
between the input signal and power consumption, either through digital switching or
analog biasing. If such power supply variation or noise non-linearly couples into the
signal path, distortion in the actual conversion can result. For the multi-bit FLASH
quantizer, each of the comparators switches for each sample, and so to first-order the
quantizer power consumption does not depend on the input signal. For the VCO-
based quantizer, the switching activity within the VCO core is directly proportional to
the input signal, and as such the power supply current is a relatively strong function
of the input signal. As such, care must be taken to properly isolate the VCO power
supply from other analog blocks in the signal path.
While we hypothesized earlier that feedback with high gain will improve the VCO
non-linearity, a more quantitative examination of the non-linearity suppression can
be useful in highlighting the fundamental tradeoffs and limitations of the technique.
Figure 8-4 shows a simplified DT and CT model for a basic EA VCO-based ADC
that includes error terms from both a quantization error, Eq, and also a VCO non-
linearity error, En. Although each domain has advantages for different stages of the
ADC design, as mentioned earlier we will use DT from this point forward, without
loss of generality. In this model the units of Eq are [rad], and the units of En1 are
[rad/sample], which normalizes the non-linearity error to the reference frequency.
As the quantization noise-transfer function Hq describes how the quantization
error Eq is shaped in the digital output of the ADC, we can also consider a non-
linearity transfer function HI that will suppress the non-linearity error En, from the
VCO-based quantizer. For this analysis we make a small-signal linear approximation
that allows us to define an input signal E,,l(z) that is decoupled from U(z), allowing
149
L
(a)
Enl(f) Eq(f)
(b)
Figure 8-4 A model in discrete-time (a) and continuous-time (b) for the VCO-based
quantizer EA ADC with non-linearity error E, 1 and quantization error Eq
us to estimate how well the loop is able to reject E,~ (z) as a function of frequency.
With these definitions, we can generally describe the modulator output V(z) as
m0 -'····I-··
~··i..
-50 II
1 Order,
-60 2"10Order, DGZeros....
- 2nd Ordee: Optimal Zero
-70 d- OrderDC Zeros...
- 3 Order Optimal Zero
-80---4'OrdesrDj CZeros
- 4th Order; Optimal Zeroi
1U
1, cr·r
0
U I.v
A
Oversampling Ratio
Figure 8-5 Maximum in-band IH(z)l for a lowpass modulator across oversampling ratio
and loop order. The zeros are placed either at DC (dashed line) or at locations optimal
for the oversampling ratio (solid line).
1
H(z) 1 + AcoqAacA(z)
(8.9)
(8.9)
Equation 8.7 confirms that the non-linearity error E~1 will observe a high-pass
transfer function as set by the overall loop order and dynamics, with suppression
approximately equal to the open-loop gain of Aco-qA&cAf(z). Also, we can also
see from Equation 8.8 that the quantization noise suppression is one order higher
-1
than the order of the loop filter due to the 1 - z term in Eq. Lastly, in terms
of minimizing both quantization noise and VCO distortion, we clearly desire a large
Alf(z) to minimize IH(z)l in the signal band of interest, noting that Alf(z) is a strong
function of frequency.
The standard techniques to minimize IH(z)I given a low-pass signal bandwith are
to increase the loop order and to optimize the placement of H(z) zeros. Figure 8-5
plots the maximum value of |H(z) for varied oversampling ratio, loop order, and zero
optimality [62], which directly corresponds to the minimum amount of VCO non-
linearity suppresion. A loop order of up to four is readily achievable as a standard
151
practice today, and in this case a large oversampling ratio (OSR) provides tremendous
supression of VCO nonlinearity error. However, as the OSR decreases to less than
20, for stability reasons the various loop orders begin to cluster together. In fact,
if the OSR < 16, the higher order loops lose so much advantage that a first-order
loop is actually preferable to a fourth-order loop without optimal zero-placement.
Therefore, applications with larger OSR will especially benefit from the advantages
of the VCO-based quantizer.
We can now make a few general observations regarding the suppression of VCO
non-linearity from EA feedback. In one sense, the EA modulator has improved the
VCO-based quantizer nonlinearity by approximately the gain of the loop, which is a
significant and marked advance compared to the stand-alone architecture. However,
we can also see that the linearity performance of the VCO has not been improved in
relation to the quantization noise. Observe that both the quantization noise Eq(z)
and the distortion El(z) have been modified by the same factor of H(z) compared
to the quantizer without feedback. Therefore, as was the case in the VCO-based
quantizer example from Section II, we may expect that the VCO non-linearity may
still present a limitation for frequencies very close to the maximum edge of the input
bandwidth.
8.3 Example
To verify the above analysis, we can again simulate an example VCO-based quantizer
EA ADC at the behavioral level using CppSim [50], a very-fast code-driven C++
simulator that is especially targeted at high-performance mixed-signal systems. A
tutorial that includes the example simulation is also free and available online.
Before simulating the converter from Figure 8-4, we first need to consider that,
even in the ideal sense, the VCO-based quantizer has a delay that has so far not been
modeled. This excess loop delay causes phase lag in the signal transfer function, and
must be accounted for in order to ensure loop stability. For our purposes here, the
delay for the VCO-based quantizer is approximated by a single sample period. As we
152
Figure 8-6 Model for the prototype ADC including excess loop delay and a minor com-
pensation loop
will see in Chapter 9, this estimate of a single sample period agrees fairly well with
a more precise delay value calculated for a practical system, and allows for relatively
simple calculation of loop filter parameters.
A modified block diagram that includes this excess loop delay as a z-1 delay
element is pictured in Figure 8-6. Also included in the system is a minor feedback
loop that compensates for the impact of excess loop delay incurred by the latency
of the VCO-based quantizer [80]. To explain in more detail, we calculate that the
prototype noise transfer function H(z) from Equation 8.9 is modified to now be
1
H(z) = 1 + Avco-q [AdaclAlf(z)
1 + Adac2] z - 1 (8.10)
Although the feedback from Ada 1 is now delayed by both the loop filter and the excess
loop delay, the overall effect on the loop dynamics is mitigated by proper design of
Ada 2 and Alf(z). A design procedure has been outlined in [80] (and scripted in the
tutorial) that allows the designer to map from the desired NTF to the design of Alf.
In this example, we examine the SNDR performance of the same ADC with two
different loop filters, and the same assumptions regarding the VCO-based quantizer
have been made in this example as in the previous case without feedback. The first
case is a 2 nd order loop filter without zero optimization, and the second is a 4 th order
loop filter with optimized zero placement for Fb = 20MHz. Figure 8-7 displays the
original VCO-based quantizer spectrum from the earlier example in the background,
153
Example TA ADC Output Spectrum
u i| I~~fISNR = 86.6dB SIi i SNR =99.5dB
....i"•"i'i'iiiii ....i' "ili'ii...SNDR = 77.4dB
I
-40l.... -:-,v i rr,"'"~r. *I
-40 .... iiii. 14411tr.... l~r.
~ (*i';
i
-120 .... -120 ...
-160LLL
nI~t
-1~10•o 10 10' 10o
10- 10 10' 10"
Frequency (Hz) Frequency (Hz)
(a) (b)
Figure 8-7 Behavioral simulation results of an example VCO-based quantizer EA ADC
with (a) 2 nd order loop filter with NTF zeros at DC and (b) 4 th order loop filter with
optimized zeros for Fb = 20MHz
and overlays the spectrum of the EA ADC in the foreground. The first case with a
2nd order loop filter is shown on the left side in Figure 8-7 (a), and one can clearly
see that the suppression of both the quantization noise and non-linearity decreases
with frequency, as expected. On the right side in Figure 8-7 (b) is the second case
with a 4 th order loop filter with optimized zeros. Here, the level of error suppression
is significantly increased, and the suppression is generally flatter across the band of
interest. Note that the "white" noise in Figure 8-7 (b) is not quantization error, but
rather it is believed to be a error from quantization effects.
By comparing the simulation results of the two loop filters, we can justify the
assumptions made previously in developing the model for non-linearity suppression.
In fact, both the levels of suppression as well as the overall frequency dependence
agree with what would be expected from the model. In practice, however, there
are many other potential sources of non-linearity in these very high-speed EA ADC
must be
(e.g. DAC mismatch, front-end amplifier distortion), and these other errors
balanced not only against the VCO-based quantizer non-linearity, but also against
thermal and 1/f noise.
154
8.4 Conclusion
This chapter has compared the use of the VCO-based quantizer to the traditional
FLASH based architecture, and found that the VCO-based quantizer offers a few
unique advantages such as the ability to provide inherent dynamic element matching,
as well as reduced sensitivity to metastability and comparator offset. The primary
issue with the VCO-based quantizer, the linearity of its voltage-to-frequency tuning
characteristic, has been modeled within a EA ADC and we have seen that for large
open-loop gains, the linearity performance can be improved significantly. Finally, the
model was substantiated with simulation examples, illustrating that while the VCO
non-linearity has indeed been suppressed by the gain of the preceding loop filter, it
may yet pose a limitation for overall converter distortion performance.
155
156
Chapter 9
157
VCO-based I DouT
Quantizer
with Implicit
Barrel-Shift
DEM
order Candy structure [47], its design is actually quite different with respect to the
means by which it achieves stability. In particular, the minor loop feedback, which
is created by feeding the output current of DAC2 into the Vt.ne node, is not formed
around an integrator as would be done in the Candy structure. Rather, the two
integrators occur before the minor loop, and consist of an active integrator (formed
by the opamp and elements RA and CB) and a lossy integrator (formed passively
by elements RIN, CIN, and RA). Stability of the structure therefore requires the
inclusion of an open-loop zero in the signal transfer function, which is formed by
elements RB and CB.
With the ADC having a target signal bandwidth of 10-20MHz, the actual closed
loop bandwidth of the ADC was then designed to be around 160MHz. To achieve
adequate phase margin, the stabilizing zero formed by RB and CB was set to be in
the range of 75-110MHz (as influenced by the setting of CB, as explained in the Loop
Filter subsection). The passive filter, which forms a lossy integrator as mentioned
above, was set to be slightly less than 10MHz in order to attenuate the large current
pulses from the DAC 1 output. While the inclusion of the front-end passive filter leads
to a slight penalty in noise, it has the advantage of providing a very linear front-end
for the ADC and simplifying design of the opamp (which would otherwise have to
158
deal more directly with the current pulses of DAC 1 ).
As opposed to optimizing the zeros of the ADC noise transfer function for a
signal bandwidth of 10-20MHz, we chose to implement a simple ADC topology that
highlights the properties of the VCO-based quantizer. Additionally, the chosen topol-
ogy allows for second-order dynamics and third-order noise shaping with only a single
opamp. To explain, the proposed topology achieves third-order noise shaping through
the inclusion of three zeros within its quantization noise transfer function, Eq, as ex-
plained earlier. Two of those zeros, as provided by the VCO-quantizer and the active
integrator, are located at or very near the origin. The third zero, as provided by
the lossy integrator formed by the front-end passive filter, is located slightly below
10MHz as set by the bandwidth of that filter.
While the choice of 10-20MHz signal bandwidth did not explicitly influence the
zero placement, it was strongly considered in choosing appropriate thermal noise levels
for the opamp, DAC 1, and the front-end passive filter. These blocks were therefore
designed such that the overall thermal noise had a comparable spectral density to the
quantization noise at the edge of the signal bandwidth range (i.e., 20MHz).
Given the above overview of the proposed structure, we now examine its various
blocks in detail in the subsections to follow. In particular, we will present additional
circuit details of the VCO-based quantizer, the current DACs, and the loop filter.
Figure 9-2 illustrates a geometric view of the combined VCO-based quantizer, implicit
DEM, and DAC circuitry implemented with 31 levels. In essence, this structure cor-
responds to the VCO-based quantizer shown in Figure 7-4 which has been augmented
with DAC elements. A bit-slice of this structure, which is also shown in the figure,
reveals a variable delay consisting of a 4-transistor stack followed by a buffer, some
digital logic to implement the first order difference operation, and a DAC element
159
Figure 9-2 Geometric view of the proposed 31-level combined VCO quantizer/DEM and
DAC
with current output. The buffer is used to isolate the variable delay output from
the sampling register, which is implemented with standard cell regenerative latches.
Simulations demonstrated that metastability is not a concern, as predicted from the
discussion in Section 8.1.2. In terms of delay timing, a half-period is available before
generating the DAC pulses, which allows use of standard cell XOR gates and TSPC
DFF for the subsequent first-order difference logic.
160
500 I U I I
N Il
I
l l -
ll-I I
I
II -
I
ll-III -b
I
400 I
I
I
I
I
.,,I.,,.,..J..,,
0 300 I
I
I
,
I
I/
200
o
--- I .----- --------------- 1---I
= 100 I I
I i
I
I
I I
0U
0
n
-0.2 -0.1 0 0.1 0.2
Input Voltage (V)
process. In the prototype, the choice of N = 31 elements and Fdk = 950 MHz requires
a nominal delay of 70ps, and, therefore, a minimum delay of around 35ps.
In designing the variable delay cell for the VCO-based ADC, care must be taken
to avoid a large gain variation in the tuning characteristic of the VCO. Such gain
variation would directly alter the open loop gain of the overall ADC, which could
impact its performance and cause stability problems. Fortunately, with an input
common-mode set to mid-supply, the chosen delay cell has relatively smooth odd-
order non-linearity at both the bottom and top of the tuning curve, which can be
seen clearly in Figure 9-3. Of course, the quantizer does impose a limited range for its
operation, as seen by the fact that at -300mV differential input voltage, the oscillator
has slowed to a level near zero frequency, and above 300mV the oscillator starts to
reach limits in the high end of its frequency range. For the implemented structure, a
useful operating range for the VCO-quantizer is up to -2dBFS for 5-bit operation at
950MS/s.
To account for process variation in the center frequency of the oscillator, four gain
settings control the level of current drive in the delay cell. As shown in Figure 9-
3, the 2 bits of tuning can account for approximately +20% of center frequency
variation, and are hand-adjusted in this prototype. This constitutes a relatively
161
coarse adjustment of the frequency offset of the VCO tuning characteristic, which
is acceptable since any remaining offset simply translates into a differential offset
voltage at the input of the VCO tuning port. Of course, in the case of a severe offset,
linearity performance will suffer and, ultimately, the open loop gain of the ADC will
significantly drop if frequency saturation occurs in the VCO. Note that the impact of
power supply and thermal variations on the oscillator center frequency are mitigated
by the feedback having large gain at low frequency, as will be seen in Chapter 10.
Finally, since excess delay introduced by the quantizer degrades the phase margin
of the ADC structure, it is worthwhile to estimate its value in the proposed VCO-
based quantizer structure. To do so, note that Vtne is integrated over the previous
sampling period which can be seen as a 1/2 clock delay, and the DAC 1 pulse logic
begins 1/2 period after the quantizer positive sampling edge. Additionally, there is
an estimate of 1/4 clock delay for generating the RZ DAC pulses. The combination
of these effects leads to an excess loop delay of approximately 1.25 clock periods.
9.2.2 DAC
An RZ topology was chosen for the primary DAC in the prototype ADC (i.e., DAC 1
in Figure 9-1) in order to minimize the impact of inter-symbol interference at the
high sample frequency of 950MHz and to provide additional compensation of excess
loop delay introduced by the VCO-based quantizer. The penalties for choosing an RZ
topology are larger current variation at the output summing node, increased sensitiv-
ity to clock jitter, and increased power [80]. As mentioned earlier, the issue of current
variation was addressed through the use of passive filtering in the prototype. The is-
sue of clock jitter, which strongly impacts the SNR of any high-speed continuous-time
EA ADC structure, was addressed by using a low noise, off-chip clocking source. The
issue of power consumption was partially mitigated through circuit design efforts, the
details of which are described below.
The schematic for the primary RZ DAC element core is shown in part (a) of Fig-
ure 9-4, and the overall DAC structure comprises of 31 unit elements, each connected
bit-wise to the VCO-quantizer outputs. Degenerated transistors with moderate chan-
162
VDD
DATA =-1
DATA = 0
• 1 _
Vss .
Figure 9-4 Schematic and operation of (a) DAC 1 and (b) DAC 2
nel lengths (and accompanying cascode devices) are used on both the top and bottom
current sources to minimize thermal and 1/f noise. The output common mode range
of the DAC is set via the low impedance of the input signals, which have a common
mode voltage of half-supply (VDD/ 2 ). Large, off-chip capacitors are used for both the
NMOS and PMOS bias voltages to reduce the noise coupling from the current refer-
ence. The full-scale on current of DA C1 is ±9 mA, which corresponds to a full-scale
input current of +4.5 mA.
As shown in Figure 9-4 (a), a triple-source configuration steers the current bias
to either the positive or negative summing node during the active pulse, and to a
relatively low impedance node set at VDD/ 2 during the return-to-zero time. This
configuration allows the current sources to share current during the RZ time, and
163
therefore saves 25% of the current compared with alternative topologies. However,
there is still 50% more bias current used in this design than would be for an NRZ
implementation.
The RZ DAC switching waveforms are at full-level CMOS logic levels, so the
switching transistors see a large overdrive. The on pulse control is output from
NAND gates which retimes the data with the negative clock state. Careful attention
to balancing the differential signals helps to keep source bounce low during switching
events. Again, the power required in generating the switching waveforms for the RZ
implementation is significantly higher than for an NRZ DAC, especially considering
the 950MHz sampling rate.
In contrast to the RZ approach used for the primary DAC, the minor loop DAC
(which corresponds to DAC 2 in Figure 9-1) is implemented as an NRZ structure due
to its less stringent performance requirements. The clocking of this DAC is done
without retiming since the sensitivity to clock jitter and ISI is suppressed by the
forward integration path. The 31-elements of this second DAC are scrambled with
the barrel-shift DEM due to the bit-wise connection to the VCO-based quantizer,
though the issue of DAC mismatch is not as important for this DAC as the primary
one. The full-scale current of DAC 2 is nominally ±64 IA, and can be adjusted over a
wide range through an off-chip bias current such that peaking is properly controlled
in the noise transfer function (NTF) of the ADC. With the minor loop disabled by
removing the DAC current bias, the ADC was found to still be marginally stable.
The fully-differential loop filter schematic, which uses only a single opamp, is shown
in Figure 9-5. As mentioned earlier, the loop filter includes a front-end passive fil-
ter composed of elements RIN, RA, and CIN in order to absorb the large current
deviations of DAC1 and provide a very linear ADC front-end. Closer examination
of the front-end passive filter reveals that voltage VA is actually a virtual ground
when placed in EA feedback, so the ADC input current IIN is defined primarily by
resistor RIN. The capacitor CIN then filters the error signal -IN - IDAC1 before IA
164
Figure 9-5 Schematic of the fully differential ADC loop filter
Vss
is integrated onto capacitor CB, whose value can be adjusted by ±25% with an on-
*chip binary capacitor array. Adjustment of CB leads to a gain change in the active
integrator, which allows for better accomodation of K, variations in the VCO-based
quantizer. Of course, changes in CB will also lead to variation in the value of the
open loop zero formed by CB and RB.
The loop filter opamp is implemented with the two-stage Miller-compensated
topology shown in Figure 9-6. Since the ADC input is assumed to have a constant
165
common-mode voltage at its input, the first opamp stage can be cascoded even with
low supply voltage. Note that the output common-mode voltage also controls the
input common-mode of the VCO, and is set according to a common-mode feedback
circuit that consists of two large polysilicon resistors, a single-stage amplifier, and
an off-chip reference voltage [58]. Interestingly, because the VCO-based quantizer
offers relatively high SNR performance on its own, a large DC open loop gain is not
required for the opamp in the proposed ADC topology. As such, the gain is designed
to be over 50dB with a gain-bandwidth product in the range of 2-3GHz.
As mentioned earlier, minor loop feedback is used to compensate for excess loop
delay from the quantizer and DA C 1 in order to allow a more aggressive NTF. To avoid
the use of another amplifier for a summation operation, current DAC2 is directed
through resistor Rc such that the resulting voltage is added to the output of the
opamp. Although the opamp output resistance is non-zero, it is much less than Rc
in the frequencies of interest and does not need to be well-controlled since the gain
and precision of this minor loop is not critical to ADC performance. The value of Rc
is chosen to keep the parasitic pole, which is formed by Rc and the input capacitance
of the quantizer, from affecting the loop dynamics. The full scale current of DAC 2
is then set based on the value of Rc and considerations of the NTF. In addition to
providing analog summation without an amplifier, another benefit to this topology
is that the stability concerns of the operational amplifier are isolated from the input
capacitance of the VCO-based quantizer.
166
Chapter 10
A prototype of the ADC structure shown in Figure 9-1 is implemented in a 0.131tm CMOS
process. A microphotograph of the fabricated chip is shown in Figure 10-1. The ac-
tive silicon area of the ADC is 640ymx660pm, including power supply decoupling
capacitors and guardring. Area for the 5-bit VCO-quantizer core is 120pmx86/pm,
and the total chip area including 28 pads is 1.3mm x 1.3mm.
167
I
E
E
IL3
L- 1.3 mm 0I
Specification Value
Sampling Frequency 900-1000 MHz
Input Bandwidth 10 / 20 MHz
Peak SNR 86 / 75 dB
Peak SNDR 72 / 67 dB
Analog Power 20mW (1.2V)
Digital Power 20mW (1.2V)
Peak Efficiency 0.5pJ/step
Active Area 640pmx660pm
Total Area 1.3mmx 1.3mm
Technology 0.13tpm IBM CMOS
pling clock, with sharp bandpass filters for each as well. A fixed frequency bandpass
filter with extremely high-Q is required for the input signal, and a tunable band-
pass filter is used for the clock. All measurements are performed with the input
signal AC-coupled, and the single-ended to differential conversion is performed using
a transformer balun.
168
Variable Analog Bandwidth Mal
Variable Input Frequency
eu
80
SNR: 80 .. ....
..........
...............
.....
... ----------
SNR
70 70
-2 --
60 i -- i-----
---- ---------------
i -- 6so
:SNDR SNDR
so50 50 -............ . . .--- ---
m40 -
..........
- -i
. :.... M............
101-------------, -- m40
1.0, 2.5, 5.0 MHz
30
:10, 15, 20 MHz, 30
20
...--..
.. .... ;-:::ti 20 ·-----.i...-~i
i--- ~ ---- ·
---- ·------
.-IE----------
10 10
0 o
0
)1- -90 -80 -70 -60 -50 -40 *30 -20 -10
)0 -90 -80 -70 -60 -50 -40 -30 -20 -10 0
Input Amplitude (dBFS) Input Amplitude (dBFS)
i·-i-;:::::i::i 'SNR'
Qn·
60
4
It VDD I 1. V
70
60
-- Fs =100MHz
Or VDD I.S5V
:SNDR't
50
040
50
.....i..................
m40
30 30 ,...............
.....
~. ··--
i----'............-
20 ................ 4,- -
i------- ..................-
------- i--- 20
C,~..~.....~..r.....l.....~... ...........
10 10
i..........i.~~~...............~~
0 ------
0 ---- --- --- -----------
.... .. . .. .. .. ^^
•A
-100 -90 -80 -70 -60 -50 -40 -30 -20 -10 01-
0 -90 -80 -70 -60 -50 -40 *30 -20 -10
The power consumption of the ADC is 40mW, which is evenly split between the 1.2V
analog and digital supplies such that each draw roughly 16-17mA. Although there
is no direct way to measure the sub-system current, bias currents indicate that the
primary DAC consumes 9mA, and the operational amplifier 8mA. For the digital
supply, the pulse waveform generation circuits for the RZ DAC require about 8mA,
the VCO-quantizer 5mA, and the thermometer-to-binary summation circuits take the
remaining 3mA. A summary of the ADC performance is found in Table 10.1, where
the figure of merit is Power/(2. Bandwidth - 2 ENOB).
The SNR and SNDR vs. input amplitude curves across a number of operating
169
conditions are shown in Figure 10-2. If not otherwise specified, the input frequency
is 2.5MHz, the analog bandwidth is 10MHz, and the sample rate is 950MHz. At
a 10MHz input bandwidth, the ADC achieves at least 81dB SNR and 65dB SNDR
across all input frequencies, a power supply of 1.2-1.5V, and a sampling frequency
of 900-1000MHz. While a peak SNR of 14-bits at 10MHz is achieved very efficiently
with only 40mW of total power consumption, the ADC distortion performance is
limited by the VCO-quantizer non-linearity to 10.5-12bits, depending on the specific
test configuration.
The decline of SNDR with increasing signal frequency in Figure 10-2 is a con-
sequence of the reduced gain of the loop filter at higher frequencies, which leads to
reduced suppression of the VCO non-linearity. The degradation of SNDR by such
non-linearity was about 5dB higher than predicted by simulation. This is likely due
to the modeling accuracy of the VCO tuning characteristic, which can be affected
by layout in addition to process and temperature variations. It may be possible to
improve the SNDR somewhat with more attention given to modeling these issues,
although as we will see later that other techniques offer more promise in improving
VCO linearity.
It was observed that low input signal levels into the proposed ADC led to small
limit-cycles which were seen in the 10-100kHz frequency range. These limit cycles are
an artifact of the barrel-shift algorithm used for DEM on the DACs, which is why some
demanding applications avoid the use of the barrel-shift algorithm in favor of other
DEM strategies [3]. These small limit-cycles can reduce the SNR by a few dB when
the input signal falls below about -35dBFS, as seen in the SNR vs. amplitude curves.
As would be expected, the actual frequency of the limit-cycle depends somewhat on
the input DC level.
An FFT of the ADC output with an 1.045MHz input signal at -15dBFS is shown
in Figure 10-3. The third-order noise shaping is visible from 10-50MHz, and the
quantization noise peaks around 60MHz. A small noise skirt centered around 1MHz
was found to be from the bandpass filter used in testing. The high frequency quanti-
zation noise feature occuring in the 200-300MHz range is believed to be caused by the
170
V
10
Frequency (MHz)
mismatch between rising and falling edges of the VCO-quantizer, as verified with be-
havioral simulation. Fortunately, this artifact does not affect the functional operation
of the ADC as its stability was seen to be robust across a wide variety of operating
conditions.
10.3 Discussion
Table 10.2 compares this work with other reported CT EA CMOS ADC operating
at a sampling rate over 250MHz and an analog bandwidth of more than 5MHz.
171
The high SNR of 86dB achieved in this work points to the strength of the VCO-
quantizer architecture, which allows efficient reduction of quantization noise through
high-speed operation. In addition, the SNDR performance and power consumption
are in line with other realizations, and as seen in Chapter 8, additional VCO non-
linearity suppression is possible to improve performance further.
In combination with an optimized NTF for a 10-20MHz bandwidth, a higher-order
loop filter may be expected to yield at least another 10dB or more of linearity on top
of the performance reported in this work. Coupled with a more power efficient NRZ
DAC design, a forecast performance of over 80dB with 20-30mW in 0.13pm CMOS
would certainly compete well with today's state-of-the-art implementations and ar-
chitectures. Because the VCO-quantizer scales well with digital process technology,
there may be even more advantage in the architecture going forward.
Some ADC applications requiring more than 13-14 ENOB with low OSR may face
practical limitations to the levels of linearity suppression from that can be achieved
from known in-loop analog techniques reported in this work. In addition, other sources
of distortion will then become significant, both in the feedback DAC and in the front-
end amplifiers. Future research in the area of VCO-quantizers may find promising
results from novel EA linearization techniques, alternative ADC architectures with
less sensitivity to VCO voltage-to-frequency distortion, or operation with a more
balanced level of linearity and quantization noise performance.
172
Chapter 11
Conclusion
173
regard to important metrics such as dynamic range, power, and area.
In the case of the VCO-based analog-to-digital converter, the performance advan-
tages and limitations of a VCO-based quantizer were presented and discussed using
both theory and simulated examples. Because the non-linearity of the VCO fre-
quency tuning presents the primary bottleneck for achieving high-performance, trade-
offs for using the VCO-quantizer within a EA ADC architecture were presented. To
demonstrate these considerations, a high-speed continuous time ~L ADC operating
at 950Msps was designed and fabricated in 0.13pm CMOS. Although the architecture
chosen for this work was originally disclosed in [39), measurement results were pre-
sented in this work that justifies the consideration of VCO-based quantizers in EA
ADC. Possible improvements are also discussed that may significantly improve these
results.
Because analog device characteristics in future CMOS processes are not expected
to improve, the increasing trend of replacing analog signal processing with digital
signal processing is likely to continue. At the same time, analog functions cannot
entirely disappear from the mixed-signal interface. Therefore, as demonstrated in
this work, the ability to achieve and leverage high-performance analog functionality
with highly digital circuit elements is very compelling, and is an exciting area for
future research.
174
Bibliography
[1] E. Alon, V. Stojanovic, and M. Horowitz. Circuits and techniques for high-
resolution measurement of on-chip power supply noise. IEEE Journal of Solid-
State Circuits,40:820-828, April 2005.
[2] Y. Arai, T. Matsumura, and K. Endo. A CMOS four-channel x 1 K time memory
LSI with 1-ns/b resolution. IEEE Journal of Solid-State Circuits, 27:359-364,
March 1992.
[3] R. Baird and T. Fiez. Linearity enhancement of multibit deltasigma A/D and
D/A converters using data weighted averaging. IEEE TCAS-II, 42:753-762,
December 1995.
[4] R.G. Baron. The Vernier Time-Measuring Technique. Proceedings of the IRE,
pages 21-30, January 1957.
[5] V. B. Boros. A Digital Proportional Integral and Derivative Feedback Controller
for Power Conditioning Equipment. IEEE Power Electronics Specialists Conf.
Rec., pages 135-141, June 1977.
[6] L. Breems, R. Rutten, R. van Veldhoven, G. van der Weide, and H. Termeer.
A 56mW CT Quadrature Cascaded EA Modulator with 77dB DR in a Near
Zero-IF 20MHz Band. IEEE ISSCC, pages 238-239, February 2007.
[7] H.-H. Chang, P.-Y. Wang, J.-H.C. Zhan, and B.-Y. Hsieh. A fractional
spur-free ADPLL with loop-gain calibration and phase-noise cancellation for
GSM/GPRS/EDGE. IEEE ISSCC Dig. Tech. Papers, pages 200-201, Febru-
ary 2008.
[8] C.-C. Chen, P. Chen, C.-S. Hwang, and W. Chang. A Precise Cyclic CMOS
Time-to-Digital Converter With Low Thermal Sensitivity. IEEE Trans. Nucl.
Sci., 52:834-838, 2005.
[9] P. Chen, C.-C. Chen, and Y.-S. Shen. A Low-Cost Low-Power CMOS Time-to-
Digital Converter Based on Pulse Stretching. IEEE Trans. Nucl. Sci., 53:2215-
2220, 2006.
[10] P. Chen, J.-C. Zheng, and C.-C. Chen. A Monolithic Vernier-Based Time-to-
Digital Converter with Dual PLLs for Self-Calibration. IEEE Custom Integrated
Circuits Conference, pages 321-324, 2005.
175
[11] J.-M. Chou, Y.-T. Hsieh, and J.-T. Wu. A 125 MHz 8b digital-to-phase converter.
IEEE Int. Solid-State Circuits Conf., 2003.
[12] J.-M. Chou, Y.-T. Hsieh, and J.-T. Wu. Phase averaging and interpolation using
resistor strings or resistor rings for multi-phase clock generation. IEEE Trans.
Circuits and Systems 1, 53:984-991, 2006.
[13] P. Dudek, S. Szczepanski, and J. V. Hatfield. A high-resolution CMOS time-
to digital converter utilizing a Vernier delay line. IEEE Journal of Solid-State
Circuits, 35:240-247, February 2000.
[14] M. A. Farahat, F. A. Farag, and H. A. Elsimary. Only digital technology analog-
to-digital converter circuit. IEEE InternationalMidwest Symposium on Circuits
and Systems, pages 178-181, December 2003.
[15] M. Ferriss and M.P. Flynn. A 14mW Fractional-N PLL Modulator with an En-
hanced Digital Phase Detector and Frequency Switching Scheme. IEEE ISSCC
Dig. Tech. Papers,pages 352-353, February 2007.
[16] B.W. Garlepp, K.S. Donnelly, Jun Kim, P.S. Chau, J.L. Zerbe, C. Huang, C.V.
Tran, C.L. Portmann, D. Stark, Yiu-Fai Chan, T.H. Lee, and M.A. Horowitz. A
portable digital DLL for high-speed CMOS interface circuits. IEEE Journal of
Solid-State Circuits, 34:632-644, 1999.
[17] Y. Geerts, M. Steyaert, and W. Sansen. Design of Multi-Bit Delta-Sigma A/D
Converters. Kluwer Academic Publishers, 2002.
[18] E. J. Gerds, J. Van der Spiegel, R. Van Berg, H. H. Williams, L. Callewaert,
W. Eyckmans, and W. Sansen. A CMOS time to digital converter IC with 2
level analog CAM. IEEE Journalof Solid-State Circuits, 29:1068-1076, 1994.
[19] M.S. Gorbics, J. Kelly, K.M. Roberts, and R.L. Sumner. A high resolution
multihit time to digital converter integrated circuit. IEEE Trans. Nucl. Sci.,
44:379-384, 1997.
[20] A. Hajimiri, S. Limotyrakis, and T.H. Lee. Jitter and Phase Noise in Ring
Oscillators. IEEE Journal of Solid-State Circuits,34(6):790-804, June 1999.
[21] B. Helal, M. Straayer, and M. H. Perrott. A Low Jitter 1.6 GHz Multiplying
DLL Utilizing a Scrambling Time-to-Digital Converter and Digital Correlation.
VLSI Symp. Dig. Tech. Papers, pages 166-167, June 2007.
[22] B. M. Helal, C.-M. Hsu, K. Johnson, and M. H. Perrott. A Low Noise Pro-
grammable Clock Multiplier Based on A Pulse Injection-Locked Oscillator with
a Highly-Digital Tuning Loop. IEEE RFIC Symp., June 2008.
[23] S. Henzler, S. Koeppe, W. Kamp, H Mulatz, and D. Schmitt-Landsiedel. 90nm
4.7ps-Resolution 0.7-LSB Single-Shot Precision and 19pJ-per-Shot Local Passive
Interpolation Time-to-Digital Converter with On-Chip Characterization. IEEE
Int. Solid-State Circuits Conf., pages 548-549, 2008.
176
[24] M. Hoven, A. Olsen, T. S. Lande, and C. Toumazou. Novel second-order E-A
modulator/frequency-to-digital converter. IEEE Electronics Letters, 31:81-82,
January 1995.
[25] C. M. Hsu, M. Straayer, and M. H. Perrott. A Low Noise, Wide Bandwidth, 3.6-
GHz Digital EA Fractional-N Frequency Synthesizer with Quantization Noise
Cancellation. IEEE ISSCC Dig. Tech. Papers, page N/A, February 2008.
[26] J.P. Hurrell, D.C. Pridmore-Brown, and A.H. Silver. Analog-to-Digital Con-
version with Unlatched SQUIDs. IEEE Transactions on Electron Devices, ED-
27(10), 1980.
[27] C. S. Hwang, P. Chen, and H. W. Tsao. A high-precision time-to-digital converter
using a two-level conversion scheme. IEEE Trans. Nucl. Sci., 51(4):1349-1352,
August 2004.
[28] A. Iwata. The architecture of delta sigma analog-to-digital converters using a
VCO as a multibit quantizer. IEEE TCAS-II, 46:941-945, July 1999.
[29] J.-P. Jansson, A. Mantyniemi, and J. Kostamovaara. A CMOS Time-to-Digital
Converter With Better Than 10 ps Single-Shot Precision. IEEE Journalof Solid-
State Circuits,41:1286-1296, 2006.
[30] K. Karadamoglou, N. P. Paschalidis, E. Sarris, N. Stamatopoulos, G. Kottaras,
and V. Paschalidis. An 11-bit high-resolution and adjustable range CMOS time-
to-digital converter for space science instruments. IEEE Journal of Solid-State
Circuits, 39:214-222, 2004.
[31] J. Kim and S. Cho. A Time-Based Analog-to-Digital Converter Using a Multi-
Phase VCO. IEEE ISCAS, pages 3934-3937, May 2006.
[32] S. Kleinfelder, T. Majors, K. Blumer, W. Farr, and B. Manor. MTD132-a new
sub-nanosecond multi-hit CMOS time-to-digital converter. IEEE Trans. Nucl.
Sci., 38:97-101, April 1992.
[33] F. Kocer and M.P. Flynn. A new transponder architecture with on-chip ADC
for long-range telemetry applications. IEEE JSSC, 41:557-564, May 2006.
[34] M. Lee and A.A. Abidi. A 9b, 1.25ps Resolution Coarse-Fine Time-to-Digital
Converter in 90nm CMOS that Amplifies a Time Residue. IEEE Journal of
Solid-State Circuits, 43:769-777, April 2008.
[35] S. J. Lee, B. Kim, and K Lee. A novel high-speed ring oscillator for multiphase
clock generation using negative skewed delay scheme. IEEE Journal of Solid-
State Circuits,32:289-291, February 1997.
[36] J. G. Maneatis and M. A. Horowitz. Precise delay generation using coupled
oscillators. IEEE Journal of Solid-State Circuits, 28(12):1273-1282, December
1993.
177
[37] A. Mantyniemi, T. Rahkonen, and J. Kostamovaara. An integrated 9-channel
time digitizer with 30 ps resolution. IEEE Int. Solid-State Circuit Conf., pages
266-465, 2002.
[42] S.S. Mohan, W.S. Chan, D.M. Colleran, and S.F. Greenwood. Differential Ring
Oscillators with Multipath Delay Stages. IEEE Custom Integrarted Circuits
Conference, pages 503-506, September 2005.
178
[49] S. Paton, A. Di Giandomenico, L. Hernandez, A. Wiesbauer, T. Potscher, and
M. Clara. A 70-mW 300-MHz CMOS continuous-time EA ADC with 15-MHz
bandwidth and 11 bits of resolution. IEEE JSSC, 39:1056-1063, July 2004.
179
[63] S. Sidiropoulos and M. A. Horowitz. A semidigital dual delay-locked loop. IEEE
Journal of Solid-State Circuits, 32:1683-1692, 1997.
[67] R. B. Staszewski, J.L. Wallberg, S. Rezeq, C.-M. Hung, O.E. Eliezer, S.K. Vemu-
lapalli, C. Fernando, K. Maggio, R. Staszewski, N. Barton, M.-C. Lee, P. Cruise,
M. Entezari, K. Muhammad, and D. Leipold. All-digital PLL and transmit-
ter for mobile phone. IEEE Journal of Solid-State Circuits, 40(12):2469-2482,
December 2005.
[68] A. E. Stevens, R. P. Van Berg, J. Van der Spiegel, and H. H. Wllliams. A time-
to-voltage converter and analog memory for colliding beam detectors. IEEE
Journal of Solid-State Circuits, 24:1748-1752, December 1989.
[70] B.K. Swann, B.J. Blalock, L.G. Clonts, D.M. Binkley, J.M. Rochelle, E. Breed-
ing, and K.M. Baldwin. A 100-ps Time-Resolution CMOS Time-to-Digital Con-
verter for Positron Emission Tomography Imaging Applications. IEEE Journal
of Solid-State Circuits, 39(11):1829-1852, November 2004.
180
[75] T. Watanabe, T. Mizuno, and Y. Makino. An All-Digital Analog-to-Digital
Converter With 12-jtV/LSB Using Moving Average Filtering. IEEE Journal of
Solid-State Circuits, 38:120-125, January 2003.
[76] C. Weltin-Wu, E. Temporiti, D. Baldi, and F. Svelto. A 3GHz Fractional-N All-
digital PLL with Precise Time-to-Digital Converter Calibration and Mismatch
Correction. IEEE ISSCC Dig. Tech. Papers, pages 344-345, February 2008.
[77] C. Weltin-Wu, E. Temporiti, D. Baldi, and F. Svelto. A 3GHz fractional-N
all-digital PLL with precise time-to-digital converter calibration and mismatch
correction. IEEE ISSCC Dig. Tech. Papers,pages 344-345, February 2008.
[78] J. Wu, Z. Shi, and I.Y. Wang. Firmware-only Implementation of Time-to-Digital
Converter (TDC) in Field-Programmable Gate Array (FPGA). IEEE Nuclear
Science Symposium, 1:19-25, October 2003.
[79] N. Yaghini and D. Johns. A 43mW CT complex AE ADC with 23MHz of signal
bandwidth and 68.8dB SNDR. IEEE ISSCC, pages 502-503, February 2005.
[80] S. Yan and E. Sanchez-Sinencio. A Continuous-Time EA Modulator With 88-dB
Dynamic Range and 1.1-MHz Signal Bandwidth. IEEE JSSC, 39:75-86, January
2004.
[81] S. Yang. High performance logic technology-scaling trend and future challenges.
IEEE Solid-State and Integrated-Circuit Technology, 1:62-67, 2001.
[82] B. Yu, H. Wang, Q. Xiang, E. Ibok, and M.-R. Lin. 15 nm gate length planar
CMOS transistor. IEEE Electron Devices Meeting, pages 11.7.1 - 11.7.3, 2001.
181