Eecs 2020 109
Eecs 2020 109
Generator
Zhaokai Liu
Borivoje Nikolic, Ed.
Vladimir Stojanovic, Ed.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission.
Time-interleaved SAR ADC Design Using Berkeley Analog
Generator
by Zhaokai Liu
Research Project
Committee:
Borivoje Nikolic
Research Advisor
Date
*******
Vladimir Stojanovic
Second Reader
Date
2
Abstract
Among different ADC architectures, the successive approximation register (SAR) ADC has
flexible architecture, high power efficiency and is suitable for the digital CMOS process. Its
building blocks rely on MOS switches and latches, which makes it strongly benefits from
technology scaling. Time-interleaving (TI) architectures can provide a higher sampling rate
because they help relax the power-speed trade-offs of ADCs. Therefore, combining SAR
with time-interleaving becomes a good solution to many digital signal processing applica-
tions that require power-efficient analog-to-digital conversion. Based on Berkeley Analog
Generator (BAG), a time-interleaved SAR ADC generator has been implemented in differ-
ent technologies. To explore the design flow using circuit generators, this report discusses
the working principle and implementation of time-interleaved SAR ADC. A test chip has
been taped out in Intel22nm FFL process, containing 6 different versions of ADCs. In each
design, a 9-bit 16-way TI-SAR ADC samples at 10GS/s with a memory block storing the
digitized result from ADC.
3
Contents
Contents 3
List of Figures 4
1 Introduction 5
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Research goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Report organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3 Generator-Based Design 29
3.1 Design-based design methodologies . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 LAYGO layout generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.3 Schematic generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
References 50
4
List of Figures
1.1 Figure of merit of all ADCs published at ISSCC and VLSI Symposium from 1997
to 2019 [1]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Diagram of time-interleaved concept. . . . . . . . . . . . . . . . . . . . . . . . . 8
Chapter 1
Introduction
1.1 Motivation
Analog-to-digital converter (ADC) has been one of the most commonly used building blocks
of mixed-signal circuit as they act as the interface between analog and digital realm. It
is used to acquire analog signals from different sources and convert them into digital form
for analysis or transmission. Therefore, one of the keys to the success of different digital
systems which operate at a wide range of continuous-time signal has been the advance in
ADC design. The speed and performance of ADC are often the bottleneck when building
modern systems.
New applications have continuously been driving research in the ADC targeting at higher
speed and resolution. For the application that benefits from the fast evolution of the digital
integrated circuit, the steady increase in the performance naturally leads to more sophisti-
cated signal processing in the digital domain and moves ADC closer to the input of chip in
order to capture more analog information. Communication systems that enable higher data
rates will demand ADC with higher sampling rates. And higher resolution is also needed
when complex modulation is applied.
Figure 1.1 shows the Walden figure of merit of ADCs published at ISSCC and VLSI
Symposium [1]. The standard Walden figure of merit here is defined as
P
FoM = (1.1)
2 · min(fs /2, BWef f ) · 2EN OB
CHAPTER 1. INTRODUCTION 6
Figure 1.1: Figure of merit of all ADCs published at ISSCC and VLSI Symposium from
1997 to 2019 [1].
Where P is the sampling frequency, fs is the sampling frequency and BWef f is the effective
resolution band width.
It is evident from the plot that different architectures ADC has been adopted for different
applications with specific speed and performance requirements. Specifically, ADCs work at
multi-gigasample per second with moderate resolution is widely used in high-performance
electronic/optical link and radar/lidar sensing system([2], [3]). As stated above, these sys-
tems naturally scale to advanced technology nodes for both technological and economic
benefits. Therefore, ADCs that enable SoCs to use digital power reduction brought by
technology scaling should adopt architectures that benefit scaling as well.
One problem for circuit design in an advanced technology node is that as devices size has
shrunk exponentially, the number of design rules also increases exponentially, making it diffi-
cult to quickly prototype design in modern processes. A generator-based design methodology
CHAPTER 1. INTRODUCTION 7
can help to deal with stringent and complicated rules. Circuit generators can shorten the
time spent in post-layout verification, help accelerate the design cycle, and enable designers
to explore circuits under different technologies.
The function of an ADC is to generate a N-bit digital output such that the analog signal
can be approximated as VDAC = D/2N · Vref , where Vref is the reference voltage. Depending
on the approach of getting the final value, there are different categories of ADCs. An ADC’s
sample rate (fs ) can be chosen either for Nyquist rate operation (fs = 2 × fBW ) or for
oversampled operation (fs >> 2 × fBW ). ADC architectures like flash, pipeline, successive
approximation register (SAR) sample input signal at Nyquist frequency while sigma-delta
(Σ∆) ADC working with a higher sample rate.
Each of these typologies has their own unique advantages in terms of power, speed and
resolution that makes them suitable for a certain particular use scenario. For example,
pipeline ADCs perform analog-to-digital conversion by cascade low-resolution stages that
sample, coarse quantizing and amplify residue for the next stage. While this architecture
is suitable for high-speed applications, the requirement of precise active amplifiers in each
stage make it analog-intensive and take much power. In oversampled ADCs, which typically
implemented as Σ∆ converter, higher resolution is provided through oversampling and noise
shaping. A broad range of Σ∆ converters can also be implemented in advanced technology
nodes. But it still not able to benefit from power reduction with CMOS scaling because
op-amps usually are needed to construct analog integrator in loop filter.
1.1 that SAR-based design (marked in red) achieve leading-edge performance for sampling
rate (fs ) range from tens of kilohertz to tens of gigahertz [4]. In the low-frequency regime,
there are designs like [5], [6] for medical application. For moderate frequency, SAR or SAR
assisted ADCs such as [7] and [8] can offer moderate resolution at low power level( 1mW). At
the ultra-high-speed region, a 90 GS/s design [9] is demonstrated to be suitable for optical
and electrical data link applications.
Time-interleaving originally was used as an effective method with area and power penalty[10].
But it turns out to be a good solution for power-speed trade-off and will relieve many prob-
lems even when speed is not of primary limitation. The basic working principle is shown in
Figure 1.2. As the speed of a single-channel ADC approaches the limits of the technology, the
power-speed trade-off becomes nonlinear and demanding a disproportionately higher power
for the desired increase in speed. Time-interleave relaxes the trade-off and makes pushing
to higher conversion speed possible. And this benefit comes with the overhead of sampling
clock phase generation, which makes the energy efficiency of equivalent single channel con-
version worse. And also the multi-channel structure suffers from mismatches in gain, offset
and bandwidth, which usually requires some calibration techniques.
In Chapter 2, the design considerations in various building blocks of SAR ADC is ex-
plained. By exploring the design space of each building block, especially for those well-
understood parts, an automated generator-based design flow can be applied to accelerate
design iterations.
Chapter 3 explains the working principles of generator-based design flow and the layout
generation engine. Also, some example code are presented to show the layout and schematic
generation steps.
Chapter 4 talks about circuit and generator details. The implementation of generation in
Intel 22nm FFL is presented. Chapter 5 discusses the future work and possible improvement
on the speed and resolution of ADC generator.
10
Chapter 2
To better adopt automatic generator-based design flow and develop the SAR ADC gen-
erator, this chapter review major aspects of design in SAR building blocks. Also, time-
interleaved (TI) topology that applied to ADCs is further examined to understand the ad-
vantages and limitations of it. First, this chapter will review the basic operation of SAR
ADC, then discuss major building blocks including comparators, capacitive DAC and SAR
logic. Lastly, the speed benefits in TI-SAR architecture, effects on metastability and major
challenges arise from multiple channel mismatches are explained.
At the beginning of each conversion, the input voltage is sampled on the capacitor array
(top-plate in this example). Then the sampled voltage is compared with a reference voltage
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 11
and the comparator generates the result. Based on the first result, half of the total capacitor
flip makes the reference voltage change. This operation repeats and gradually drives the
differential input of the comparator toward zero. Then the conversion finishes with only the
quantization error (< 21N ) left.
This conversion process can benefit from modern nano-CMOS technology because new
technology is optimized for digital operation. So logic gates and MOS switches all naturally
become faster and consume less power. Also, some technology provides tall metal stack
and advanced lithography which makes metal fringe capacitor can be smaller while keeping
accuracy. Lastly, the voltage comparator in design can be implemented as a regenerative
dynamic latch. So the entire design can be implemented in a dynamic operation manner
and doesn’t consume static currents. For low-speed sensor applications, it works well even
at kilohertz and the power of the entire system is reduced by turning it into a ”sleep” state.
For aggressively high-speed application time-interleaved architecture that combines tens of
ADCs is also a good choice even a single-channel of ADC can only deliver moderate speed.
There are also non-idealities in each building block. First, the noise will be collected
from various sources and impact the general signal-to-noise (SNR) of design. Besides quan-
tization noise, thermal noise sampled onto capacitive DAC and input-referred noise from
the comparator also degrade the performance. Distortions caused by limited bandwidth and
non-linearity in track and hold circuit limit the SNDR it can achieve. Systematic errors arise
from the mismatch between DAC units and transistors in comparator cause trade-off with
conversion speed. As for the power consumption, the total power can be estimated by sum
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 12
of contributions from each block: as the number of bits (Nbit ) and sample frequency (fs ) in-
crease logic stages energy (Elogic ) as well as comparator energy (Ecmp ) linearly increase. And
capacitor array consumes the major part of the power which proportional to the reference
voltage (Vref ) and supply (Vdd ). Total power would be:
Here it is assumed that capacitor array are fully charged and discharged in each conversion,
which would be modified by a switching factor α if use more sophisticated switching scheme.
2.2 Comparator
Comparators in SAR ADCs are used to generate a result of voltage comparison. It is a
critical block in building high-speed converters. The target of comparator design focus on
making a fast and accurate decision, reducing noise as well as lowering the meta-stability
rate. The specific implementation of the comparator may vary in different applications.
However, because of the reasons stated above, generally comparators are designed as a
clocked dynamic circuit. This section uses strong-arm comparator as an example to illustrate
detail considerations in this block.
Strong-arm comparator
Figure 2.2 shows an example of using strong-arm latches as a comparator. The pre-amplifier
in front of it is optional. The comparator is essentially a regenerative latch controlled by
clock phase Φ. When the clock is high, the tail current turns on and the input pair begins
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 13
to amplify input voltage difference. As the difference accumulates, the regenerative cross-
coupled inverters on the top form positive feedback to generate a digital output. And this
signal is buffered and drives latches in SAR logic. During Φ̄ phase, the current tail is off
and multiple reset transistors are shown in gray color turn on and pull corresponding nodes
to well-defined values. This operation helps eliminate both common-mode and differential
hysteresis impacts on the comparator.
• Amplification: After the tail current is turned on by the clock, the input pair amplifies
input voltage. And in this phase, the current tail is fairly constant because input transistors
act as a differential pair. As the nodes P and N are being pulled down to different values,
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 14
the comparator will enter phase B. During this period, the input pair produces an input
gain:
Vth · gm,in
Av,A ≈
Icm
This current mode amplification period takes τint to finish.
• Turns on NMOS: After entering this phase, the cross-coupled transistors form a feedback
loop and exponentially split the output nodes apart. In this period, the circuit is working
in a positive feedback loop and can be quantified by the time constant τreg . After several
time constants, the output continuously falls under Vdd − Vth,p and turns on the PMOS
pair.
• Regeneration: After PMOS’s are on, circuit it performing integration and finally have a
large enough output swing for output buffer to produce a digital output. And the positive
feedback will eventually pull one output to supply and another one to ground.
During each iteration of SAR operation, the amount of time used for comparison is critical
since it will directly limit the ADC speed. On the evaluation of comparator design, both τint
and τreg are important. For different input common-mode voltage, these two time constants
will vary accordingly. And one simple approach to estimate them is the approximate total
time it takes to produce a valid output as:
Vdif f
Ttot = τint + τreg · log( ) (2.2)
Vin + Vos
where Vdif f is a pre-defined output difference and Vos is the input-referred offset voltage. For
certain input common-mode voltage, three unknown variables can be estimated by simula-
tions with different values of Vdif f .
• Input-referred thermal noise One of the main limits of accuracy comes from the input-
referred thermal noise of comparators. The noise analysis involved in the circuit with non-
linear and time-varying nature. Therefore, the small-signal analysis is not quite suitable
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 15
for dynamic comparators. Intuitively speaking, the input pair will be a major source of
noise. It is simply because during phase A and B in Figure 2.3 input pair transistors
works as a current integrator to amplify the input signal. When the transconductance
(gm ) of input pair increases, the noise current will be reduced. Also, thermal noise is
averaged on the parasitic capacitance Cx in Figure 2.2. A wide device with more parasitic
can also reduce the noise contribution from the input pair. Statistically speaking, with
longer integration time, because of the random nature of thermal noise, the noise tends to
average out on the parasitic capacitance. In summary, the input device should be wide and
have gm /Id value in order to optimize noise performance. And comparator’s input should
be sized as big as allowed by speed specification. The second largest noise contributor
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 16
is NMOS’s in the cross-coupled inverter because they are involved in amplifying output
nodes when the difference is still small.
To simulate the noise from comparators, instead of using transient simulation which takes
a long time and has slow simulation speed, PSS noise analysis is a better method. Since
comparator is manifesting in a periodic way in system, assuming it is working at steady-
state, noise contribution during decision-making period can be integrated.
• Kickback: Kickback occurs because the output of the comparator always has one side is
pulled down while the other side is pulled up. The kickback effect which couples input
nodes P and N in Figure 2.2 with CGD , will be signal dependent and therefore should be
minimized in high-resolution design. It can be modeled as capacitive divided voltage on
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 17
input gates:
CGD
δVkickback = ∆VP N (2.4)
CGD + CDAC
With larger DAC array, the amplitude will be reduced. Also when circuit is sensitive to
kickback effect, an additional pre-amplifier can help relieve it.
From the analysis above, both noise and speed performance are closely related to input
common-mode voltage of the comparator. As shown in Figure 2.4, delay tends to reduce
and noise will increase with higher Vcm . It can be intuitively understood that the input pair
is stronger and reduce integration time, make noise have less time to average out. The third
plot shows the delay and noise trade-off for a specific design. It is possible to make design
choice simply from this plot to decide comparator sizing and corresponding common-mode
voltage it works at.
This analysis will oversimplify other non-idealities discussed above, but it shows that by
combining design algorithm with generator it is possible to make quick iterations on circuit
design and also automate design of some very well-understood circuits.
time for resolving a certain bit is Tconv , total conversion time in a synchronous SAR at least
should be N Tconv +Tsamp +Trst . In asynchronous SAR design [14], additional logic is inserted
to assert finish when bit is resolved and trigger the next conversion.
• Speed: To improve the speed of SAR ADC, usually asynchronous architecture is used,
because it is not limited by the worst-case delay. From an optimization perspective, the
loop delay can be analyzed as Figure 2.5 shows. The delay of going through a comparator,
SAR logic and settling the capacitor DAC is defined as Tcmp , Tlogic and TDAC separately.
First of all, a larger capacitor requires larger drive strength. The capacitor driver should
be sized up for different weights, and the delay is inversely proportional to driver gate
(as well as corresponding latch) Assumed that only one comparator is used, with different
loading at different bits, the logic delay increases linearly. As the plot shows that when
combining two delays, there will be a minimum point that overall loop speed can be
optimized. This result can be interpreted as a different strategies of architecture selection.
The main benefit of the asynchronous operation is getting rid of wasted extra time waiting
for bit that needs less time. But it is possible to carefully match delays in each iteration
so that they take approximately similar amount of conversion time. The advantage of
doing this is in decreasing number of gates inside the loop. Because in asynchronous
design, the additional logic used for the asynchronous clock generation will introduce tens
of picosecond delay even in an advanced technology.
• Power: The power consumption is fairly a fixed overhead, meaning that is not very
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 19
sensitive to different architectures and will directly benefit from technology scaling. The
power consumption of logic gates has a linear dependence on the different number of bits
Nbit , if the switching energy per iteration is fixed. But The switching energy can also scale
with Nbit because the number of logic gates scale with Nbit as well. In this case the power
consumption of digital logic is quadratically depends on number of bits.
The design space of SAR logic is relatively limited and not many non-idealities are in-
volved. However, those delays are very sensitive to layout and matching delays in each
conversion step and require post-extraction simulation. By implementing these circuits us-
ing layout generators makes it possible to quickly verify the delay and reduce design time
cost. The difference in timing diagram for asynchronous [14] and synchronous operations
is shown in Figure 2.6. The synchronous operation relies on an internal clock that divides
the conversion phase into a uniform time interval for each bit. In relatively low-speed SAR
ADC design, clock generation is less critical. But for high-speed application, conversion
time is composed of maximum DAC settling time, comparator resolve time and margin for
worst-case clock jitter. The last part either elongates conversion time or imposes stringent
requirements on clock generation. Therefore the power and speed limitation will also be
limited by the internal clock circuit. It is obvious that in synchronous operation, the time
needs to accommodate the worst case, which usually is the last bit since it takes a much
longer time for the comparator to resolve voltage difference to around 21 LSB.
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 20
Figure 2.7: (a) Clock jitter (b)DNL and INL of a 3-bit ADC with unit capacitor mismatch.
In asynchronous operation, there will still be a global clock drive the ADC into different
phases. But during the conversion phase, the asynchronous processing is triggered internally
from MSB to LSB. After the comparator resolves the current bit, it produces a DONE signal
to trigger the next bit’s conversion. This operation takes advantage of different conversion
bits have different times and each conversion time is not limited by the worst-case anymore.
A numerical analysis in [14] shows that worst-case for total conversion time happens when
input is 31 VF S or 32 VF S . And if only the comparator resolve time is considered, Tasync /Tsync
approaches 1/2 as the number of bits increases, where Tasync , Tsync are the total time for
asynchronous and synchronous operation respectively.
First, the impact of clock jitter is shown in Figure 2.7 (a). It is assumed that the input
voltage is sampled at fixed intervals. However, the uncertainty of the clock edges makes the
exact time of sampling uncertain. The difference between actual sample time and theoretical
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 21
sample time is shown as ∆t in the plot. Clock jitter is critical in high-speed ADC design
while applications at low speed don’t have stringent requirements.
Thermal noise sampled on the capacitor DAC array comes from the resistance of the
sampling switch. The sampled thermal noise is inversely proportional to DAC sizing CDAC .
The total differential noise is 4kT /CDAC . Even without any other non-idealities, the reso-
lution of ADC still directly influenced by thermal noise. To get one more effective bit, it
is required improve the signal-to-noise ratio by 6dB, which equivalently is 4 × CDAC . As
discussed above, power consumption in SAR ADCs mainly comes from charging capacitor.
So thermal noise leads to the trade-off between power and resolution.
The distortion of sampling comes from the non-linearity of the sample-switch as well
as the resistance associated with the switch. To get enough sample switch bandwidth, the
switch needs to be sized up: while increasing the switch size, it loads itself and reach a point
that further improving size doesn’t help. Also, if we want to settle the input signal within
half LSB, it needs sampling time:
to finish, where τsample is time constant of sampling switch, and that also means the input
signal frequency is limited to
1
fin < (2.6)
2π · τsample
The distortion comes from the fact that MOS switch doesn’t maintain a fix VGS during
the entire sampling period. reducing the requirements on SFDR dramatically reduces the
minimum sizing needs in terms of 2nd and 3rd harmonics.
Capacitor DAC mismatch: Ideally the digital output of ADC should be:
N Vin − Vref,N
Dout = (2 − 1) · (2.7)
Vref,P − Vref,N
That means each step should have a uniform width like the red line shown in Figure
2.7. However, the mismatch of each capacitor unit creates a systematic error to Vin − Dout
transfer curve. At low frequency, the deviation from an ideal curve will also show up as
harmonic components in low-frequency simulation.
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 22
Assume unit capacitor CL has normal distribution with variance σu2 . The DNL variance
comes from switching from 2N −1 −1 to 2N −1 , it has variance σDN L = (2N −1)σu . The variance
of INL can be approximated by σIN L = 2N −2 σu . Therefore, careful layout is required during
DAC design. Otherwise, the matching problem will directly limit the best performance of
ADC. Figure 2.7(b) shows the transfer curve of a 3-bit binary DAC model in python, each
unit capacitor has the same σu . The blue line shows how it deviates from the ideal curve
with mismatch added. The thermometer code can also be used in capacitor DAC design.
The penalty from that is the number of switches exponentially increase with number of bits.
A combination of binary and thermometer code can have a good trade-off between number
of switches and mismatch.
The power consumption of capacitor DAC is a major part of the SAR ADC. Some parts
of power are technology scaling friendly, and reducing unit capacitor size utilize metal fringe
capacitors while maintaining good matching, can also reduce power. Some fixed overhead,
like thermal code to binary code decoding, should also be considered.
Advantages of time-interleaving
Speed and Power-speed trade-off: As a single-channel ADC’s sampling frequency in-
creases, the acquisition and conversion time both shrink. For acquisition, there are two
aspects that set the lower bound. First, the voltage on the sampling capacitor needs enough
time to settle. Generally, for an N-bits ADC settle to half LSB:
If a ADC uses a half of clock cycle to sample, Tsettle < Tclk /2. Also Tclk /2 = 1/2fs =
1/4fin,max Thus
1
fin,max1 ≤ (2.9)
4τsample · ln(Nbits + 1)
As fin grows it will ultimately be limited by the above value. While with N-way time-
interleaved, the acquisition time is relaxed by N. However, the sampling switch and capacitor
will have an associate time constant that limits the bandwidth.
1
fin,max2 ≤ (2.10)
2πτsample
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 24
With N increasing and fin,max1 > fin,max2 , this doesn’t bring benefit to sampling speed any
more. Also, other limitations may also limit the frequency even becomes larger than fin,max .
Where Tp is the delay time that depends on logic gates sizing. In modern process the power-
speed are traded off in an inverse proportional dependency. The design space we have to
further improve ADC speed focuses on those part that can be improved, for single ADC:
k k
P = Poverhead + = Poverhead + (2.13)
Tconv − Toverhead Ts − Toverhead
For time-interleaved ADC, the second part becomes
Nk
Pscale = (2.14)
N T s − Toverhead
And as the N increase the power is benefit from interleaving.
Metastability rate: Use the same approximation in comparator metastable rate anal-
ysis, the metastablity is proportional to
Tcompare
P = α · exp(− ) (2.15)
τreg
With longer time for conversion as stated above, the timing for comparator is relaxed. For
interleaving ratio N, the result becomes exponentially better.
N Tcompare
P = α · exp(− ) (2.16)
τreg
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 25
Gain mismatch: In Figure 2.8, the gain mismatch is annotated as different gain Gi .
Gain mismatch exhibits itself as the slopes difference in different sub-ADC’s transfer curve.
It comes from different sources, such as the sampling process and reference voltage change.
The first plot in Figure 2.9 shows the time domain waveform illustrating the influence of gain
mismatch and the frequency response is shown in Figure 2.10. For an N-way time-interleaved
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 26
ADC, the error in frequency domain happens repetitively every N/fs and the error amplitude
is modulated by the input signal. Therefore, this effect exhibits as an amplitude-modulated
noise at frequency peaks at
k
fgain,noise = ±fin ± fs (k = 1, 2, ..., N ) (2.17)
N
The gain mismatch degrades the SNR and the extent to which performance is influenced
also depends on the amplitude of the input signal.
Offset mismatch: Offset sources in the signal chain like the offset of the comparator
can be moved forward to input as an input offset for ADC like Vos,i shown in Figure 2.8.
The time-domain waveform shows a repetitive error that happens every N/fs . The average
offset in each channel generates a DC component. The noise frequency response in Figure
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 27
2.10 peaks at
k
fof f set,noise = fs (k = 1, 2, ..., N ) (2.18)
N
Timing mismatch: The mismatch caused by different sampling edges for each ADC is
composed of both clock skew (systematic error) and clock jitter (random error), it is shown in
Figure 2.8. This effect causes the largest error when the slope of the input signal is steepest.
Therefore in Figure 2.9 the phase of the error waveform is shifted by π/2. It is essentially a
phase-modulated noise and the noise frequency peaks also locate at
k
ftime,noise = ±fin ± fs (k = 1, 2, ..., N ) (2.19)
N
In the frequency domain, this error will overlap with error caused by gain mismatch and the
phase of the error is shifted.
Bandwidth mismatch: Different sampling bandwidth for different channels will also
cause a error. The switches can be modeled by the R-C model and because the frequency
response of the gain and phase varies during sampling are different for each channel.
N
Vsampled = Gi A cos 2πfin + θi (2.20)
fs
CHAPTER 2. SAR ADC DESIGN CONSIDERATIONS 28
Where Gi and θi are different different gain and phase shift caused by sampling bandwidth
mismatch. It is both amplitude- and phase-modulated noise and that will show its effect
similar to gain and time mismatch. The noise frequency peaks locate at
k
fBW,noise = ±fin ± fs (k = 1, 2, ..., N ) (2.21)
N
Chapter 3
Generator-Based Design
A typical generator-based design flow is shown in Figure 3.1. It starts from a design script
which handles circuit specifications and translates them into circuit chosen by designer,
device sizing and layout strategies. Those information are organized in a structural way
that can be taken by some script-based generators to implement actual schematic, layout
and testbench. A schematic generator will take a pre-defined template and map the input
parameters to actual sizing of each components inside. And layout generator needs to follow
the scripted layout strategies that are able to handle layout in different situations. The
input to layout generators are super-set of schematic input that include other parameters
like wire space and width. Designers need to construct generator scripts in a way that they
are DRC/LVS clean for a reasonable combinations of devices’ size in different technology.
Each circuit will have specific requirements. So designers also need to construct testbench
generators that instantiate testbench for the generated circuit instance, run simulation and
CHAPTER 3. GENERATOR-BASED DESIGN 30
process data. If the simulation results show that the generated instance fails to meet specs,
the design script should be able to handle the result returned from the simulator and make
iteration based on information acquired. This agile approach makes sure that each time the
circuit can be verified with post-layout effects.
The time-interleaved SAR ADC generator in this work uses Berkeley Analog Generator
(BAG) [17] for layout and schematic generation. The rest of this chapter will introduce
implementations of schematic and layout generator.
To make a process portable layout generator, LAYGO adopts the approach similar to the
Lego block. The layout methodology of LAYGO is shown in Figure 3.2. It handles complex
design rules by hand-made primitives and pre-defined routing grids. The unit blocks such as
different unit transistors are equivalent to the Lego blocks, and the routing grid is the Lego
bump. Imagine what happens if the size of Lego blocks become smaller, as long as they are
still assembled according to smaller Lego bumps, we will still get the same result except the
difference in dimension.
Similarly, different templates are constructed in different technology. They are assembled
by the same script to generate layout. In advanced technology nodes, DRC rules becomes
CHAPTER 3. GENERATOR-BASED DESIGN 32
increasingly complex, but as long as the pre-defined unit blocks can capture different rules,
the generated layout will still be DRC clean. Generally speaking, the most difficult front-
end design rules are captured by different categories of hand-crafted cells. Going up to
higher-level the metal patterns are wired up following pre-defined routing grid with specific
spacing, width and via types. This approach is similar to digital circuit design that uses
standard cells. But different device types are free to choose as long as corresponding unit
block templates are implemented.
Several routing grid examples are shown in the top middle of Figure 3.2. Routing grid
templates are less stringent as long as they meet metal width/space and via requirements
that specified by design rules. So designers have the flexibility that define different routing
grids for different purpose of layout. The M1-M2 CMOS grid is an example of an unevenly
spaced grid. In this case, the grid is only valid when putting complementary transistors with
their gates face each other. In this example, two routing tracks are used for source/drain
routing, one routing track is used to connect gates and there is one more track left between
gates of two transistors. Supply routing uses wider metal and double via. The M2-M3 basic
grid and M1-M2 basic grid in the figure are example of evenly spacing grid. They can be
used when multiple rows of same type of transistors are placed together.
The right side in the figures shows an example of NAND gate using the templates ex-
CHAPTER 3. GENERATOR-BASED DESIGN 33
plained above. There are two transistor templates placed in the middle use CMOS grid.
And boundary templates are aligned at two sides, making sure this layout can be tiled with
other gates without violating design rules. Inputs A and B pin are placed at Metal 2, output
signal is connected to Metal 3. All these connections use pre-defined CMOS routing grid,
meaning that the absolute coordinate is calculated from template primitives.
Besides transistors, other passive devices such as capacitor, diode and resistor are also
supported in LAYGO. A capacitor DAC layout example is shown in Figure 3.3. Unit capaci-
tor cells and dummy cells are implemented manually. The size of unit block and the location
of pins also need to be compatible with the transistor routing grid in order to be integrated
with transistor in higher level layout. This example shows a simple 5-bit capacitor DAC on
the right side of Figure 3.3.
The example above shows that building device primitives in LAYGO relies on correct
size and pin location. Specifically, a device primitive needs to have quantized dimension.
In our implementation, a PlacementGrid is used as the minimum grid for primitive block.
And all the pins in primitive blocks should compatible with at least one routing grid defined
in LAYGO. Also, LAYGO can take black-box and integrate that into layout in a similar
way. As long as the boundary of block handle the design rules properly and pins of block
are on-grid, it can be very flexible in specific implementation.
CHAPTER 3. GENERATOR-BASED DESIGN 34
Example code:
After completing the templates library, a python script is used to place templates and gen-
erate layout. And all the implementations of layout design such as the floorplaning, sizing
adjustment and routing need to be coded. Some example commands are listed below to
show how generators are implemented.
# Templates placement :
# ( x0 , y0 ) , ( x1 , y1 ) a r e t h e o r i g i n p o i n t s i n s t s a r e p l a c e d
inst0 = laygen . relplace ( cellname =’cellName0 ’,
gridname =’gridName ’,
xy=[’x0’, ’y0’])
inst1 = laygen . relplace ( cellname =’cellName1 ’,
gridname =’gridName ’,
xy=[’x1’, ’y1’])
# S i g n a l Routing :
# c o n n e c t from one i n s t 0 ’ s p i n t o one i n s t 1 ’ s p i n
laygen .route ( gridname0 =’gridName0 ’,
refobj0 =inst0.pins[’pinName0 ’],
gridname1 =’gridName1 ’,
refobj1 =inst1.pins[’pinName1 ’])
# Supply Routing :
# c o n n e c t s o u r c e s o f i n s t t o vdd / v s s
for devName in [inst0 , inst1 ]:
for pinName in [’S0’, ’S1’]:
laygen . route ( gridname0 =’gridName ’,
refobj0 = devName .pins[ pinName ],
refobj1 =dev.bottom ,
direction =’y’, via1 =[0, 0])
# power and g r o u d r a i l s
rvdd = laygen . route ( gridname0 =’gridName ’,
refobj0 =inst0. bottom left ,
CHAPTER 3. GENERATOR-BASED DESIGN 35
# Export pin
laygen .pin(name=’pinName ’, gridname =’gridName ’, refobj =inst0)
The typical layout design flow is very similar to the manual layout flow that requires
a floorplan first. And device arrangement and routing need to be considered carefully by
designers. However, the difference comes from the fact that a good layout generator generates
DRC- and LVS-clean layout with many different input parameters. The example in Figure
3.4 shows some considerations when making a strong-arm layout generator. The arrows show
the direction in which transistors expand. Each row is assigned to put transistors with a
specific function. In this way, the size can be easily adjusted. For example when changing
the size of input pair, layout can be easily extended toward two sides. Also, the schematic
shows an offset cancellation pair to adjust the input-referred offset of the comparator. It can
be implemented as an option in the layout generator. Because those transistors are put in a
different row, the option can be turned off when not needed.
CHAPTER 3. GENERATOR-BASED DESIGN 36
This example shows a very simple case but highlights some points when implement-
ing generators. First, since the generator itself doesn’t have any build-in algorithm for
auto-routing and placement. More complex layout strategies always need more algorithms
implemented in script. In this case, the offset cancellation transistor should be very small
for resolution consideration, but they are put in a separate row, which is definitely an area
inefficient strategy. Those transistors can be inserted into other rows but that might make
sizing adjustment harder. Second, this floorplan assumes the transistor can be infinitely
expanded to two sides. But it is very often that the layout has a width limitation or aspect-
ratio requirement. If so, the generator should be coded in a way that can fold transistors into
multiple rows when size exceed certain value. Third, when drawing layout it is very often
the case that some special wiring and placement are implemented. But in a generator-based
layout design, the designer prefers to some repetitive and regular items so it will usually
lead to different layout strategies when compared with manual layout. From this example.
Therefore, it is important to design the circuit in an efficient way that can utilize a reliable
generator to make iterations faster.
Some example codes for implementing schematic generators are listed below. The in-
stances in the templates can be accessed using self.instances[instance name] and the
generator configures instances using the design method. Schematic generators also offer the
flexibility that designers can make changes based on schematic templates. Instances can be
re-connected, arrayed or deleted. Dummy transistors can automatically be generated from
the information returned by layout generator. Since different number of dummies will be
added in layout to ensure matching and alignment, the connection and properties of dummies
are calculated by the layout generator when actual calculating the layout.
, simulation parameters can be set by BAG. Simulation results are returned after simulation
finish, so all the data can be processed using BAG.
Example code:
# s e t t r a n s i s t o r parameters
self. instances [’XP ’]. design (w=wp , l=lch , intent =intentp , nf= fg load )
self. instances [’XN ’]. design (w=wn , l=lch , intent =intentn , nf= fg amp )
# h a n d l e dummy t r a n s i s t o r s
self. design dummy transistors ( dum info , ’XDUM ’, ’VDD ’, ’VSS ’)
# some o t h e r f u n c t i o n s
Change pins name: rename pin ()
Remove instance : delete instance ()
Replace instance templates : replace instance master ()
Reconnect instance terms: reconnect instance terminal ()
Create an array of instance : array instance ()
38
Chapter 4
4.1 Overview
This chapter describes the implementation of the time-interleaved SAR ADC generator based
on the design methodology that introduced above and using LAYGO layout generation engine
to for layout generation. And the implementation of this generator in Intel 22nm FFL is
presented.
Figure 4.1 shows the overview of generator architecture. This generator can provide 4-10
bits resolution and use time-interleaving architecture to achieve higher sampling rate. It is
composed of four main blocks: multi-phase clock generation, main SAR slice array, retimer
at the output, and the bias circuit(not shown here, which provides VREF for capacitor DAC
array) The blocks shown here are explained as the following:
• Clock Generation: It takes a half-rate differential clock input and uses a chain of delay
cells to generate N different clock signal phases in parallel that trigger each SAR slice to
sample input signal.
• ADC Core: The main part of SAR ADC that takes input signal, reference voltage as well
as the clock and outputs digital result. It consists of a capacitor DAC, comparator, SAR
logic and asynchronous clock generator. A fixed amount of time is allocated for sampling.
And it works in an self-timed, asynchronous way to convert sampled signal at frequency
fs,slice = fs /N .
• Retimer: Because each ADC slice is timed to different clock phase, a retimer block takes
digital output from each slice is necessary. It aligns output from each slice to the same
clock phase such that the ADC can transfer data to the digital interface.
The rest of part of this chapter will introduce several key generators in this ADC generator.
Some design details, floorplan and generator options will be explained.
C0 = C u
(4.1)
Ci = αCi−1
The transfer function of ADC is 2N steps from VREF N to VREF P for a N-bit radix-2
capacitor DAC, all the step sizes are uniform. In reality, each capacitor does not match
perfectly and mismatch makes the transfer curve deviate from the ideal one. The random
√
variance of single unit matching σC is inversely proportional to unit capacitor area Aunit .
As the area increases, the variance can be reduced. Also, from a thermal noise perspective,
a larger sampling capacitor makes the total thermal noise kBCT smaller, where kB is the
Boltzmann constant and T is the absolute temperature. However, this approach dramatically
increases the total power consumed by charging capacitor and increase the area as well.
When capacitor DAC that uses radix not equals to two, there are different effects on
the transfer function for ADC and DAC. Figure 4.3 shows transfer functions of DAC and
ADC for radix = 2.2 and radix = 1.8 separately. The left plot shows an extend horizontal
level which is interpreted as missing decision level and the right plot shows missing code.
The former effect can only be adjusted by reducing analog input while the last one can be
adjusted digitally by assigning different weights to each capacitor. From [19], it is sufficient
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 41
Figure 4.3: Effect of different radix on ADC and DAC transfer curve.
that the transfer function can be adjusted to be free of missing decision when
i−1
Ci < C0 + Σk=0 Ck = C0 + Σi−1 k
0 α C0 (4.2)
The capacitor DAC generator is implemented in the way that the number of bits and
radix can both be adjusted. The conceptual diagram of the capacitor DAC generator is
shown in Figure 4.4. Unit capacitor and capacitor dummy primitives are hand-drawn. The
schematic template defines the connections and ports. With different input parameters listed
in the Figure 4.4, both schematic and layout of the capacitor are generated.
4.3 Comparator
The comparator generator implements both traditional strong-arm latch comparator and
the dual strong-arm latch comparator [20]. The additional second regenerative latch in dual
strong-arm latch helps reduce both offset sensitivity and offset while maintaining comparable
performance as the conventional one. The diagram of comparator generator is shown in
Figure 4.5. Two different schematic templates are necessary since these two circuits have
enough differences.
The generator selects different topology by changing option doubleSA. All the other
options are setting the sizing of each transistor. From the input parameters, the layout is
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 42
generated used the selected topology and dummy transistor information is calculated from
the generated layout. The schematic generator map device sizing from the input parameter to
the schematic template, all the dummies are also added from layout information as explained
in previous chapter.
generation to shift register, shift register set the current bit that needs to be decided and
change the connection of capacitor DAC. The comparator should only be triggered after the
voltage on the capacitor DAC is settled. Therefore it is important to adjust the delay of two
loops to match. The purpose of different delays in clock generation block is shown in Figure
4.7. The comparator output at the 3rd iteration corresponds to the blue line at the right
side. It takes T1 to resolve the result. And the last iteration takes T2 for the comparator
to resolve and shown as the pink line shows. For a small difference at the input, it takes a
longer time for the comparator to make decisions. Although later iterations take a shorter
time for capacitor DAC to settle, the later iterations tend to take a longer time than the
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 44
first several bits because the comparator needs much more time to generate output signal
that trigger following gates. So the delay in clock generation block can be reduced in the
first several bits. And it helps maximize the speed of asynchronous operation.
As for the layout generation, various types of logic gates are implemented first. Similar
to digital circuit, the logic layout generators use these gates as standard cells to implement
different types of logic circuit.
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 45
Example code:
# Get s i z e o f b l o c k 0 and p l a c e b l o c k 0 a t o r i g i n
block size0 = laygen . get tempalte (cellname0 , libname0 ). size
block0 = laygen . relplace ( instance name0 , template name0 , gridname ,
xy=(0, 0), template lib0 )
# Put b l o c k 1 n e x t t o b l o c k 0 u s e b l o c k s i z e 0
block1 = laygen . relplace ( instance name0 , template name0 , gridname ,
xy=( block size0 [0], 0), template lib0 )
# Get p i n s from i n s t a n c e
pina = laygen . g e t i n s t p i n x y (block0 , pin0 , gridname1 )
pinb = laygen . g e t i n s t p i n x y (block1 , pin1 , gridname1 )
# Get m i d d l e l o c a t i o n o f two p i n s
mid y = (pina [1]+ pinb [1])//2
# c o n n e c t two p i n s
laygen . route vhv ( layerv0 =layer1 , layerh =layer2 , layerv1 =layer1 ,
xy0= inp xy m5 , xy1= inp xy m7 , track y = mid y ,
gridname0 =gridname1 , gridname1 = gridname2 )
The top-level generator of ADC assembles the ADC slices, sampling circuit, clock gener-
ation block as well as the retimer. One feature of this top-level generator is that the order
of ADC slices is free to change for different applications. The time diagram of clock phases
is shown in Figure 4.8. It is related to clock generation block. The clock generation is
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 46
two chains of delay elements driven by differential clock signals. The correct clock phase is
guaranteed by the uniform delay from each delay cells. The delay of each cell is adjusted
by capacitor loading at each delay cell’s output. Therefore, when the slice order needs to be
adjusted, as long as the order of even and odd slices are put at two sides. and connected by
two different delay chains, the slice order can be easily changed. Two examples are shown
in the bottom diagram in Figure 4.8.
This generator has been implemented in Intel 22nm FFL with the updated options that
improve the sampling rate by decoupling the crosstalk between channels. The steps of
implementing this chip are shown in Figure 4.9. First, a 9-bit 16-way time-interleaved SAR
ADC core including decoupling capacitor and voltage references are fully generated by the
generator. It works at sampling rate 10GS/s and simulated SNDR is 37.6 dB at Nyquist
frequency. In order to set the configuration bits and read out quantized result. A digital
block includes memory as well as scan chain is manually integrated at the top level with
generated ADC core. Because generator makes it convenient to quickly generate different
designs. Six sub-chips with different configurations are generated and integrated in the same
way. They have different sampler sizes, different sampling strategies and different radixes
for capacitor DAC.
CHAPTER 4. ADC GENERATOR IMPLEMENTATIONS 47
Chapter 5
The advantages of SAR ADC make it a popular ADC architecture in scaled CMOS process.
Generator-based design methodology improves the efficiency of circuit design in advance
technology nodes. The complex design rules are captured by generator and designers can
quickly get feedback from post-layout effect. Circuit design using BAG makes it possible to
reuse designs in different process technologies and generate designs with different specifica-
tions.
In this report, the working principle and design methodologies of high-speed time-interleaved
SAR ADC are presented. The main goal is investigating design methodology and combined
it with analog generators to enable design space exploration under different specifications.
Also, the LAYGO layout generation engine is introduced. This report demonstrated the
detail implementation of hand-made LAYGO primitives and the layout generation based on
that. Example layouts and code are used to illustrate the usage of BAG. The prototype
of time-interleaved SAR ADC generator is implemented in intel22 FFL with six different
configurations. The testing boards have been fabricated and assembled. Figure 5.1 shows
the testing board design for the chip. and the future work will be finish the measurement
of the prototype chip . Also, there are several improvements worth to be considered in the
future:
1. Complete design script for SAR ADC. While the layout and schematic genera-
tors have been implemented, main design procedure is still similar to traditional design
methodology that relies on the designer’s interpretation of the simulation result. A design
script that captures key performances of circuits can be implemented to close the loop of
CHAPTER 5. CONCLUSION AND FUTURE WORK 49
design automation.
2. Improve the speed of SAR ADC generator. Currently the single slice of ADC
is optimized for a higher sampling rate. And multiple slices are necessary when the
sampling rate goes beyond the capability of the single-channel design. However, the
number of slices cannot go infinitely large because of non-idealities in channel mismatches.
Also, increasing time-interleaving ratio brings the challenges to the sampling network.
Therefore, techniques that can provide a higher sampling rate for this ADC generator are
worth investigating.
3. Improve the resolution of ADC generator. The resolution of this SAR ADC gener-
ator is limited. The resolution of the ADC can also be improved by sampling circuit with
higher accuracy, better element matching and new calibration method. Adding a second
stage after SAR ADC will also be an effective way to improve resolution. Among differ-
ent ADC architectures, ring-oscillator based voltage-controlled oscillator ADCs and ring
amplifiers have demonstrated some highly competitive design metrics in scaled technolo-
gies. By combining and selectively enabling these two techniques, an adaptive-resolution
architecture can be developed and will be made available for broader use.
50
References
[8] C. Liu. “27.4 A 0.35mW 12b 100MS/s SAR-assisted digital slope ADC in 28nm
CMOS”. In: 2016 IEEE International Solid-State Circuits Conference (ISSCC). 2016,
pp. 462–463. doi: 10.1109/ISSCC.2016.7418107.
[9] L. Kull et al. “22.1 A 90GS/s 8b 667mW 64 interleaved SAR ADC in 32nm digital
SOI CMOS”. In: 2014 IEEE International Solid-State Circuits Conference Digest of
Technical Papers (ISSCC). 2014, pp. 378–379. doi: 10.1109/ISSCC.2014.6757477.
[10] B. Razavi. “Design Considerations for Interleaved ADCs”. In: IEEE Journal of Solid-
State Circuits 48.8 (2013), pp. 1806–1817. doi: 10.1109/JSSC.2013.2258814.
[11] B. Razavi. “The StrongARM Latch [A Circuit for All Seasons]”. In: IEEE Solid-State
Circuits Magazine 7.2 (2015), pp. 12–17. issn: 1943-0590. doi: 10.1109/MSSC.2015.
2418155.
[12] A. Yu et al. “Understanding Metastability in SAR ADCs: Part II: Asynchronous”.
In: IEEE Solid-State Circuits Magazine 11.3 (2019), pp. 16–32. issn: 1943-0590. doi:
10.1109/MSSC.2019.2922890.
[13] J. P. Keane et al. “16.5 An 8GS/s time-interleaved SAR ADC with unresolved decision
detection achieving 58dBFS noise and 4GHz bandwidth in 28nm CMOS”. In: 2017
IEEE International Solid-State Circuits Conference (ISSCC). 2017, pp. 284–285. doi:
10.1109/ISSCC.2017.7870372.
[14] S. M. Chen and R. W. Brodersen. “A 6-bit 600-MS/s 5.3-mW Asynchronous ADC
in 0.13-µm CMOS”. In: IEEE Journal of Solid-State Circuits 41.12 (2006), pp. 2669–
2680. doi: 10.1109/JSSC.2006.884231.
[15] Wei Yu, Subhajit Sen, and B. H. Leung. “Distortion analysis of MOS track-and-hold
sampling mixers using time-varying Volterra series”. In: IEEE Transactions on Circuits
and Systems II: Analog and Digital Signal Processing 46.2 (1999), pp. 101–113. doi:
10.1109/82.752910.
[16] N. Kurosawa et al. “Explicit formula for channel mismatch effects in time-interleaved
ADC systems”. In: Proceedings of the 17th IEEE Instrumentation and Measurement
Technology Conference [Cat. No. 00CH37066]. Vol. 2. 2000, 763–768 vol.2. doi: 10.
1109/IMTC.2000.848838.
[17] E. Chang et al. “BAG2: A process-portable framework for generator-based AMS circuit
design”. In: 2018 IEEE Custom Integrated Circuits Conference (CICC). 2018, pp. 1–8.
doi: 10.1109/CICC.2018.8357061.
REFERENCES 52
[18] J. Han et al. “A Generated 7GS/s 8b Time-Interleaved SAR ADC with 38.2dB SNDR
at Nyquist in 16nm CMOS FinFET”. In: 2019 IEEE Custom Integrated Circuits Con-
ference (CICC). 2019, pp. 1–4. doi: 10.1109/CICC.2019.8780169.
[19] Wenbo Liu and Yun Chiu. “An equalization-based adaptive digital background cali-
bration technique for successive approximation analog-to-digital converters”. In: 2007
7th International Conference on ASIC. 2007, pp. 289–292. doi: 10 . 1109 / ICASIC .
2007.4415624.
[20] A. Papadopoulou, V. Milovanovic, and B. Nikolic. “A low-voltage low-offset dual
strong-arm latch comparator”. In: 2017 IEEE Asian Solid-State Circuits Conference
(A-SSCC). 2017, pp. 281–284. doi: 10.1109/ASSCC.2017.8240271.