Analog IC Design Using Precomputed Lookup Tables Challenges and Solutions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Received June 7, 2020, accepted July 14, 2020, date of publication July 21, 2020, date of current version

July 31, 2020.


Digital Object Identifier 10.1109/ACCESS.2020.3010875

Analog IC Design Using Precomputed Lookup


Tables: Challenges and Solutions
ABDELRAHMAN A. YOUSSEF 1, BORIS MURMANN 2, (Fellow, IEEE),
AND HESHAM OMRAN 1
1 Integrated Circuits Laboratory (ICL), Faculty of Engineering, Ain Shams University, Cairo 11517, Egypt
2 Department of Electrical Engineering, Stanford University, Stanford, CA 94305, USA
Corresponding author: Hesham Omran ([email protected])
This work was supported in part by the Egypt’s Information Technology Industry Development Agency (ITIDA) under
Grant PRP2018.R24.7.

ABSTRACT Design productivity remains an important aspect in the analog integrated circuit design industry,
as growing competition and shorter design cycles pressure the traditional flow that involves time-consuming
manual iterations in a circuit simulator. This paper describes innovations within an alternative framework
that uses precomputed look-up tables (LUTs) to enable fast and accurate evaluation of circuit sizing scenarios
without a simulator in the loop. It lets the designer explore and understand the design space boundaries in a
systematic setting, thus supporting informed decision making and architectural innovation that is difficult to
attain with fully automated, black-box sizing tools. Our discussion begins with an overview of the LUT-based
design paradigm and its two primary variants: inverse design (finding design parameters that meet the
specifications) and forward evaluation (sweeping design parameters to search the design space). In support
of the latter, the core of our work focuses on improving the accuracy and speed of LUT access, enabling
millions of queries within seconds on a standard computer. Large improvements over prior art are enabled
using enhanced interpolation methods, which allow for a relatively large LUT grid spacing (hence small
memory footprint) and yet accurate parameter lookup. We evaluate the efficacy of the proposed methods
using two classical analog circuits, a bandgap reference and a folded cascode amplifier. In the bandgap
example, we observe less than 1 ppm error between the LUT-predicted temperature coefficient and circuit
simulation. In the folded-cascode example, one million design points are generated in only 4 seconds,
providing the designer with useful maps that delineate the reachability of certain target specifications.

INDEX TERMS Systematic analog design, precomputed lookup tables, gm/ID methodology, analog design
automation, interpolation, bandgap voltage reference, folded cascode OTA.

I. INTRODUCTION on a circuit simulator, starting from an initial design point


The integrated circuits industry has seen dramatic devel- using rough hand-analysis, until fulfilling the design require-
opments over the past several decades, affecting fabrica- ments. This tedious ad-hoc process usually includes unin-
tion processes, design methodologies, and computer-aided formed design decisions, and leads to sub-optimal designs.
design (CAD) tools. Digital IC design has witnessed sub- In addition, this monotonous and time-consuming process
stantial enhancements in the design flow and the associated must be repeated for any change in the design specifications
CAD tools. On the other hand, no fundamental changes have or the process technology. Consequently, the analog part in a
been brought to the analog IC design flow. Analog design is complex chip is often the bottleneck in design cost and time.
a complicated problem that requires dealing with numerous The problem gets worse with the increasing complexity and
trade-offs and a large number of design variables; thus, it is the stringent time-to-market requirements of state-of-the-art
not straightforward to find a design point that meets all chips.
the required design specifications. Analog designers usually Several approaches have been proposed in the litera-
depend on their experience to tweak the design variables ture to address the productivity gap of the analog IC
design flow [1]–[3]. One of the early approaches was the
The associate editor coordinating the review of this manuscript and knowledge-based approach, where expert designers trans-
approving it for publication was Omid Kavehei . form their heuristic design procedure (plan), which is based

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
134640 VOLUME 8, 2020
A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

on knowledge and experience, into a computer program. One


main drawback of this approach is the large discrepancies
between the results of the authored design plan and the
simulation results, because the designer uses simplified mod-
els in the plan. The simulation-based optimization approach
alleviates this drawback by relying on a circuit simulator to
solve any arbitrary circuit using accurate and sophisticated
device models. This approach was the one that survived in
the market, and was implemented in the commercial design
tools offered by major electronic-design automation (EDA)
companies, e.g., [4]. However, the simulation-based approach
suffers from several limitations that hindered its wide-spread
acceptance in the design community. First, this approach
relies on invoking the simulator at every iteration in the
optimization procedure (i.e., SPICE-in-the-loop); thus, it suf-
fers from long execution time, especially for circuits with
large number of variables (degrees of freedom, DOFs).
Second, in addition to the license of the optimization tool
itself, the optimizer takes several seats from the pool of
expensive simulator licenses shared by the designers. Third,
the designer is completely detached from the circuit, as the
optimizer does not offer insights into the circuit behavior,
the achievable design metrics, or the trade-offs between
different specifications.
A promising approach that reconnects the designer to the
design problem while boosting the productivity is the use of
precomputed lookup tables (LUTs) [5], [6]. The key idea in
FIGURE 1. Design scenarios using precomputed LUTs: (a) Using a
this design paradigm is to abstract the complex device models knowledge-based design plan to solve the inverse problem. (b) Solving
of modern devices in the form of LUTs. These LUTs are the direct (forward) problem for an array of design points to search the
generated by the simulator for a set of reference devices once design space.

per technology. The designer can then use these LUTs to


author systematic design plans for a circuit without invoking
the simulator, while achieving simulator-accurate results [5], less time, effort, and expertise. In addition, the process can
[7], [8]. This design scenario (depicted in Fig. 1a) resem- be automated by using a symbolic circuit solver (e.g., [9]).
bles the knowledge-based approach; however, it replaces Moreover, this scenario lends itself to vectorization, i.e., an
the simplified and inaccurate large-signal models with the array of design points can be processed simultaneously [10].
simulator-accurate LUTs. Thus, there is no gap between the Two usage models are possible for this design scenario. First,
systematic design procedure and the simulation results. How- the designer can generate design charts that show specs in
ever, this design scenario does not address another impor- the design space or illustrate the trade-offs between different
tant drawback in the knowledge based approach, which is specs. Second, an optimizer can be used to generate a new
the requirement to solve an inverse problem. Solving the array of design points in a loop to search for the optimal
inverse problem (find sizing and bias conditions of devices design point that satisfies the required constraints. The later
given design specifications) requires high-level of expertise, usage model resembles the simulation-based optimization
considerable time and effort, and is many times impossible. approach, but it uses LUT-in-the-loop instead of SPICE-in-
Moreover, for circuits with many DOFs, the expert designer the-loop.
must make assumptions regarding several DOFs in order to The aforementioned LUT-based design scenario can
break deadlocks and simplify the design procedure [7], [8]. address the shortcomings of knowledge-based and
Consequently, creating a design plan for a new design prob- optimization-based approaches, and set a new paradigm
lem is not a straight-forward task, and the results will not be for analog IC design. The results are simulator-accurate,
optimal. the execution time can be very fast, no simulator license
Another LUT-based design scenario that addresses the required, design trade-offs are explored, optimal design point
aforementioned problem is depicted in Fig. 1b. In this sce- is targeted, and adding new design problems or circuit topolo-
nario, the inverse problem is replaced by the direct (forward) gies does not require excessive effort or expertise. How-
problem, i.e., find specifications given design point (sizing ever, the recent implementations of the LUT-based design
and biasing of devices). Solving the forward problem is sig- flow suffer from several limitations that must be addressed
nificantly easier than the inverse problem, as it requires much first [5], [6]. This papers aims at discussing the challenges

VOLUME 8, 2020 134641


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

of the LUT-based design paradigm and proposing practical


solutions.
The rest of the paper is organized as follows: Sec. II
discusses the LUT-based analog IC design paradigm and
its key challenges. Sec. III and IV propose solutions that
address these challenges. Sec. V presents two design exam-
ples to illustrate the merits of the proposed solutions. Finally,
the conclusions are presented in Sec. VI.

II. THE LUT-BASED ANALOG IC DESIGN PARADIGM


A. LUTs STRUCTURE
The lookup tables (LUTs) are tables that store the device
characteristics across its different DOFs. LUTs can be created
for any type of device, but this work focuses on the MOSFET
as it is the ubiquitous device in analog IC design. The MOS-
FIGURE 2. A simplified illustration showing a subset of MOSFET
FET is a four-terminal device; thus, its characteristics are parameters (ID , gm , gds , gmb ) stored in 4D LUTs. All device parameters
controlled by three independent voltage differences, namely, needed by the designer can be similarly stored in the LUTs.
VGS , VDS , and VSB . The MOSFET characteristics also depend
on the device sizing, i.e., the channel width (W ) and length
(L). For example, the drain current (ID ), which is the primary find the biasing and sizing conditions of every device in
MOSFET dependent variable, can be written as a function of the circuit to achieve a set of required specifications. Thus,
five independent variables (five DOFs) strictly speaking, the designer must specify five DOFs
(W , L, VGS , VDS , VSB ) for every device. However, for analog
ID = f (W , L, VGS , VDS , VSB ) . (1)
IC design the MOSFET is usually biased in saturation; thus,
Since, the MOSFET IV and CV characteristics are propor- it basically operates as a voltage-controlled current source
tional to W regardless of the device operating region (i.e., bias (VCCS). Consequently, VDS is of secondary importance, and
conditions), we can rewrite (1) as it is usually set to be above the drain-saturation voltage
(VDSAT ) by some margin to guarantee that the device is biased
ID = W · f (L, VGS , VDS , VSB ) . (2)
in saturation. Moreover, VSB is usually imposed by the circuit
Consequently, we can consider normalized quantities of dif- topology. Thus, the designer’s task boils down to specifying
ferent MOSFET parameters with respect to W , and the DOFs only three DOFs (W , L, VGS ) for every device.
are reduced to four. The absolute quantities at any W can Analog circuits are usually biased by current mir-
be simply computed using cross multiplication. The pro- rors, i.e., we set the device current (ID ) rather than the
portionality with W does not hold accurately for devices device voltage (VGS ). Thus, from a practical perspective,
with small W due to narrow-width effects [5], [11]; how- the DOFs become (W , L, ID ). In the conventional design
ever, narrow-width devices are not usually used in analog flow, the designer (based on knowledge and experience)
IC design. Thus, from a practical perspective, this small sweeps these DOFs (W , L, ID ) using a circuit simulator to
deviation can be ignored [5]. explore the design space. However, biasing a given device
Using a circuit simulator, a nested sweep is performed using these DOFs is not straightforward because sweeping
for a reference device (a device with a reference W ) across any of these three variables (W , L, ID ) changes the MOSFET
the four DOFs. Every parameter desired by the designer inversion level (bias point); thus, the search process becomes
(e.g., ID , gm , etc.) can be saved in the form of a 4D LUT complicated. Moreover, the search range of W is quite large,
as illustrated in Fig. 2. Each LUT is a 4-D grid that stores and it depends on both L and ID . Replacing W in the DOFs
the data of a specific parameter along the grid vectors with W /L does not help much because sweeping ID still
{L, VGS , VDS , VSB }. This procedure is repeated for every changes the inversion level, sweeping L changes the device
device, and the LUTs of each device can be grouped in a physics (i.e., the IV characteristics) and consequently the
single structure for convenient access [12]. It should be noted inversion level, and the search range of W /L is still dependent
that these LUTs are generated only once per technology, on ID .
i.e., there is no need to invoke the circuit simulator again after The LUT-based design flow can address these shortcom-
LUTs generation. For detailed description on how to gener- ings by replacing W with gm /ID , which is commonly referred
ate the LUTs for a given technology, the reader is referred to as the gm /ID design methodology [5], [7], [13]–[18]. The
to [5], [12]. new set of DOFs (gm /ID , L, ID ) enables ‘‘orthogonal’’ search
for the device design point. The MOSFET inversion level
B. DOFs IN LUT-BASED DESIGN is solely determined by the gm /ID ratio, independent of L
The role of the analog designer is to select a circuit topol- and ID . When the designer sweeps L or ID , the new corre-
ogy that is expected to meet the design requirements, then sponding W is retrieved from the LUTs while keeping gm /ID

134642 VOLUME 8, 2020


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

several device parameters, the LUTs of a single device using


relatively fine steps can easily jump to the GB range. The
LUTs storage is not a major problem since the capacity
of permanent storage is in the TB range. However, noting
that the LUTs must be loaded to the RAM to perform the
lookup operation, this may be a significant challenge. The
problem becomes worse in modern technologies because
there are many flavors for the MOSFET (e.g., high-VT ,
normal-VT , and low-VT ). Moreover, for a variation-aware
FIGURE 3. An example of a lookup operation for ID . Multivariate linear
interpolation is used in the conventional approach.
design, the LUTs of every device must be extracted at several
process and temperature corners. As a result, the amount of
data to load in the memory can simply become impractical.
unchanged (i.e., bias point unchanged). Another benefit is Besides accuracy and LUTs size, the performance of the
that the search range of gm /ID is limited (typically 3 to lookup operation is another key requirement. This is espe-
30 S/A), and is independent of L and ID . The gm /ID method- cially important for the forward problem scenario (see Fig.1b)
ology can be applied using design charts, but using LUTs which involves performing a huge number of lookup oper-
enables automating the process [5], [7]. For deep subthresh- ations. It is worth noting that replacing the SPICE-in-the-
old design, the gm /ID ratio saturates; thus, it can be replaced loop approach with LUT-in-the-loop will be attractive only
by the current density (JD = ID /W ) as a single orthogonal if it offers substantial speedup. Using a simple interpolation
DOF that controls the inversion level [5]. It should be noted procedure and small-size LUTs can help boosting the perfor-
that although VDS and VSB are not considered as primary mance; however, both may come at the expense of accuracy.
designer DOFs, their effect are still taken into account as they Therefore, using an efficient and vectorized implementation
are included in the LUTs. for the lookup operation is indispensable to boost the perfor-
mance [10].
C. CHALLENGES OF LUT-BASED DESIGN
III. THE LOOKUP OPERATION
The fundamental operation in the LUT-based design flow is
A. CONVENTIONAL LOOKUP
the lookup operation, i.e., the operation of retrieving device
parameters from the LUTs at a given query point. The merits The LUT-based design methodology relies on interpolation to
of the LUT-based design paradigm can be true only if the implement the lookup operation. Linear interpolation is one
following three requirements are satisfied: of the simplest methods to estimate values that lie between
known data points, which we will refer to as ‘‘knots’’. Simply,
1) the lookup operation is accurate;
the interpolant is formed by joining each two consecutive
2) the LUTs have reasonable size; and
knots using a straight line as shown in Fig. 4. Each straight
3) the lookup operation is fast.
line is completely defined by specifying two unknowns; thus,
These three requirements are usually conflicting, so address- only two points are needed to compute the interpolant, and the
ing them simultaneously is a threefold challenge. interpolant is unique. Linear interpolation has low computa-
The LUTs has inherently finite accuracy due to the finite tional complexity; however, it has poor accuracy, especially
steps used to build the 4D grids. When the query point when the grid step is large. Moreover, the interpolant is not
(LQ , VGSQ , VDSQ , VSBQ ) is off-grid, the lookup operation is differentiable at the knots.
basically a multivariate interpolation process as depicted
in Fig. 3. Since the key advantage of the LUT-based design
paradigm is being simulator-accurate, this interpolation pro-
cess must maintain acceptable accuracy. The required accu-
racy may differ from one design to another, but in general
high accuracy (e.g., error < 0.1%) is required for the design of
high-precision circuits and for iterative procedures (to avoid
error accumulation and/or divergence) [6], [8]. The lookup
accuracy can be improved by:
1) using finer step size when building the LUTs; however,
this comes at the expense of size and speed; and
2) using high-order interpolation; however, this comes at
FIGURE 4. An example of linear, spline, and pchip interpolation.
the expense of speed.
The second challenge is the LUT size, which is in direct The work in [5] and [12], which we will refer to as
conflict with the accuracy as previously mentioned. Halv- the conventional approach, uses linear interpolation for the
ing the step size in the four dimensions results in 16-fold lookup operation. A simplified illustration for the conven-
increase in the LUT size. Since the designer needs to save tional lookup operation is shown in Fig. 5. The 2D grid

VOLUME 8, 2020 134643


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

Several methods can be used to estimate the slopes (dk


and dk+1 ), i.e., there is no unique cubic interpolant.
The most common methods are spline interpolation and
shape-preserving pchip as shown in Fig. 4. In cubic spline
interpolation, the slopes are calculated such that the inter-
polant has continuity in both first and second derivatives
at each knot. Cubic splines have better smoothness than
any other cubic interpolant, and they give the best results
when interpolating smooth data. However, this approach
has high computational complexity as it requires solving
a system of equations to get the slope values. In addi-
tion, the monotonicity of the interpolant is not guaranteed
even if the knots are monotonic (see Fig. 4). On the other
hand, the pchip approach estimates the slopes to generate
shape-preserving interpolants, i.e., monotonic interpolants
for monotonic data [19], [20]. These interpolants have less
FIGURE 5. A simplified example for multivariate linear interpolation in smoothness than splines since they have continuity in the
conventional lookup. The procedure can be similarly applied to 4D data.
first derivatives only. However, the monotonic behavior is
desirable in many applications. Moreover, it has low compu-
represents the drain current (ID ) along two dimensions: VGS tational complexity as it does not require solving a system of
and VDS . The value of ID is represented by the color and size equations to compute the slopes.
of every point. The grid is formed by using VGS and VDS Since VGS is the primary variable controlling the MOSFET
vectors with a step of 50 mV and 200 mV , respectively. To get behavior, the work in [6], which we will refer to as the modi-
ID at the query point (VGSQ = 0.425 V , VDSQ = 0.5 V ), fied approach, proposed that using cubic (pchip) interpolation
first, linear interpolation is applied along VDS to get ID at the in the VGS dimension only was sufficient to perform accurate
points that have VGS values around VGSQ (0.4 V < VGSQ < lookup operation for the design of high-precision circuits.
0.45 V ). Then another linear interpolation is applied to get the The modified lookup operation is a two-step interpolation
value of IDQ corresponding to VGSQ . The same multivariate process, as shown in Fig. 6. First, it applies linear interpola-
interpolation method can be applied on 4D LUTs. In addition tion on all dimensions except VGS , then 1D monotonic pchip
to the poor accuracy of the linear interpolatin, the work in [5] is applied versus VGS only. This is further illustrated in Fig. 7.
and [12] does not support vectorization, i.e., it cannot handle The modified lookup operation applies linear interpolation
a set of scattered query points simultaneously. along VDS to the points that have VGS values surrounding
VGSQ (0.35 V < 0.4 V < VGSQ < 0.45 V < 0.5 V ). Then the
B. MODIFIED LOOKUP resultant values (points lying on the horizontal dashed line)
In order to address the accuracy shortcoming in linear inter- are joined by a pchip interpolant versus VGS . The monotonic
polation, piecewise cubic interpolating polynomials can be pchip uses two-sided formula to estimate the slopes of the
used to interpolate the data points. Each two adjacent knots interior points [19], [20], which is why the drain current ID
are joined with a distinct cubic polynomial [19]. A cubic values are evaluated at the outer points (VGS = 0.35 V and
polynomial has four unkowns; thus, it is defined by four con- 0.5 V ). Once the slopes are determined, the interpolant is
straints. Consequently, function values at the two knots are computed as given by (3), and the value of the drain current
not sufficient to define a unique cubic polynomial. Therefore, (IDQ ) corresponding to VGSQ can be evaluated. It should
two additional points are used to estimate the first derivative be noted that if the query point is next to the endpoints
values (i.e., slopes) at the knots. In order to guarantee con- of the grid vector, a one-sided formula is used to estimate
tinuity in the first derivative, the cubic Hermite interpolant the slope [19], [20]. One sided formulas usually results in
between the two knots (xk , yk ) and (xk+1 , yk+1 ) is defined worse estimations, so the interpolation error will be larger.
as [19]
3hs2 − 2s3 h3 − 3hs2 + 2s3
Pk (x) = y k+1 + yk
h3 h3
s2 (s − h) s(s − h)2
+ 2
dk+1 + dk . (3)
h h2
where,
xk < x < xk+1
h = xk+1 − xk , s = x − xk
FIGURE 6. The modified lookup operation. Pchip is used for the VGS -axis
dk = P0k (xk ), dk+1 = P0k (xk+1 ). only.

134644 VOLUME 8, 2020


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 8. Simplified illustration for the proposed lookup operation. The


slopes are retrieved from the LUTs themselves rather than being
estimated.

FIGURE 7. A simplified example for the modified lookup operation. The


procedure can be similarly applied to 4D data.

The modified lookup operation has better accuracy but


slightly lower speed compared to the conventional lookup
operation. Both implementations are not vectorized, so they
are inefficient for high-dimensional design space exploration.

C. PROPOSED LOOKUP
The proposed lookup method aims at simultaneously address-
ing the three challenges discussed in Sec. II-C (accuracy, size,
and speed). A key idea in the proposed method is to make
use of the MOSFET parameters stored in the LUTs structure
to enhance the interpolation process. For example, if it is
required to lookup ID , the modified lookup approach used
pchip interpolation along the VGS dimension to estimate the FIGURE 9. A simplified example for the proposed lookup operation. The
procedure can be similarly applied to 4D data.
slopes. However, the slope values (dk and dk+1 ) used to define
the interpolant in (3) need not be estimated using the pchip
approach. Instead, the slopes can be extracted from the LUTs
structure itself. The slope is defined by
and Fig. 7. Linear interpolation along VDS is applied at VGS =
∂ID
= gm . (4) 0.4 V and 0.45 V . The same linear interpolation is applied on
∂VGS the gm grid. Then, the unique cubic interpolant of ID versus
Since the gm LUT is already stored in the LUTs structure, VGS is defined using (3), and the value of IDQ corresponding
slope values at the knots can be directly extracted from to VGSQ can be evaluated. Only two knots are needed in
it, as depicted in Fig. 8. Since the gm LUT is simulator- this case, since the slopes at these knots are provided from
accurate, the slope values provided from the gm LUT will the gm LUT. The same idea can be applied to interpolation
provide much better accuracy compared to the mathematical versus VDS and VSB if needed, as the slopes (gds and gmb ,
estimations (e.g., spline and pchip). The improved accuracy respectively) are also stored in the LUTs structure.
is also achieved at the endpoints because the drawback of The accuracy of the proposed lookup operation can be
using a one-sided formula does not exist. In addition to the further improved by taking into consideration the behavior
improved accuracy, the proposed approach enables higher of the MOSFET IV characteristics. In strong inversion (SI),
speed and smaller LUT size. First, the slope estimation step ID depends on VGS α , where α typically ranges from 1 to 2.

is skipped, which improves the performance. Second, for a Thus, using the cubic interpolant can faithfully mimic the
given accuracy, a larger grid step (i.e., smaller LUT size) actual IV characteristics. However, in weak inversion (WI),
can be used compared to the conventional and modified ID has exponential dependence on VGS . Thus, the cubic inter-
approaches, which reduces the LUTs size and also improves polant will not be able to follow the exponential trend, and the
the performance. interpolation error will increase. In order to achieve consistent
The proposed lookup operation is illustrated in Fig. 9, accuracy across all operating regions, the interpolation can be
which uses the same grid and the same query point as in Fig. 5 applied to ln ID rather than ID . The slopes at the knots will

VOLUME 8, 2020 134645


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 10. Lookup operation accuracy: percent error in ID vs VGS .

then be given by of the lookup operation is substantially reduced. In addition,


∂(ln ID ) gm the proposed implementation supports performing the lookup
= , (5) operation on an array of scattered query points, which allows
∂VGS ID
fast exploration of the design space.
which can also be retrieved from the LUTs.
The previous discussion mainly considers ID , which is
the most important MOSFET parameter, especially when D. RESULTS AND DISCUSSION
solving iterative procedures [6]. The second most important In order to compare the proposed lookup approach with the
MOSFET parameter is the transconductance (gm ). Since the conventional and modified approaches, two LUTs structures
derivative of gm is not available in the LUTs, the previous were generated: coarse LUTs with VGS step of 50 mV and
approach cannot be directly used. However, noting that the fine LUTs with VGS step of 5 mV . LUTs data throughout
interpolant of ID has been already obtained with improved this paper are generated from a 180 nm CMOS technology;
accuracy using the proposed approach, the interpolant of gm however, the precomputed LUTs design paradigm can be
can be defined as the derivative of the interpolant of ID , similarly applied to more advanced technologies [5]. The
i.e., the derivative of (3) lookup operation is applied to the coarse LUTs to estimate
the MOSFET parameters at every 5 mV change in VGS . The
6hs − 6s2 6s2 − 6hs
P0k (x) = y k+1 + yk interpolation errors can be calculated by calculating the dif-
h3 h3 ference between the coarse LUTs estimated values and the
2
3s − 2hs 3s2 − 4hs + h2
+ d k+1 + dk . (6) fine LUTs simulator-accurate values.
h2 h2 The relative error of ID is plotted versus VGS in Fig. 10.
This means that gm uses a parabolic interpolant (i.e., second Several observations can be made from this figure. First, all
order polynomial). However, since the cubic interpolant of lines have nulls every 50 mV because this is the coarse LUTs
ID is already accurate, this quadratic interpolant of gm can VGS step. Second, the proposed approach achieves orders of
give overall better accuracy than the cubic pchip used in magnitude smaller error compared to both conventional and
the modified lookup approach, in addition to a performance modified approaches. Third, due to the use of the logarithmic
advantage. transformation, the error reduction is significantly larger in
The proposed solutions indirectly provide some improve- WI and the error variation of the proposed approach across
ment in the performance of the lookup operation. How- operating regions is much less than the conventional and
ever, significant additional performance enhancement can be modified approaches. Fourth, achieving an error less than
achieved by using an efficient and vectorized implementa- 0.01 % using coarse LUTs of 50 mV step enables significant
tion. The conventional and modified lookup approaches do reduction of LUTs size without sacrificing accuracy.
not support vectorized operations. Moreover, every time the The relative error of gm is plotted versus VGS in Fig. 11. The
lookup operation is invoked it performs redundant and slow proposed approach achieves orders of magnitude error reduc-
checks and preprocessing on the input arguments, in addition tion compared to the conventional approach. The lookup
to creating a new gridded interpolant object for every call. error in WI is more than one order of magnitude better
In order to drastically enhance the performance, the proposed than the modified approach, although the modified approach
approach eliminates these redundant operations, and creates uses cubic interpolation. This does not hold in SI due to the
and stores the gridded interpolant objects themselves, rather logarithmic transformation applied in all regions. However,
than the raw 4D arrays. This is well aligned with the precom- the proposed approach provides overall better results because
puting spirit promoted in the proposed work. Moreover, it has the error is consistent across all operating regions, in addition
negligible impact on the LUTs size. Consequently, the time to being sufficiently low. On the other hand, the error of the

134646 VOLUME 8, 2020


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 11. Lookup operation accuracy: percent error in gm vs VGS .

FIGURE 13. Conventional VGS lookup operation using two-step


interpolation.

becomes more than three orders of magnitude. This enables


very fast exploration of high-dimensional design spaces as
will be seen in Sec. V-B.

IV. THE VGS LOOKUP OPERATION


A. CONVENTIONAL VGS LOOKUP
The LUT-based design scenario uses gm /ID or JD as a knob
to set the device bias point as explained in Sec. II-B. More-
over, design procedures often include scenarios where the
bias current (ID ) is known and it is required to get the bias
voltage (VGS ) [6]. However, the normal lookup operation
shown in Fig. 3 does not allow gm /ID or JD in the query point.
FIGURE 12. Performance of the lookup operation: (a) Comparison of Therefore, another lookup operation is needed to look up VGS
lookup time vs number of query points. (b) Speedup of the proposed given (L, gm /ID , VDS , VSB ) or (L, JD , VDS , VSB ).
approach vs the conventional and modified approaches.
The conventional implementation of this VGS lookup oper-
ation uses a two-step interpolation process that includes axes
swapping as depicted in Fig. 13 [5], [12]. First, it computes JD
modified approach varies by four orders of magnitude and is (or gm /ID ), for all values in the VGS grid vector. Next, the axes
unacceptable in WI. are swapped, i.e., JD becomes the grid vector (X-axis) and
The performance of the proposed lookup is compared VGS is treated as the dependent variable (Y-axis), and 1D
to the conventional and modified approaches in Fig. 12. pchip interpolation is performed to get the required VGSQ
All functions are implemented in MATLAB and the test is value corresponding to JDQ in the query point. This is differ-
performed on the same machine (Core-i7-6500U CPU and ent from the modified lookup operation in Fig. 6, which does
12 GB RAM) for a fair comparison. Since the execution time not involve swapping the axes. For JD (or gm /ID ) to be a valid
may fluctuate from one run to another, a large number of grid vector, it must be strictly monotonic; thus, it may have to
runs is performed using random query points, then the results be trimmed before performing the 1D interpolation [5]. The
are averaged out. Fig. 12 shows that there is a considerable modified lookup approach in [6] uses the same conventional
speedup even for a single query point by virtue of removing approach for VGS lookup operations.
the redundant checks and preprocessing. For a large number The VGS lookup operation is illustrated in Fig. 14, where
of query points, the vectorization kicks in, and the speedup it is required to get VGS value corresponding to the query

VOLUME 8, 2020 134647


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 14. An example of the conventional VGS lookup operation.

point JDQ = 1 µA/µm at VDSQ = 0.5 V . First, linear


FIGURE 15. Proposed LUTs structure. Two new LUTs are added for fast
interpolation is applied to get all JD values corresponding to and vectorized VGS lookup operation.
the query vector formed using the grid vector of VGS (points
on the horizontal dashed line). Then the resultant points are
plotted with JD on X-axis and the corresponding VGS values
applied. To look up VGS given ln JD , the slopes are given by
on Y-axis, and pchip interpolation is applied to get VGSQ
corresponding to JDQ . ∂VGS JD · W
This VGS lookup operation unnecessarily uses pchip inter- = , (7)
∂(ln JD ) gm
polation to estimate the slopes for the cubic interpolant.
In addition, the two-step interpolation process is slow, and which can be retrieved from the LUTs, where W is the
cannot be vectorized for a set of scattered query points. width of the reference device. As previously discussed,
Thus, it will represent a bottleneck in any LUT-based design the simulator-accurate slopes will yield more accurate inter-
procedure, even with the performance enhancement achieved polants. These slopes can be also used while building the VGS
for the lookup operation presented in Sec. III-D. LUTs in the precomputation step.

B. PROPOSED VGS LOOKUP C. RESULTS AND DISCUSSION


In order to overcome the limitations of the inefficient two-step Similar to Sec. III-D, a coarse LUT with 50 mV VGS step
VGS lookup operation, we can resort to the key technique in and a fine LUT with 5 mV VGS step are used to compare
the proposed LUT-based design paradigm: precomputation. the accuracy of the conventional and proposed VGS lookup
The two-step process depicted in Fig. 13 can be done in operations. JD values that correspond to the VGS grid vector
advance by going through all the possible query vectors in are extracted from the fine LUT. Then this JD vector is used
a nested loop (i.e., iterate through every L, VDS , and VSB ). as query points for the VGS lookup operation using the coarse
The resultant is two new LUTs added to the LUTs structure: LUT. The interpolation error is calculated as the difference
VGS vs ln JD and gm /ID grid vectors as shown in Fig. 15. between the interpolated VGS and the fine LUT VGS grid vec-
It is important to use ln JD rather than JD as the grid vector tor. The error for the conventional and proposed approaches
since JD spans several orders of magnitude. Precomputing is shown in Fig. 16. Although the conventional method uses
the new VGS LUTs is done only once per technology similar pchip interpolation, the proposed method achieves signifi-
to the original LUTs structure. It should be noted that these cantly better accuracy, especially in WI where the error is
two new LUTs are precomputed from the LUTs structure reduced by up to two orders of magnitude.
itself without invoking the simulator again. Since the LUTs The performance of the proposed VGS lookup is compared
structure already contains many LUTs for different MOSFET to the conventional method in Fig. 17. For a single query
parameters, adding the two new LUTs will have a minor point, the speedup is more than one order of magnitude.
effect on its size. As the number of points increases, the speedup approaches
Using these two new LUTs, the VGS lookup operation can four orders of magnitude. Due to eliminating the two-step
be treated similar to the normal lookup operation discussed in interpolation with axes swapping (see Fig. 13), the VGS
Sec. III. Moreover, the idea of using the data in the LUTs to lookup speedup is one order of magnitude higher than the
estimate the slopes of the cubic interpolant can be similarly speedup of the normal lookup operation shown in Fig 12.

134648 VOLUME 8, 2020


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 16. Accuracy of VGS lookup operation: percent error in VGS vs JD . The modified lookup approach in [6] is the same
as the conventional approach for VGS lookup operations.

FIGURE 18. Schematic of the bandgap voltage reference circuit used as a


design example.

show the effect of lookup errors. The design procedure of


the bandgap circuit shown in Fig. 18 is presented in detail
in [6]. One of the key metrics of this circuit is the dependence
of the output reference voltage VREF on temperature, which
is characterized by the temperature coefficient (TC). For a
temperature range from TMIN to TMAX , TC in ppm is defined
as
FIGURE 17. Performance of the VGS lookup operation: (a) Comparison of 106 VREF,MAX − VREF,MIN
lookup time vs number of query points. (b) Speedup of the proposed TC (ppm) = · . (8)
approach vs the conventional approach. The modified lookup approach VREF,TNOM TMAX − TMIN
in [6] is the same as the conventional approach for VGS lookup
operations. where TNOM is the nominal temperature.
The circuit is synthesized using LUTs with 25 mV VGS
step, and the synthesized circuit is simulated using Cadence
V. DESIGN EXAMPLES Spectre to compare synthesis results against simulations. The
In order to appreciate the importance of the solutions pro- synthesis procedure involves both normal lookup operations
posed in this work, it is important to put them in the context and VGS lookup operations. The lookup approaches discussed
of practical design examples. Two design examples are dis- in Sec. III and Sec. IV are compared in Fig. 19. The results
cussed in this section. The first example is a bandgap voltage show VREF vs temperature at ρN = ρP = 20 S/A, where
reference circuit, which benefits mainly from the improved ρN and ρP are the gm /ID ratio of the NMOS and PMOS
accuracy and the reduced LUTs size. The second example is transistors, respectively. In addition, Fig. 19 shows the con-
a folded cascode OTA, which shows how the vectorized and tours of TC in the ρN − ρP space in order to study the effect
fast lookup implementation can be used to explore the design of the transistor operating region on the results. As evident
trade-offs in multi-dimensional design problems. from Fig. 19a, the conventional approach fails to provide
accurate results, and the difference between synthesis and
A. BANDGAP VOLTAGE REFERENCE simulation is quite large. The modified approach provides
The bandgap voltage reference is a precision circuit that better results as shown in Fig. 19b; however, the contour
is sensitive to small errors; thus, it is a good example to plot shows that the error is significant when the devices are

VOLUME 8, 2020 134649


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

FIGURE 20. Schematic of the folded cascode OTA used as a design


example.

current is evenly split between the common-source and the


common-gate stages [21], the bias current of all transistors
boils down to a single DOF, and the total number of DOFs is
reduced to 11.
In order to explore the design space, a set of 106 design
points is generated. For every design point, the two DOFs of
every transistor (L and ρ) are randomly selected from a uni-
form distribution. L is selected in the interval (0.2 µm, 2 µm)
and ρ is selected in the interval (5 S/A, 25 S/A). This allows
exploring short/long channel behavior and WI/SI biasing.
The total bias current (IB ) is randomly selected from a uni-
form logarithmic distribution in the interval (0.1 mA, 10 mA).
Using randomly generated design points gives a more uni-
form exploration for the design space compared to a grid
search. The load capacitance (CL ) is considered an external
constraint, and is set to 2 pF in this design example. The
design metrics are computed using vectorized evaluation of
symbolic expressions according to the scenario in Fig. 1b.
FIGURE 19. Comparison of synthesis and simulation results for the It should be noted that this does not sacrifice the accuracy
bandgap reference circuit showing VREF vs temperature and TC in the
ρN − ρP space, where ρN and ρP are the gm /ID ratio of the NMOS and
because all MOSFET parameters substituted in the symbolic
PMOS transistors, respectively. (a) The conventional lookup approach. equations are extracted from the LUTs.
(b) The modified lookup approach. (c) The proposed lookup approach.
Remarkably, synthesizing the 106 design points and com-
puting the design metrics takes 4 s only. Generating this large
dataset using the conventional lookup method requires more
biased in WI. Fig. 19c shows that the proposed approach than three orders of magnitude more time. On the other
achieves impressing matching between synthesis and simula- hand, it will require impractical time using simulation-based
tion results across all operating regions with less than 1 ppm techniques. This huge amount of data gives the designer
error. Achieving this level of accuracy using 25 mV VGS step endless possibilities to gain insights and examine design
enables easily generating tables at different temperature and trade-offs using design charts. As an example, Fig. 21a shows
process corners, which is essential in the design of this type of the design points in the DC gain (Avo ) vs gain-bandwidth
circuits. In addition to PVT variations, process mismatch can product (GBW ) space. Moreover, phase margin (PM ) and
be also taken into account using the LUTs as shown in [6]. fan-out (FO) constraints are applied to the design, where FO
is defined as the ratio of the load capacitance to the input
B. FOLDED CASCODE OTA capacitance. The chart in Fig. 21a tells what range of gain
The folded cascode is one of the most popular OTA topologies is achievable? What range of GBW is achievable? How PM
due to its flexible input range. Fig. 20 shows a schematic of and FO affect the achievable gain or GBW ? Is putting more
a fully differential folded cascode OTA. Due to symmetry, current going to improve or hurt the GBW given the gain is
the half-circuit principle can be applied, and the designer kept constant? All these questions cannot be answered using
needs to specify the DOFs of five transistors only. Accord- the simulation-based optimization approach.
ing to the discussion in Sec. II-B, each transistor has three Fig. 21b shows another design chart which explores the
DOFs (L, ρ = gm /ID , ID ). However, assuming the bias power-speed trade-off of the OTA, and how this trade-off is

134650 VOLUME 8, 2020


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

selection and guidance for the invention of new circuits.


It is conceivable to couple our fast evaluation method to
an advanced optimization tool to navigate optimal sizing
conditions and topology changes without a simulator in the
loop.

REFERENCES
[1] S. Pandit, C. Mandal, and A. Patra, Nano-Scale CMOS Analog Circuits:
Models and CAD Techniques for High-Level Design. Boca Raton, FL,
USA: CRC Press, 2014.
[2] M. F. Barros, J. M. Guilherme, and N. C. G. Horta, Analog Circuits and
Systems Optimization Based on Evolutionary Computation Techniques.
Berlin, Germany: Springer, 2010.
[3] R. A. Rutenbar, G. G. E. Gielen, and B. A. Antao, Eds., Computer-Aided
Design of Analog Integrated Circuits and Systems. Hoboken, NJ, USA:
Wiley, 2002.
[4] Cadence Design Systems. Virtuoso Analog Design Environment GXL.
Accessed: Jun. 1, 2020. [Online]. Available: https://www.cadence.com
[5] P. Jespers and B. Murmann, Systematic Design of Analog CMOS Circuits
Using Pre-Computed Lookup Tables. Cambridge, U.K.: Cambridge Univ.
Press, 2017.
[6] H. Omran, M. H. Amer, and A. M. Mansour, ‘‘Systematic design of
bandgap voltage reference using precomputed lookup tables,’’ IEEE
Access, vol. 7, pp. 100131–100142, 2019.
[7] M. N. Sabry, H. Omran, and M. Dessouky, ‘‘Systematic design and opti-
mization of operational transconductance amplifier using gm /ID design
methodology,’’ Microelectron. J., vol. 75, pp. 87–96, May 2018.
[8] M. N. Sabry, I. Nashaat, and H. Omran, ‘‘Automated design and opti-
mization flow for fully-differential switched capacitor amplifiers using
recycling folded cascode OTA,’’ Microelectron. J., vol. 101, Jul. 2020,
Art. no. 104814.
FIGURE 21. Design charts of folded cascode OTA. (a) DC gain vs GBW [9] A. Montagne. SLiCAP: Symbolic Linear Circuit Analysis Program.
with PM and FO constraints. (b) Bias current vs GBW with PM and DC Accessed: Jun. 1, 2020. [Online]. Available: https://www.analog-
gain constraints. electronics.eu/slicap
[10] The MathWorks. MATLAB Documentation. Accessed: Jun. 1, 2020.
[Online]. Available: https://www.mathworks.com/help/matlab/matlab_
affected by PM and gain constraints. The maximum achiev- prog/vectorization.html
[11] J. Ou and P. M. Ferreira, ‘‘Implications of small geometry effects on gm /ID
able GBW is reduced by one order of magnitude due to based design methodology for analog circuits,’’ IEEE Trans. Circuits Syst.
the gain specification regardless of the current consumption. II, Exp. Briefs, vol. 66, no. 1, pp. 81–85, Jan. 2019.
Moreover, for a given GBW , PM , and gain, the bias current [12] B. Murmann. gm /ID Starter Kit. Accessed: Jun. 1, 2020. [Online]. Avail-
able: https://web.stanford.edu/~murmann/gmid
of an unoptimized design may be two orders of magnitude [13] F. Silveira, D. Flandre, and P. G. A. Jespers, ‘‘A gm /ID based methodology
inferior to an optimized design. Trade-offs with other design for the design of CMOS analog circuits and its application to the synthesis
metrics such as noise and area can be similarly explored. of a silicon-on-insulator micropower OTA,’’ IEEE J. Solid-State Circuits,
vol. 31, no. 9, pp. 1314–1319, Sep. 1996.
Moreover, as in Fig. 1b, the designer can use a local or global [14] R. Fiorelli, E. J. Peralias, and F. Silveira, ‘‘LC-VCO design optimization
optimizer to minimize an objective, e.g., bias current, while methodology based on the gm /ID ratio for nanometer CMOS technolo-
satisfying a set of constraints. gies,’’ IEEE Trans. Microw. Theory Techn., vol. 59, no. 7, pp. 1822–1831,
Jul. 2011.
[15] F. T. Gebreyohannes, J. Porte, M. Louërat, and H. Aboushady,
VI. CONCLUSION ‘‘A gm /ID methodology based data-driven search algorithm for the
This paper discussed advantages and challenges of the design of multi-stage multi-path feed-forward-compensated amplifiers
targeting high speed continuous-time 61-modulators,’’ IEEE Trans.
LUT-based design flow for analog circuits. Specifically, Comput.-Aided Design Integr. Circuits Syst., early access, Jan. 7, 2020,
it identified the need for fast and accurate table lookup doi: 10.1109/TCAD.2020.2966998.
to enable rapid exploration of the circuit’s performance [16] G. Piccinni, C. Talarico, G. Avitabile, and G. Coviello, ‘‘Innovative strategy
for mixer design optimization based on gm /ID methodology,’’ Electronics,
space using direct (forward) evaluation of its underlying vol. 8, no. 9, p. 954, Aug. 2019.
design equations. The presented solution uses enhanced [17] J. Ou and P. M. Ferreira, ‘‘A gm /ID -based noise optimization for CMOS
interpolation methods that facilitate large LUT grid spac- folded-cascode operational amplifier,’’ IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 61, no. 10, pp. 783–787, Oct. 2014.
ing while maintaining highly accurate lookup. The latter [18] P. Jespers, The gm /ID Methodology, a Sizing Tool for Low-Voltage Analog
is validated using a highly sensitive bandgap design that CMOS Circuits: The Semi-Empirical and Compact Model Approaches.
achieves ppm-level accuracy between the synthesized and Boston, MA, USA: Springer, 2010.
[19] C. B. Moler, Numerical Computing With MATLAB, vol. 87. Philadelphia,
SPICE-simulated designs. In addition to being accurate, PA, USA: SIAM, 2008.
the lookup method developed in this work is also fast, [20] F. N. Fritsch and R. E. Carlson, ‘‘Monotone piecewise cubic interpolation,’’
enabling millions of queries within seconds. Such function- SIAM J. Numer. Anal., vol. 17, no. 2, pp. 238–246, Apr. 1980.
[21] H. Omran, ‘‘Optimum split ratio for folded cascode OTA bias current:
ality can be used to search a circuit’s design space, providing A qualitative and quantitative study,’’ in Proc. 31st Int. Conf. Microelec-
rapid feedback to the designer for feasibility studies, topology tron. (ICM), Dec. 2019, pp. 223–226.

VOLUME 8, 2020 134651


A. A. Youssef et al.: Analog IC Design Using Precomputed LUTs

ABDELRAHMAN A. YOUSSEF received the HESHAM OMRAN received the B.Sc. (Hons.)
B.Sc. degree (Hons.) in electrical engineering from and M.Sc. degrees in electrical engineering from
Ain Shams University, Cairo, Egypt, in 2019. He is Ain Shams University, Cairo, Egypt, in 2007 and
currently a Research Assistant with the Integrated 2010, respectively, and the Ph.D. degree in elec-
Circuits Laboratory (ICL), Ain Shams University. trical engineering from the King Abdullah Uni-
His research interests include design of analog versity of Science and Technology (KAUST),
and mixed-signal integrated circuits and design Saudi Arabia, in 2015. From 2008 to 2011, he was
automation. a Research and a Teaching Assistant with the
Integrated Circuits Laboratory (ICL), Ain Shams
University, and a Design Engineer with Si-Ware
Systems (SWS), Cairo, where he worked on the circuit and system design
BORIS MURMANN (Fellow, IEEE) received
of the first miniaturized FT-IR MEMS spectrometer (NeoSpectra). From
the Dipl.Ing. degree in communications engineer-
2011 to 2016, he was a Researcher with the Sensors Laboratory, KAUST.
ing from the Fachhochschule Dieburg, Dieburg,
He held Internships with the Bosch Research and Technology Center, Sun-
Germany, in 1994, the M.S. degree in elec-
nyvale, CA, USA, and Mentor Graphics, Cairo. In 2016, he joined ICL, Ain
trical engineering from Santa Clara University,
Shams University, as an Assistant Professor. He has published more than
Santa Clara, CA, USA, in 1999, and the Ph.D.
30 papers in international journals and conferences. His research interests
degree in electrical engineering from the Univer-
include design of analog and mixed-signal integrated circuits, especially in
sity of California at Berkeley, Berkeley, CA, USA,
analog and mixed-signal design automation.
in 2003. From 1994 to 1997, he was with Neutron
Mikrolektronik GmbH, Hanau, Germany, where
he was involved with the development of low-power and the smart-power
application-specified integrated circuits (ASICs) in automotive CMOS tech-
nology. Since 2004, he has been with the Department of Electrical Engi-
neering, Stanford University, Stanford, CA, USA, where he is currently a
Full Professor. His current research interests include mixed-signal integrated
circuit design, with a special emphasis on data converters, sensor interfaces,
and circuits for embedded machine learning. He was a co-recipient of the
Best Student Paper Award from the Very Large-Scale Integration (VLSI)
Circuits Symposium, in 2008. He was a recipient of the Best Invited Paper
Award from the IEEE Custom Integrated Circuits Conference (CICC),
in 2008, the Agilent Early Career Professor Award, in 2009, and the Friedrich
Wilhelm Bessel Research Award, in 2012. He served as the Data Converter
Subcommittee Chair and the 2017 Program Chair for the IEEE International
Solid-State Circuits Conference (ISSCC). He served as an Associate Editor
for the IEEE JOURNAL OF SOLID-STATE CIRCUITS.

134652 VOLUME 8, 2020

You might also like