Machine Learning-Based Design and Optimization of High-Speed Circuits @2024
Machine Learning-Based Design and Optimization of High-Speed Circuits @2024
Machine
Learning-based Design
and Optimization
of High-Speed Circuits
Machine Learning-based Design and Optimization
of High-Speed Circuits
Vazgen Melikyan
Machine Learning-based
Design and Optimization
of High-Speed Circuits
Vazgen Melikyan
Synopsys Armenia CJSC
Yerevan, Armenia
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland
AG 2024
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of
illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by
similar or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors, and the editors are safe to assume that the advice and information in this
book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Switzerland AG
The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland
The book systematically expounds the main results obtained by the author in the
field of design and optimization of high-speed integrated circuits (ICs) and their
standard blocks (heterogeneous ICs, analog-to-digital and digital-to-analog con-
verters, input/output cells, etc.) operating in non-standard conditions (deviations of
technological process parameters, supply voltage, ambient temperature, etc.). The
proposed methods are based on machine learning and consider effects of different
external and internal destabilizing factors (radiation exposure, self-heating,
nonideality of the power source, input signals, interconnects, power rails, etc.).
The main goals of most proposed methods and solutions of design and optimi-
zation of high-speed ICs are to improve important parameters and characteristics
(performance, power consumption, occupied area on the die, transmitting and
receiving data quality and accuracy) of circuits and reduce the effects of
non-standard operating conditions and different types of destabilizing factors.
The book is anticipated for scientists and engineers specializing in the field of IC
design and optimization as well as for students and postgraduate students studying
disciplines related to IC design.
vii
Introduction
Over the past 60 years, the semiconductor industry has evolved according to
Moore’s law, which has allowed to have metal-oxide-semiconductor (MOS) tran-
sistors or their different modifications (FinFET, GAA (Gate All Around), etc.) with
channel lengths up to several nanometers. Leading companies/manufacturers, such
as Taiwan Semiconductor Manufacturing Company (TSMC), Global Foundries
(GF), and Samsung, have already developed 7, 5, and 3-nanometer technologies of
IC fabrication. In the coming years, it is even possible to move to transistors with
several Angstrom sizes. At present, already produced ICs contain more than hundred
billions of active devices in ICs. Taking the planned transition to several Angstrom
size technologies in the near future into account, the number of circuit components
will soon increase significantly and can reach up to trillion. Such kind of fast
developments have led to the enormous changes of the main parameters and
characteristics of ICs.
Fast developments in semiconductor industry first affected the performance of
ICs. The productivity of ICs drastically increased because the small size of transis-
tors decreases their parasitic parameters and allows to have more functional blocks
on the same area. That is why the operating frequencies of ICs have also increased,
reaching dozens of GHz.
In addition to high-speed operation of contemporary ICs, the speed of transmis-
sion of information between them is also growing drastically. Modern electronic
systems include different ICs, which are located independently of each other in the
same Printed Circuit Board (PCB) and perform various functions. These ICs are in
constant communication with each other and exchange processed information.
Significant increase in the volume of transmitted data between different ICs dictates
the necessity of high-speed transmission of information between them. Currently,
the speed reaches up to 256 and 512 Gbit/s. For accurate operation of the entire
electronic system, it is necessary to ensure correct transmission of information
between different ICs of the system. Therefore, there is need to design special
basic blocks in ICs that will ensure transmission and reception of information
throughout the system. Such function is performed by specially designed input/
ix
x Introduction
output (I/O) cells implemented in ICs. I/O cells are one of the most important
components of contemporary ICs. The new standards of information transmission
by I/O cells are aimed at increasing speed. For example, 4th generation Universal
Serial Bus (USB) devices provide 40 Gbps data transfer speed, while 1st generation
USB standard provided only 12 Mbit/s speed. Among the modern data transmission
standards are Peripheral Component Interconnect (PCI) Express, Serial Advanced
Technology Attachment (ATA), Double Data Rate (DDR), low-power DDR, High-
Definition Multimedia Interface (HDMI), etc. These standards have many varieties
that operate at different frequencies. They also differ from each other in their
application. For example, DDR standard provides communication between com-
puter core and external devices. Low-power DDR standard provides communication
in modern smart phones. In the mentioned standards, information is transmitted
sequentially using two transmission lines. Such choice of the number of transmission
lines is related to the advantages of differential signal. It allows getting two-times
increase in the signal level as well as avoiding the influence of common component
of noise.
Development of high-speed ICs and their different blocks (in particular, I/Os)
have led to strict requirements for IC reliability and need to develop means of
designing ICs that work in non-standard operating conditions. Disruption of IC
operation can directly or indirectly lead to serious economic losses, environmental
damage, or threats to human life. The reduction in transistor sizes led to thinning of
oxide layer of transistor gates which allowed reducing their threshold voltage. The
low threshold voltage, in turn, allowed reducing supply voltages of ICs, leading to
the reduction of power consumption. The development of technological process has
made it possible to reduce the distance between metal layers, leading to an increase
in the density of interconnects. In case of closer wires, the mutual capacitances
between them increase. Large capacitances lead to an increase in the noise level. In
modern ICs, the value of the supply voltage has become several hundred millivolts,
and noises reach tens of millivolts. An increase in noise and a decrease in the value of
the supply voltage led to a decrease in the signal-to-noise ratio. This reduces the
noise immunity of modern ICs and can lead to unwanted changes in the character-
istics of circuits, up to functional failure. The new generation I/O cells are faster and
work with lower supply voltages, which significantly complicates the processes of
data reading and transfer. It should be noted that as a result of reducing the supply
voltage, ICs become sensitive to external and internal destabilizing factors, and due
to the increase in speed, the transmission line begins to significantly suppress the
transmitted data which makes data processing even more difficult. Currently, there
are also high-precision IC types which provide extremely high reliability. They are
mainly used in medical fields, space equipment, etc. Since the loss of data in the
above areas can lead to great damage, I/O cells in high-precision ICs are equipped
with systems to increase the reliability of data transmission and reading. With the
increase of data transfer rates, deviations of timing parameters of data become an
extremely important problem. In modern I/O cells, the fastest data change time
reaches picoseconds. In that case, even small deviations of timing parameters can
lead to data error. The main obstacles to increasing the speed of data transmitted in
Introduction xi
efficiency, the latter do not meet the modern requirements for practical design. Such
kind of examples are too many.
The development of means of design and optimization of high-speed ICs has now
become a decisive part in the process of IC design. The problems, solved during
design and especially the optimization of high-speed ICs, require huge amount of
computations because of enormous number of components and options of consid-
ering versions of designing circuits and their operating modes. From this point of
view and taking into account current rise of machine learning (ML), connected with
the occurrence of big data and more powerful computing resources in the last few
years, ML methods and tools can be considered as most suitable during design and
optimization of high-speed ICs.
This monograph is devoted to description of the developed new principles,
methods, and circuit solutions for design and optimization of high-speed ICs,
based on ML. In the monograph different effective new principles, methods, solu-
tions, and means of design and optimization of high-speed ICs are proposed. For
example, effective approaches from the point of view of timing limitations for the
development of means to increase speed of sequential information reception of ICs
have been proposed. The embedded nodes and the structure of their architecture
allow to significantly increase data transfer and processing frequencies in case of
increasing the area occupied on the die and power consumption within the permis-
sible limits. Methods of IC design that work in non-standard operating conditions
have been proposed which meet modern requirements and, at the expense of
increasing the occupied area and power consumption within the permissible limits,
significantly reduce the deviations caused by changes in external and internal
destabilizing factors and aging phenomena. Principles and methods of developing
signal transmission calibration means in ICs have been proposed, which allow to
significantly improve their main technical characteristics and parameters: speed,
reliability of data transfer and reading, power consumption.
The author expresses deep gratitude to his PhD students Manvel Grigoryan,
Hakob Kostanyan, Karen Khachikyan, Arman Atanesyan, Taron Kaplanyan, and
Suren Abazyan, who have participated in the development of described means.
The author would also like to thank his colleague Ruzanna Goroyan who assisted
in the preparation of this monograph.
Contents
xiii
xiv Contents
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
Chapter 1
Means to Accelerate Transfer
of Information Between Integrated Circuits
Data transmission standards also differ from each other in their application. For
example, the DDR standard provides communication between the computer core
and external devices, and the low-power DDR standard provides communication
between modern smartphones.
In the above standards, information is transmitted sequentially using two trans-
mission lines. Such a choice of the number of transmission lines is related to the
advantages of the differential signal [20]. It allows to get a two-time increase in the
signal level, as well as to avoid the influence of the common component of noises.
Although there are special I/O cells that use the parallel method of data transfer [21–
23], the sequential method is considered more preferable. This is due to the increase
in the cost of the system as a result of adding additional channels during parallel
transmission. The parallel transmission method also leads to an inevitable increase in
the surface area of the IC due to the need for additional nodes for signal transmission,
reception, and restoration.
Special blocks generating and processing a clock signal are an important part of
special I/O cells performing the function of sequential information transmission and
reception in ICs [24–26]. In general, three types of architectures with clock signal are
distinguished in I/O cells: common, transmitted, and embedded [27–29].
In the case of a common clock signal architecture, there is one main quartz
generator that provides exchange of reference clock signal to two functioning ICs
(Fig. 1.1).
Such a structure requires providing equal-length interconnects to reduce the phase
shift between the clock signals provided to two ICs. The architecture with a common
clock signal allows to provide up to 100 Mb/s speed, so it is designed for low-power
systems.
In the case of architecture with a transmitted clock signal, the transmitting IC
provides a reference clock signal to the receiver (Fig. 1.2).
Necessity rises to add an additional routing between the transmitter and receiver
nodes in the system to ensure the transmission of the clock signal. Such an
Channel
PLL
PLL
Transmitter Receiver
Clock
generator
Transmitter Receiver
channel
Clock signal
PLL
channel
CDR
CDR
PLL PLL
Conductors in channels can be of two types: microstrip and strip line (Fig. 1.6)
[50, 51].
Although the strip line structure of wires is more complicated and implies higher
costs during the manufacturing process, it allows to significantly avoid the noise
caused by the effect of interconnects which is even more fundamental in the case of a
differential signal.
There are also two types of vias located on the printed circuit board: straight and
backdrilled (Fig. 1.7) [52–54].
Straight inter-level vias add extra zeros to the overall channel transfer function
and increase the amount of signal distortion. This can be avoided if their isolated
parts are removed at an additional stage during the manufacturing process. As a
result, the cost of the system increases, but a more linear characteristic of the channel
is obtained.
Increasing the speed of information exchange standards implemented by modern
high-speed I/O cells makes the presentation format with centralized parameters of
the channel impermissible [55–57], because such an approach does not take into
account the following:
• Voltage drops in wires
• Changing magnetic field
• Bias currents
• Conduction currents due to dielectric imperfection
6 1 Means to Accelerate Transfer of Information Between Integrated Circuits
r0 L0 r0 L0
g0 C0 g0 C0
∂V ðx, t Þ ∂I ðx, t Þ
= - r0 I ðx, t Þ - L , ð1:1Þ
∂x ∂t
∂I ðx, t Þ ∂I ðx, t Þ
= - g0 I ðx, t Þ - C : ð1:2Þ
∂x ∂t
From the differential equations it follows that V(x,t) and I(x,t) are functions
containing two variables, so the transmitted signal has a wave nature, and reflections
are observed in case of line mismatch. The wave impedance of a homogeneous line
is determined by the following formula:
V ð xÞ r 0 þ jωL
Z0 = = : ð1:3Þ
I ð xÞ g0 þ jωC
Rout - Z 0
Kt = , ð1:4Þ
Rout þ Z 0
1.1 General Issues of Means to Accelerate Transfer of Information. . . 7
Rinput - Z 0
Kr = , ð1:5Þ
Rinput þ Z 0
where Rout corresponds to the output resistance of the conducting node and Rinput
corresponds to the input resistance of the receiving node. Therefore, resistance
termination blocks (RTBs) are designed, which provide a resistance value of 50 Ω,
regardless of supply voltage, temperature, and technological deviations [60–63]. To
ensure such accuracy, external resistors placed outside the IC are often used, the
accuracy of which varies in the range of ±3%.
There are four basic resistance termination architectures [58]:
• Non-terminated architecture
• Termination only in transmitter
• Termination only in the receiver
• Termination both in the transmitter and receiver
In the first case, there is no RTB in either the transmitter or receiver. Such an
architecture is designed for low-power systems and works with rather short channels.
It allows to save the area on the die due to the absence of RTBs, but it limits the
performance of the system. Only the architecture with the RTB implemented in
transmitter is intended for medium-performance systems, but in this case, the value
of the output voltage of the transmitter is established in two stages, for a period twice
the delay time of the channel. When the RTB is implemented only at the receiver,
ideally the signal is no longer reflected back to transmitter. However, technological
deviations and non-ideality of the system exclude the precise setup of the resistance
value, which leads to reflections that do not fade in the transmitter. Two-way built-in
RTB ensures minimal reflections and increases system reliability. It is the best
architecture to avoid reflections. However, in this case, there is a loss of the signal
level twice at the expense of the resulting voltage divider. Such a loss is compensated
by a differential signal transmission option. This is the reason why modern high-
speed systems use an architecture with a built-in bidirectional RTB and send the
signal in a differential form.
The presence of R0 and g0 components in a channel is conditional to both thermal
and skin effect and dielectric absorption [64, 65]. As a result of the skin effect, the
functional resistance of the conductor decreases as the frequency increases:
f
Rðf Þ = Rh , ð1:6Þ
fs
where fs is the frequency when the predominant part of the current passes through
half of the area of the wire and Rh corresponds to the resistance of the wire in case of
a constant signal. Therefore, the losses due to the resistive component in coordinated
long lines can be represented by the following formula:
8 1 Means to Accelerate Transfer of Information Between Integrated Circuits
R R f
/R = = h : ð1:7Þ
2Z 0 2Z 0 fs
It follows from the above formula that the losses due to the skin effect are a
function of the frequency and increase along with its increase.
The effect of dielectric absorption is due to the phenomenon of free dipoles
absorbing thermal energy. When a direct concentrated electric field is applied, the
free dipoles change their direction and absorb thermal energy.
The dielectric absorption effect in aligned long lines can be estimated by the
following formula:
G Z0 L
/D = = π f C tan δD : ð1:8Þ
2 C
As can be seen from the obtained formula, the losses due to dielectric absorption
are directly proportional to the frequency.
Calculation of losses due to skin effect and dielectric absorption gives a rather
accurate picture to estimate losses of terminated long line (Fig. 1.9).
The obtained graph shows that the signal in long lines is suppressed as the
frequency increases. Summarizing, it can be noted that the increase of the operating
frequency is limited by the frequency characteristics of long lines. Therefore, there is
a need to restore the transmitted signal in the receiver before the reading process. To
perform this function, special equalizers are designed in the transmitter and receiver,
which will provide equalization of the signal in the operating frequency range.
It is also noteworthy that the four-level Pulse Amplitude Modulation (PAM4) of
signal transmission is used in modern sequential information transmission standards
[66–72]. It allows to get the same speed, but reduce the frequency of signal
transmission twice.
1 1
a0
F ðt Þ = þ ða cos ðn t Þ þ bn sinðn t ÞÞ = cn eint : ð1:9Þ
2 n=1 n n=1
That is, an arbitrary signal transmitted by I/O cells can be analyzed into an infinite
or finite number of sinusoidal signals with different frequencies (Fig. 1.10).
Considering that the frequency characteristics of long lines decrease nonlinearly
as the frequency increases, it can be said that different harmonic components of the
transmitted signal are modified to different extents.
Therefore, the equalizer, built into the transmitter, should amplify the high-
frequency components of the signal and ensure full equalization of the signal in
the operating frequency range (Fig. 1.11) [76–78].
Equalization of the fundamental harmonics of the signal is an important condi-
tion, because high-frequency components are considered as noise. Therefore, it is
necessary to increase the operating frequency and gain of the equalizer in that
domain.
To estimate signal distortions, it is convenient to use the eye diagram [79–83]. It
represents the superposition of all transmitted bits, which is obtained as a result of
dividing the transmitted data by its paragraph size. The advantage of the method is
that it allows to estimate the timing parameters of a signal of arbitrary length by
means of a graphical representation. Below are the amplitude-frequency character-
istics of a 17-inch-long channel and the eye diagram of a signal transmitted at 5 GHz
(Fig. 1.12).
The eye diagram shows that the signal is completely distorted and cannot be read
at the receiver. In addition to the equalizer, implemented in the receiver node, a
digital filter with a finite impulse response of several orders is applied in the
10 1 Means to Accelerate Transfer of Information Between Integrated Circuits
0
0.3
5
0.2
10
Voltage (V)
0.1
15 0
20 –0.1
25 –0.2
30 –0.3
35 –0.4
40 –0.5
0 1 2 3 4 5 6 7 8 9 10 0 50 100 150 200
Frequency (GHz) Time(ps)
Fig. 1.12 The amplitude-frequency characteristic of the channel (a) and the eye diagram of the
signal transmitted by it (b)
W ðzÞ = W - 1 þ W 0 z - 1 þ W 1 z - 2 þ ⋯ þ W n zn - 1 : ð1:10Þ
Figure 1.14 shows the eye diagram of the channel, the digital filter implemented
in the transmitting node, and the 5 GHz signal.
1.1 General Issues of Means to Accelerate Transfer of Information. . . 11
0 0.4
0.3
–5
0.2
Voltage (V)
–10
0.1
–15
0
–20
–0.1
–25 –0.2
–30 –0.3
Channel
–35 Digital Filter –0.4
System
–40 –0.5
0 1 2 3 4 5 6 7 8 9 10 0 50 100 150 200
a Frequency (GHz) b Time(ps)
Fig. 1.14 Amplitude-frequency characteristic of channel and digital filter (a) and its effect on the
signal (b)
As it can be seen from the eye diagram, the transmitted signal can be recovered
using a sufficiently accurate operating comparator. However, technological devia-
tions and design difficulties do not allow using the mentioned method in more high-
performance systems. Therefore, there is an inevitable demand for the development
and design of information recovery means at the receiving node. Thus, the design of
means of increasing the speed of sequential information reception and processing in
the receiving node is one of the problems of modernizing IC development.
12 1 Means to Accelerate Transfer of Information Between Integrated Circuits
In systems containing multiple ICs, various measures are used to ensure error-free
transmission and reception of data. Their application depends on the standard of
signal transmission, as well as on the physical dimensions and amplitude-frequency
characteristics of the channel in PCB. Small distortions caused by a sufficiently short
channel can be recovered even by means of an analog receiver block (ARB)
[88, 89]. Such a choice may also be due to the low frequency of the signal
transmitted in the system. Although various low-frequency communication systems
still exist today, the increase in speed due to modern signal transmission standards
has led to the need to design new methods [90]. However, the technical qualifica-
tions of modern standards require that their implementation in the systems also
provides the possibility of low-power operation [14–19]. Therefore, the effective-
ness of the means depends on the technical conditions of the specific system.
The main parameters of the means of receiving sequential information are the
maximum operating frequency and gain, the linearity of the implemented nodes, the
area occupied by the IC, its power consumption, and, of course, the cost of the
system. It is also necessary to take into account the influence of the noise sources of
the node or nodes introduced in the system, which is even more fundamental in the
case of the application of four-level amplitude modulation for signal transmission,
because in this case the signal voltage reserves are reduced. It is also often necessary
to apply several measures together in order to achieve the desired outcome.
Existing Methods to Increase the Speed of Continuous Time Linear Equalizer
(CTLE)
The channel has a characteristic of a low-pass filter, and as the frequency increases,
the signal distortions become more fundamental. Therefore, there is a need to design
a special basic block in the receiver node, which will have a high-pass filter
characteristic and will be able to perform signal equalization [91, 92].
In one of the well-known approaches, it is proposed to use active and passive
continuous time linear equalizer (CTLE) [93]. A passive CTLE is a combination of
high- and low-pass filters made up of resistors and capacitors (Fig. 1.15) [94, 95].
The transfer function of the circuit is:
R2 1 þ R1 C1 s
H ðsÞ = , ð1:11Þ
R1 þ R2 1 þ RR1þR
R2
ðC 1 þ C 2 Þ s
1 2
where R1, R2, C1, and C2 correspond to passive CTLE resistor and capacitor values.
It can be seen from the transfer function that it has one zero and one pole, which are
determined as follows:
1.1 General Issues of Means to Accelerate Transfer of Information. . . 13
1 1
ωzero = , ωpole = R1 R2 : ð1:12Þ
R1 C 1 R1 þR2 ðC 1 þ C 2 Þ
The passive CTLE gain is entirely dependent on resistor and capacitor values.
R2 C1
As = ,A = , ð1:13Þ
R1 þ R2 a C 1 þ C 2
where As is the amplification of the system to ωzero frequency and Aa is the gain of the
circuit in the range above ωpole frequency. It can be seen from (1.12) that the gain of
the passive CTLE at low frequencies depends only on the ratio of resistances and
does not affect it in the high frequency range. It is determined by the capacitance
ratio. This allows to apply different ratios for the above components and get a
positive change in the gain as the frequency increases. To evaluate this change, the
ratio between Aa and As is considered.
Aa ωpole R1 þ R2 C1
= = : ð1:14Þ
As ωzero R2 C1 þ C2
However, it follows from (1.13) that the absolute value of the gain is smaller than
1 both at low and high frequencies. Therefore, passive CTLE partially solves the
signal equalization problem and cannot provide sufficient gain.
Active CTLE consists of two proportional differential branches connected by a
resistor and capacitor (Fig. 1.16) [96]. They are designed to suppress the
low-frequency harmonics of the signal.
The transfer function of the circuit is determined by:
gm s þ Rf1Cf
H ðsÞ = ð1:15Þ
Cout s þ 1þgRmf CRff =2 s þ Rload1Cout
where gm is the input transistor conductance, Rload is the amplifier load, and Rf and Cf
are the feedback resistance and capacitance, respectively.
14 1 Means to Accelerate Transfer of Information Between Integrated Circuits
The zero and poles of the transfer function are determined as follows:
1 1 þ gm Rdeg =2 1
ωzero = , ωpole1 = , ωpole2 = : ð1:16Þ
Rf C f Rf Cf Rload C out
The formula characterizing the dependence of the maximum value of the circuit gain
is determined below:
Aa = gm Rload : ð1:17Þ
gm Rload
Ah = : ð1:18Þ
1 þ gm Rf =2
Obviously, there will be a difference in the gain at low and high frequencies. To
estimate this, the relative gain (RG) parameter is introduced, which is determined by
the following formula:
ωpole1
RG = = 1 þ gm Rf =2: ð1:19Þ
ωzero
50.0
0.0
Gain (dB)
–50.0
–100.0
–150.0
100.0k 1meg 10meg 100meg 1g 10g 10
Frequency
Fig. 1.20 Amplitude-frequency characteristics of the system for all possible codes
Thus, the presented method of increasing the speed allows to increase the
frequency of the transmitted signal. However, the equalization of the signal using
the method ensures a lower level of the output signal, which makes the problem of its
processing in the next node more fundamental. Therefore, the applied method cannot
completely solve the proposed problems.
Existing Methods to Increase the Speed of the Decision Feedback Equalizer
(DFE)
An increase in the frequency of the transmitted signal leads to an increase in the
effect of inter-symbol interference (ISI). In the case of ISI, the value of the voltage in
each level of the transmitted signal is determined by the superposition of the
previous and next bits. Therefore, there is an inevitable requirement to design and
implement new nodes for signal recovery and processing. In one of the well-known
approaches, it is proposed to introduce a decision feedback equalization circuit
(DFE), which is a digital filter with finite impulse response, after the CTLE in the
receiver node [97, 98].
18 1 Means to Accelerate Transfer of Information Between Integrated Circuits
–30.0
a
–32.5
–35.0
–37.5
PSRR(dB)
–40.0
–42.5
–45.0
–47.5
–50.5
–52.5
–32.5
b
–35.0
–37.5
–40.0
PSRR(dB)
–42.5
–45.0
–47.5
Fig. 1.21 Suppression factor of power bus noises in typical (a) and worst (b) cases
Figure 1.22 shows the architecture of the receiver node with the embedded DFE.
The code-controlled RTB provides the required channel termination and common
mode level of the input signal. CTLE then performs signal equalization for the given
1.1 General Issues of Means to Accelerate Transfer of Information. . . 19
channel. And the 1-tap DFE filters the noise generated by the ICI from the equalized
signal and restores the fully differential signal.
The 1-tap DFE architecture is shown in Fig. 1.23.
20 1 Means to Accelerate Transfer of Information Between Integrated Circuits
For even and odd taps of transmitted information, two branches are implemented,
the selection of which is made using multiplexers. Since feedback timing parameters
are a primary concern in DFEs implemented in modern high-speed I/O cells, it was
decided to use the output of the multiplexers as the control signal of the parallel
branch [99]. As a result, the formed feedback coefficient (H1) is added
(or subtracted) to the input signal, and the decision is made already taking into
account the digital value of the previous bit.
Figure 1.24 shows the architecture of comparators implemented in a 1-tap DFE.
The presence of N1a and N1b transistors provides isolation between input and
output signals. And N3a and N3b transistors isolate the input pair from the clock
signal. Thus, the above-mentioned transistors increase the noise resistance of the
system.
The Idp and Idn reference currents are controlled by digital-to-analog converters
(DACs) (Fig. 1.25).
Reference current control provides capability to adjust the difference between the
output voltages caused by the non-ideality of the manufacturing process. The
experimental results prove that the current DAC can compensate the deviation
voltage up to 50 mV.
Transient simulation was performed to evaluate the performance of the system,
and the obtained results were summarized in eye diagrams (Fig. 1.26).
Fig. 1.26 Eye diagrams at the channel output (a), CTLE output (b), and DFE output (c)
22 1 Means to Accelerate Transfer of Information Between Integrated Circuits
1.2 1.2
1 1
0.8 0.8
Amplitude (mV)
Amplitude (mV)
0.6 0.6
0.4 0.4
0.2 0.2
0 0
–0.2 –0.2
0 0.5 1 1.5 2 0 0.5 1 1.5 2
Time (unit environment) Time (unit environment)
The simulation results prove that the architecture of the proposed receiver, which
consists of a CTLE and a 1-tap DFE, allows to realize sequential information
reception at a frequency of 5 GHz.
Thus, the implementation of a 1-tap DFE in the receiver allows to compensate the
noise caused by ICI and to increase the transmission frequency of the signal.
However, even with the use of multi-tap DFE, the speed of signal transmission is
limited by the speed of the elements in the system. Therefore, the applied method has
clear limitations, and there is a problem of designing an architecture with elements
with higher speed.
Existing Methods to Increase the Speed of Reception of Sequential Information
Using Pulse Amplitude Modulation (PAM4)
PAM2 has been used for signal transmission for decades. However, the amplitude-
frequency characteristics of the channel show that the increase in the frequency of
the signal is limited by practically irreversible distortions. For this reason, over time,
they also started to use PAM4, which allows to transmit twice as much information
at the same signal frequency [100, 101] (Fig. 1.27).
In the case of PAM4, each signal level corresponds to 2 bits of information.
Therefore, the same data transmission speed can be obtained by using twice lower
frequency. However, such an approach leads to a reduction of the voltage reserve
between two neighboring levels, which further tightens the noise immunity
requirements.
Figure 1.28 shows the architecture of a PAM4-enabled receiver.
In the first stage, equalization of distorted signal is implemented using the CTLE.
Then, the embedded three comparators, which have different offsets, perform a
signal level check. The checked values are stored in subsequent triggers, which are
controlled by the synchronizing signal generated by the embedded PLL. The values
1.1 General Issues of Means to Accelerate Transfer of Information. . . 23
stored in the triggers are transformed from thermometric to unary code by means of a
decoder.
In order to perform a three-stage comparison, a voltage shifting amplifier (VSA)
and an output buffer were used. M1/M2 and M3/M4 transistors embedded in VSA
are not differential pairs, because the reference currents in their branches (I1~I4) are
different (Fig. 1.29). Such a structure amplifies the signal with the appropriate
amplitude and provides only its variable component so that the output buffer
makes the right decision.
Figure 1.30 shows the results of transient simulation.
The complete system results are presented in Table 1.2.
24 1 Means to Accelerate Transfer of Information Between Integrated Circuits
Thus, the method of using the PAM4 of the signal makes it possible to signifi-
cantly increase the speed of data transfer. The architecture of the presented receiver
allows to perform PAM4 signal reception, processing, and transformation into a
binary code. However, when using the method, there is a need to design a high-
precision and high-speed three-level comparator, as the voltage margin between
adjacent signal levels decreases. In the case of such an architecture, the speed of the
receiving node is also limited by the use of nodes implemented in the comparator.
The existing methods and approaches to increase the speed of receiving sequential
information in an integrated circuit do not fully meet its current requirements.
Therefore, the development of new solutions and methods continues to be a current
issue. Based on this, the following approaches have been proposed:
1. Implementation of a neg-C circuit, controlled by a current DAC in CTLE
(Fig. 1.31).
Such an architecture will ensure a decrease in the output complex resistance of the
CTLE and, therefore, an increase in the bandwidth of the entire system. This, in turn,
will increase the gain of the system at operating frequencies. In order to ensure the
same gain in the entire operating range of the output signal, there will be an
inevitable need to adjust the linearity of the implemented circuits. Adjustment of
the thermometric branches in the current DAC will ensure the linearity of the output
current and exclude the loss of the code. And in neg-C circuit, the linearity setup
system controlled by the constant component of the signal will ensure the uniform
change of its currents in the differential branches. As a result, with an increase in the
area occupied on the die and the power consumption within the permissible limits,
such an architecture will allow to equalize the signal with a higher frequency and a
larger output operating range.
26 1 Means to Accelerate Transfer of Information Between Integrated Circuits
PAM4 DECODER
Vin
S/H
0 PAM2
Signal
(LSB)
-Vref
The discussions of the means to increase the speed of CTLE and the research on its
application prove that it has clear limitations and cannot fully meet the modern
standards of data transfer. Built-in RCM enabled higher-frequency signal equaliza-
tion, reducing the output operating range.
In order to implement equalization of the transmitted signal with a higher
frequency, it has been proposed to design and include at the output of the CTLE a
binary code-controlled negative capacitance circuit (neg-C circuit). The simplest
neg-C circuit consists of a capacitor and two transistors connected with positive
feedback (Fig. 1.34) [102].
The positive feedback in the system allows to obtain a negative value of the
complex resistance. The impedance of a neg-C circuit is:
1 gm þ s Cgate - source þ 2C
Z eq = - , ð1:20Þ
sC gm - sC gate - source
where gm is the conductance of the feedback transistor and Cgate-saource is the gate-
source parasitic capacitance of that transistor.
In order to control the amplitude-frequency characteristics of the neg-C circuit, as
well as to exclude the influence of the common mode component of the current on
the CTLE, it is proposed to apply the following architecture (Fig. 1.35) [103].
To make the reference current controllable with a digital code, it is recommended
to use an 8-bit current DAC (Fig. 1.36). The choice of its range is determined by the
saturation margin of the current sources in the neg-C circuit.
CTLE AC response
(Params) : f(Hz)
15.0
par(afe)
10.0
(Params)
5.0
0.0
–5.0
10.0k 100.0k 1meg 10meg 100meg 1g 10g 100g
f(Hz)
(Params) : f(Hz)
8.0
par(afe)
7.0
6.0
5.0
4.0
3.0
2.0
(Params)
1.0
0.0
–1.0
–2.0
–3.0
–4.0
–5.0
–6.0
10.0k 100.0k 1meg 10meg 100meg 1g 10g 100g
f(Hz)
(Params) : f(Hz)
10.0
par(afe)
9.0
8.0
7.0
6.0
5.0
4.0
3.0
(Params)
2.0
1.0
0.0
–1.0
–2.0
–3.0
–4.0
–5.0
–6.0
–7.0
10.0k 100.0k 1meg 10meg 100meg 1g 10g 100g
f(Hz)
10.0
(Params)
5.0
0.0
–5.0
100.0k 1meg 10meg 100meg 1g 10g 100g
f(Hz)
In order to evaluate the stability of the neg-C circuit, frequency analysis was
performed (Figs. 1.42 and 1.43).
However, all the above results were obtained in the case of small signal simula-
tion, so it is necessary to evaluate the linearity of the system.
The linearity of amplifiers is measured with a 1 dB compression point
(Fig. 1.44) [104].
The 1 dB compression point is the input signal amplitude level in the result of
which the gain is 1 dB less than the nominal value. However, the level of the output
signal at that point is also important. It must be high enough to include the input
operating range of the next node. Since the CTLE is followed by a decision feedback
equalizer (DFE) and operates with an input signal amplitude of up to 150 mV, the
output signal amplitude should be no less than 150 mV for a 1 dB compression
point [68].
The M1 and M2 transistors included in the neg-C circuit have the same dimen-
sions and provide the same current in the differential branches at the same voltage
drop. In the presence of a variable component of the input voltage, the voltages
controlling them change, which allows for faster switching due to positive feedback.
However, at low values of the voltage, the M1 and M2 transistors appear in cut-off
mode, and the amount of deviation with respect to the constant component between
the differential branches is violated.
It is recommended to connect M3 and M4 transistors parallel to M1 and M2
transistors, which will be controlled only by the constant component of the input
signal (Fig. 1.45).
34 1 Means to Accelerate Transfer of Information Between Integrated Circuits
During the simulation, a sinusoidal signal was given to the input of the CTLE, the
amplitude of which starts to increase over time (Fig. 1.46).
As it can be seen from the graph of currents of M1 and M2 transistors, they
increase and decrease unevenly (Fig. 1.47).
1.2 Methods to Increase the Speed of Receiving Sequential Information. . . 35
ideal
50.0
40.0
(dBm)
(31.569, 41.201)
30.0
20.0
10.0
2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
Pin(dBm)
I2N - 1 - I0
LSB = : ð1:21Þ
N2
The main parameters characterizing the linearity of the current DAC are the
differential (DNL) and integral (INL) nonlinearities.
Iiþ1 - Ii
DNL = - 1, ð1:22Þ
LSB
I -I
INL = i 0 - i: ð1:23Þ
LSB
vin_P1db = 0.054860538799005 mV
vout_P1db = 0.15442488394593 mV
(dBm) : Pin(dBm)
60.0
Pout
ideal
50.0
30.0
20.0
10.0
2.0 4.0 6.0 8.0 10.0 12.0 14.0 16.0 18.0 20.0 22.0 24.0 26.0 28.0 30.0 32.0 34.0 36.0 38.0 40.0 42.0 44.0 46.0 48.0
Pin(dBm)
Fig. 1.51 As a result of “Monte Carlo” simulation, the DNL of the current DAC
CMOS transistors operating in the saturation mode are used as a current source.
Branch currents are determined by selecting transistor sizes.
Since the nominal currents of the thermometric branches are eight times larger
than the size of the LSB, the inaccuracies caused by the technological process in
them significantly reduce the nonlinearity of the system.
As a result of “Monte Carlo” simulation, the DNL of the current DAC was
measured (Fig. 1.51).
The obtained results indicate that there is a possibility to have DNL up to three
LSB. It is an unacceptable value, because in that case the current sources included in
the neg-C circuit deviate from the operating mode. It is recommended to reduce the
currents of the thermometric branches and obtain a negative DNL when switching
them (Fig. 1.52) [106].
The obtained results show that the maximum value of DNL after applying the
method is less than one LSB (Fig. 1.53).
Thus, a CTLE setup system providing signal equalization was proposed. The
proposed system is based on the reference current control of neg-C circuit including
positive feedback, which is implemented by the current DAC. The ability to adjust
the system with a digital code provides signal equalization for different channels and
data transmission frequencies. Neg-C circuit and current DAC linearity setup
methods are proposed, which ensure the saturation condition of the current source
1.2 Methods to Increase the Speed of Receiving Sequential Information. . . 39
Fig. 1.53 DNL of current DAC as a result of Monte Carlo simulation after application of the
method
transistors and, therefore, the increase in the output operating range of the overall
system. The proposed system provides approximately two times faster signal equal-
ization. When using the proposed method, compared to existing solutions, its power
consumption increases slightly, approximately 10%.
40 1 Means to Accelerate Transfer of Information Between Integrated Circuits
where tcl2out,c is the delay from the comparator clock signal to the output, tdeltime,mul
is the multiplexer delay time, tstime,t is the trigger setup time, tcl2out,t is the delay from
the trigger clock signal to the output, tdeltime,adder is the adder delay time, tstime,c is the
trigger setup time, and UTI is the unit time interval. To reduce tcl2out,c, it was
proposed to use a circuit of a high-speed comparator with low input capacitance
(Fig. 1.54) [107].
In the recovery phase, the np and nn wires connect to the supply voltage through
the Mr1 and Mr2 transistors and close the M9 and M10 transistors. At the same time,
M11 and M12 transistors reset the potentials of fp and fn wires. Therefore, a value
corresponding to the supply voltage is formed at the outputs. This allows reducing
the dependence of the comparator on the supply voltage at this stage.
VDD
M5 M6
Mr3 Vout-
Vout+ fn
fp
np nn
M11 M7 M8 M12
VDD
Vin+ M1M3 M4M2 Vin-
clk Mtail1
Fig. 1.55 Offset voltage in the result of Monte Carlo simulation for a typical case
In the comparison stage, the M1 and M2 transistors start discharging the np and
nn wires. The difference in the input signal provides different current values in the
differential branches. M5, M6, M7, and M8 transistors are used to form the output
signal.
The constant current induced by M3 and M4 transistors provides faster switching
of the output signal.
Comparator offset was measured in the result of Monte Carlo simulation
(Figs. 1.55 and 1.56).
Timing simulations were performed on the existing and proposed comparators
(Fig. 1.57).
The obtained results indicate that the delay time of the comparator has decreased
by about 36%. However, due to the transistors added to it, the total system area on
the die has increased by approximately 11%.
Timing simulation of the proposed comparator with changes in technological
process, temperature, and supply voltage has been carried out (Fig. 1.58).
A receiver architecture composed of a dual-cascade CTLE and a DFE is proposed
[104, 105]. Timing simulation of the whole system was performed (Figs. 1.59 and
1.60).
In order to evaluate the noise immunity of the system, “Monte Carlo” simulation
was performed (Figs. 1.61 and 1.62).
As a result of the implementation of the proposed comparators, the layout of the
DFE increased by about 8% (Fig. 1.63) [104, 105].
42 1 Means to Accelerate Transfer of Information Between Integrated Circuits
Fig. 1.56 In the result of Monte Carlo simulation, the offset voltage in the worst case
Transient Comparison
(V) : t(s)
1.5
Proposed
x: 112.16p Conventional
Clock
1.0
0.0
x: 65.65p x: 95.141p
–0.5
20p 40p 60p 80p 100p 120p 140p 160p 180p
t(s)
Fig. 1.58 Proposed comparator delay times for extreme corner cases
The use of the proposed comparators reduced the timing constraints in DFE by
about 35%. As a result, at the expense of a slight increase in the area occupied on a
die and an almost 16% increase in power consumption, twice-higher-speed system
was obtained.
1.2 Methods to Increase the Speed of Receiving Sequential Information. . . 45
Fig. 1.62 Power supply rejection ratio (PSRR) in the worst case
As discussed in Sect. 1.1, PAM4 signal transmission enables to increase the amount
of received data by two times at the same frequency. It is recommended to apply four
identical branches in parallel after the CTLE in the receiver, in which the value of the
signal will be captured by means of sample-hold (S/H) blocks (Fig. 1.64).
Instead of the previously considered three-state comparator, a system consisting
of three comparators was used, the negative inputs of which are provided with
reference voltages [108]. The outputs of the system are transformed into a binary
code by means of a decoder. As a result, sequential signal processing and parallel
representation transformation are performed.
In comparators, in order to exclude wrong reading of the signal due to deviations
of the duty cycle of clock signal, it was adjusted using duty cycle regulator (DCR)
(Fig. 1.65) [109].
Due to the negative feedback in the system, the duty cycle detector (DCD)
converts the offsets of the output signals from their average value into a control
voltage, which is applied to the input of the regulator. Due to this, the constant
voltage components of the input signals are equalized. Then the output signal duty
cycle is setup. The received incomplete differential signal is converted into a fully
differential signal by means of the CML-CMOS buffer.
The constant component of the input voltage in DCR is determined by means of a
voltage divider consisting of resistors (Fig. 1.66). The input capacitance provides
filtering of the constant component of the signal.
Fig. 1.69 Input and output signals having 0.4 duty cycle
1.2 Methods to Increase the Speed of Receiving Sequential Information. . . 49
Fig. 1.70 Input and output signals having 0.2 duty cycle
CMRR
(Params) : f(Hz)
0.0
X Value: 6g OUTPUT
(6g, –11.45)
–50.0
(Params)
–100.0
–150.0
1meg 10meg 100meg 1g 10g
f(Hz)
Conclusions
1. Effective approaches from the point of view of time limitations for the develop-
ment of means to increase the speed of sequential information reception in
integrated circuits have been proposed. The embedded nodes and architecture
structure underlying them provide an opportunity to significantly increase data
transfer and processing frequencies in the event of an increase in the area
occupied on the die and power consumption within the permissible limits.
1.2 Methods to Increase the Speed of Receiving Sequential Information. . . 51
References
1. V. Stojanović, High-Speed Serial Links: Speed Serial Links: Design Trends and Challenges
(Massachusetts Institute of Technology, Integrated Systems Group, 2005), pp. 38–42
2. W. Zhikai, W. Hu, Z. Gu et al., Large-scale integrated circuits simulation based on CNT-FET
model. 2021 International conference on IC design and technology (ICICDT) (2021), pp. 1–4
3. B. Zhang, Z. Wentong, Z. Le, et al., Review of technologies for high-voltage integrated
circuits. Tsinghua Sci. Technol. 27, 495–511 (2021)
4. Y. Liang, S. Ruize, Y. Yee-Chia et al., Development of GaN monolithic integrated circuits for
power conversion. 2019 IEEE custom integrated circuits conference (CICC) (2019), pp. 1–4
5. Y. Honda, G. Masahide, W. Toshihisa et al., Triple-stacked silicon-on-insulator integrated
circuits using Au/SiO2 hybrid bonding. 2019 IEEE SOI-3D-subthreshold microelectronics
technology unified conference (S3S) (2019), pp. 1–3
6. M. Rashdan, F. El-Sayed, M. Salman, Performance comparison between SerDes and time-
based serial links. 2020 7th international conference on electrical and electronics engineering
(ICEEE) (2020), pp. 37–41
7. N. Zhou, K. Huang, F. Lve et al., A 76 mw 40-GB/s SerDes transmitter with 64: 1 mux in
65-nm Cmos technology. 2016 6th international conference on electronics information and
emergency communication (ICEIEC) (2016), pp. 155–158
8. A.R. Chada, B. Mutnury, D. Wallace et al., Simulation challenges in designing high speed
serial links. 2012 IEEE 62nd electronic components and technology conference (ECTC)
(2012), pp. 153–159
9. K. Huang, D. Luo, Z. Wang et al., A 190 mW 40 Gbps SerDes transmitter and receiver chipset
in 65 nm CMOS technology. 2015 IEEE custom integrated circuits conference (CICC) (2015),
pp. 1–4
10. A.A.S. SH, K.S. Reddy, A 20 Gb/s latency optimized SerDes transmitter for data centre
applications. 2020 IEEE international conference on electronics, computing and communica-
tion technologies (CONECCT) (2020), pp. 1–4
11. A. Bandiziol, W. Grollitsch, F. Brandonisio et al., Design of a half–rate receiver for a 10 Gbps
automotive serial interface with 1–tap–unrolled 4–taps DFE and custom CDR algorithm. IEEE
international symposium on circuits and systems (ISCAS) (May 27, 2018), pp. 1–5
References 53
12. A. Roshan Zamir, High Speed Reconfigurable NRZ/PAM4 Transceiver Design Techniques:
Doctoral Dissertation (Texas A & M University, 2018), 100 p
13. H. Tang, L. Ding, J. Jin et al., A 28 Gb/s 2-tap FFE source-series-terminated transmitter in
22 nm CMOS FDSOI. 2018 IEEE international symposium on circuits and systems (ISCAS)
(2018), pp. 1–4.
14. Universal Serial Bus 3.1 Specification, Revision 1.0 (2013), 631 p. https://manuais.
iessanclemente.net/images/b/bc/USB_3_1_r1.0.pdf. Accessed 14 Mar 2019
15. PCI Express Base Specification Revision 3.0 (2010), 860 p. http://www.lttconn.com/res/
lttconn/pdres/201402/20140218105502619.pdf. Accessed 14 Mar 2019
16. Serial ATA Revision 3.0 – Gold Revision (Serial ATA International Organization, 2009),
663 p. http://www.lttconn.com/res/lttconn/pdres/201005/20100521170123066.pdf. Accessed
14 Mar 2019
17. VESA DisplayPort Standard Version 1, Revision 1a (2008), 238 p. http://file.yizimg.com/383
992/2014090921252964.pdf. Accessed 14 Mar 2019
18. https://www.synopsys.com/dw/ipdir.php?ds=dwc_ddr_multiphy
19. High-Definition Multimedia Interface Specification Version 1.4b (2011), 73 p. https://
glenwing.github.io/docs/HDMI-1.4b.pdf. Accessed 14 Mar 2019
20. B. Razavi, Design of Analog CMOS Integrated Circuits (July, 2017), 801 p
21. P. Hakansson, A. Huynh, S. Gong, A study of wireless parallel data transmission of extremely
high data rate up to 6.17 Gbps per channel. 2006 Asia-Pacific microwave conference (2006),
pp. 975–978
22. E. Takahashi, N. Endou, Y. Kasai, M. Iwata, et al., 8 Gbps parallel data transmission with
adaptive I/O circuit. IEEE Proc. 32nd Eur. Solid-State Circuits Conf. 32(2), 472–475 (2006)
23. W. Xie, C. Guangqiang, J. Weiwei, Research on high-speed SerDes interface testing technol-
ogy. 22nd international conference on electronic packaging technology (ICEPT) (2021),
pp. 1–5
24. A. Sahakyan, A. Shishmanyan, A. Hekimyan, Multi-rate clock-data recovery solution in high
speed serial links. Electronics and nanotechnology (ELNANO): IEEE 35th international
conference (2015), pp. 242–244
25. C.L. Hsieh, S.L. Liu, A 1–16-Gb/s wide-range clock/data recovery circuit with a bidirectional
frequency detector. IEEE Trans. Circuits Syst. II Express Briefs 58, 487–491 (2011)
26. M.S. Li, Y.K. Lu, C.Y. Yang et al., PLL-based clock and data recovery for SSC embedded
clock systems. 2019 international SoC design conference (ISOCC) (2015), pp. 309–310
27. M. El-Badry, M. El-Fiky, A. Yasser, A. Shehata et al., A 2.2-pJ/bit 10-Gb/s forwarded-clock
serial-link transceiver for IoE applications. Signals, circuits and systems (ISSCS): Interna-
tional symposium (2017), pp. 1–4
28. J.W. Jung, B. Razavi, A 25-Gb/s 5-mWCMOS CDR/deserializer. VLSI circuits (VLSIC):
Symposium (2012), pp. 138–139
29. M.M. Ayesh, S.A. Ibrahim, H.F. Ragai et al., A low-power high-speed charge-steering
ADC-based equalizer for serial links. Electronics, circuits, and systems (ICECS): IEEE
international conference (2015), pp. 500–501
30. O. Seijo, I. Val, J.A. Lopez-Fernandez et al., IEEE 1588 clock synchronization performance
over time-varying wireless channels. 2018 IEEE international symposium on precision clock
synchronization for measurement, control, and communication (ISPCS) (2018), pp. 1–6
31. S.H. Chung, Y.J. Kim, Y.H. Kim, et al., A 10-Gb/s 0.71-pJ/bit forwarded-clock receiver
tolerant to high-frequency jitter in 65-nm CMOS. IEEE Trans. Circuits Syst. II Express Briefs
63, 264–268 (2015)
32. X.. Wang, Q. Hu, Analysis and optimization of combined equalizer for high speed serial link.
Anti-counterfeiting, security, and identification (ASID): IEEE 9th international conference
(2015), pp. 43–46
33. M. Saneei, A. Afzali-Kusha, Z. Navabi, A mesochronous technique for communication in
network on chips. 2006 international conference on microelectronics (2006), pp. 32–35
54 1 Means to Accelerate Transfer of Information Between Integrated Circuits
34. E. Kilada, M. Dessouky, A. Elhennawy, Architecture of a fully digital CDR for plesiochronous
clocking systems. 2007 IEEE international conference on signal processing and communica-
tions (2007), pp. 939–942
35. S.S. Saber, M. Ehsanian, A linear high capture range CDR with adaptive loop bandwidth for
SONET application. Microelectronics (ICM): 29th international conference (2017), pp. 1–4
36. J. Kim, Y. Hwang, Y. Moon, A study of the referenceless CDR based on PLL. SoC design
conference (ISOCC): IEEE international conference (2016), pp. 265–266
37. H. Zhang, P. Xue, Z. Hong, A 4.6–5.6 GHz constant KVCO low phase noise LC-VCO and an
optimized automatic frequency calibrator applied in PLL frequency synthesizer. 43rd annual
conference of the IEEE Industrial Electronics Society (2017), pp. 8337–8342
38. J. Jin, J. Kim, H. Kim et al., A 4.0-10.0-Gb/s referenceless CDR with wide-range, jitter-
tolerant, and harmonic-lock-free frequency acquisition technique. ESSCIRC 2018-IEEE 44th
European solid state circuits conference (ESSCIRC) (2018), pp. 146–149
39. P. Zhang, C. Zhang, J. Zhang et al., A 25–28Gb/s PLL-based full-rate reference-less CDR in
0.13 μm SiGe BiCMOS. 2017 2nd IEEE international conference on integrated circuits and
microsystems (ICICM) (2017), pp. 186–190
40. Y. He, Z. Wang, H. Liu et al., An 8.5–12.5 ghz multi-pll clock architecture with lc PLL and
ring PLL for multi-lane multi-protocol SerDes. 2017 international conference on electron
devices and solid-state circuits (EDSSC) (2017), pp. 1–2
41. N. Bansal, R. Gupta, An NMOS low drop-out voltage regulator with-17dB wide-band power
supply rejection for SerDes in 22FDX. 31st international conference on VLSI design and 2018
17th international conference on embedded systems (VLSID) (2018), pp. 341–346
42. E. Abramov, T. Vekslender, O. Kirshenboim, et al., Fully integrated digital average current-
mode control voltage regulator module IC. IEEE J. Emerg. Select. Topics Power Electron.,
485–499 (2017)
43. S. Harutyunyan, H. Kostanyan, M. Grigoryan, et al., A reliable PMOS-based charge pump
architecture. Proc. RA NPUA Ser. Tech. Sci 73(2), 181–187 (2020)
44. J. Cui, Y. Zeng, J. Xia, Design of a low temperature drift and high PSRR bandgap reference
source with second-order compensation. J. Terahertz Sci. Electron. Inf. Technol., 41–49
(2018)
45. X. Xu, C. Chen, T. Sugiura et al., 18-GHz band low-power LC VCO IC using LC bias circuit
in 56-nm SOI CMOS. IEEE Asia Pacific microwave conference (APMC) (2017), pp. 938–941
46. X. Xu, C. Chen, T. Yoshimasu et al., A 28-GHz band highly linear power amplifier with novel
adaptive bias circuit for cascode MOSFET in 56-nm SOI CMOS. International conference on
electron devices and solid-state circuits (EDSSC) (2017), pp. 1–2
47. I.M. Filanovsky, M. Igor, A. Allam et al., Sub-regulators for biasing circuits. IEEE 55th
international midwest symposium on circuits and systems (MWSCAS) (2012), pp. 314–317
48. X. Xu, X. Yang, T. Yoshimasu, A 2-GHz-band low-phase-noise VCO IC with an LC bias
circuit in 180-nm CMOS. 11th European microwave integrated circuits conference (EuMIC)
(2016), pp. 197–200
49. D. Chen, A 16b 5MSPS two-stage pipeline ADC with self-calibrated technology. International
conference on information and computer technologies (ICICT) (2018), pp. 155–158
50. H. Alaqil, J. Hong, Combined microstrip and suspended substrate stripline combline bandpass
filter with two transmission zeros. 15th Mediterranean microwave symposium (MMS) (2015),
pp. 1–4
51. M. Sarkar, Suspended substrate stripline-microstrip mixed substrate topology based wide
stopband low pass filter. TEQIP III sponsored international conference on microwave inte-
grated circuits, photonics and wireless networks (IMICPW) (2019), pp. 90–94
52. G. Xiang, K. Sheach, P. Brunet, A study on high-density high-speed SerDes design in buildup
flip chip ball grid array packages. European microelectronics and packaging conference
(2019), pp. 1–4
References 55
53. N. Na, J. Audet, L. Shan, Design optimization for isolation in high wiring density packages
with high speed SerDes links. 56th electronic components and technology conference (2006),
pp. 1–7
54. T. Tang, B.S. Fang, D. Ho et al., Innovative flip chip package solutions for automotive
applications. IEEE 69th electronic components and technology conference (ECTC) (2019),
pp. 1432–1436
55. V. King, S. Jared, Transmission-Line Theory (1955), 509 p
56. C.R. Paul, Analysis of Multiconductor Transmission Lines (2007), 414 p
57. F.H. Branin, Transient analysis of lossless transmission lines. Proc. IEEE 55, 2012–2013
(1967)
58. T. Itoh, C. Caloz, Electromagnetic Metamaterials: Transmission Line Theory and Microwave
Applications (Wiley, 2005), 352 p
59. V.G. Oklobdzija R.K. Krishnamurthy, High-Performance Energy-Efficient Microprocessor
Design (Springer Science & Business Media, 2006), 338 p
60. O.H. Petrosyan, A.A. Martirosyan, A.S. Trdatyan, et al., Equalization method of resistors.
Manual Eng. Acad. Armenia Yerevan 15(3), 475–479 (2018)
61. V.Sh. Melikyan, A.K. Hayrapetyan, B.E. Baghramyan et al., Transmitter output impedance
calibration method. Proceedings of IEEE East-West design & test symposium (EWDTS)
(Kazan, Russia, 2018), pp. 51–58
62. V. Melikyan, A. Balabanyan, A. Hayrapetyan et al., Receiver/transmitter input/output termi-
nation resistance calibration method. 2013 IEEE XXXIII international scientific conference
electronics and nanotechnology (ELNANO) (2013), pp. 126–130
63. Z. Yan, C. Zhang, M. Wang et al., Calibration mechanism for input/output termination
resistance in 28 nm CMOS. IEEE 3rd international conference on integrated circuits and
microsystems (ICICM) (2018), pp. 42–46
64. R.G. Chambers, The anomalous skin effect. Proc. R. Soc. Lond. A. Math. Phys. Sci. 215,
481–497 (1952)
65. E.J. Murphy, H.H. Lowry, The сomplex nature of dielectric absorption and dielectric
loss. J. Phys. Chem. 34, 598–620 (2002)
66. M. Tang, Z. Li, J. Hu et al., A 56-Gb/s PAM4 continuous-time linear equalizer with fixed
peaking frequency in 40-nm CMOS. IEEE international conference on integrated circuits,
technologies and applications (ICTA) (2019), pp. 89–90
67. M.S. Choudhary, N.S. Pudi, J.M. Redoute et al., An EMI immune PAM4 transmitter in
130 nm BiCMOS technology. IEEE MTT-S international microwave and RF conference
(IMARC) (2019), pp. 1–4
68. C. Menolfi, T. Toifl, R. Reutemann et al., A 25 Gb/s PAM4 transmitter in 90 nm CMOS SOI.
2005 IEEE international digest of technical papers. Solid-state circuits conference (ISSCC)
(2005), pp. 72–73
69. J. Li, S. An, Q. Zhu, et al., VSB modified duobinary PAM4 signal transmission in an IM/DD
system with mitigated image interference. IEEE Photon. Technol. Lett. 32(7), 363–366 (2020)
70. Q. Liao, N. Qi, Z. Zhang et al., The design techniques for high-speed PAM4 clock and data
recovery. IEEE international conference on integrated circuits, technologies and applications
(ICTA) (2018), pp. 142–143
71. R. Ma, M. Cao, G. Chen et al., A 5/10 Gb/s dual-mode NRZ/PAM4 CDR in 65-nm CMOS.
IEEE international conference on electron devices and solid-state circuits (EDSSC) (2019),
pp. 1–3
72. M. Wang, Y. Chen, J. Yuan, A low jitter 50 Gb/s PAM4 CDR of receiver in 40 nm CMOS
technology. International conference on wireless communications and signal processing
(WCSP) (2020), pp. 349–352
73. D.M. Pozar, Microwave Engineering (Wiley, 2011), 736 p
74. R.N. Bracewell, R.N. Bracewell, The Fourier Transform and Its Applications (1986), 640 p
75. V. Serov, Fourier Series, Fourier Transform and Their Applications to Mathematical Physics
(2017), 534 p
56 1 Means to Accelerate Transfer of Information Between Integrated Circuits
76. J. He, N. Qi, N. Yu et al., A 2nd-order CTLE in 130 nm SiGe BiCMOS for a 50 GBaud PAM4
Optical Driver. IEEE international conference on integrated circuits, technologies and appli-
cations (ICTA) (2018), pp. 151–152
77. U. Upadhyaya, S. Sen, S. Goyal et al., A 16 Gbps 10: 1 serializer with active inductor based
CTLE for high frequency boosting. 27th IEEE international conference on electronics, circuits
and systems (ICECS) (2020), pp. 1–4
78. A. Mkhitaryan, A. Grigoryan, M. Grigoryan, et al., Hysteresis improvement method in MIPI
D-PHY low-power receiver. Manual of Russian-Armenian Slavonic University. Phys. Math.
Natural Sci. 1, 95–103 (2020)
79. P. Hale, J. Jargon, C. Wang, et al., A statistical study of de-embedding applied to eye diagram
analysis. IEEE Trans. Instrum. Measur. 61(2), 475–488 (2011)
80. M. Mehri, R. Sarvari, A. Seydolhosseini, Eye diagram parameter extraction of nano scale
VLSI interconnects. IEEE 21st international conference on electrical performance of electronic
packaging and systems (EPEPS) (2012), pp. 327–330
81. B. Gao, K. Wei, L. Tong, An eye diagram parameters measurement method based on K-means
clustering algorithm. IEEE MTT-S international microwave symposium (IMS) (2019),
pp. 901–904
82. P. Li, T. Wu, An eye diagram improvement method using simulation annealing algorithm.
IEEE 22nd workshop on signal and power integrity (SPI) (2018), pp. 1–4
83. J. Park, D. Kim, Y. Kim et al., Eye-diagram estimation with stochastic model for 8B/10B
encoded high-speed channel. IEEE international symposium on electromagnetic compatibility
and 2018 IEEE Asia-Pacific symposium on electromagnetic compatibility (EMC/APEMC)
(2018), pp. 1–5
84. D. Tonietto, J. Hogeboon, E. Bensoudane et al., A 7.5 Gb/s transmitter with self-adaptive FIR.
2008 IEEE symposium on VLSI circuits (2008), pp. 198–199
85. P. Prandoni, M. Vetterli, Signal Processing for Communications (2008), 371 p
86. R. Marin, A. Frappé, A. Kaiser, Delta-sigma based digital transmitters with low-complexity
embedded-FIR digital to RF mixing. IEEE international conference on electronics, circuits and
systems (ICECS) (2016), pp. 237–240
87. H. Liu, H. Jiang, Y. Shen et al., A delta-sigma-based transmitter utilizing FIR-embedded
digital power amplifiers. IEEE 58th international midwest symposium on circuits and systems
(MWSCAS) (2015), pp. 1–4
88. E. Conde-Almada, Design and physical implementation of an analog receiver for a SerDes
system on chip in 130nm CMOS technology (2016), 48p
89. T. Terada, R. Fujiwara, G. Ono et al., A CMOS UWB-IR receiver analog front end with
intermittent operation. IEEE symposium on VLSI circuits (2007), pp 86–87
90. A. Jaiswal, Y. Fang, K. Hofmann, Low-power high-speed on-chip asynchronous wave-
pipelined CML SerDes. 27th IEEE international system-on-chip conference (SOCC) (2014),
pp. 5–10
91. A. Aghighi, A. Tajalli, M. Taherzadeh-Sani, A low-power 10 to 15 Gb/s common-gate CTLE
based on optimized active inductors. IFIP/IEEE 28th international conference on very large
scale integration (VLSI-SOC) (2020), pp. 171–175
92. B. Li, B. Jiao, C. Chou, et al., CTLE adaptation using deep learning in high-speed SerDes link.
IEEE 70th electronic components and technology conference (ECTC) (2020), pp. 952–955
93. V. Melikyan, A. Sahakyan, et al., High PSRR and accuracy receiver active equalizer. IEEE
34th international scientific conference on electronics and nanotechnology (ELNANO)
(2014), pp. 194–197
94. P. Francese, T. Toifl, M. Brändli et al., A 16 Gb/s receiver with DC wander compensated rail-
to-rail AC coupling and passive linear-equalizer in 22 nm CMOS. ESSCIRC 2014-40th
European solid state circuits conference (ESSCIRC) (2014), pp. 435–438
95. M. Chen, M. Chung, C. Yang, A low-PDP and low-area repeater using passive CTLE for
on-chip interconnects. Symposium on VLSI circuits (VLSI circuits) (2015), pp. 244–245
References 57
96. D. Thulasiraman, G. Chiranjeevi, J. Gaggatur et al., A 18.6 fJ/bit/dB power efficient active
inductor-based CTLE for 20 Gb/s high speed serial link. IEEE international conference on
electronics, computing and communication technologies (CONECCT) (2019), pp. 1–6
97. Y. Choi, Y. Kim, A 10-Gb/s receiver with a continuous-time linear equalizer and 1-tap
decision-feedback equalizer. IEEE 58th international midwest symposium on circuits and
systems (MWSCAS) (2015), pp. 1–4
98. J. Chae, M. Kim, S. Choi, et al., A 10.4-Gb/s 1-tap decision feedback equalizer with different
pull-up and pull-down tap weights for asymmetric memory interfaces. IEEE Trans. Circuits
Syst. II Express Briefs 67(2), 220–224 (2019)
99. K. Kaviani, A. Amirkhany, C. Huang, et al., A 0.4-mW/Gb/s near-ground receiver front-end
with replica transconductance termination calibration for a 16-Gb/s source-series terminated
transceiver. IEEE J. Solid-State Circuits 48, 636–648 (2013)
100. W. Fu, Q. Hu, R. Wang, A 40 Gb/s PAM4 SerDes receiver in 65 nm CMOS technology. IEEE
Canadian conference on electrical & computer engineering (CCECE) (2018), pp. 1–4
101. L. Tang, W. Gai, L. Shi, PAM4 receiver with adaptive threshold voltage and adaptive decision
feedback equalizer. 2016 IEEE international symposium on circuits and systems (ISCAS)
(2016), pp. 2246–2249
102. B. Mrković, M. Ašenbrener, The simple CMOS negative capacitance with improved fre-
quency response. 2012 proceedings of the 35th international convention MIPRO (2012),
pp. 87–90
103. M. Grigoryan, A. Atanesyan, G. Hakobyan, S. Harutyunyan, Two stage CTLE for high speed
data receiving. IEEE 40th international conference on electronics and nanotechnology
(ELNANO) (Kyiv, 2020), pp. 374–377
104. E.V. Balashov, D. Pasquet, A.S. Korotkov et al., Automatization of compression point 1dB
(CP1dB) and input 3rd order intercept point (IIP3) measurements using lab VIEW platform.
International symposium on signals, circuits and systems (2005), pp. 195–198
105. W. Kester, A.D.I. Engineeri, Data Conversion Handbook (2005), 976 p
106. A.A. Atanesyan, M.T. Grigoryan, H.V. Margaryan, H.A. Aghayan, et al., Method of increas-
ing current DAC linearity with considering its random variables for modeling risk or uncer-
tainty. Manual of Russian-Armenian (Slavic) University. Phys. Math. Natural Sci. 2, 64–70
(2020)
107. H. Aghayan, D. Manukyan, M. Grigoryan, Low input capacitance dynamic latch comparator
for high speed operation. Manual of Russian-Armenian (Slavic) University. Phys. Math.
Natural Sci. 1, 65–75 (2020)
108. https://visualstudio.microsoft.com/
109. V.Sh. Melikyan, A.A. Atanesyan, M.T. Grigoryan, H.T. Kostanyan et al., Duty-cycle correc-
tion circuit for high speed interfaces. IEEE 39th international conference on electronics and
nanotechnology (ELNANO) (Kyiv, 2019), pp. 42–45
110. V. Melikyan, A. Trdatyan, A. Sahakyan, A. Martirosyan et al., Process variation detection and
self-calibration method for high-speed serial links. IEEE East-West design & test symposium
(EWDTS), 14 September 2018 (Kazan, Russia, 2018), pp. 681–684
Chapter 2
Design Methods of Integrated Circuits,
Working Under Non-standard Operating
Conditions
It is known that over the past 60 years, the semiconductor industry has evolved
according to Moore’s Law [1], which has allowed to have metal-oxide-semiconduc-
tor (MOS) transistors with channel lengths up to 3 nm [2]. All this has led to an
increase in the productivity of ICs, because the small size of transistors allows to
have more functional blocks on the same area. Along with all this, the operating
frequencies of ICs have also increased, reaching tens of GHz [3–5]. Such data
transfer speeds have allowed the development and improvement of such technolog-
ical directions and industries as automotive electronics [6], Internet of Things [7],
artificial intelligence [8], and virtual reality devices [8]. This ensures continuous data
transfer without human-human or human-computer interaction [9]. During the
simulation stage, there is a need to observe the phenomena of voltage and temper-
ature drift [10], because during the operation of ICs, drastic changes in external
conditions are possible even after the calibration stage [11]. Such improvements, as
well as the greater involvement of high-speed devices in human life, have led to the
tightening of requirements for IC reliability and the need to develop means of
designing ICs that work in non-standard operating conditions. Disruption of IC
operation can directly or indirectly lead to serious economic losses, environmental
damage, or threats to human life.
The reduction in sizes led to thinning of the oxide layer of transistor gates, which
allowed to reduce their threshold voltage [11, 12]. The low threshold voltage, in turn,
allowed to reduce the supply voltages of ICs, leading to the reduction of power
consumption [13–16]. The development of the technological process has made it
possible to reduce the distance between metal layers, leading to an increase in the
density of interconnects [17, 18]. In the case of closer wires, the potential connec-
tions between them increase [19]. Large capacitances lead to an increase in the noise
level (Fig. 2.1) [20–23]. In modern circuits, the value of the supply voltage has
become several hundred millivolts, and noises reach tens of millivolts [24–26].
An increase in noise and a decrease in the value of the supply voltage led to a
decrease in the signal-to-noise ratio. This reduces the noise immunity of modern ICs
[27, 28] and can lead to undesirable changes in the voltage characteristics of circuits,
up to functional failure.
The increase in operating frequencies, the density of transistors and interconnects
per unit area, and the reduction of the thickness of the oxide layer have made the
influence of the undesirable phenomena of self-heating [29] and aging [30] on the
operation of ICs more tangible. Under the influence of self-heating and electric field,
the charge carriers can acquire such kinetic energy that will allow them to appear in
the oxide layer of the gate, being absorbed in it. This process is known as the
phenomenon of hot carriers injection (HCI) [31], the effect of which is great during
switching of transistors (Fig. 2.2) [32]. The phenomenon of bias temperature insta-
bility (BTI) occurs when a constant voltage is applied to the terminals of the
transistor (Fig. 2.2) [32].
A high potential difference between the gate and the rest of the terminals causes
damage to the oxide layer, which is recoverable over time. The continuity of BTI and
HCI phenomena leads to a change in the properties of the oxide layer of transistors
and its parameters, as well as a decrease in the lifetime of ICs [33–35]. Modern
standards require ICs to meet a minimum 10-year life span limitation, which further
increases the role of aging phenomena. Factory-provided transistor aging models are
used to assess the effect of BTI and HCI phenomena on circuit performance. As a
result of the simulation, changes in the threshold voltage and current value of
transistors in the saturation region are observed [36, 37]. Taking into account the
fields of application of current ICs, as well as the operating temperature of automo-
tive electronics standards up to 150 °С and a larger range of offset value [38–41], the
mentioned phenomena can play a decisive role from the point of view of IC
functionality.
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 61
Fig. 2.2 Occurrence of HCI, BTI, self-heating phenomena during the operation of the transistor
Along with the reduction of the supply voltages, the amplitudes of the signals
applied to the inputs of the comparators also decreased [42]. Systems with a small
amplitude compared to the constant component of the voltage require operational
amplifiers (OpAmp) with high sensitivity and gain. The sensitivity is characterized
by the minimum difference between inputs required for switching. Ideally, for any
voltage difference between the inputs greater than 0, the output of the comparator
should be switched. The minimum voltage difference between the inputs, at which
switching occurs at the output of the comparator, is called the offset voltage
(Fig. 2.3) [43, 44].
The main causes of offset are errors in technological process, asymmetry of
layout, and aging phenomena [45–47].
In modern ICs, the value of offset reaches tens of millivolts, which is a challenge
for design companies. The circuit of the amplifier with a folded cascode belongs to
the series of well-known structures providing a high value of gain in comparators
[48] (Fig. 2.4).
62 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
In order to solve the problems mentioned in the previous sub-chapter, new methods
and means have been developed in recent years, the use of which contributes to
increasing the stability and uninterrupted operation of ICs. It was possible to achieve
all this by introducing digital subsystems that ensure self-regulation of circuits. The
use of digital-to-analog and analog-to-digital converters allows to introduce calibra-
tion algorithms [49], which, taking into account the operating conditions of the
blocks, can ensure the optimal values of resistances, current, voltage, and other
parameters at individual nodes of circuits by changing the code. Similar algorithms
make it possible to achieve the minimum change of gain, delay, signal jitter, and
other important parameters of the circuit. Among the mentioned solutions are the
methods of reducing the offset with digital control [50] and automatic reset in
comparators [51] in IC receivers, as well as the method of continuous data reception
through digital delay lines [52–55].
It is known that offset is one of the important factors limiting the accuracy of
comparators and OpAmps [56]. Used primarily in analog circuits, these elements
are one of their most sensitive components. One of the complications in the design of
comparators is the provision of saturation and high gain of circuit transistors and
symmetry of layout [57]. These three factors have quite a large influence on the value
of the offset. An unbalanced layout can cause differences in resistance and capac-
itance values between comparator branches, leading to an increase in offset [58–60].
The two-cascade comparators used in current ICs are able to provide a low offset
value, having a low supply voltage value and high noise immunity [56]. A small
value of offset voltage can be achieved for two reasons:
1. The offset of the first cascade (Fig. 2.5) [56] is canceled by the input offset storage
circuit.
2. The offset of the second cascade (Fig. 2.6) [56] is reduced due to the high gain of
the first cascade.
The first cascade is a dynamic amplifier that amplifies the difference of the input
signals by discharging the differential junction [56].
64 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
Fig. 2.7 Timing diagram of comparator operation. (a) Charging of C+/C- capacitors up to VDD
value. (b) Discharge of capacitors. (c) Offset storage. (d) Outputs of the first cascade are connected
to the VDD point and input signals are sampled. (e) Comparison of signals
Fig. 2.8 A simplified view of the operation of the comparator in the reset phase
switches are connected through the rst signal. The In+ and In- points are connected
to the O+ and O- points, respectively, and are charged up to the VDD value of the
supply voltage (Fig. 2.8a) [56]. The clk1 signal turns on the M6 transistor,
discharging С+/C- potentials (Fig. 2.8b) [56]. As the input M0 and M1 transistors
pass into the subthreshold range, the I+/I- currents decrease sharply. The rst signal is
turned off, and the offset is stored in С+/C- capacitors (Fig. 2.9a) [56].
O+ and O- points are connected to VDD; at the same time the sw signal is turned
off to register input signals (Fig. 2.9b) [56]. A comparison of input signals is made
through the discharge of O+ and O- points (Fig. 2.9c) [56].
Then, the O+ and O- points, connected to the inputs of the second cascade, form
the outputs of the general system (Fig. 2.10) [56]. The main reason for the offset of
the first cascade is the asymmetry of M1 and M2 transistors. It is canceled during the
reset phase. During the discharge process, the gate and output of input transistors are
connected together. Therefore, under the influence of the current flowing through
66 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
Fig. 2.9 Simplified view of the operation of the comparator during the comparison stage
them, the voltage applied to the gate will increase, which in turn will cause it to
shrink. Due to such negative feedback, the current values of the input transistors will
get closer to each other.
After the input transistors pass into the subthreshold range, the difference in these
currents is less than 1 nA. After the rst signal is turned off, the value of the current for
the input transistors has the following dependence on the gate-source voltage (2.1)
[56]:
VG - Vth
Ids = 2nβU 2⊺ e nUT ð2:1Þ
where Ids is the value of the current flowing through the input transistors, n is the
slope coefficient of the current graph in the subthreshold range, Vgs is the gate-source
voltage, Vth is the threshold voltage of the transistor, and UT (2.2) and β (2.3) are the
current and thermal voltage coefficients, respectively;
W
β = μC ox ð2:2Þ
L
kT
UT = , ð2:3Þ
q
where μn is the charge carrier mobility, Cox is the capacitance per unit area of the
transistor gate insulator, W and L are the width and length of the transistor channel,
k is the Boltzmann constant, and T is the absolute temperature [56, 61]. After the
reset phase, the voltage stored in С+/C- potentials is determined by the expression
given in (2.4) [57]:
I ds
V g V th þ nU T ln : ð2:4Þ
2nβU 2T
From Eq. (2.4), it follows that asymmetries arising from β and Vth are also stored
and canceled before the comparison stage.
Fig. 2.12 Input data equalizer of a receiver and a simplified view of circuit operation
(SPC) circuit. In the ideal case, the channel represents a short circuit between
transmitters and receivers [63].
The non-ideality of channel parameters—low bandwidth [64] and noises [65]—
affect the accuracy of data reception at the input of the receiver. As it is well known,
the transmission characteristic of the channel has a low-frequency character [66],
which leads to distortions of the transmitted data depending on their frequency. To
solve the mentioned problems, data equalizers [67–72] are used in both transmitters
and receivers, which have opposite characteristics to the channel in order to neu-
tralize its losses. The equalizer used at the input of the receiver (Fig. 2.12) [67]
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 69
consists of a differential pair, the two branches of which are connected to each other
by resistors and capacitors. The feature of the circuit is that depending on the
frequency of the input data, the gain takes different values. At low frequencies, the
behavior of the circuit is similar to that of a source-degraded amplifier, and at high
frequencies, a common-source amplifier.
Having the values of Rb, R1 resistors and Ce, C1 capacitors, the system gain (2.5)
can be obtained.
gm S þ RoCo
1
AV = , ð2:5Þ
Cr g R
1þ m2 1
Sþ R1 C 1 S þ Ro1Co
The presence of an offset between the output signals of a node can lead to
incorrect switching or loss of bits. The value of the offset voltage can be determined
by calculating the difference between the outputs of the circuit when the input
signals are equal. In modern ICs, analog-digital self-regulating blocks are used to
reduce the offset between the outputs of the equalizer [76, 77]. Their purpose is to
provide a minimum voltage difference between the outputs by changing the code for
an input single-ended signal. The most well-known method consists of a DAC and a
differential pair, which, by connecting to the outputs of the equalizer, reduces the
offset due to the insertion of additional current (Fig. 2.15).
DAC represents resistors connected in series and MOS switches controllable by
code (Fig. 2.16). The latter, depending on the digital code and the reference voltage
value [78, 79], output the analog voltage caused by the connection of resistors. By
increasing the bit-rate of the DAC, it is possible to increase the accuracy of the
system at the expense of reducing the step of the analog voltage [79]. The resulting
analog voltage is supplied to M1 and M2 transistors (Fig. 2.15), whose connected
current source can either match the temperature-independent current supplied to
DAC or differ from it by being connected to a circuit with a constant conductivity to
keep the gain stable during the operation.
The calibration algorithm works in the following sequence: at the beginning point
b) (Fig. 2.18), being connected to the point providing the smallest voltage of DAC, is
kept constant. The code changes from maximum to minimum and vice versa
(Fig. 2.18). During that time, the voltage of point a) starts to decrease from the
high voltage value of DAC in a minimal step, equaling to point b) (Fig. 2.17).
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 71
In the next stage, with the same logic, the voltage of point b) increases and
decreases from the beginning, keeping the voltage of point a) constant. At the end,
the voltage of point a) increases, keeping point b) fixed (Fig. 2.18).
During calibration, the inputs of the node are connected to each other so that it is
possible to measure the offset. The required code value is selected during output
switching. After that, the calibration is considered complete, and sequential data can
be accepted.
72 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
(lin)
0.3 a) b)
0.2
0.1
–0.1
–0.2
Fig. 2.18 The process of changing the values of points (a) and (b) during calibration
One of the most important parameters of modern ICs is performance [80]. Taking the
current operating frequencies into account, the requirement to have a clock signal
with a stable frequency inside ICs increases [81]. Increasing operating frequencies
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 73
has led to shorter signal periods, which means that the negative impact of signal jitter
and drift during data transmission has increased. In modern systems, as a generator
of a constant clock signal, the PLL is used (Fig. 2.19). It is a self-calibrating system,
the output frequency of which is a multiple of the input frequency. Powered by a
clock generator, the frequency of which is several tens of megahertz, PLL allows to
have a clock signal reaching tens of GHz in ICs. The phase-frequency detector
(PFD) included in its structure is intended for matching the phases of input and
output signals. By registering the phase difference, it generates output voltages that
control the charge pump (CP). Depending on the output voltage of the CP, the
voltage-controlled oscillator (VCO) changes the frequency of the output signal. The
ratio of the frequencies of the input and output signals is determined by means of a
frequency divider (FD) (Fig. 2.19).
Such a structure of the clock signal generator occupies a large area and does not
allow to use it in different parts of ICs [82]. The clock signal is transmitted to other
parts of the IC by buffering (Fig. 2.20) [83].
The main purpose of buffering is to restore the signal from distortion. Connec-
tions, having their own parasitic resistances and making capacitive connections with
neighboring wires, slow down the signal rise/falls, affect the duty cycle, and can
cause jitter and signal delay [84] (Fig. 2.21). This interferes with the parallel
operation of different parts of ICs.
Elements used for signal buffering contain simplest inverters in their structure
[85]. Depending on the conditions of the external environment, the timing charac-
teristics of inverters can be changed, because they affect the operation of transistors.
Temperature fluctuations lead to a change in the threshold voltage of transistors,
which affects the value of the charge and discharge current of the output load of
inverters. Fluctuations in the supply voltage, affecting the value of the gate-source
voltage, also lead to a change in timing parameters of inverters.
In order to solve the mentioned problems in modern ICs, programmable digital
delay lines (DDL) are used (Fig. 2.22) [86, 87]. Consisting of sequentially connected
74 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
delay buffers, the circuit allows code to control the overall line delay by enabling or
disabling the appropriate number of elements.
Similar circuits are also used for uninterrupted reception of sequential data. It is
possible to control the clock signal using DDLs. Depending on the external condi-
tions, the calibration algorithm selects the appropriate code, placing the rise/fall of
the clock signal in the middle of the data signal, for the most accurate sampling.
Thanks to shifting of the clock signal, it is possible to sample the data with both rise
and fall (Fig. 2.23). The most important parameters of DDLs are the total delay
interval of the line and the system step per code unit change.
Table 2.2 shows the simulation results for the main DDL parameters.
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 75
The maximum step bias from the standard value is 0.03 MM, which is 1.5% of the
signal period. The total delay range deviates by 0.23 MM from the typical case,
which does not exceed 12% of the period.
76 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
Table 2.2 Dependence of key DDL parameters on external factors and technological deviations
Measurement Minimum Typical Maximum
Name of parameter unit value value value
Maximum delay step Unit range 0.09 0.12 0.13
(MaxDS) (UR)
Minimum delay step 0.029 0.03 0.06
(MinDS)
Total delay interval of line 2.06 2.27 2.5
(TLD)
The main difficulty in using the offset reduction method in comparators is ensuring
the symmetry of the switched used. It is necessary that the clock signals used for
switching have a 50% duty cycle (Fig. 2.24); otherwise, it will lead to a difference in
the times when the switches are on and off. Such a deviation of the clock signal can
lead to incomplete discharge or charging of capacitors, which will mean there is a
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 77
residual offset at the input of the comparator. In case of mismatch of several clock
signals used in the system, the accuracy of the method decreases.
Taking into account the operation of circuits in non-standard operating conditions
and sharp changes in temperature and voltage, ensuring such a value of duty cycle is
a great challenge and requires a lot of effort for designers [88].
In addition to the mentioned drawback, there is another problem, which is due to
the dissipation of the accumulated charge in the transistor channel when the switches
are off. Since MOS transistors are used as a switch, the value of charge, diffused to
the source and drain of the transistor when they are off, will depend on the resistance
values of these pieces. If the resistances are equal, the accumulated charge will be
divided and dispersed in two pieces. Taking into account the non-idealities of the
technological process, it is not possible to ensure the uniform dispersion of the
charge. The charge diffused into the potential will cause a change in the actual offset
of the system stored in it (Fig. 2.25).
The amount of charge accumulated in the channel can be controlled by changing
the physical dimensions of the transistor (2.6). Reducing the length and width of the
gate will reduce its capacitance and reduce the amount of accumulated charge.
Qr = ðC ox WLV G ‐V th Þ ð2:6Þ
From the expression (2.7) it follows that it is not possible to reset the amount of
accumulated charge to zero. On the other hand, reducing the size of the transistor
will reduce the current passing through it and increase the charging and discharging
times of the capacitor. This can cause the system to slow down. There are methods
for reducing the residual offset caused by switch asymmetry, but their implementa-
tion requires increasing the circuit area and adding additional clock signals, while
not completely solving the offset problem. Using capacitors in input cascade leads to
an increase in the size of layout and can affect comparator bandwidth.
Considering the above-mentioned shortcomings, there is a need to develop new
methods that will reduce the offset in the comparators without increasing the size of
the circuit and having a negative impact on the stability of the system.
78 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
The main purpose of the receiver is to restore the distorted data signal after the
channel. After eliminating the unwanted effect of the channel, only the signal is
transmitted to the remaining blocks of the receiver. The presence of offset in the
equalizer can lead to incorrect processing of the received signal, resulting in possible
data loss. The operation of modern systems in non-standard conditions implies
drastic temperature changes. The temperature dependence of transistor parameters
can affect their state. Transition of the transistors from saturation mode to triode or
off mode will cause the entire system to fail.
The simulation results show that in the method of offset reduction through the
digital control circuit in IC receivers, the offset change as a result of temperature drift
reaches up to 27 mV (Fig. 2.26d) (Table 2.3).
30.00
25.00
20.00
Offset voltage (mV)
15.00 c) d)
10.00
a)
5.00 b)
0.00
64 72 80 88 96 104 112 120 127
–5.00
Calibration code
Table 2.3 The offset change of the equalizer in the worst case
Current obtained from a circuit
Temperature-independent current with constant conduction
Maximum offset change 10 mV 27 mV
2.1 General Issues of Design Methods of Integrated Circuits, Working. . . 79
1 W ΔL
I= μCox ðV G ‐V th Þ2 1 þ ð2:7Þ
2 L L
Table 2.4 Change of main DDL parameters during temperature and voltage changes
Change in conditions Measured value Unit Minimum value Maximum value
MaxDS UR 0.089 0.132
±25 °C
MinDS 0.029 0.061
TLD 2.05 2.52
MaxDS 0.087 0.133
±50 °C
MinDS 0.028 0.063
TLD 2.0 2.55
MaxDS 0.085 0.136
±100 °C
MinDS 0.026 0.067
TLD 1.91 2.59
+30 mV MaxDS 0.088 0.128
MinDS 0.027 0.058
TLD 1.68 2.67
-30 mV MaxDS 0.093 0.135
MinDS 0.033 0.065
TLD 1.66 2.74
Taking into account the requirements for ICs, working in non-standard operating
conditions and the unacceptable deviations caused by changes in temperature and
supply voltage in existing methods due to aging of transistors and changes in
threshold voltage, there is a need to develop new solutions and methods aimed at
increasing the stability and efficiency of circuits. The following principles are
suggested:
Simulation results for one of the well-known structures of the described OpAmp
showed that it has an input offset of up to 35 mV as a result of aging in the off state.
The described method successfully reduced the offset value, but its implementation
required a significant increase in area due to the addition of a second comparator
cascade, the need for several clock signals for which a 50% duty cycle had to be
provided, and capacitors that caused the circuit stability issues. It is recommended to
keep the inputs of the circuit closed and ensure the minimum potential difference
between the terminals of the remaining transistors due to the addition of transmission
82 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
gates and MOS switches in the structure of the OpAmp, due to the minimum
increase in area, because, as mentioned, in this case, the effect of aging phenomena
is less.
Taking the offset increase into account caused by the change in the parameters of the
transistors feeding the outputs of the equalizer in the described method, it is
recommended to eliminate them and use a current DAC instead of the DAC
consisting of a circuit and resistors. Such a change, due to the reduction of the
area occupied on a semiconductor crystal and the minimal increase in power
consumption, allows the outputs of the circuit to be fed directly from the current
DAC, the transistors of which have a greater saturation reserve and are stable during
temperature changes.
2.4.1 Conclusions
changes in temperature and voltage after calibration and the increased influence
of aging phenomena typical of modern technologies are not taken into account.
This indicates that the latter do not meet modern requirements, and the need to
develop new solutions for increasing the stability and reliability of integrated
circuits has arisen.
3. Approaches of IC design that work in non-standard operating conditions have
been proposed, which will meet modern requirements and, at the expense of
increasing the occupied area and power consumption within the permissible
limits, will significantly reduce the deviations caused by changes in external
conditions and aging phenomena.
The problems described in Sect. 2.1 witness about the shortcomings in IC design
methods working under non-standard operating conditions. The increase in the
involvement of ICs and the impact of aging phenomena in them and the sharp
changes in voltage and temperature during the operation of devices have led to
tightening the requirements towards IC reliability. Designers are forced to solve the
problems that are present in the existing means or to develop methods to increase IC
stability. Below are the proposed means of designing ICs, working under non-
84 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
standard operating conditions, which solve the existing shortcomings in the existing
methods, meeting the contemporary requirements.
As mentioned above, one of the common comparator circuits has an offset problem
caused by aging of the circuit in the off state. The proposed offset reduction method
required the presence of capacitors, several clock signals, a second amplifier cas-
cade, as well as switches. The problems related to IC stability, the complexity of
ensuring the 50% duty cycle of the clock signal, and the dissipation of the charge
when the switches are turned off caused a residual offset in the system. In order to
avoid these defects, a method of offset reduction caused by aging phenomena in
comparators was proposed [89].
Comparing the parameters of transistors in the on and off state of the comparator
as a result of aging for 10 years, it becomes clear that the main reason for the
occurrence of offset is the change in their threshold voltage. The maximum delta
threshold was observed in the case of input transistors (Table 2.5) [89].
The effect of BTI and HCI phenomena on the input transistors is smaller when
they are closed. Due to the addition of transmission gates TG1 and TG2, it is possible
to cut off the inputs of the comparator and further control them in the off state of the
circuit. D1 and D2 transistors have been added to control the inputs of the circuit. If
they are open, the input transistors are in a closed state (Fig. 2.29) [89].
After the changes, the maximum offset of the comparator in the case of different
input scenarios was 8 mV (Fig. 2.30).
The change in the threshold voltage of input transistors has decreased, reaching
131 ms. At the same time, there is still a change in the threshold voltage up to
191 mV in the circuit (Table 2.6) [89].
To reduce the changes in the threshold voltage of the current source M9 and M7
and M8 transistors, D3, D4, and D5 switches were added (Fig. 2.31) [89]. In the off
state of the circuit, the D5 switch is open. The source terminals of M1 and M2
transistors, being connected to the supply voltage, provide a zero potential difference
with respect to the gate. At the same time, the potential difference between all
terminals of M9 transistor is zero, which leads to a reduction in the effect of aging
phenomena. D3 and D4 switches ensure the discharge of outputs of M7 and M8
transistors to the ground when the circuit is off.
Table 2.5 The maximum delta threshold of the input transistors as a result of aging for 10 years for
on and off states of the comparator
State of the circuit The maximum delta threshold of input transistors (mV)
On 25.1
Off 273.2
2.5 Design Methods of Integrated Circuits, Working Under Non-standard. . . 85
Table 2.6 Maximum deviations of parameters of comparator transistors as a result of aging for
10 years
Name of transistor Current shift from initial value (%) Delta threshold (mV)
M9 9.8 191
M7/M8 8.03 155
M1/M2 6.32 131
86 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
After the specified modifications, the input offset value of the comparator is 3 mV
(Fig. 2.32) [89].
In order to avoid leakage currents, as well as the described shortcomings,
occurring when switching the transmission gates TG1 and TG2, the widths and
lengths of the channel of the added transistor switches were chosen as small as
possible.
Monte Carlo simulation was performed for typical offset and worst cases
(Figs. 2.33 and 2.34) in order to check the effect of technological deviations on
the operation of the circuit after modifications.
2.5 Design Methods of Integrated Circuits, Working Under Non-standard. . . 87
Int Min/Max
0.005
0.000
–0.005
–0.010
–0.015
–6 –4 –2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Fig. 2.33 Offset distribution as a result of Monte Carlo simulation for a typical case after
transformations
The maximum offset value for typical and worst cases was 10.1 mV and
12.64 mV, respectively (Table 2.7).
A comparison of transistor parameters due to gain, offset, and aging was made
with the results of the initial circuit under temperature conditions up to 150 °C
(Table 2.8) [89].
Thus, at the expense of the 4.82% increase in the area of the described compar-
ator, the proposed method ensures an 11.6 times reduction of the offset caused by the
aging of the circuit. The area of the circuit has increased due to the addition of extra
transistors and transmission gates. The gain of the comparator did not decrease as a
result of modifications. The method does not require addition of extra outputs, since
the transistors and transmission gates are controlled by the signals used to turn off
the circuit.
In the described method, the output offset of the receiver increased as a result of
temperature fluctuations. The threshold voltage of M1 and M2 transistors was
88 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
Int Min/Max
0.010
0.005
0.000
–0.005
–0.010
–0.015
–6 –4 –2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Fig. 2.34 Offset distribution as a result of Monte Carlo simulation for a worst case after
transformations
Table 2.8 Comparison of parameters of the initial and final circuits of the comparator as a result of
aging for 10 years
Parameter Modified circuit Initial circuit
Delta threshold (mV) 0.01 273
Offset (mV) 3 35
Gain (dB) 80.01 76.3
Maximum current shift (mkA) 0.1 9.8
changing. Both cases of choosing the current sources feeding the system had their
drawbacks. A method is proposed [90] in order to avoid the changes of the offset.
The following changes have been made in the circuit:
1. To avoid voltage-current-voltage conversions, the pair of M1 and M2 transistors
was added [90].
2. The structure of the DAC consisting of resistors was replaced by the current DAC
(Fig. 2.7) [90] (Fig. 2.35).
2.5 Design Methods of Integrated Circuits, Working Under Non-standard. . . 89
A similar structure allows to connect current DAC directly to the outputs of the
equalizer (Fig. 2.36) [90].
To simulate the case of getting voltages equal to the gates of M1 and M2
transistors and to keep the logic of calibration algorithm the same, an always-on
current DAC was added. In order to ensure the symmetry of the branches in the
layout, a disconnected DAC was added. In this way, it is possible to avoid the
occurrence of additional offset in the output of the circuit. A similar structure allows
90 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
After deciding the code, the inputs of the node are brought together again. In this
case, at the point of their intersection, the output offset already has a significantly
smaller value (Fig. 2.40) [90].
As a result of the operation of calibration algorithm, the output offset was in
decimal mV. The offset change due to the -40 to 150 °C thermal drift of the
proposed method was 1.42 mV [90]. It was possible to achieve such a small change
in the offset as a result of the application of the current DAC. The transistors used in
it have a fairly large supply of saturation. Delta threshold as a result of the change in
temperature does not cause a change in the state of transistors. In this approach the
range of offset calibration is more linear, because the voltage is obtained directly on
resistors of the equalizer (Fig. 2.41) [90].
To evaluate the effect of technological deviations on the proposed method, Monte
Carlo verification was performed at 4.5 sigma range. Before using the method, the
92 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
maximum value of offset was 4 mV in the range of ±4.5 sigma (Fig. 2.42) [90]. After
the changes in the circuit, the maximum offset value in the worst and typical cases
was 1.2 mV and 0.92 mV, respectively (Table 2.9) (Figs. 2.43 and 2.44) [90].
Thus, the replacement of DAC of resistors allowed to reduce the area of the circuit
by 43.2% at the expense of refusing from transistor-resistor structure (Table 2.10)
[90]. The current DAC occupies a smaller area, consisting only of MOS transistors.
The power consumption increased by 7.2% due to the always-on power supply
branch.
The obtained results prove that the proposed solution is effective in terms of
reducing the output offset of the equalizer. Due to the reduction of the area and the
increase of the power consumption, the method allows to achieve an offset of about
19 times the reduction in case of sharp changes in temperature.
2.5 Design Methods of Integrated Circuits, Working Under Non-standard. . . 93
Fig. 2.41 Output signals before (a) and after (b) application of the method
In the described method, the timing parameters of DDL deviated during temperature
and voltage fluctuations. This was due to the change in the threshold and gate-source
voltages of transistors in inverters.
This led to deviations in the charge and discharge times of output loads, which
affected the DDL delay.
To avoid the above problem, a method based on negative feedback is proposed
[91]. To sense temperature and voltage changes, an additional delay element was
added to which feedback was applied (Fig. 2.17) [91]. In order to accurately measure
changes in external conditions on the line, it is recommended to place the additional
delay element in the physical design as close to the DDL as possible (Fig. 2.45).
The input and output of the added delay element are connected to the phase
detector (PD). It represents an XOR circuit and is designed to capture delay deviation
94 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
data
Gaussian fit
Confidence
Interval
0.004 Target Sigma
Spec Min/Max
Int Min/Max
0.002
0.000
–0.002
–0.004
–6 –4 –2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Fig. 2.42 The worst-case offset distribution before applying the method
0.002
0.000
–0.002
–0.004
–6 –4 -2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Fig. 2.43 The worst-case offset distribution as a result of using the method
data
Gaussian fit
Confidence
Interval
0.004 Target Sigma
Spec Min/Max
Int Min/Max
0.002
0.000
–0.002
–0.004
–6 –4 -2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Fig. 2.44 Offset distribution in a typical case as a result of using the method
96 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
Table 2.10 Comparative results of the maximum offset change, the occupied area, and the power
consumption for the circuits using the initial and proposed methods
Circuit For the existing method For the proposed method
Maximum delta threshold (mV) 26.98 0,0.42
Area (mkm2) 2361.31 1341.2
Power consumption (mW) 8341.25 8941.82
COpAmp output, the control voltages Vp and Vn are changed. Through feedback,
these voltages are brought to values where the delay has minimal deviation. In
addition to the feedback circuit, the voltages Vp and Vn are also connected to the
DDL, since changes in external conditions have affected it in the same way
(Fig. 2.49).
98 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
To check the reliability of COpAmp, its power supply rejection ratio (PSRR)
(Fig. 2.50) [91] and gain (Fig. 2.51), amplification, and phase supplies were mea-
sured (Table 2.11). LPF cut-off frequency was chosen in the range of 120–180 kHz
(Fig. 2.52).
2.5 Design Methods of Integrated Circuits, Working Under Non-standard. . . 99
2.45
2.40
2.35
2.30
2.25
2.20
–6 –4 –2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
To verify the effectiveness of the method, Monte Carlo check was performed in
the worst and typical cases [91] (Figs. 2.53 and 2.54).
The delay distribution is linear and satisfies the ±4.5 sigma range of technological
deviations. After the changes made, the operation of the circuit was checked for
changes in voltage and temperature. Changes in external conditions were applied
after calibration was completed to verify the operation of negative feedback. The
maximum range of DDL delay was 0.47 MM at -30 mV deviation of the supply
voltage (Table 2.12) [91].
Thus, due to changes in temperature and supply voltage, the DDL delay range has
been reduced by 56.04%, from 1.08 MM to 0.47 MM. It was possible to achieve all
this due to the 23.1% increase in the area occupied on the die, because the negative
feedback circuit was introduced. The circuit satisfies the Monte Carlo technological
deviation range of ±4.5 sigma. The COpAmp has sufficient PSRR to suppress power
jitter noises.
In summary, it can be said that the proposed method is effective in reducing
deviations in DDLs operating under non-standard conditions.
2.6 Conclusions 101
2.7
2.6
2.5
2.4
2.3
2.2
2.1
–6 –4 –2 0 2 4 6
Equivalent Inverted Gaussian Distribution [V]
Table 2.12 Comparative DDL delay results for circuits using the initial circuit and proposed
method
DDL-h TLD (UR)
For initial circuit For proposed method
Change in external conditions Min value Max value Min value Max value
After calibration 2.06 2.5 2.28 2.58
1.91 2.59 2.2 2.69
±100 °C
+30 mV 1.68 2.67 2.17 2.63
-30 mV 1.66 2.74 2.11 2.58
2.6 Conclusions
References
1. R. Schaller, Moore’s law: Past, present and future. IEEE Spectr. 34(6), 52–59 (1997). https://
doi.org/10.1109/6.591665
2. T.P. Dash, S. Dey, E. Mohapatra, et al., Vertically-stacked silicon nanosheet field effect
transistors at 3nm technology nodes. Devices for integrated circuit (DevIC) (2019),
pp. 99–103, doi: https://doi.org/10.1109/DEVIC.2019.8783300
3. Z. Zhang, J. Sun, W. Zhu, et al., Design of a 3.2GHz50GHz ultra wideband YIG-tunable-
filter. International conference on microwave and millimeter wave technology (ICMMT)
(2019), pp. 1–3, doi: https://doi.org/10.1109/ICMMT45702.2019.8992423
4. R. Yousry, 11.1 a 1.7pJ/b 112Gb/s XSR transceiver for intra-package communication in 7nm
FinFET technology. IEEE international solid- state circuits conference (ISSCC) (2021),
pp. 180–182, doi: https://doi.org/10.1109/ISSCC42613.2021.9365752
5. A. Cevrero, 6.1 a 100Gb/s 1.1pJ/b PAM-4 RX with dual-mode 1-Tap PAM-4/3-Tap NRZ
speculative DFE in 14nm CMOS FinFET. IEEE international solid-state circuits conference
(ISSCC) (2019), pp. 112–114, doi: https://doi.org/10.1109/ISSCC.2019.8662495
6. M. Rohith, K. Sreelakshmi, Design and integration of gateway electronic control unit (ECU) for
automotive electronics applications. Asian conference on innovation in technology
(ASIANCON) (2021), pp. 1–4, doi: https://doi.org/10.1109/ASIANCON51346.2021.9545049
7. V. Puranik, Sharmila, A. Ranjan, et al., Automation in agriculture and IoT. 4th international
conference on internet of things: smart innovation and usages (IoT-SIU) (2019), pp. 1–6, doi:
https://doi.org/10.1109/IoT-SIU.2019.8777619
8. J. Ouyang, X. Du, Y. Ma, et al., Kunlun: A 14nm high-performance AI processor for diversified
workloads. IEEE international solid-state circuits conference (ISSCC) (2021), pp. 50–51, doi:
https://doi.org/10.1109/ISSCC42613.2021.9366056
9. A. Yarali, Artificial intelligence, 5G, and IoT. Intelligent connectivity: AI, IoT, and 5G: IEEE
(2022), pp. 251–268, doi: https://doi.org/10.1002/9781119685265.ch14
10. L. Wang, G. Liu, J. Wang, A method of temperature drift compensation for pulse synchroni-
zation in high-speed signal acquisition. 3rd IEEE international conference on control science
and systems engineering (ICCSSE) (2017), pp. 529–534, doi: https://doi.org/10.1109/CCSSE.
2017.8087988
References 103
11. S. Shin, L. Yongjae, J. Park, et al., A clock distribution scheme insensitive to supply voltage
drift with self-adjustment of clock buffer delay. IEEE Trans. Circuits Syst. II Express Briefs 69,
814–818 (2021). https://doi.org/10.1109/TCSII.2021.3110409
12. L. Luo, Y. Wu, J. Diao, et al., Low power low noise amplifier with DC offset correction at 1 V
supply voltage for ultrasound imaging systems. IEEE 61st international midwest symposium on
circuits and systems (MWSCAS) (2018), pp. 137–140, doi: https://doi.org/10.1109/MWSCAS.
2018.8624065
13. B.T. Venkatesh Murthy, N.K. Singh, R. Jha, et al., Ultra low noise figure, low power con-
sumption Ku- band LNA with high gain for space application. 5th international conference on
communication and electronics systems (ICCES) (2020), pp. 80–83, doi: https://doi.org/10.
1109/ICCES48766.2020.9137956
14. A. Boora, B.K. Thangarasu, K.S. Yeo, An ultra-low power 900 MHz intermediate frequency
low noise amplifier for low-power RF receivers. IEEE 33rd international system-on-chip
conference (SOCC) (2020), pp. 163–167, doi: https://doi.org/10.1109/SOCC49529.2020.
9524753
15. P. Sandeep, P.A. Harsha Vardhini, V. Prakasam, SRAM utilization and power consumption
analysis for low power applications. International conference on recent trends on electronics,
information, communication & technology (RTEICT) (2020), pp. 227–231, doi: https://doi.org/
10.1109/RTEICT49044.2020.9315558
16. K.-C. Chang, B.-Z. Lu, Y. Wang, et al., A 17.7-42.9-GHz low power low noise amplifier with
83% fractional bandwidth for radio astronomical receivers in 65-nm CMOS. IEEE Asia-Pacific
microwave conference (APMC) (2020), pp. 507–509, doi: https://doi.org/10.1109/
APMC47863.2020.9331381
17. V. Naranje, P.V. Reddy, B.K. Sharma, Optimization of factory layout design using simulation
tool. IEEE 6th international conference on industrial engineering and applications (ICIEA)
(2019), pp. 193–197, doi: https://doi.org/10.1109/IEA.2019.8715162
18. D.A. Bulakh, A.V. Korshunov, S.A. Ilin, Identification of integrated circuits based on layout
layers routing information. IEEE conference of Russian young researchers in electrical and
electronic engineering (ElConRus) (2021), pp. 1965–1968, doi: https://doi.org/10.1109/
ElConRus51938.2021.9396208
19. A. Alshaabani, B. Wang, Parasitic capacitance cancellation technique by using mutual induc-
tance and magnetic coupling. IECON 2019 – 45th annual conference of the IEEE industrial
electronics society (2019), pp. 1928–1931, doi: https://doi.org/10.1109/IECON.2019.8927137
20. V.S. Melikyan, A.K. Mkhitaryan, H.T. Kostanyan, et al., Power supply noise rejection
improvement method in modern VLSI design. IEEE East-West design & test symposium
(EWDTS) (2019), pp. 1–4, doi: https://doi.org/10.1109/EWDTS.2019.8884372
21. D. Wu, C. Qian, X. Zhang, et al., Design of a capacitance measurement circuit with input
parasitic capacitance elimination. IEEE 5th international conference on integrated circuits and
microsystems (ICICM) (2020), pp. 53–57, doi: https://doi.org/10.1109/ICICM50929.2020.
9292245
22. A. Khunteta, V. Niranjan, A novel noise reduction technique in CMOS amplifier. 3rd IEEE
international conference on recent trends in electronics, information & communication tech-
nology (RTEICT) (2018), pp. 779–783, doi: https://doi.org/10.1109/RTEICT42901.2018.
9012381
23. P.-C. Yeh, C.-N. Kuo, A 94 GHz 10.8 mW low-noise amplifier with inductive gain boosting in
40 nm digital CMOS technology. IEEE Asia-Pacific microwave conference (APMC) (2019),
pp. 1357–1359, doi: https://doi.org/10.1109/APMC46564.2019.9038505
24. L. Fang, P. Gui, A 14nV/√Hz 14μW chopper instrumentation amplifier with dynamic offset
zeroing (DOZ) technique for ripple reduction. IEEE custom integrated circuits conference
(CICC) (2019), pp. 1–4, doi: https://doi.org/10.1109/CICC.2019.8780239
25. K. Mikitchuk, A. Chizh, S. Malyshev, Noise and gain of an erbium-doped fiber amplifier for
delay-line optoelectronic oscillator. International conference on noise and fluctuations (ICNF)
(2017), pp. 1–4, doi: https://doi.org/10.1109/ICNF.2017.7985957
104 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
26. O. Bondarev, D. Mirvoda, A. Kosogor, et al., A line of 4–40 GHz GaAs low noise medium
power amplifiers for SDH relay stations. 11th German microwave conference (GeMiC) (2018),
pp. 187–190, doi: https://doi.org/10.23919/GEMIC.2018.8335061
27. Z. Wang, Z. Yuan, Y. Zhao, A gate-driver architecture with high common-mode noise
immunity under extremely high dv/dt. IEEE applied power electronics conference and exposi-
tion (APEC) (2021), pp. 2532–2536, doi: https://doi.org/10.1109/APEC42165.2021.9487312
28. R. Raj, M.S. Bhat, S. Rekha, Library characterization: Noise and delay modeling. IEEE
distributed computing, VLSI, electrical circuits and robotics (DISCOVER) (2018),
pp. 44–48, doi: https://doi.org/10.1109/DISCOVER.2018.8674081
29. C. Prasad, S. Ramey, L. Jiang, Self-heating in advanced CMOS technologies. IEEE interna-
tional reliability physics symposium (IRPS) (2017), pp. 6A-4.1–6A-4.7, doi: https://doi.org/10.
1109/IRPS.2017.7936336
30. A.K. Mkhitaryan, G.A. Petrosyan, H.T. Grigoryan, et al., The reliability compensation method
of voltage controlled oscillators. Proceedings of NPUA: Information тechnologies, electronics,
radio engineering (2020), pp. 65–70
31. D. Son, G.-J. Kim, J. Kim, et al., Effect of high temperature on recovery of hot carrier
degradation of scaled nMOSFETs in DRAM. IEEE international reliability physics symposium
(IRPS) (2021), pp. 1–4, doi: https://doi.org/10.1109/IRPS46558.2021.9405153
32. R. Kishida, T. Asuke, J. Furuta, et al., Extracting BTI-induced degradation without temporal
factors by using BTI-sensitive and BTI-insensitive ring oscillators. IEEE 32nd international
conference on microelectronic test structures (ICMTS) (2019), pp. 24–27, doi: https://doi.org/
10.1109/ICMTS.2019.8730967
33. S. Mishra, P. Weckx, J.Y. Lin, et al., Fast & accurate methodology for aging incorporation in
circuits using adaptive waveform splitting (AWS). IEEE international reliability physics sym-
posium (IRPS) (2020), pp. 1–5, doi: https://doi.org/10.1109/IRPS45951.2020.9129351
34. J. Diaz-Fortuny, J. Martin-Martinez, R. Rodriguez, et al., A noise and RTN-removal smart
method for parameters extraction of CMOS aging compact models. Joint International
EUROSOI workshop and international conference on ultimate integration on silicon
(EUROSOI-ULIS) (2018), pp. 1–4, doi: https://doi.org/10.1109/ULIS.2018.8354740
35. A. Sivadasan, R. J. Shah, V. Huard, et al., NBTI aged cell rejuvenation with back biasing and
resulting critical path reordering for digital circuits in 28nm FDSOI. Design, automation & test
in Europe conference & exhibition (DATE) (2018), pp. 997–998, doi: https://doi.org/10.23919/
DATE.2018.8342154
36. Y. Liu, X. Chen, Z. Zhao, et al., SiC MOSFET threshold-voltage instability under high
temperature aging. 19th international conference on electronic packaging technology (ICEPT)
(2018), pp. 347–350, doi: https://doi.org/10.1109/ICEPT.2018.8480781
37. X. Li, J. Qing, Y. Sun, et al., Linear and resolution adjusted on-chip aging detection of NBTI
degradation. IEEE Trans. Device Mater. Reliab. 18(3), 383–390 (2018). https://doi.org/10.
1109/TDMR.2018.2847322
38. R.W. Johnson, J.L. Evans, P. Jacobsen, et al., The changing automotive environment: High-
temperature electronics. IEEE Trans. Electron. Packag. Manuf. 27(3), 164–176 (2004). https://
doi.org/10.1109/TEPM.2004.843109
39. S.B. Yalçın,O. Demirci, M.E. Soltekin, Designing and implementing secure automotive net-
work for autonomous cars. 29th signal processing and communications applications conference
(SIU) (2021), pp. 1–4, doi: https://doi.org/10.1109/SIU53274.2021.9477958
40. I. Kastelan, M. Popovic, M. Vranješ, et al., Work in progress: Modernizing laboratories for
innovative technologies in automotive. IEEE global engineering education conference
(EDUCON) (2018), pp. 1700–1702, doi: https://doi.org/10.1109/EDUCON.2018.8363439
41. J. Zhou, X. Long, J. He, et al., Uncertainty quantification for junction temperature of automotive
LED with die-attach layer microstructure. IEEE Trans. Device Mater. Reliab. 18(1), 86–96
(2018). https://doi.org/10.1109/TDMR.2018.2796072
References 105
42. W.M.A. Halim, J.R. Rusli, S. Shafie, et al., Study on performance of capacitor-less LDO with
different types of resistor. IEEE international circuits and systems symposium (ICSyS) (2019),
pp. 1–5, doi: https://doi.org/10.1109/ICSyS47076.2019.8982395
43. P. Suriyavejwongs, E. Leelarasmee, W. Pora, A low voltage CMOS current comparator with
offset compensation. IEEE asia pacific conference on circuits and systems (APCCAS) (2019),
pp. 161–164, doi: https://doi.org/10.1109/APCCAS47518.2019.8953117
44. S. Pourashraf, J. Ramirez-Angulo, A.R. Cabrera-Galicia, et al., An amplified offset compensa-
tion scheme and its application in a track and hold circuit. IEEE Trans. Circuits Syst. II Express
Briefs 65(4), 416–420 (2018). https://doi.org/10.1109/TCSII.2017.2695162
45. L. Kouhalvandi, S. Aygün, G.G. Özdemir, et al., 10-bit high-speed CMOS comparator with
offset cancellation technique. 5th IEEE workshop on advances in information, electronic and
electrical engineering (AIEEE) (2017), pp. 1–4, doi: https://doi.org/10.1109/AIEEE.2017.
8270524
46. Y.S. Vani, N.U. Rani, R. Vaddi, A low poltage capacitor based current controlled sense
aamplifier for input offset compensation. International SoC design conference (ISOCC)
(2017), pp. 23–24, doi: https://doi.org/10.1109/ISOCC.2017.8368810
47. A. Bamigbade, V. Khadkikar, DC-offset rejection approaches for single-phase frequency-
locked loop. IEEE international conference on power electronics, drives and energy systems
(PEDES) (2020), pp. 1–5, doi: https://doi.org/10.1109/PEDES49360.2020.9379567
48. V.S. Raja, S. Kumaravel, Design of recycling folded cascode amplifier using potential distri-
bution method. International conference on microelectronic devices, circuits and systems
(ICMDCS) (2017), pp. 1–5, doi: https://doi.org/10.1109/ICMDCS.2017.8211570
49. Y. Feng, Q. Fan, H. Deng, et al., An Automatic comparator offset calibration for high-speed
flash ADCs in FDSOI CMOS technology. IEEE 11th Latin American symposium on circuits &
systems (LASCAS) (2020), pp. 1–4, doi: https://doi.org/10.1109/LASCAS45839.2020.
9069018
50. M. Wu, M. Lai, F. Lv, et al., An adaptive equalizer for 56 Gb/s PAM4 SerDes. 6th international
conference on integrated circuits and microsystems (ICICM) (2021), pp. 398–402, doi: https://
doi.org/10.1109/ICICM54364.2021.9660321
51. Chapter 2. Dynamic Offset Cancellation Techniques for Operational Amplifiers., https://ocw.
tudelft.nl/wp-content/uploads/Reader_ET8017_Electronic_Instrumentation__DOC_
techniques.pdf
52. D. Park, J. Kim, A 7-GHz fast-lock 2-step TDC-based all-digital DLL for post-DDR4
SDRAMs. IEEE international symposium on circuits and systems (ISCAS)(2018),
pp. 1–4, doi: https://doi.org/10.1109/ISCAS.2018.8351396
53. T. Kim, J. Kim, A 0.8-3.5 GHz shared TDC-based fast-lock all-digital DLL with a built-in
DCC. IEEE international symposium on circuits and systems (ISCAS) (2021), pp. 1–4, doi:
https://doi.org/10.1109/ISCAS51556.2021.9401335
54. Y. Wei, S. Huang A folded locking scheme for the long-range delay block in a wide-range DLL.
2018 international SoC design conference (ISOCC) (2018), pp. 90–91, doi: https://doi.org/10.
1109/ISOCC.2018.8649933
55. Z. Liu, L. Lou, Z. Fang, et al., A DLL-based configurable multi-phase clock generator for true-
time-delay wideband FMCW phased-array in 40nm CMOS. IEEE international symposium on
circuits and systems (ISCAS) (2018), pp. 1–4, doi: https://doi.org/10.1109/ISCAS.2018.
8351374
56. X. Zhong, A. Bermak, C. Tsui, A low-offset dynamic comparator with area-efficient and
low-power offset cancellation. IFIP/IEEE international conference on very large scale integra-
tion (VLSI-SoC) (2017), pp. 1–6, doi: https://doi.org/10.1109/VLSI-SoC.2017.8203481
57. J.-K. Han, J.-W. Kim, S.-H. Choi, et al., Asymmetrical half-bridge converter with zero
DC-offset current in transformer using new rectifier structure. International power electronics
conference (IPEC-Niigata 2018 -ECCE Asia) (2018), pp. 4049–4053, doi: https://doi.org/10.
23919/IPEC.2018.8507457
106 2 Design Methods of Integrated Circuits, Working Under Non-standard. . .
58. L. Long, Y. Li, X. Liu A zero offset reduction method for RTD-based thermal flow sensors.
2021 IEEE international instrumentation and measurement technology conference (I2MTC)
(2021), pp. 1–4, doi: https://doi.org/10.1109/I2MTC50364.2021.9459961
59. V. Raghuveer, K. Balasubramanian, S. Sudhakar, A 2μV low offset, 130 dB high gain
continuous auto zero operational amplifier. International conference on communication and
signal processing (ICCSP) (2017), pp. 1715–1718, doi: https://doi.org/10.1109/ICCSP.2017.
8286685
60. M. Moezzi, S.F. Mousavi, P. Ashtari, An area-efficient DC offset cancellation architecture for
zero-IF DVB-H receivers. IEEE Microw. Wirel. Compon. Lett. 28(9), 813–815 (2018). https://
doi.org/10.1109/LMWC.2018.2854259
61. D. Zeng, H. Zhu, W. Feng, et al., A 24-30GHz asymmetric SPDT switch for 5G millimeter-
wave front-end. IEEE Asia-Pacific microwave conference (APMC) (2020), pp. 773–775, doi:
https://doi.org/10.1109/APMC47863.2020.9331540
62. M.J. Rosario, F. Le-Strat, P.-F. Alleaume, et al., Low cost LTCC filters for a 30GHz satellite
system. 33rd European microwave conference proceedings (IEEE Cat. No.03EX723C) (2003),
pp. 817–820, doi: https://doi.org/10.1109/EUMC.2003.177601
63. H. Ahn, A. Dong, A. Wong, et al., 56Gbps PAM4 SerDes link parameter optimization for
improved post-FEC BER. IEEE 28th conference on electrical performance of electronic
packaging and systems (EPEPS) (2019), pp. 1–3, doi: https://doi.org/10.1109/EPEPS47316.
2019.193195
64. https://nptel.ac.in/content/storage2/courses/117101058/downloads/Lec-8.pdf
65. A. Lapidoth, G. Marti, Encoder-assisted communications over additive noise channels. IEEE
Trans. Inf. Theory 66(11), 6607–6616 (2020). https://doi.org/10.1109/TIT.2020.3012629
66. V.S. Melikyan, A.S. Sahakyan, K.H. Safaryan, et al., High accuracy equalization method for
receiver active equalizer. East-West design & test symposium (EWDTS 2013) (2013),
pp. 1–4, doi: https://doi.org/10.1109/EWDTS.2013.6673119
67. K. Zheng, Y. Frans, K. Chang, et al., A 56 Gb/s 6 mW 300 um2 inverter-based CTLE for short-
reach PAM2 applications in 16 nm CMOS. IEEE custom integrated circuits conference (CICC)
(2018), pp. 1–4, doi: https://doi.org/10.1109/CICC.2018.8357076
68. Y.-H Chen, D.-B. Lin, Combined optimization of FFE and CTLE equalizations by analysis of
pulse response. IEEE international symposium on radio-frequency integration technology
(RFIT) (2021), pp. 1–3, doi: https://doi.org/10.1109/RFIT52905.2021.9565234
69. M.A. Dolatsara, H. Yu, J.A. Hejase, et al., Invertible neural networks for inverse design of
CTLE in high-speed channels. IEEE electrical design of advanced packaging and systems
(EDAPS) (2020), pp. 1–3, doi: https://doi.org/10.1109/EDAPS50281.2020.9312919
70. A. Balachandran, Y. Chen, C.C. Boon, A 32-Gb/s 3.53-mW/Gb/s adaptive receiver AFE
employing a hybrid CTLE, edge-DFE and merged data-DFE/CDR in 65-nm CMOS. IEEE
Asia Pacific conference on circuits and systems (APCCAS) (2019), pp. 221–224, doi: https://
doi.org/10.1109/APCCAS47518.2019.8953146
71. M. Wen, L. Ding, X. Wang, et al., A 50 Gb/s serial link receiver with inductive peaking CTLE
and 1-tap loop-unrolled DFE in 22nm FDSOI CMOS. IEEE MTT-S international wireless
symposium (IWS) (2020), pp. 1–3, doi: https://doi.org/10.1109/IWS49314.2020.9360200
72. D. Lee, Y.-H. Kim, et al., A 0.9-V 12-Gb/s two-FIR tap direct DFE with feedback-signal
common-mode control. IEEE Trans. Very Large Scale Integr. Syst. 27(3), 724–728 (2019).
https://doi.org/10.1109/TVLSI.2018.2882606
73. Y. Itoh, W. Xiaole, S. Omokawa, An L-band SiGe HBT active differential equalizer with
tunable positive/negative gain slopes using transistor-loaded RC-circuits. Asia-Pacific micro-
wave conference (APMC) (2018), pp. 708–710, doi: https://doi.org/10.23919/APMC.2018.
8617315
74. K. Chen, W.W. Kuo, A. Emami, A 60-Gb/s PAM4 wireline receiver with 2-tap direct decision
feedback equalization employing track-and-regenerate slicers in 28-nm CMOS. IEEE custom
integrated circuits conference (CICC) (2020), pp. 1–4, doi: https://doi.org/10.1109/CICC48029.
2020.9075948
References 107
75. C.F. Huang, Y.L. Chao, A hybrid de-embedding technique of eye diagram measurement for
high-speed digital interconnections. IEEE transactions on components, packaging and
manufacturing technology (2014), pp. 892–895
76. S. Guo, L. Ding, J. Jin, A 16/32Gb/s NRZ/PAM4 receiver with dual-loop CDR and threshold
voltage calibration. IEEE 13th international conference on ASIC (ASICON) (2019),
pp. 1–4, doi: https://doi.org/10.1109/ASICON47005.2019.8983675
77. H. Won, K. Han, S. Lee, et al., An on-chip stochastic sigma-tracking eye-opening monitor for
BER-optimal adaptive equalization. IEEE custom integrated circuits conference (CICC) (2015),
pp. 1–4, doi: https://doi.org/10.1109/CICC.2015.7338374
78. Hakob T. Kostanyan, Harutyun T. Kostanyan, G.A. Petrosyan, et al., 5V wide supply voltage
bandgap reference for automotive applications. IEEE 39th international conference on elec-
tronics and nanotechnology (ELNANO) (2019), pp. 229–232, doi: https://doi.org/10.1109/
ELNANO.2019.8783600
79. A.K. Mkhitaryan, H.T. Kostanyan, H.T. Grigoryan, et al., Stability improvement method for
ultra-low-power bandgap reference. IEEE 40th international conference on electronics and
nanotechnology (ELNANO) (2020), pp. 331–334, doi: https://doi.org/10.1109/
ELNANO50318.2020.9088904
80. H. Seol, S. Hong, O. Kwon, An area-efficient high-resolution resistor-string DAC with reverse
ordering scheme for active matrix flat-panel display data driver ICs. J. Disp. Technol. 12(8),
828–834 (2016). https://doi.org/10.1109/JDT.2016.2526042
81. H. Zhu, Q. Sun, X. Li, Simulation of 112Gbps full-link interconnect system. Cross strait quad-
regional radio science and wireless technology conference (CSQRWC) (2019), pp. 1–3, doi:
https://doi.org/10.1109/CSQRWC.2019.8799127
82. T.-H. Tsai, R.-B. Sheen, C.-H. Chang, et al., A hybrid-PLL (ADPLL/charge-pump PLL) using
phase realignment with 0.6-us settling, 0.619-ps integrated jitter, and -240.5-dB FoM in 7-nm
FinFET. IEEE Solid-State Circuits Lett. 3, 174–177 (2020). https://doi.org/10.1109/LSSC.
2020.3010278
83. Z. Ge, J. Fu, P. Wang, Low power clock tree optimization by clock buffer/inverter reduction.
IEEE international conference on integrated circuits, technologies and applications (ICTA)
(2019), pp. 69–70, doi: https://doi.org/10.1109/ICTA48799.2019.9012842
84. C. Lin, S. Huang, W. Cheng, An effective approach for building low-power general activity-
driven clock trees. International SoC design conference (ISOCC) (2018), pp. 13–14, doi: https://
doi.org/10.1109/ISOCC.2018.8649800
85. M.F. Allam, A.A. Bdelrahman, H. Omran, et al., Novel decimation topology with improved
jitter performance for clock and data recovery systems. 19th IEEE international new circuits and
systems conference (NEWCAS) (2021), pp. 1–4, doi: https://doi.org/10.1109/NEWCAS50681.
2021.9462785
86. V.G. Srivatsa, A.P. Chavan, D. Mourya, Design of low power & high performance multi source
h-tree clock distribution network. IEEE VLSI device circuit and system (VLSI DCS) (2020),
pp. 468–473, doi: https://doi.org/10.1109/VLSIDCS47293.2020.9179954
87. H. Wenjia, Y. Horii, Enhanced group delay of microstrip-line-based dispersive delay lines with
lc resonant circuits for real-time analog signal processing. IEEE Asia Pacific microwave
conference (APMC) (2017), pp. 272–275, doi: https://doi.org/10.1109/APMC.2017.8251431
88. Z. Zhang, W. Chu, S. Huang, The Ping-Pong tunable delay line in a super-resilient delay-locked
loop. 56th ACM/IEEE design automation conference (DAC) (2019), pp. 1–2
89. H.T. Kostanyan, H.V. Margaryan, V.A. Janpoladov, et al., The minimizaton method of
transistor ageing influence on modern voltage references. IEEE 40th international conference
on electronics and nanotechnology (ELNANO) (2020), pp. 335–338, doi: https://doi.org/10.
1109/ELNANO50318.2020.9088844
90. H.T. Kostanyan, The minimization method of thermal drift influence on analog integrated
circuits. Proc. RA NAS NPUA. Series Tech. Sci. 75(1), 120–128 (2022)
91. H.T. Kostanyan, Skew improvement method for digital delay lines operating in nonstandard
environments. Proc. Univ. Electron. 27(2), 233–239 (2022). https://doi.org/10.24151/1561-
5405-2022-27-2-233-239
Chapter 3
Signal Transmission Calibration Systems
in Integrated Circuits
I/O [3] cells are one of the most important components of contemporary ICs [1]
(Fig. 3.1) [2]. They ensure lossless reading and transmission of data and also process
the transmitted signal.
It should be noted that I/O cells can generally work in bidirectional mode, which
means that the same I/O cell can both read the data and transmit it (Fig. 3.2).
Currently, there are many types of I/O cells [4], which have different applications,
as a result of which the power supply voltage of I/O cells and the data transfer rates
can be different. For example, double data rate (DDR) [5] nodes provide data
transfer between the computer core and RAM [6]. There is also low-power DDR
(LPDDR) [7], which is used in devices that require high power efficiency, such as
mobile phones, modern laptops, and tablet computers. Another type of I/O cell is the
Universal Serial Bus (USB) [8], which is used to transfer data and provide power
between computers. Currently, high-bandwidth memories (HBM) [9] are also used,
which provide data transfer between computer core and three-dimensional synchro-
nous dynamic random access memory [10] (3SDRAM). Multimedia special I/O cells
are also used [11, 12], for example, high-definition multimedia interface, which
allows to transmit high-quality video data and audio data (Table 3.1) [13].
From Table 3.1 it can be noticed that the new generation I/O cells are faster and
work with lower supply voltages, which significantly complicates the processes of
data reading and transfer. It should be noted that as a result of reducing the supply
voltage, ICs become sensitive to external and internal noise [14], and due to the
increase in speed, the transmission line begins to significantly suppress the trans-
mitted data, which makes data processing even more difficult.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 109
V. Melikyan, Machine Learning-based Design and Optimization of High-Speed
Circuits, https://doi.org/10.1007/978-3-031-50714-4_3
110 3 Signal Transmission Calibration Systems in Integrated Circuits
IC1 IC2
Receiver Receiver
Fig. 3.3 Transmission line structure: Vd the transmitted signal, Rout the output of the transmitter, Z0
the wave of the transmission line, and Rin the input resistance of the receiver
Currently, there are also high-precision IC types [15], which provide extremely
high reliability. They are mainly used in different fields. Since the loss of data in the
above areas can lead to great damage, I/O cells in high-precision ICs are equipped
with systems to increase the reliability of data transmission and reading [16].
In ICs, I/O cells are connected via transmission lines [17], which contain capac-
itive and inductive components [18]. They significantly suppress the signal ampli-
tude and cause distortions (Fig. 3.3) [19].
It should be noted that the wave resistance value of the transmission line is
usually equal to 50 Ω [20].
Due to the suppressive properties of transmission lines and wave reflections, it is
difficult to further increase the speed of ICs [21]. These reflections occur when the
resistances of the transmitter and the receiver are not matched, which significantly
reduces the reliability of the transmitted data, leading to even data loss (Fig. 3.4).
Depending on the degree of matching of transmitter and receiver resistances, the
wave reflection coefficients will be different. For the general case, this coefficient is
determined by the following formula [22]:
112 3 Signal Transmission Calibration Systems in Integrated Circuits
102
Amplification (dB)
101
0
Tran
sfer
10–1 functio
n
10–2
R - Z0
ρ= : ð3:1Þ
R þ Z0
Signal reflections can occur both at the output of the transmitter and at the input of
the receiver and are determined as follows [22]:
Rout - Z 0
ρt = , ð3:2Þ
Rout þ Z 0
ð3:3Þ
In modern I/O cells, data transfer rates can reach Gbps (Fig. 3.5) [23], which
significantly complicates data reading. In that case, prevention of transmission line
distortions becomes an even more important issue. For example, in fifth-generation
double data rate nodes [24] (DDR5), the data transfer rate is 6.4 Gbps.
Reading a signal of such view (Fig. 3.6) requires special methods, the use of
which will neutralize these distortions and allow lossless data reading [25–27].
3.1 General Issues of Signal Transmission Calibration Systems. . . 113
Developing accurate methods of data transfer and reading that will be applicable
to a large number of I/O cells and will also neutralize the distortion caused by the
transmission line is an important issue for companies that fabricate
contemporary ICs.
In summary, the factors that hinder the process of increasing the speed of I/O cells
and accurate data transmission, which causes a great demand for the development of
means of regulating the transmission of signals, are:
. The large set of I/O cells and various operating requirements towards them
. Negative effect of the transmission line on signal transmission
. Increasing the performance of I/O cells
. High reliability of signal processing required for high-precision ICs.
In modern ICs, signal transmission calibration systems are mainly composed of I/O
cells, the sub-nodes of which aim to increase the quality of data transmission and
reading. The main components of the mentioned I/O cells are the transmitter and the
receiver (Fig. 3.7) [28].
The data transmitted from the core is preprocessed in digital logic (DL) node. The
latter implements the supply of control signals, determines the operating mode of I/O
114 3 Signal Transmission Calibration Systems in Integrated Circuits
cell, as well as forms the signals characterizing the state of system’s supply voltage
and input signal.
Since the transmitted data and the core have different supply voltage values, a
level shifter is used (Fig. 3.8) [29]. It should be noted that there is also a high-to-low
level shifter, used in the receiver.
3.1 General Issues of Signal Transmission Calibration Systems. . . 115
The data, passing through the input node, is transferred to the output buffer
(OB) [30], which is connected to the transmission line. The OB contains high-
power buffers connected to each other in parallel, through which data is transmitted
as lossless as possible. Depending on the operating mode of I/O cell, it may be
required to provide different values of the output resistance of the transmitter. In that
case, the DL node forms calibrating signals, as a result of which some of the OB
nodes are connected or disconnected and change the value of the output resistance.
Since the OB has a large input capacitance, the transmitted data is pre-amplified by a
pre-buffer (PB) [31]. It consists of buffers that are connected to the current support
system (CSS) [32]. The latter supplies additional current to buffers, as a result of
which it is possible to control the speeds of rise/fall times of the transmitted data
(Fig. 3.9), which are also controlled by DL node.
CSS can provide different values of speed of rise/fall times of the output signal. It
depends on the type of I/O cell as well as the length of the transmission line. All these
nodes aim to increase the reliability of the transmitted data as much as possible to
reduce the impact of the transmission line. The latter has two types of models: for
lossless and lossy transmission lines. The wave impedance for a lossless line is
determined by the following formula [33]:
L
Z0 = , ð3:4Þ
C
and for the line with loss it is determined by the following formula [33]:
R þ jωL
Z0 = , ð3:5Þ
G þ jωC
116 3 Signal Transmission Calibration Systems in Integrated Circuits
V
Vdesirable
Vt
Vr
2.5
2.2
1.9
1.2
0
10ns 20ns 30ns 40ns 50ns t
Resistance (Ohm)
2500
2400
2300
2200
R(Ohm)
Typical case
Slow case
50 Fast case
°C
–40 125
the type of I/O cells, it may consist of cascades of several receivers. It aims to
amplify the data at its target frequency and suppress other frequencies as much as
possible, which will restore the data distorted by the transmission line.
These receivers usually work in reference voltage comparison or differential
modes. In reference voltage comparison mode, the reference voltage is applied to
the negative input of the receiver, and the transmitted data is applied to the other
input. That reference voltage is supplied from a digital-to-analog converter (DAC) in
the receiver [44, 45]. In this mode, the receiver presents a comparator and, in cases of
values higher than the reference voltage, generates logic “1” at the output, and in
cases of values lower than the reference voltage, logical “0” signals (Fig. 3.13). It
should be noted that ideally, the value of the reference voltage should be equal to half
118 3 Signal Transmission Calibration Systems in Integrated Circuits
the amplitude of the transmitted data, so that the signal is transmitted to the core as
evenly as possible.
In the differential mode, signals deviated 180 degrees are applied to the receiver
inputs (Fig. 3.14) [46]. In this mode, the receiver is more noise-immune, because
noises contained in the input data are not transmitted to the output.
3.1 General Issues of Signal Transmission Calibration Systems. . . 119
With the increase of data transfer rates, deviations of timing parameters of data
become an extremely important problem [47]. In modern I/O cells, the fastest data
change time reaches ps [48]. In that case, even small deviations of timing parameters
can lead to data error. The most common way of measuring timing parameters of a
signal is the eye diagram (Fig. 3.15) [49], through which the size of the signal jitter,
voltage levels, as well as the main timing parameters are observed. The rectangle
depicted in the diagram is called the eye opening, the horizontal side of which
corresponds to the time margin of the data, and the vertical side to the voltage
corresponds to the amplitude of the data.
Another important timing parameter is the signal duty cycle which is determined
by the following formula [50]:
T1
D= * 100%, ð3:7Þ
Tp
where D is the duty cycle of signal, T1 the duration of the logical “1” range of the
signal, and Tp the period of the signal.
For an ideal case, the signal duty cycle should be equal to 50%. It means that the
durations of logic “1” and logic “0” of the signal are equal, and the signal has a
proportionally periodic view. Mainly, duty cycle deviations are caused by the
asymmetry of rise/falls of the signal [51], which in turn is caused by transmission
line effects [52] and PVT deviations [53].
Another important timing parameter of the data is jitter (Fig. 3.16) [54–57]. It can
be represented as the sum of signal deviations, reflections, data-dependent interfer-
ence [58], noises affecting the data, and delays. It is generally calculated for a given
unit period. The common jitter consists of two main components: deterministic [59]
and arbitrary jitters (Fig. 3.17) [60].
The occurrence of deterministic jitter is not accidental; it has clear reasons and is
usually periodic. It means that the jitter can be repeated at one or more frequencies. It
consists of two subgroups: data-dependent and periodic jitters. Symbolic
v
«1000»
v
Inter-symbol Interference t
«1011»
1
yð t Þ = n= -1
χ½n] * hðt - nTsÞ, ð3:8Þ
where y(t) is the signal formed at the receiver input, h(t) is the impulse response of
the transmission line, Ts is signal period, and χ[n] is the data being sent.
It can be observed that the response of the transmission line is different for the
“0001” and “1011” data, so at the data “1011” a distorted signal is formed at the
input of the receiver. Such distortions can lead to data loss, resulting in system
crashes. Therefore, it is extremely important to ensure lossless data transfer.
Thus, one of the most relevant problems in current I/O cells is the development of
means of signal transmission calibration, which will enable to significantly improve
the reliability of data transmission, as well as to reduce the negative effects of the
transmission line.
The demand for the design of signal transmission calibration systems [63] in ICs is
conditioned by difficulties of data transmission and reading caused by the increase in
speeds in I/O cells [64].
122 3 Signal Transmission Calibration Systems in Integrated Circuits
In addition, modern ICs are more sensitive to PVT deviations [65]. Under these
conditions, accurate data processing is extremely difficult and can often cause
system failures, which is a critical issue in high-precision ICs that operate in highly
sensitive environments and require high data processing accuracy. The development
of signal transmission calibration means will allow not only to significantly increase
the reliability of data transmitted in the high frequency range, but also to significantly
reduce the design time.
The increase in the reliability of transmitted data is conditioned by the use of
signal transmission calibration means [66], due to which it will be possible to
neutralize the negative effect of the transmission line, and they will be more stable
to wave reflections, and it will also be possible to neutralize signal distortions caused
by PVT deviations [66–70].
The reduction of design time is conditioned by the increase in data reliability as a
result of the use of signal transmission calibration means [71–74]. Currently, the
design of I/O cells is a rather long-lasting process, because the design is carried out
with consideration of all possible deviations, as well as testing I/O cells under the
conditions of possible PVT deviations [75, 76]. It should be noted that in this case it
is often not possible to ensure the required performance of I/O cells in all cases.
As it was mentioned, one of the most important timing parameters of reading signal
is its duty cycle [77]. In an ideal case, it should be equal to 50%, but due to wave
reflections, noises, and PVT deviation, it can deviate, causing data reading error
[78]. To solve this problem, wide-range duty cycle calibration system (Fig. 3.19)
[79] is used in receivers.
The latter calibrates the deviations of duty cycle by controlling the current of
NMOS and PMOS transistors connected to the input buffer. These transistors are
controlled by the control signal Vcontrolling formed at the output of the operational
amplifier. Depending on the value of this voltage, the current of NMOS and PMOS
transistors connected to the input buffer increases or decreases, as a result of which it
is possible to control the duty cycle of the input signal. The indicated control voltage
Vcontrolling is formed from the difference in the values of the voltages applied to the
inputs of the differential amplifier. A voltage equal to half of the supply voltage is
applied to the positive input of the differential amplifier, which corresponds to a
signal with a duty cycle of 50%, and the output of the RC filter is connected to the
other input [80]. The RC filter integrates the input data, as a result of which the
resulting voltage value corresponds to the duty cycle of the signal.
For example, in the result of integrating a signal with an amplitude of 1 V and a
duty cycle of 50%, 0.5 V is obtained, and a signal with a duty cycle of 60%
corresponds to 0.6 V. Thus, by applying these two voltages to the inputs of the
operational amplifier, it is possible to detect the direction of the deviation of duty
cycle and correct it as a result of current support. However, this method has low
accuracy, and at high frequencies, it is not possible to ensure accurate calibration of
duty cycle.
3.1 General Issues of Signal Transmission Calibration Systems. . . 123
In
*Vdd
Vcontrol – RC filter
+ 0.50
Modern I/O cells must provide full performance under various PVT conditions.
However, PVT deviations cause current, output resistance, and voltage deviations
[85, 86], which negatively affects the transmitted data and can even cause system
124 3 Signal Transmission Calibration Systems in Integrated Circuits
error. For this reason, the current support system (CSS) is used (Fig. 3.22) [87],
which provides additional current support and corrects the deviated rise/falls of the
transmitted data.
CSS contains imitation of output buffer that is connected in parallel to the main
output buffer. It can amplify rise/falls by applying “PMOS amplification” and
“NMOS amplification” signals (Fig. 3.23).
Thus, it is possible to recalculate the deviations of the transmitted data, but the
disadvantage of this system is that it does not perform rise/fall equalization during
data transmission, as a result of which IC reliability is significantly reduced.
3.1 General Issues of Signal Transmission Calibration Systems. . . 125
Termination
PMOS control Amplification BB system
Data
NMOS enable
Fig. 3.23 Fixing the offset of the data passed through CSS
The increase in data speeds significantly complicates the process of their transmis-
sion. In particular, the transmission line’s property to suppress the signal can cause
126 3 Signal Transmission Calibration Systems in Integrated Circuits
data loss, which limits further increases in data transfer rates. For this reason, signal
equalization methods are used [88–90], which partially neutralize the negative effect
of the transmission line. In general, the essence of the equalization method is similar
to the operation of the finite impulse response filter; the input signal, passing through
the delay element, is superimposed on the original input signal. As a result, a new
type of signal is formed, which is amplified in high frequency ranges (Fig. 3.24) [91].
In modern I/O cells, the equalization method of signal amplitude modulation
(SAM) is widely used, which is implemented by means of an “XOR” logic cell.
Input and clock generated width modulation signals are applied to its inputs, as a
result of which an already equalized signal is formed (Fig. 3.25) [91]. Moreover,
depending on the value of duty cycle of the clock SAM, it is possible to reduce the
inter-symbol interference of the signal.
Depending on the parameters of the transmission line, it is also possible to choose
such a value of duty cycle of clock SAM, in the case of which the inter-symbol
interference of the transmitted signal will be minimal. The value of this duty cycle
can be calculated using the function of signal period (TS) and channel time constant
(TC) of the transmission line [91].
Tc 1 1 Ts Tc
Duty cycle = = ln þ eT c : ð3:9Þ
Ts 2 2 Ts
Thus, the main obstacles of increasing the speed of data transfer in I/O cells and
its lossless reading are PVT deviations, transmission lines, and inaccuracies caused
by transmitters. In order to eliminate them, it is necessary to design signal transmis-
sion calibration systems. The research of the existing approaches and means of
developing their design means shows that they do not meet the modern requirements
for practical design from the point of view of efficiency.
It follows from the above written that currently used signal transmission calibration
systems do not take power consumption problems into account, and they do not
provide the required reliability of data transmission and reading, so they are not
applicable in high-precision ICs and limit the further increase in the speed of I/O
cells.
In particular:
. The wide-range duty cycle calibration system used in the receiver has low
accuracy. Its use in case of high signal frequencies is not possible to ensure the
exact calibration of duty cycle which can cause an error in the data that are read.
. The high-frequency level shifters used in the transmitter can distort the transmit-
ted data, which complicates the accurate transmission of data. These deviations
affect the time margin of the transmitted data and in some cases even cause data
losses. Therefore, it is obvious that such a structure of level shifters limits further
increase of data transfer rate.
. The current support systems used in the transmitter do not perform rise/fall
calibration during signal transmission, as a result of which the reliability of
integrated circuits is significantly reduced.
. It is difficult to calibrate the deviations due to wave reflections using signal width
modulation equalization methods, which leads to the error of the transmitted data.
128 3 Signal Transmission Calibration Systems in Integrated Circuits
As mentioned in Sect. 3.1, deviations in the signal duty cycle at the receiver can
reduce the reliability of the read data, causing system failures. One of the main
factors of their occurrence are PVT deviations, which can appear both before the
transmission of the signal and during its transmission (Fig. 3.26). However, currently
used duty cycle calibration systems are mostly applied before the signal transmission
process. That is the reason that deviations during signal transmission are not taken
into account, and the reliability of the system decreases significantly.
Thus, it is obvious that in order to increase the reliability of I/O cells, deviations
during signal transmission should also be taken into account.
The proposed duty cycle calibration system (Fig. 3.27 [92]) detects and calibrates
the duty cycle deviations during the entire operation of the I/O cell.
In modern I/O cells, the processing of the transmitted data is carried out by
reading the data around the reference voltage. It is supplied either through the
DAC in the receiver [93] or through the external DAC. The proposed calibration
system, detecting deviations in the duty cycle of the signal, modifies the input code
of the internal DAC of the receiver and leads to a change in the reference voltage. As
a result, calibration of the duty cycle of the received signal occurs.
The main components of the designed calibration system (Fig. 3.28) are RC
integrator, ADC, reference voltage division node (RVDN), analog multiplexer [94–
96], as well as digital logic node.
3.2 Design Principles of Signal Transmission Calibration Systems. . . 131
VSS
It should be noted that such a structure allows not to interrupt the data transfer
process and carry out the calibration in parallel, neutralizing the duty cycle devia-
tions caused by PVT deviations and other reasons.
The purpose of using the RC integrator (Fig. 3.29) is to obtain the constant
voltage level of the received signal, which expresses the value of the duty cycle. It
consists of a classic RC integrator circuit and an analog repeater. The analog repeater
is designed on the basis of operational amplifier and aims to keep the value of the
integrated voltage constant.
Thus, in this way, it is possible to obtain the value of the duty cycle (Fig. 3.30)
and detect its deviations. For example, integration of a signal with an amplitude of
1 V and a duty cycle of 50% will result in a constant signal voltage level of 0.5 V,
and in the case of a duty cycle of 40%—0.4 V.
It can be noticed that the integration process does not happen instantaneously. In
order to establish a voltage corresponding to duty cycle at the output of the
132 3 Signal Transmission Calibration Systems in Integrated Circuits
integrator, ~50 ns is required, which determines the duration of one system calibra-
tion cycle. The latter is calculated for the PVT in which the confirmation time is the
longest.
In the calibration system, the ADC (Fig. 3.31) is used to generate digital codes
corresponding to the output voltages of the integrator and RVDN.
It should be noted that the voltage provided by RVDN (Fig. 3.32) is equal to half
of the amplitude of the received signal, that is, to 50% of the duty cycle. As a result
of this process, digital codes are formed, one of which is equal to the value of the
duty cycle at the given moment of the signal, and the other is equal to the value of
50% of duty cycle. Since there is a power-to-ground current path in the RVDN, a
PMOS transistor is used to reduce the leakage current. It opens when the ADC is
supplied with a voltage equivalent to 50% of the duty cycle. The supply of these
voltages is carried out by means of an analog multiplexer, which selects one of the
outputs of the RVDN or the integrator and connects it to the ADC.
A digital logic cell is used to detect deviations, which compares the output codes
of the ADC. After the comparison operation, it supplies the DAC node with a new
input code, as a result of which the reference voltage of the receiver changes and the
duty cycle of the received signal is calibrated. This process continues until the duty
cycle reaches the target 50%. It should be noted that the accuracy of the system can
be improved by increasing the bitness of the ADC, which contributes to the detection
of smaller deviations in the duty cycle.
3.2 Design Principles of Signal Transmission Calibration Systems. . . 133
!Sel
VDD/2
Fig. 3.33 Duty cycle calibration results under typical PVT conditions
Fig. 3.34 ADC input selection between reference and signal DC voltages
3.2 Design Principles of Signal Transmission Calibration Systems. . . 135
Fig. 3.35 Duty cycle calibration process under worst PVT conditions
Simulations showed that the system consumes a maximum power of 3.95 mW,
which allows it to be used in low-power ICs. It should be noted that the proposed
structure allows the calibration of the duty cycle to be carried out in the range of
40–60% deviations. That range was chosen taking into account the range of possible
deviations of the duty cycle in modern ICs.
The results show that the duty cycle calibration time is almost the same for
different PVT conditions and does not depend on the direction of deviations
(Fig. 3.35).
The proposed mixed-signal system is applicable to both periodic signal and data
sequences. For this purpose, a simulation was carried out for the PRBS5 data
sequence with a frequency of 2133 MHz (Figs. 3.36 and 3.37).
While reading of the above data, the system changed the reference voltage of the
receiver, as a result of which it became possible to restore the crossing point of the
eye diagram.
136 3 Signal Transmission Calibration Systems in Integrated Circuits
0.8
0.7
0.6
H=850m
0.5
0.4
0.3
0.2
0.1
0
t
Fig. 3.36 Without using the calibration system, the eye diagram of the PRBS5 type data, being
received
0.8
0.7
0.6
H=850m
0.5
0.4
0.3
0.2
0.1
0
t
Fig. 3.37 Using the calibration system, the eye diagram of the PRBS5 type data, being received
A comparison with currently used duty cycle calibration systems was also carried
out, which showed the effectiveness of the proposed system (Table 3.3).
Thus, as a result of using the proposed system, it is possible to significantly
improve the reliability of the data during data reading and calibrate the duty cycle
3.2 Design Principles of Signal Transmission Calibration Systems. . . 137
As it was mentioned, as a result of the increase in the data transfer rate, the input
capacitances of sub-blocks in I/O cells begin to significantly suppress the transferred
data. For this reason, in high-frequency LS (Fig. 3.38) [101], the output of a MOS
transistor is used as an input to reduce the effect of its gate-source capacitance.
As a result of this structure, distortions of the transmitted data appear; in partic-
ular, step-like segments appear during the signal rise/fall. The latter significantly
reduces the reliability of the transmitted data and negatively affects its time margin.
Since LSs are used in almost all I/O cells, it is obvious that in order to increase the
reliability of the data transmitted in the I/O cell, the signal distortion properties of
high-frequency LSs should be neutralized.
It should be noted that when the switching of the input signal occurs at one of the
“a” or “b” interconnection points, P4 and P5 MOS transistors are in the open state
(Fig. 3.44). In all other conditions, they are closed.
Thus, the proposed rise/fall correction system calibrates the distorted signals,
increasing the reliability of transmitted data.
140 3 Signal Transmission Calibration Systems in Integrated Circuits
It can be observed that without the application of the correction system, the output
signal of the high-frequency LS contains a large jitter (10.83 ps), which occurs at a
slow PVT condition of 1.14 V, 125 °C (Fig. 3.45).
In the case of using the correction system, the step-like sections at the
interconnecting points “a” and “b” disappear (Fig. 3.46).
The simulations showed that when using the system, the step-like sections
disappear in all PVT conditions (Fig. 3.47).
The simulation results of high-frequency LS in the case of using the correction
system are presented in Table 3.5.
It can be noticed that the jitter of the output signal is significantly reduced when
using the system. The system also calibrated the duty cycle values; before using the
system, the duty cycle values ranged between 48.7% and 51.1%, and after the system
was applied, the range decreased significantly, reaching 49.55–50.5% (Fig. 3.48).
142 3 Signal Transmission Calibration Systems in Integrated Circuits
Fig. 3.45 View of the high-frequency LS output signal without using correction system
Fig. 3.47 Five view of high-frequency LS output signal under five PVT conditions
Thus, the designed rise/fall correction system significantly improves the reliabil-
ity of the transmitted signal in the transmitter. Simulations of the proposed system
showed that it calibrates the duty cycle deviations by 39.5% and also reduces the
jitter by about twice. However, it increases the area of the high-frequency LS in the
transmitter by 12%.
As mentioned in Sect. 3.1, when the speed of the transmitted signal increases, the
calibration of data deviations becomes difficult. The main sources of these deviations
are PVT deviations, supply voltage fluctuations, as well as noise. All the mentioned
factors cause asymmetries of the transmitted signal rise/fall times, which signifi-
cantly reduce the reliability of I/O cells (Fig. 3.49).
In order to eliminate deviations, CSSs are used, which supply additional current
to the pre-buffer (PB) sub-nodes, as a result of which the rise/fall times are equalized
[104]. Deviations can occur during the operation of I/O cell, but currently used CSSs
do not calibrate the asymmetries of rise/fall times during data transmission.
The proposed rise/fall asymmetry calibration system allows correction of devia-
tions during the entire operation of the transmitter (Fig. 3.50). The transmitted data is
divided into two taps, one of which connects to the calibration system and the other
to the PB.
This structure allows uninterrupted data transfer and calibration during I/O cell
operation. The calibration system is clocked by the fVCO signal supplied from the
voltage-controlled oscillator (VCO) in the IC. The purpose of the system is to detect
the asymmetry of rise/fall times during data transmission, after which to provide
additional current to the output node of the PB. It should be noted that deviations can
occur both in the case of increasing and decreasing rise/falls. For this reason, the
system detects in advance the direction of deviation of rise/falls and amplifies the
NMOS or PMOS cascades of the PB output and calibration rise/falls in the given
direction. The main components of calibration system (Fig. 3.51) are the input logic
cell, integrator, comparators, and logic cell.
The input logic cell separates one period from the transmitted signal, which is
then processed to perform the rise/fall time duration calculation. The processing of
that signal is carried out by means of an integrator (Fig. 3.52), which slows down the
part of one period of the transmitted signal and allows to calculate the duration of
rise/fall times.
The slowed signal is supplied to two comparators, the reference voltage of one of
which corresponds to 90% of the supply voltage, and the other to 10%. This allows
forming signals informing about the start and end of switching of the input signal at
the output of the comparators (Fig. 3.53).
For the comparison of high-frequency signals, the clocked comparator (Fig. 3.54)
was used in the calibration system [105].
Signals informing about the start and end of switching of the input signal are
supplied to a logic cell that counts the durations of rise/fall times and forms the
PMOS and NMOS control signals. The latter are connected to the output transistors
of PB. They turn off or on the PB’s output PMOS or NMOS cascades, resulting in
changes in the rise/fall speeds. The logic cell (Fig. 3.55) calculates the speed of the
146 3 Signal Transmission Calibration Systems in Integrated Circuits
Tramsmitter
LS PB BB
CL CSS
input rise/falls and registers it in registers, after which it detects the direction of
deviation using digital comparators. Then the control node supplies the PMOS and
NMOS control signals, respectively. This operation is performed until the rise/fall
durations of the transmitted signal are equal.
148 3 Signal Transmission Calibration Systems in Integrated Circuits
Digital
comparator
It should be noted that the calculation of the speed of the rise/fall times of the
transmitted signal occurs through fVCO signal, increasing the frequency of which will
improve the accuracy of the system. The frequency selection of the fVCO signal was
made taking into account the integration duration of the signal.
The speed should be sufficient to perform the calculation during the change of
rise/fall times.
Thus, with the help of the proposed calibration system, it is possible to correct the
asymmetries of rise/fall times during the signal transmission, reducing jitter of the
signal.
In the case of slow PVT, the asymmetry of the initial rise/fall times of the input
signal of the transmitter was 24.8 ps (Fig. 3.56), which is quite a large bias and can
cause significant oscillation and jitter in the back buffer (BB).
It can be noticed that the asymmetry of the rise/fall times of the input signal is
corrected during three periods through the calibration system, making only 1.6 ps
(Fig. 3.57). It should be noted that the operation of the system does not depend on the
direction of deviations, and it does not interrupt the process of transmission of
signals.
The calibration accuracy of rise/fall asymmetry can be increased by selecting the
sizes of additional PMOS and NMOS transistors in the output node of the PB. It
should be noted that the time for the calibration process is different in the case of
other PVTs, but it does not exceed five periods. The values of calibrated rise/falls as
a result of the system application are presented in Table 3.7.
A comparison with modern rise/fall asymmetry calibration systems was also
carried out, which confirmed the effectiveness of the proposed system (Table 3.8).
Thus, the calibration system of designed rise/fall asymmetries significantly
reduces the effect of PVT deviations, as a result of which the reliability of the
transmitted signal increases. Simulations of the proposed system showed that it
calibrates the asymmetries of the signal rise/falls, reaching it to 1.2%, and also
reduces jitter. However, it increases the current consumed in the PB node, which
is 2.26 mA for the worst case.
150 3 Signal Transmission Calibration Systems in Integrated Circuits
Table 3.8 Comparison of the proposed calibration system with other systems
Parameter [9] [10] Proposed system
Manufacturing process (μm) 0.13 0.18 0.032
Speed (Mb/s) 630 500 2133
Rise/fall asymmetry (%) 0.9 3.4 1.2
As mentioned in Sect. 3.1, the transmission line significantly suppresses the trans-
mitted signal, causing inaccuracies and data losses (Fig. 3.58). It should be noted
that, even with a coherent line, the transmission line still suppresses the signal,
reducing its amplitude and also slowing down the speed of the signal rise/fall times.
The proposed system (Fig. 3.59) [106] reduces the effect of the transmission line.
It supplies additional current to the output buffers, as a result of which the signal rise/
fall speeds are improved, and it becomes possible to neutralize the losses caused by
the transmission line. The designed system is applicable in high-frequency I/O cells
and does not interrupt the data transmission process in the transmitter.
In order to carry out signal calibration in the proposed system, a slew rate control
amplifier (Fig. 3.60) [107] was designed, which was implemented using an “XOR”
logic cell and a frequency divider. It is placed in the node of the transmitter, and
during the signal transmission it carries out the amplification of its output signal.
3.2 Design Principles of Signal Transmission Calibration Systems. . . 151
The input signal is applied to one of the inputs of XOR, and its delayed version to
the other. As a result, a pulse signal is formed at the output of the “XOR,” which gets
a logical 1 value during rise/fall times of the input signal (Fig. 3.61) [105]. It should
152 3 Signal Transmission Calibration Systems in Integrated Circuits
Input signal
Delay signal
XOR Out
be noted that the signal delay node was implemented using sequentially connected
buffers.
Since the XOR should work at high signal frequencies and have a low delay, the
following structure of XOR was chosen (Fig. 3.62) [107].
In order to ensure amplification of PMOS and NMOS nodes in BB, it is necessary
to divide the output signal of “XOR” into two taps (Figs. 3.63 and 3.64) [107].
For this purpose, the frequency of the output signal of the XOR is reduced twice
by means of a DFF. As a result of this, two signals are formed: PMOS and NMOS
amplification, through which additional current is supplied to the BB during
3.2 Design Principles of Signal Transmission Calibration Systems. . . 153
Output signal of
XOR cell
NMOS control
Output signal of
AND cell
switching of the input signal. It should be noted that the proposed signal supplies
additional current from the BB to the transmission line only during signal transitions
and is switched off in its static regions to avoid consuming additional current. It is
also possible to control the operation of the proposed system and BB using the
enable signal <1:0>, and if necessary, turn them off (Table 3.9).
154 3 Signal Transmission Calibration Systems in Integrated Circuits
Output signal of
XOR cell
PMOS control
NMOS control
End of
substrate line Rterm
v|Voltage Transmission
level
line
Rterm
vOffset data
NMOS enable
for the typical case, the high constant voltage level of the signal decreases by
200 mV from the target value (Fig. 3.66).
For other PVTs, the signal amplitude is also suppressed, and for the slow case, the
DC signal voltage level is reduced by 260 mV (Fig. 3.67).
Measurements of signal jitter and the speed of rise/fall times were also carried out
when the proposed system was turned off (Table 3.10).
Thus, it can be noticed that the structure of BBs currently used significantly limits
the further increase in signal transmission speed. As a result of using the proposed
system, it was possible to reduce the influence of the transmission line and improve
the amplitude of the transmitted signal and the speeds of rise/fall times (Fig. 3.68).
It can be noticed that when the system is used, the speed of the signal rise/fall
times increases drastically, and the signal approaches the target value of its constant
voltage level. The signal also improves in the case of other PVT conditions
(Fig. 3.69).
It should be noted that the system connects additional resistors to the BB during
switching, only then supplying additional current from the BB to the transmission
line. In the static areas of the signal, this current is absent, as a result of which the
consumed current decreases. Jitter measurements were also carried out, which
confirmed that the signal jitter is significantly reduced due to the proposed system
(Table 3.11).
The use of the system also helps to increase the opening of the eye diagram of the
signal (Figs. 3.70 and 3.71). Its horizontal opening increases by 13% and vertical by
10%.
156 3 Signal Transmission Calibration Systems in Integrated Circuits
Fig. 3.67 The signal transmitted through the BB in slow and fast cases
Fig. 3.69 The transmitted signal using the proposed system in slow and fast cases
Thus, through the designed calibration system, it is possible to reduce the effect of
the transmission line, as a result of which the speed of the signal rise/fall times
increases by 50%, the horizontal and vertical openings of the signal eye at the end of
the transmission line increase by 13% and 10%, respectively, as well as signal jitter
is reduced by 20.7%. However, the proposed system increases the area of BB by
13%.
158 3 Signal Transmission Calibration Systems in Integrated Circuits
Fig. 3.70 Using BB, an eye diagram of a signal at the end of the transmission line
Fig. 3.71 Using the calibration system, an eye diagram of a signal at the end of the
transmission line
References 159
Conclusion
1. Principles of development of signal transmission calibration means in integrated
circuits have been proposed, which allow to significantly improve their main
technical characteristics and parameters: speed, reliability of data transmission
and reading, consumed power, etc.
2. A self-calibration method for detection of deviations in the duty cycle of the
signal has been developed, in which, due to reading the data through the applied
digital nodes, it significantly improves the reliability of the data and the calibra-
tion duty cycle with an accuracy of ±0.5% at the expense of the increase of
consumed power of the receiver by only 3.95 mW.
3. A method of increasing the reliability of high-frequency data in the sub-nodes of
the transmitter was proposed, which, due to the supply of additional current,
calibrates the deviations of the duty cycle in the voltage converter by 39.5% and
also reduces the jitter twice at the expense of increasing the area of the high-
frequency voltage converter in the transmitter by 12%.
4. A calibration method for the asymmetry of high-frequency signal rise/falls has
been developed, which significantly reduces the effect of process-voltage-tem-
perature deviations, as a result of which the asymmetry of the rise/falls is
calibrated, reaching 1.2%; the jitter is also reduced at the expense of the
pre-buffer, increasing the consumed current by 2.26 mA.
5. A transmission line-induced signal distortion calibration method was proposed,
which reduces the effect of the transmission line, as a result of which the speed of
the transmitted signal rise/fall times increases by 50%, the horizontal and vertical
openings of the signal eye increase by 13% and 10%, respectively, as well as the
signal jitter is reduced by 20.7% due to the 13% increase in output buffer area.
References
10. B. Dingle, J. Eubanks, K. Janasak, 3D RAM modeling and simulation in a model based
systems engineering environment. IEEE annual reliability and maintainability symposium
(RAMS-2020) (2020), pp. 1–6
11. S. Eidson, B. Gaines, P. Wolf, 30.2: HDMI: High-definition multimedia Interface, in SID
Symposium Digest of Technical Papers, (Blackwell Publishing Ltd, Oxford, UK, 2003),
pp. 1024–1027
12. A. Sedzin, J. Aguilar, A. Marechal, R. O’Connor, et al., High-speed inter-IC interfacing for
mobile multimedia applications, in Digest of Technical Papers International Conference on
Consumer Electronics, (IEEE, Piscataway, 2007), pp. 1–2
13. A.S. Trdatyan, Development of Self-Configurable Input/Output Units for Integrated Circuits.
PhD dissertation. Yerevan, (2020), 162 pages
14. S. Chun, M. Swaminathan, L. Smith, et al., Modeling of simultaneous switching noise in high
speed systems. IEEE Trans. Adv. Packag. 24, 132–142 (2001)
15. A. Fish, V. Milrud, O. Yadid-Pecht, High-speed and high-precision current winner-take-all
circuit. IEEE Trans. Circuits Syst. II Express Briefs 52, 131–135 (2005)
16. J. Ardenkjaer-Larsen, B. Fridlund, A. Gram, G. Hansson, et al., Increase in signal-to-noise
ratio of > 10,000 times in liquid-state NMR. Proc. Natl Acad. Sci. 100, 10158–10163 (2003)
17. N. Rao, G. Knight, S. Mohan, et al., Studies on failure of transmission line towers in testing.
Eng. Struct. 35, 55–70 (2012)
18. A. Djordjevic, A. Zajic, G. Tosic, et al., A note on the modeling of transmission-line losses.
IEEE Trans. Microwave Theory Techz 51, 483–486 (2003)
19. https://www.electronicdesign.com/technologies/communications/article/21796367/back-to-
basics-impedance-matching-part-1
20. E. Turan, S. Demir, An all 50ohm divider/combiner structure. IEEE MTT-S international
microwave symposium digest (Cat. No. 02CH37278) (2002), pp. 105–108
21. A. Mangan, S. Voinigescu, M. Yang, et al., De-embedding transmission line measurements for
accurate modeling of IC designs. IEEE Trans. Electron. Devices 53, 235–241 (2006)
22. V. Melikyan, A. Balabanyan, A. Hayrapetyan, et al., Receiver/transmitter input/output termi-
nation resistance calibration method. IEEE XXXIII international scientific conference elec-
tronics and nanotechnology (ELNANO-2013) (2013), pp. 126–130
23. H. Johnson, M. Graham, High-Speed Signal Propagation: Advanced Black Magic (Prentice
Hall Professional, Upper Saddle River, 2003), 808p.
24. S. Lehmann, F. Gerfers, Channel analysis for a 6.4 Gb/s DDR5 data buffer receiver front-end.
15th IEEE international new circuits and systems conference (NEWCAS-2017) (2017),
pp. 109–112
25. K. Khachikyan, L. Msryan, A. Balabanyan, Research of PVT variation influence on PLL
system and methodology of control voltage stabilization. IEEE 37th international conference
on electronics and nanotechnology (ELNANO-2017) (2017), pp. 190–193
26. V. Melikyan, K. Khachikyan, H. Gumroyan, et al., Crystal area reduction method for imped-
ance matching systems in high-speed data links. Proc. Univ. Electron. 24(5), 503–510 (2019)
27. V. Melikyan, A. Hayrapetyan, B. Baghramyan, et al., Transmitter output impedance calibra-
tion method. IEEE east-west design & test symposium (EWDTS-2018) (2018), pp. 1–8
28. W. Bae, Supply-scalable high-speed I/O interfaces. Electronics 9, 1315 (2020)
29. L. Fassio, F. Settino, L. Lin, R. De Rose, et al., A robust, high-speed and energy-efficient
ultralow-voltage level shifter. IEEE Trans. Circuits Syst. II Express Briefs 68, 1393–1397
(2020)
30. B. Mahendranath, A. Srinivasulu, Output buffer for+ 3.3 V applications in a 180 nm+ 1.8 V
CMOS technology. Radioelectron. Commun. Syst. 60, 512–518 (2017)
31. H. Yu, T. Michalka, M. Larbi, M. Swaminathan, Behavioral modeling of tunable I/O drivers
with preemphasis including power supply noise. IEEE Trans. Very Large-Scale Integr Syst.
28, 233–242 (2019)
References 161
32. M. Ker, T. Wang, F. Hu, Design on mixed-voltage I/O buffers with slew-rate control in
low-voltage CMOS process. IEEE 15th international conference on electronics, circuits and
systems (2008), pp. 1047–1050
33. https://www.sciencedirect.com/topics/engineering/characteristic-impedance
34. J.F. Wakerly, Transmission lines, reflections, and termination, in Digital Design Principles
and Practices, 4th edn. (Pearson Education, Inc., Upper Saddle River, 2006), 112 p
35. A.J. Deutsch, W. Res, G.V. Kopcsay, et al., When are transmission-line effects important for
on-chip interconnections. Microwave Theory Tech. IEEE Trans. RFIC Virtual J. 45(10),
1836–1846 (1997)
36. V. Melikyan, A. Balabanyan, A. Hayrapetyan, A. Durgaryan, NMOS/PMOS resistance
calibration method using reference frequency. IEEE ninth international conference on com-
puter science and information technologies revised selected papers (2013), pp. 1–6
37. A. Balabanyan, A. Durgaryan, Fully integrated PVT detection and impedance self-calibration
system design. IEEE XXV international scientific conference electronics (ET-2016) (2016).
pp. 1–4
38. V. Melikyan, A. Sahakyan, A. Hayrapetyan, et al., Serializer/deserializer output data signal
duty cycle correction method. Proceedings of 57th ETRAN conference (2013), pp. 1–4
39. J. Chung, A.A. Iliadis, Design and optimization of a CMOS IC novel RF tracking sensor.
Int. J. Circuit Theory Appl. 49, 801–819 (2021)
40. O.H. Petrosyan, A.A. Martirosyan, A.S. Trdatyan, et al., Equalization method of resistors.
Manual Eng. Acad. Armenia Yerevan 15(3), 475–479 (2018) (in Armenian)
41. B. Sporrer, L. Wu, L. Bettini, et al., A fully integrated dual-channel on-coil CMOS receiver for
array coils in 1.5–10.5 T MRI. IEEE Trans. Biomed. Circuits Syst. 11, 1245–1255 (2017)
42. Y. Lai, Y. Liao, J. Jou, et al. Design of high-speed optical receiver module for 160Gb/s NRZ
and 200Gb/s PAM4 transmissions. IEEE international symposium on circuits and systems
(ISCAS-2019) (2019), pp. 1–4
43. B. Fahs, J. Chowdhury, M. Hella, A 12-m 2.5-Gb/s lighting compatible integrated receiver for
OOK visible light communication links. J. Lightwave Technol. 34, 3768–3775 (2016)
44. G. Li, Y. Yin, Y. Zhang, High-precision mixed modulation DAC for an 8-bit AMOLED driver
IC. J. Disp. Technol. 11, 423–429 (2015)
45. J. Jun, J. Kang, S. Kim, A 16-bit incremental ADC with swapping DAC for low power sensor
applications. IEEE international symposium on circuits and systems (ISCAS-2019) (2019),
pp. 1–4
46. V. Melikyan, K. Khachikyan, A. Matevosyan, A. Petrosyan, et al., High quality factor 5.0
Gbps CTLE circuit for SERDES serial links. IEEE East-west design & test symposium
(EWDTS-2018) (2018), pp. 1–5
47. V. Melikyan, A. Balabanyan, A. Durgaryan, et al., PVT variation detection and compensation
methods for high-speed systems. IFIP/IEEE 21st international conference on very large-scale
integration (VLSI-SoC-2013) (2013), pp. 322–327
48. V. Melikyan, K. Khachikyan, A. Trdatyan, A. Petrosyan, et al., High quality factor 5.0 Gbps
CTLE circuit for SERDES serial links. IEEE east-west design & test symposium (EWDTS),
Kazan, Russia, 14 September 2018 (2018), pp. 641–644
49. V.K. Sharma, S. Deb, Analysis and Estimation of Jitter Sub-components. Dissertation (IIIT,
Delhi, 2014), 96 p
50. N. Kirianaki, Y. Yurish, Data Acquisition and Signal Processing for Smart Sensors (Wiley,
Newark, 2002), 274 p
51. J. Fan, X. Ye, J. Kim, B. Archambeault, et al., Signal integrity design for high-speed digital
circuits: Progress and directions. IEEE Trans. Electromagn. Compat. 52, 392–400 (2010)
52. S. Ooi, L. Kong, H. Goay, et al., Crosstalk modeling in high-speed transmission lines by
multilayer perceptron neural networks. Neural Comput. & Applic. 32, 7311–7320 (2020)
53. K. Khachikyan, A. Balabanyan, A. Petrosyan, PLL control voltage stabilization method for
high-speed systems. IEEE XXV international scientific conference electronics (ET-2016)
(2016), pp. 1–4
162 3 Signal Transmission Calibration Systems in Integrated Circuits
74. A. Sidorov, N. Goryunov, S. Golubkov, Improvement of automatic control system for high-
speed current collectors. J. Phys. Conf. Ser. 944, 012108 (2018) IOP Publishing
75. A. Malkov, D. Vasiounin, O. Semenov, A review of PVT compensation circuits for advanced
CMOS technologies. Circuits Syst. 2, 162–169 (2011)
76. L. Tan, K. Chan, A fully integrated point-of-load digital system supply with PVT compensa-
tion. IEEE Trans. Very Large-Scale Integr. Syst. 24, 1421–1429 (2015)
77. M. Marcu, S. Durbha, S. Gupta, Duty-cycle distortion and specifications for jitter test-signal
generation. IEEE international symposium on electromagnetic compatibility (2008), pp. 1–4
78. M. Bushnell, V. Agrawal, Essentials of Electronic Testing for Digital, Memory and Mixed-
Signal VLSI Circuits (Springer, New York, 2004), 574 p
79. R. Mehta, S. Seth, S. Shashidharan, B. Chattopadhyay, et al., A programmable, multi-GHz,
wide-range duty cycle correction circuit in 45nm CMOS process. IEEE proceedings of the
ESSCIRC (2012), pp. 257–260
80. J. Melo, N. Paulino, J. Goes, Continuous-time delta-sigma modulators based on passive RC
integrators. IEEE Trans. Circuits Syst. I Regul. Pap. 65, 3662–3674 (2018)
81. R. Hosseini, M. Saberi, R. Lotfi, A high-speed and power-efficient voltage level shifter for
dual-supply applications. IEEE Trans. Very Large-Scale Integr. Syst. 25, 1154–1158 (2016)
82. Razavi B., Fundamentals of Microelectronics (Wiley, Hoboken, 2021), 928 p
83. K. Sharma, N. Tripathi, R. Nagpal, et al., A comparative analysis of jitter estimation tech-
niques. IEEE international conference on electronics, communication and computational
engineering (ICECCE-2014) (2014), pp. 125–130
84. N. Tripathi, K. Sharma, H. Advani, et al., An analysis of power supply induced jitter for a
voltage mode driver in high speed serial links. IEEE 20th workshop on signal and power
integrity (SPI-2016) (2016), pp. 1–4
85. Y. Shim, D. Oh, T. Hoang, et al., A jitter equalization technique for minimizing supply noise
induced jitter in high speed serial links. IEEE international symposium on electromagnetic
compatibility (EMC-2014) (2014), pp. 827–832
86. S. Valadimas, Y. Tsiatouhas, A. Arapoyanni, Timing error tolerance in nanometer ICs. IEEE
16th international on-line testing symposium (2010), pp. 283–288
87. S. Lee, A. Saad, L. Lee, et al., On-chip slew-rate control for low-voltage differential signalling
(LVDS) driver. IEEE international symposium on intelligent signal processing and commu-
nication systems (ISPACS-2014) (2014), pp. 99–101
88. K. Szczerba, T. Lengyel, M. Karlsson, et al., 94-Gb/s 4-PAM using an 850-nm VCSEL,
pre-emphasis, and receiver equalization. IEEE Photon. Technol. Lett. 28, 2519–2521 (2016)
89. Z. Zhou, T. Odedeyi, B. Kelly, J. O’Carroll, et al., Impact of analog and digital pre-emphasis
on the signal-to-noise ratio of bandwidth-limited optical transceivers. IEEE Photonics J. 12,
1–2 (2020)
90. Y. Lu, K. Jung, Y. Hidaka, et al., Design and analysis of energy-efficient reconfigurable
pre-emphasis voltage-mode transmitters. IEEE J. Solid State Circuits 48, 1898–1909 (2013)
91. Yuminaka Y., Takahashi Y., Time-domain pre-emphasis techniques for equalization of
multiple-valued data. IEEE 38th international symposium on multiple valued logic (ISMVL-
2008) (2008), pp. 20–25
92. K. Khachikyan, A. Balabanyan, H. Gumroyan, Precise duty cycle variation detection and self-
calibration system for high-speed data links. IEEE computer society annual symposium on
VLSI (ISVLSI-2018) (2018), pp. 191–196
93. H. Huang, J. Heilmeyer, M. Grözing, M. Berroth, et al., An 8-bit 100-GS/s distributed DAC in
28-nm CMOS for optical communications. IEEE Trans. Microwave Theory Tech. 63,
1211–1218 (2015)
94. J. Shen, A. Shikata, D. Fernando, et al., A 16-bit 16-MS/s SAR ADC with on-chip calibration
in 55-nm CMOS. IEEE J. Solid State Circuits 53, 1149–1160 (2018)
95. Y. Delican, T. Yildirim, High performance 8-bit mux-based multiplier design using MOS
current mode logic. IEEE 7th international conference on electrical and electronics engineer-
ing (ELECO-2011) (2011), pp. 89–93
164 3 Signal Transmission Calibration Systems in Integrated Circuits
96. L. Melo, J. Goes, N. Paulino, A 0.7 V 256 μW ΔΣ modulator with passive RC integrators
achieving 76 dB DR in 2 MHz BW. IEEE symposium on VLSI circuits (VLSI circuits) (2015),
pp. 290–291
97. V.Sh. Melikyan, M. Martirosyan, A. Melikyan, G. Piliposyan, 14nm educational design kit:
Capabilities, deployment and future. Proceedings of the 7th small systems simulation sympo-
sium 2018, Niš, Serbia, 12–14 February 2018 (2018), pp. 37–41
98. I. Raja, G. Banerjee, A. Zeidan, et al., A 0.1–3.5-GHz duty-cycle measurement and correction
technique in 130-nm CMOS. IEEE Trans. Very Large-Scale Integr. Syst. 24, 1975–1983
(2015)
99. P. Chen, W. Chen, J. Lai, A low power wide range duty cycle corrector based on pulse
shrinking/stretching mechanism. Proc. IEEE Asian solid-state circuits conference (2007),
pp. 460–463
100. C. Jang, J. Bae, J. Park, CMOS digital duty cycle correction circuit for multi-phase clock.
Electron. Lett. 39, 1383–1384 (2003)
101. D. Pan, H.W. Li, B.M. Wilamowski, A low voltage to high voltage level shifter circuit for
MEMS application. IEEE proceedings of the 15th biennial university/government/industry
microelectronics symposium (2003), pp. 128–131
102. V. Melikyan, K. Khachikjan, L. Msryan, et al., High speed, low-jitter level shifter for high
speed ICs. IEEE 37th international conference on electronics and nanotechnology (ELNANO-
2017) (2017), pp. 175–177
103. R.J. Baker, C. Boyce, Circuit Design, Layout, and Simulation, IEEE press series on micro-
electronic systems (IEEE-Press, Piscataway, 2005), pp. 350–453
104. Y. Ho, H.K. Chen, C. Su, Energy-effective sub-threshold interconnect design using high-
boosting predrivers. IEEE J. Emerging Sel. Top. Circuits Syst. 2, 307–313 (2012)
105. M. Abbas, Y. Furukawa, S. Komatsu, et al., Clocked comparator for high-speed applications in
65nm technology. IEEE Asian solid-state circuits conference (2010), pp. 1–4
106. JEDEC solid state technology association. Low Power Double Data 4 (LPDDR4).
November, 2015
107. V. Melikyan, K. Khachikyan, A. Trdatyan, A. Durgaryan, Design of edge boosting digital
control circuit for high-speed ICs. IEEE 36th international conference on electronics and
nanotechnology (ELNANO-2016) (2016), pp. 315–318
Chapter 4
Methods to Improve Linearity of Signal’s
Analog-to-Digital Conversion
with Self-Calibration
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 165
V. Melikyan, Machine Learning-based Design and Optimization of High-Speed
Circuits, https://doi.org/10.1007/978-3-031-50714-4_4
166 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
deviate from the nominal, as a result of which the nonlinearity of the circuit increases
[9, 13–16].
The main parameters characterizing the linearity of ADCs and DACs are differ-
ential (DNL) and integral (INL) nonlinearities [9]. In an ideal case, when non-
linearities are absent in the operation of DAC, the increment of the digital code
should correspond to a voltage increase in the analog output equal to one least
significant bit (LSB), and in the case of an ADC, a change in the input voltage of
1 LSB should correspond to a switching from one value of the digital code to the
next one. The DNL error is defined as the maximum deviation of the switching from
the ideal 1 LSB size in the transfer function. The ideal DAC transfer function without
linearity error (Fig. 4.2) has a monotonic view, and the view of the nonlinear DAC
transfer function is non-monotonic (Fig. 4.3) [9].
The numerical code is represented in the decimal system. The linear transfer
function of the ADC will have an ideal stepwise view, and the transfer function with
linearity error will have a non-monotonic view (Fig. 4.4).
A DNL error can lead to missing code and voltage [9]. When DNL is less than
one 1 LSB for at least one of the switches, then DNL is considered non-monotonic
and the transfer function contains one local maximum or minimum. When DNL is
greater than 1 LSB, the problem of monotonicity does not arise, but such a trans-
mission characteristic is also not so desirable (Fig. 4.5) [9].
In many DAC applications, it is unacceptable to have a non-monotonic transfer
function, as this can lead to a complete failure of the circuit, especially in cases where
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 167
DAC
40
35
30
Analog output (mV)
25
20
15
10
0
0 1 2 3 4 5 6 7 8
Digital code
DAC
40
35
30
Analog output (mV)
25
20
15
10
0
0 1 2 3 4 5 6 7 8
Digital code
the DAC is part of a negative feedback system, and its non-monotonicity can make
the feedback positive. The DNL of the DAC is defined as the deviation of the
switching from the ideal 1 LSB size in the transfer function (4.1) [9, 17, 18].
168 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
ADC
8
7
6
Digital code
5
4
3
2
1 Missing code
0
0 5 10 15 20 25 30 35 40
Analog input (mV)
V iþ1 - V i
DNL = –1 ð4:1Þ
LSB
The INL of a DAC for a given code is the difference between the available
voltage and the ideal transfer function voltage (4.2) [9, 17, 18].
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 169
Vi - V0
INL = –i ð4:2Þ
LSB
An INL error can also lead to code loss. DNL and INL are mainly caused by
technological deviations and inaccuracies in the manufacturing process. Usually, the
deviation of discrete electrical elements (resistors, capacitors, etc.) from the nominal
value after the manufacturing process is not very large and does not exceed 1%, but
the situation is different in ICs. In ICs, after manufacturing, the deviation of the
parameters of separate cells can reach up to ±25% from the nominal [19, 20]. DACs
and ADCs, manufactured with such deviations, have significant nonlinearity errors
and often code loss due to a large DNL or INL error.
Using the example of the r-string DAC, the effect of deviations in the physical
parameters of electrical components on the system performance caused by
manufacturing is considered. In the r-string DAC, such deviations are carried mainly
by resistors and switches, which have a CMOS structure. Their parasitic parameters,
such as parasitic capacitance and resistance, also deviate from the nominal values
during manufacturing process, which in turn affects the delay of switch, leading to
inaccuracies in the operation of a circuit.
Thus, even precisely designed ICs, which fully meet specification requirements at
design stage, inevitably suffer deterioration of operating parameters as a result of
manufacturing, the cause of which is the manufacturing itself. In particular,
nonlinearity errors in DACs and ADCs increase, which can lead to system malfunc-
tion and as a result reduce the percentage of the output of operating circuits.
However, after the end of manufacturing process, it is not possible to correct the
operating parameters of the latter, so the development of self-calibration deviation
correction systems in the IC is an important issue.
The problems discussed and proposed in Sect. 4.1.1, as already mentioned, cannot
be completely solved at the design stage. For this reason, it is necessary to develop
and introduce such means that will allow correcting errors and deviations in the
manufacturing process by self-calibration after manufacturing, to have a precisely
operating circuit. The problem is challenging, because the semiconductor industry is
one of the fastest growing directions of the economy. The growth of problems in the
sector also creates the need to have compatible solutions. Thus, ICs with embedded
self-calibration systems are in high demand because, in addition to solving the
problem of improving IC operating parameters, they simultaneously increase the
number of operating circuits, reducing production costs. However, developing such
systems is not an easy task.
170 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Vref Vin
R +
–
R +
–
R +
–
R
R +
–
R +
–
R +
–
R
i
V ref: ΔRk
V i = V i,ideal þ : ð4:3Þ
2n k=1
R
An ideal comparator assumes a change in the output voltage level only when the
voltage values of the inputs equal each other, but in practice the comparator has an
offset error and the condition does not occur. In case of offset, the condition (4.4)
needs to be met for switching the output of the comparator to “1” or “0” [21].
From the expression (4.4), it is possible to obtain the voltage value of the
switching point of the i-th comparator (4.5) [35].
V sp , i = V i þ V offset ð4:5Þ
The offset causes nonlinearity in the overall flash ADC; as a result, there is a DNL
and INL error and, in some cases, even missing code (Fig. 4.9) [9].
Comparators are usually designed so that the offset error is minimal, and even if
there is a nonlinearity error, the DNL or INL does not exceed 1 LSB, so that there is
no missing code. However, random deviations after the manufacturing process lead
to a non-monotonic transfer function due to the offset error and thus missing code.
To avoid such a problem, it would be appropriate to have a self-calibration offset
system implemented, which would allow to correct the offset error and therefore
have a monotonic transfer function, avoiding missing code.
Thus, the operating parameters of flash ADCs are inevitably degraded after the
manufacturing process. Basically, the nonlinearity of the transfer function in flash
ADCs is caused by the deviation of the values of series resistors of reference voltages
from the nominal, as well as the nonlinearity of comparators. The reasons for
nonlinearity of comparators are offset and amplification errors.
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 173
The system embedded in the observed circuit can also improve the operating
parameters of the degraded circuit by means of self-calibration. The principle circuit
of the current DAC is presented in Fig. 4.10 [41–46]. It consists of current sources of
different weights which are connected to the output by a controlled code and provide
the appropriate current at the output.
Transistors with metal-oxide-semiconductor (MOS) structure serve as a current
source. The current of thermometric branches is responsible for the linear increase of
174 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Vsupply
Out
Iref Iref Iref Iref Iref Iref
2
2 4 2N–2
8 2N
W W W W W W
L 2L 4L 2N-2L 2N-1L
L
M0 M1 M2 M3 Mn-1 Mn
the current, and current sources with binary weight change the current by
corresponding values of two degrees (Fig. 4.11) [41].
The branches consist of current mirrors, and the size of the current is determined
by the size of transistors (4.6) [41].
2
I dsn V gs1 - V th × ð1 þ λ × V ds1 Þ
= 2
, ð4:6Þ
I ds1 V gsn - V th × ð1 þ λ × V dsn Þ
where Ids is the source-drain current of the n-th transistor, Vgsn is the gate-source
voltage of the n-th transistor, Vth is the threshold voltage of the transistor, and λ is the
channel length modulation coefficient. The formula is for an ideal case when the
parameters and currents of transistors have no deviations.
As in the case of other circuits, there are deviations in the physical parameters of
transistors during the manufacturing process. Physical parameters such as oxide
thickness, threshold voltage, transistor channel width, and general IC area changes
lead to circuit performance degradation [41]. Especially in the case when proper
equalization of transistors is not done during the physical design stage of IC, as a
result, the sizes of transistors differ from the values set at the beginning of the design,
and therefore, the current deviates from the nominal values, and the output current
has an incorrect value. Basically, there are three types of deviations: conductance
(B), threshold voltage (V), and channel length modulation coefficient (λ) [41]. Devi-
ations can be presented in the following form (4.7):
B1 = B0 þ ΔB,
V th1 = V th0 þ ΔV, ð4:7Þ
λ1 = λ0 þ Δλ,
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 175
where ΔB, ΔV, and Δλ are the deviation sizes of conductivity, threshold voltage, and
channel length modulation coefficient. In the saturation mode of M0 transistor, the
output current will be represented by the formula (4.8) [41]:
B0 2
I ds0 = × V ga - V th0 × ð1 þ λ0 × V ds0 Þ: ð4:8Þ
2
Taking all deviations into account, the output current in the saturation mode of
M1 transistor will be represented by the formula (4.9) [41].
B0 þ ΔB 2
I ds0 = × V gs - ðV th0 þ ΔV Þ × ð1 þ ðλ0 þ ΔλÞ × V ds0 Þ: ð4:9Þ
2
Therefore, the current of the n-th transistor will be determined by the formula
(4.10) [41]:
2
I dsn Bn V gs1 - V th × ð1 þ λ × V ds1 Þ
= × 2
: ð4:10Þ
I ds1 B1 V gsn - V th × ð1 þ λ × V dsn Þ
Thus, the deviation of the current source from the nominal can be represented as a
parallel current source connected to the main current source (Fig. 4.12) [41].
The output current of a non-ideal DAC will be expressed by the formula (4.11)
[41]:
n-1
I out ðnÞ = b ð nÞ
k=0 k
I out,k , ð4:11Þ
where
Such a deviation current will lead to nonlinearity of the current DAC; thus,
reducing the self-calibration deviation current in DACs of similar current will lead
to an improvement in the linearity of the overall DAC.
Thus, the main cause of nonlinearity in current DACs is the deviations of the
current values of the current sources from the nominal value. The main causes of
INL and DNL are current sources with large weight—thermometric branches of the
current.
Another very popular and widely used ADC in modern ICs is the pipeline ADC. It is
mainly used in such functions, when the operating frequencies do not exceed
100 MHz. This ADC is slower compared to a flash ADC, but it has a clear advantage
in terms of bits. In case of the same area its bit is greater. The simplest pipeline ADC
consists of N cascades, each of which receives B bit. In order to obtain a B-bit signal,
the input analog signal in each cascade is first selected and stored by a sample-hold
(S/H) device, then exposed to coarse quantization (digitization) by means of a
sub-range ADC (Fig. 4.13) [47]. The analog voltage is then restored by a
sub-DAC, after which the quantized signal is subtracted from the input signal to
yield the quantization error. To bring the quantization error to full-scale voltage
range, the error is multiplied by 2B - 1 which is performed by OpAmp (Fig. 4.14)
[47–50]. The obtained voltage is applied to the input of the next cascade, which has
exactly the same structure. Timing and digital correction is done for the obtained B
bits by a synchronization and digital error correction device, after which a Y-order
digital output corresponding to the input analog signal is obtained.
A few important features can be noted here. The larger B is, the smaller N is, that
is, with a smaller number of cascades, the presented operation can be performed. A
large choice of B will reduce the size and power consumption of ADC, but the
requirements for inter-cascade devices will become stricter [51–54].
It should be noted that if B is chosen equal to the ADC bits, there will be a flash
ADC whose bit is not so large. Another feature is that by using the well-known
synchronization and digital error correction method, the requirements on the
sub-ADCs are significantly eased, allowing the use of comparators with low accu-
racy and power. On the other hand, since there is a S/H device in each cascade, it is
necessary to have an OpAmp with a fast output assertion, which will lead to an
increase in power and size, and on which the performance of the ADC will
mainly depend. So, for example, a 12-bit pipeline ADC will look as shown in
Fig. 4.15 [55–59].
The reason for the inaccuracy can be the deviations of the S/H device and the
offset error of the OpAmp.
Low-bit ADCs and DACs have some nonlinearity and other deviations, too.
Inaccuracies and errors are presented in Table 4.1 [60]. Inaccuracies, such as the
nonlinearity of ADCs and the offset of OpAmps, can be corrected by digital error
correction means and offset error compensation methods, respectively [61]. Thus,
the above-mentioned inaccuracies are mainly causes of general ADC linearity error.
To see the effects of nonlinearity, consider an ADC cascade that contains a 2-bit
ADC and assume that the bits of all other cascades are infinitely large. Figs. 4.16 and
4.17 show the I/O characteristics of an ideal 2-bit ADC [60, 62–65]. The switching
points are determined by the ADC subconverter, and the switching magnitude
depends on the DAC and the OpAmp gain. The switching points of ADC are
0 and ± 1/2Vref. and their respective DAC output levels ±1/4Vref. and ± 3/4Vref.
178 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
-Vref +Vref In
Thus, gain error leads to a negative DNL if the gain of the amplifier is less than
the ideal (Figs. 4.20 and 4.21), and a positive DNL if the gain is greater than the gain
of the ideal amplifier (Figs. 4.22 and 4.23) [60].
Thus, OpAmps are the main source of nonlinearity in pipeline ADCs. Offset or
gain errors in OpAmps cause DNL and INL of the entire ADC to increase. All
180 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Positive
DNL
Negative
DNL
-Vref +Vref In
discussed ADCs and DACs inevitably suffer from performance degradation, and it is
clearly necessary to have solutions that will significantly improve the performance of
the presented converters, allowing to have uninterrupted operating IC.
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 181
Negative
DNL
-Vref
+Vref In
182 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Positive
DNL
-Vref +Vref In
Many methods currently exist to reduce the nonlinearity that occurs in all presented
ADCs and DACs. The application of methods depends on the specific type of ADC
and DAC and, in particular, on the function they will perform in IC. The efficiency of
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 183
the means and the accuracy of the nonlinearity reduction are also chosen based on
the specific function. The main parameters are the area occupied on the total IC,
power consumption, as well as the financial limitations.
Such means in turn have inaccuracies and may cause additional errors. Therefore,
the proposed solutions should be as accurate as possible and as linear as possible. It
is also very important that self-calibration of the system is performed in the sequence
of minimum steps, because the accuracy of the system also depends on the number
of steps: the smaller the sequence of steps, the less the total effect of the errors made
in each step.
The nonlinearity reduction is done by feedback through embedded systems, the
purpose of which is to measure the nonlinearity error and reduce the nonlinearity
error with the intended accuracy through a specific algorithm. Deviated parameters
(current, offset, parameters of electrical elements) are corrected by auxiliary devices:
calibration DACs, additional current sources, and preamplifiers. Typically, the
testing and measurement algorithm is implemented using a finite state machine
(FSM). In many cases, the FSM also performs a calibration function.
As mentioned, the flash ADC is the fastest ADC and contains a large number of
comparators that insert offset error, resulting in deterioration of linearity. One of the
proposed solutions to increase the linearity of this ADC is observed. The presented
flash ADC (Fig. 4.24) consists of a track and hold amplifier (THA), a reference
voltage generator matrix, four-cascade preamplifiers, comparators, an encoder, a
calibration feedback circuit, and an output cascade. Cascaded preamplifiers are
designed to increase the difference between the reference voltage and the input
voltage to overcome the offset error of comparators. Interpolation allows to generate
intermediate points of intersection with zero. Therefore, it is not necessary to
compare the input with the 255 reference voltages; at the same time, the number
of preamplifiers decreases, saving a large area. The first two cascades generate
33 zero-crossing points, which are corrected by the calibrating circuit. The interpo-
lation performed by means of resistors is performed in the second and third cascades,
as a result of which the remaining points of intersection with zero are obtained [66].
A pseudo-differential no-feedback THA (Fig. 4.25) is introduced to prevent the
signal-to-noise ratio from increasing [66]. The pseudo-differential p-type MOS
(p-MOS) source repeater is usually used as a repeater in succeeding preamplifiers
because it has high linearity and good offset characteristics [66–70].
However, due to the short channel, there are problems such as the reduction of the
output voltage range of ADC, and in order to have the same accuracy, it is necessary
to increase the power consumption.
184 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Cascading current sources and the use of long-channel transistors can improve the
output voltage range, resulting in increased linearity, but they may require larger
voltage drops and also affect speed [70–74]. To soften this drawback, the gates of
M2 and M4 transistors are connected together instead of connecting the input signal
to the reference voltage (Fig. 4.25). Here, M2 and M4 operate as dynamic current
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 185
sources for M1 and M3 source repeaters [66]. As the input Vip increases, ID1
decreases due to the decrease of source-drain voltage of M2 transistor. However,
the complementary Vin input decreases due to ID1 current. This approach increases
the gain of the amplifier. A similar structure provides a parallel opportunity to
improve signal rise/fall times without consuming static power. Thus, the gain, gain
bandwidth product, and the slew rate are improved by 20, 40, and 30%,
respectively [66].
In CMOS flash ADCs, circuit element mismatches and manufacturing inaccura-
cies cause offset errors in comparators and preamplifiers. Averaging methods can
partially solve the problem and increase the linearity of ADC; however, long-
channel transistors are still required in preamplifiers, and degradation of linearity
is inevitable. To solve this problem, a calibration circuit is proposed, which cali-
brates the reference voltage by means of correction currents in such a way that offset
errors caused by preamplifiers and comparators are neutralized. The part marked
with dotted lines in Fig. 4.24 is responsible for the self-calibration process. The
digital code generator generates codes corresponding to the ideal ADC output, which
are transferred to calibration input generator. The calibration input generator con-
tains precision-valued resistors and can provide linearity equal to 10-bit. Φ1c and
Φ2c keep cal.ip and cal.in when Φ1 and Φ2 turn off Vip and Vin (Fig. 4.25). The
ADC quantizes the calibration input instead of the actual input. The correction
currents are then determined by comparing the output of the ADC with the ideal
code based on the amount of error obtained. The range of current correction is
determined by the bit rate of the control counter and the discrete current source
outside IC. A similar circuit allows nonlinearity correction up to ±10 LSB with a
correction step of 0.2 LSB (Fig. 4.26) [66]. After calibration, the calibration inputs
are switched to ADC inputs, and the circuit returns from calibration mode to
operating mode. The maximum calibration time is 0.12 ms [66].
The offsets are also calibrated as a result of comparison. The offset errors of the
third and fourth cascade preamplifiers and comparators are not corrected. The circuit
was run for 24 h, after which the deviations of the parameters were checked, which
did not undergo significant changes. In order to test the proposed circuit, a sinusoid
with a frequency of 2.3 MHz and a holding speed of 1.25 GS/s was taken as an input
analog signal. Without self-calibration, the INL of ADC was measured, and in the
worst case it is equal to 3.3 LSB. After the self-calibration mode, the INL of the
circuit is reduced up to 1.1 LSB, and the DNL is reduced to 1.3 LSB [66].
Thus, the system of linearity increase with self-calibration in flash ADC partially
solves the problem of nonlinearity reduction, because the values of INL and DNL are
still greater than 1 LSB, and therefore, missing loss will inevitably occur, which is
unacceptable. In addition, the proposed solution occupies a large area, and the self-
calibration algorithm consists of a large number of successive steps, as a result of
which additional errors in each successive step lead to a large error in total.
186 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
1.5
Calibrated DNL < 1.3 LSB
1
0.5
LSB
0
–0.5
–1
–1.5
0 50 100 150 200 250
code
4
3 Uncalibrated INL < 3.3 LSB
Calibrated INL < 1.1 LSB
2
1
LSB
0
–1
–2
–3
–4
0 50 100 150 200 250
code
The main cause of nonlinearity in current DACs is the mismatch of current source
transistors to each other, which leads to deviation of currents from nominal values.
Current deviation is a static error, so it is corrected once when the circuit is switched
on. There are many approaches to solving the problem. The most general view of the
solution is presented (Fig. 4.27) [5, 75–80]. Current deviation error correction
methods are based on three main operations: self-checking or self-error measure-
ment, providing a correction process with a feedback circuit or algorithm, and the
self-calibration operation implemented by it. The deviation current correction range
of each tuning DAC (reg DAC) is determined by the maximum possible thermal
current deviation, and the LSB size of the tuning DAC determines the tuning
accuracy. Note that the LSB step size should be small enough to ensure the necessary
linearity. With such DACs, it is possible to correct both negative and positive current
deviations. A 1-bit ADC is used to find the current error, which itself is already
linear. Only the input deviation error of the current is inevitable, Ioffset [5, 76].
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 187
CALDAC(i)
Itherm(i) I binref
Itemp
temp
CALDAC(i)
The input deviation error is corrected in two stages (Fig. 4.28) [5, 76]. During
phase A (φA), the temporary current source, Itemp, is calibrated according to the
reference current, Ibinref.
Ioffset is also held; during phase B the reference current is applied to the input of
the comparator to calibrate Itherm. Itherm is calibrated according to the temporary
current Itemp while correcting the input deviation current of the comparator, Ioffset
(4.13).
During the two stages of calibration, Ioffset is eliminated. The quantization error
occurring during each cycle is controlled to reduce. For Itemp to be always set
according to positive quantization error of Ibinref, a polarity check is performed
after state V3, while a check is performed after state V6 to make sure that Itherm(i)
is calibrated according to Itemp, but already with a negative quantization error.
Such a sequence of operations allows to control the distribution of quantization
error and reduce the post-calibration error [5]. In calibration process, the total current
of the binary weighted current sources, plus an empty current source with one LSB
weight, is selected as the reference current source for comparison. This approach
practically eliminates the DNL error when converting from binary to thermometric
codes. Then calibration is done only for thermometric branches of the current, since
their effect on the linearity of the DAC is much more significant compared to the
binary branch currents. INL was measured before and after using the method. Before
self-calibration, INL of the circuit >1.3 LSB, after calibration INL < 0.4 LSB
188 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
S1
φA X1=0;
X2=0
S1 Yes No
START
SELECT;
φB
CALIBRATION
Ibinref
S2 S5
SELECT
P=-COMPARATOR B=COMPARATOR
Itherm (i)
S3 S6
X1=X1+1 X2=X2+1
Yes
(−1) Itemp< (−1) Ibinref S4 Yes No
DESELECT; (−1) Itherm(i)< (−1) Itemp
Ibin.ref
i=1
No
No
No
P=0?
P = 1?
Yes
Yes
i=i+1
X1=X1-1
X2=0 X2=X2-1
Yes
No S7
End i< 2 ? DESELECT;
Itherm(i)
(Fig. 4.29) [5]. The reason for the remaining nonlinearity is the non-calibrating
binary part of the circuit.
Thus, the presented system of self-calibration of the linearity of the current DAC
shows significant results; in particular, it reduces the DNL and INL of the system to
the extent that the missing codes in the system are absent. The nonlinearity error
suppression range of the proposed number is rather small, and the system is effective
in suppressing small linearity errors and therefore does not completely solve the
problems presented.
4.1 General Issues to Improve Linearity of Signal’s Analog-to-Digital. . . 189
Fig. 4.29 Nonlinearity measurements before (a, c) and after (b, d) calibration
A high gain OpAmp with negative feedback allows for precise operation by reducing
the nonlinearity present in non-feedback circuits. The gain of a simple feedback
circuit can be calculated using the following formula:
A 1 1
Acl = = , ð4:14Þ
1 þ Aβ β 1 þ 1=Aβ
where β is the feedback coefficient and A is the gain of the OpAmp. For an ideal
OpAmp, when the gain is infinitely large, Acl is equal to 1/β [81]. However, Acl is
smaller compared to 1/β. If β is reduced, Acl can be equalized to the desired value. In
pipeline ADCs, the residual voltage is amplified by feedback multiplier DACs
(MDACs). The gain error caused by the small gain of the OpAmp can be corrected
on the account of feedback coefficient [81–90].
Figure 4.30 shows an MDAC with a calibration capacitor (Ccal), the size of the
capacitance of which can be controlled by digital means. During the holding phase,
eight capacitors (Ch) control the input. Both Ccal and feedback capacitance (Cf) are
discharged, connecting to the DC voltage. During the amplification stage, the
sub-DAC’s digital output Cf is connected to either +Vref or -Vref. Through the
error overlap method, Cf gets equal to 2Cs; therefore, it has an inter-cascade gain
190 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
V ref: 8С s þ С f þ С cal þ С p
8
V out = Av V in - Di þ V offset , Di = ± 1, ð4:15Þ
i=1
8 8С s
8C s
Av = 8C s þCf þC cal þC p
ð4:16Þ
Cf þ A - C cal
C cal
5C s
+Vref
2C s Cf V
offset
-Vref
Cs
+Vagnd
a 2 15
10
1
DNL [LSB]
5
INL [LSB]
0 0
–5
–1
–10
–2 –15
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
CODE CODE
b 2 2
1 1
DNL [LSB]
INL [LSB]
0 0
–1 –1
-2 –2
0 100 200 300 400 500 600 700 800 900 1000 0 100 200 300 400 500 600 700 800 900 1000
CODE CODE
Fig. 4.36 Nonlinearity measurements before (a) and after (b) calibration
The existing means and approaches for improving the linearity of ADCs and DACs
with self-calibration do not fully meet currently formed requirements. Therefore, the
development of new means and solutions is relevant. Thus, the following principles
are proposed:
194 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
1. Introduce feedback in flash ADC, which will reduce the offset error of common
comparators by means of auxiliary DACs, FSM, and preamplifier. This will allow
to reduce the nonlinearity of the whole system.
2. Add a feedback system to the current DAC, which, by comparing the currents of
the current sources with a larger current source deviating less from the nominal
one, will allow to calibrate the currents of the latter, thus making it possible to
reduce the nonlinearity of the DAC.
3. Introduce an amplifier offset error system in pipeline ADC, which will allow to
reduce the nonlinearity of the latter, thus increasing the linearity of ADC.
The mentioned principles will allow to reduce the errors, made in manufacturing
process and, therefore, to increase yield percentage of working circuits, reducing
manufacturing costs.
4.2 Conclusions
From the discussion in Sect. 4.1 it follows that the existing measures do not
completely satisfy and do not solve the existing problems, and even if they solve
them partially, they require large-scale resources in terms of area and power.
The self-calibration flash ADC linearity improvement method discussed in Sect.
4.1 required large area and, despite that, did not solve certain problems, such as the
preamplifier offset problem, which also leads to linearity degradation. The discussed
flash ADC circuit largely solved the accuracy issue of reference voltages by
matching them to the expected voltage values, using multiple components in the
feedback circuit such as registers, ideal code generators, error detectors, as well as
discrete linear elements for matching, such as the debug input generator. To obtain
reference voltages and reduce the effect of offset error, a large number of pre-
amplifiers are used, which in turn have offset error and therefore increase the
nonlinearity of the overall system.
In order to reduce the drawbacks listed above, a method is proposed, for the
implementation of which a traditional 8-bit flash ADC is used. A flash ADC uses a
voltage divider with linear resistors to obtain reference voltages [110–118]. Then the
reference voltages are applied to the inputs of comparators, where they are compared
with the input signal, and when the reference voltage coincides with the value of the
input signal, there is a corresponding digital signal at the output of the comparator
(Fig. 4.37). The code received at the output of the comparators is thermometric; that
is why a thermometric-to-unary code converter is connected at the output. Flash
ADC contains 2n - 1 comparators (Fig. 4.38). The comparator is a simple telescopic
amplifier with high gain, whose output cascade is a common-source single-cascade
amplifier [119]. As it was mentioned in Sect. 4.1, due to the offset error of amplifiers,
the nonlinearity in the circuit increases and is the main cause of the nonlinearity. This
problem was not completely solved in the past, but the consequences of the error
were corrected, which did not give the desired result.
A circuit (Fig. 4.39) was proposed to reduce the offset error of the comparator, the
main idea of which is to use one preamplifier at the input of the comparator instead of
multiple preamplifiers, which will reduce the input offset error of the comparator
through feedback [119].
The offset error correction circuit of the comparator has two operating modes:
calibration and operating. In the calibration mode, the inputs of the circuit are
connected together by means of a switch. Calibration is done using calibration
DACs and a successive approximation algorithm. The algorithm is implemented
using FSM [119].
196 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Fig. 4.37 Block diagram of a flash ADC and general view of the method
brought to the actual operating mode, where the input offset is reduced. The FSM is
connected to the next comparator and the same process is performed until the offset
error of all comparators is reduced to possible minimum. Thus, the linearity of the
overall system increases. Two types of preamplifiers are used because there is a wide
voltage range in the system. The presented circuit is used for voltages from Vref to
198 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Vref/2, and for the remaining domains, the circuit with p-MOS input transistors is
used (Fig. 4.41) [119].
In both circuits, the calibration DACs have the same structure. The only differ-
ence is the change of the type of transistors of the input pair.
The complete view of the comparator offset correction system and feedback is
presented in Fig. 4.42 [119].
4.3 Methods of Improving the Linearity of Signal’s Analog-to-Digital. . . 199
In the calibration DAC, the current values of the current sources are set to
neutralizing the maximum possible offset value, with an added 10% margin.
INL and DNL measurements were taken before (Figs. 4.43 and 4.44) and after
(Figs. 4.45 and 4.46) calibration to evaluate the effectiveness of calibration. INL
decreased by about 70% and DNL by 73%. The nonlinearity error is less than 1 LSB;
therefore, no missing code was observed [119].
The presented simulation results correspond to the worst case with the largest
deviation in the manufacturing process, and the correction of the operating param-
eters was performed in the specified case. The complete results of DNL and INL are
given in Tables 4.2 and 4.3.
The layout of the designed flash ADC is presented in Fig. 4.47.
An external clock signal was used for FSM switching. Correction and equaliza-
tion of duty cycle of the clock signal to the nominal value of 50% was made, so that it
does not cause additional inaccuracies. The correction was made using duty cycle
corrector (DCC) (Fig. 4.48) [120].
The DCC consists of a duty cycle detector (DCD), a duty cycle regulator (DCR),
and a CML-CMOS buffer. Having received the control signal Vc, the DCR equalizes
the constant voltage components of the input signals due to the negative feedback.
DCD detects the deviation of CMOS signals from the average value. Then the DCR
calibrates the duty cycle of the signal, having at the output a clock signal with a duty
cycle close to the nominal, but not fully differential (Fig. 4.49). Then it brings
CML-CMOS signal into full differential form. It consists of a differential amplifier,
the inputs of which are connected to capacitors to filter the constant component of
200 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
the signal, two n-MOS transistors, which are necessary to equalize the constant
components of the voltage at the input to the CML-CMOS buffer. The CML-CMOS
buffer (Fig. 4.50) is necessary to obtain a fully differential signal [120].
One of the constituent parts of the DCD is the differential amplifier, which
ensures the stability of the circuit with feedback, the high operating frequency, and
the large-scale duty cycle correction layer. A schematic view of the DCD is shown in
Fig. 4.51 [120].
As shown in Fig. 4.51, DCD contains a coupled circuit that improves the
accuracy and performance of the circuit. Then the differential amplifier and filters
provide the control voltages. The control voltages are connected to the DCR by
feedback and calibrate the duty cycle until it is as close to the nominal value as
possible [120].
202 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
Thus, duty cycle, close to the nominal, is obtained, and therefore, the clock signal
does not cause an error in the operation of the circuit. A neg-C circuit was used to
reduce parasitic capacitances and their effects [121].
Figures 4.52 and 4.53 show power supply rejection ratio (PSRR) and common
mode rejection ratio (SMRR), respectively [120].
A neg-C circuit was used to reduce parasitic capacitances and their effects [121].
Thus, the built-in system of increasing linearity with self-calibration of flash ADC
was developed. Its implementation is based on a feedback system, which reduces the
4.3 Methods of Improving the Linearity of Signal’s Analog-to-Digital. . . 203
As discussed in Sect. 4.1, current source mismatches in current DACs cause the
DAC transfer function to be nonlinear. In order to solve the problem, in most of the
existing circuits, a comparison is made with the calibrated current and the total
currents of all the binary currents. As a result of comparison, the value of the
calibrated thermometric current source is equalized to the total current of binary
weighted current sources. The idea of this method is that the currents of all thermo-
metric branches are equal to the nominal value, and the transfer function is more
monotonous in the thermometric part. The basis of the method is the idea that the
deviations of separate binary weight current sources in different directions lead to the
fact that the value of the total current is close to the nominal, ideal value of the
thermometric current. But the circuit has a drawback [17]. In case that the deviations
of binary weighted transistors are in one direction, the current deviation is also in one
direction, and therefore a deviation from nominal occurs. A method was proposed to
avoid the problem (Fig. 4.54).
It is known that the larger physical dimensions of the electrical component, the
smaller the deviation of its electrical parameters from the nominal. Taking into
account the above, it is possible to use a large-scale transistor as a reference current
source, the current of which will be equal to the nominal value, and it will itself be
compared with the currents of thermometric branches, and according to its current,
the correction of the current of the thermometric branches will be made. Monte Carlo
simulations were performed to evaluate the current deviations of binary weighted
transistors and a single large current source (Figs. 4.55 and 4.56) [17].
The current of the large current source deviates from the nominal by ±10%, and
the total current of the binary weight transistors—±25%; therefore, in the process of
FSM, equalize the currents of thermometric branches to the current value of the
nominal current source (Fig. 4.57).
To evaluate the effectiveness of the method, a comparison was made with the
previous method. Nonlinearity parameters INL and DNL were measured (Tables 4.4
and 4.5). The results and the layout of DAC are shown in Figs. 4.58, 4.59, 4.60, 4.61,
and 4.62 [17].
Thus, an embedded means of increasing the linearity of the current DAC was
proposed. The embedded system compares the values of the current sources of the
current DAC with the nominal current and performs current matching through
feedback. As a result of current matching, the linearity of the current DAC increases.
The comparison was made with a large current source, the current of which does not
208 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
INL
INL(LSB) : CODE(s)
0.75
INL
0.5
0.25
INL(LSB)
0.0
–0.25
–0.5
–0.75
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
INL
INL(LSB) : CODE(s)
0.75
INL
0.5
0.25
INL(LSB)
0.0
–0.25
–0.5
–0.75
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
DNL
DNL(LSB) : CODE(s)
DNL
0.2
0.1
0.0
–0.1
DNL(LSB)
–0.2
–0.3
–0.4
–0.5
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
DNL
DNL(LSB) : CODE(s)
DNL
0.2
0.1
0.0
–0.1
DNL(LSB)
–0.2
–0.3
–0.4
–0.5
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
deviate during the manufacturing process. The used area has slightly increased by
approximately 5% compared to existing solutions, resulting in a 20–25% more
efficient system.
As discussed in Sect. 4.1, the three main causes of nonlinearity in pipeline ADCs are
amplifier gain error, amplifier offset error, and nonlinearity in sub-DACs or ADCs.
Basically, in the previous works the amplification error of the amplifiers was
corrected, but the other two were neglected; however, quite significant results were
obtained. The amplifiers are designed with high precision and their frequency
parameters are mainly deviated by parasitic capacitances, the effect of which is
reduced by the negative capacitance circuit [61, 121].
In the proposed method, the offset error of the amplifiers undergoing calibration
is corrected by feedback (Fig. 4.63).
Correction of offset error is performed in two stages by means of switches.
During the first stage, g1 and g2 are connected to the capacitor, and g3 disconnects
the input from the output. The input offset (4.17) is established on the capacitor.
4.3 Methods of Improving the Linearity of Signal’s Analog-to-Digital. . . 211
A
Vc = V ð4:17Þ
A þ 1 offset
During the second stage, g3 is connected to the capacitor, and g1 and g2 are
disconnected from the capacitor. Thus, when a signal is applied to the input of the
amplifier, Vc voltage on the capacitor is added to it, which is equal to the input offset.
Therefore, the offset error is eliminated and the linearity of the amplifier is increased.
The method is applicable to all cascade amplifiers (Fig. 4.64).
As a result of applying the method, the linearity of pipeline ADC will increase.
After applying the method, simulations were performed. Measurements of INL and
DNL were made before (Figs. 4.65 and 4.66) and after (Figs. 4.67 and 4.68)
application of the method. The system reduces the nonlinearity of the ADC by
approximately 2.5 times (Table 4.6).
Thus, a means of improving the linearity of pipeline ADC with self-calibration
has been developed. By reducing the system, the offset error of the amplifiers
increases the linearity of the overall system. The implementation of the tool was
carried out with the introduction of as few auxiliary elements as possible, and
compared to the existing solutions, it has greater efficiency, and less area is required
for the implementation of the method. This method is also universal and can be used
in pipeline ADCs with different structures with minimal modifications. The system
effectively reduces the nonlinearity of the ADC by about 2.5 times. Compared to the
previous method, it does not reduce the gain error.
212 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
DNL
DNL(LSB) : CODE(s)
1.5
DNL
1.0
0.5
DNL(LSB)
0.0
–0.5
–1.0
–1.5
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
INL
DNL(LSB) : CODE(s)
0.6
DNL
0.4
0.2
DNL(LSB)
0.0
–0.2
–0.4
–0.6
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
INL
INL(LSB) : CODE(s)
INL
2.0
1.5
1.0
0.5
INL(LSB)
0.0
–0.5
–1.0
–1.5
–2.0
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
INL
INL(LSB) : CODE(s)
0.6 INL
0.4
0.2
INL(LSB)
0.0
–0.2
–0.4
–0.6
0.0 20.0 40.0 60.0 80.0 100.0 120.0 140.0 160.0 180.0 200.0 220.0 240.0 260.0
CODE(s)
4.4 Conclusion
References
1. H. Zhou, X. Gui, P. Gao, Design of a 12-bit 0.83 MS/s SAR ADC for an IPMI SoC. 28th IEEE
international system-on-chip conference (SOCC) (2015), pp. 175–179
2. S. Saisundar, J.H. Cheong, M. Je, A 1.8 V 1MS/s rail-to-rail 10-bit SAR ADC in 0.18 μm
CMOS. 2012 IEEE international symposium on radio-frequency integration technology
(RFIT) (2012), pp. 83–85
3. G. Park, M. Song, A CMOS current-steering D/A converter with full-swing output voltage and
a quaternary driver. IEEE Trans. Circuits Syst. II Express Briefs 62, 441–445 (2014)
4. Y. Lan, J. Zhu, C. Wang, A method for compensating the D/A converter frequency response
distortion in different nyquist zones. IEEE 2nd international conference on electronics tech-
nology (ICET) (2019), pp. 84–87
5. G.I. Radulov, P.J. Quinn, H. Hegt, A. van Roermund, An on-chip self-calibration method for
current mismatch in D/A converters. Proceedings of the 31st European solid-state circuits
conference (ESSCIRC) (2005), pp. 169–172
References 215
6. G.A.M. Van Der Plas, A 14-bit intrinsic accuracy Q2 random walk m 90 CMOS DAC. IEEE
J. Solid-State Circuits 34, 12 (1999)
7. Y. Luo, L. Qi, A. Jain, M. Ortmanns, A high-resolution delta-sigma D/A converter architecture
with high tolerance to DAC mismatch. IEEE international symposium on circuits and systems
(ISCAS) (2018), pp. 1–5
8. S. Kulis, D. Yang, D. Ghong, et al., 26th A high-resolution, wide-range, radiation-hard clock
phase-shifter in a 65 nm CMOS technology. International conference “mixed design of
integrated circuits and systems” (MIXDES) (2019), pp. 147–150
9. W. Kester, Data Conversion Handbook (Engineeri A. D. I., 2005), pp. 976
10. D.K. Jung, Y.H. Jung, T. Yoo, et al.. A 12-bit multi-channel RR DAC using a shared resistor
string scheme for area-efficient display source driver. IEEE transactions on circuits and
systems I: Regular papers (2018), pp. 3688–3697
11. V. Kommangunta, K. Shehzad, D. Verma, et al., Low-power area-efficient 8-bit coarse-fine
resistor-string DAC. IEEE international conference on consumer electronics-Asia (ICCE-
Asia) (2020), pp. 1–3
12. B.D. Yang, Y.K. Shin, K.C. Ryu, et al., An area-efficient coarse-fine resistor-string D/A
converter. First IEEE Latin American symposium on circuits and systems (LASCAS)
(2010), pp. 29–32
13. D. Chen, A 16b 5MSPS two-stage pipeline ADC with self-calibrated technology. International
conference on information and computer technologies (ICICT) (2018), pp. 155–158
14. C.W. Lu, P.Y. Yin, M.Y. Lin, A 10-bit two-stage R-DAC with isolating source followers for
TFT-LCD and AMOLED column-driver ICs. IEEE Trans. Very Large Scale Integration
(VLSI) Syst. 27, 326–336 (2018)
15. S. Mahdavi, R. Ebrahimi, A. Daneshdoust, et al., A 12 bit 800MS/s and 1.37 mW digital to
analog converter (DAC) based on Novel RC technique. IEEE international conference on
power, control, signals and instrumentation engineering (ICPCSI) (2017), pp. 163–166
16. J.S. Na, S.K. Hong, O.K. Kwon, A highly linear 10-bit DAC of data driver IC using source
degeneration load for active matrix flat-panel displays. IEEE Trans. Circuits Syst. II Express
Briefs 67, 2312–2316 (2020)
17. A. Atanesyan, M. Grigoryan, H. Margaryan, et al., Method of increasing current DAC linearity
with considering its random variables for modeling risk or uncertainty. Вестник РАУ 2, 64–70
(2020)
18. D.A. Johns, K. Martin, Analog Integrated Circuit Design (Wiley, New York, 2008), p. 696
19. A. Fayed, M. Ismail, Adaptive Techniques for Mixed Signal System on Chip (Springer Science
& Business Media, 2006), p. 178
20. T. Shirakawa, R. Sakai, S. Nakatake, On-chip Impedance evaluation with auto-calibration
based on auto-balancing bridge. IEEE 61st international midwest symposium on circuits and
systems (MWSCAS) (2018), pp. 262–265
21. B. Razavi, The flash adc [a circuit for all seasons]. IEEE Solid-State Circuits Magazine 9, 9–13
(2017)
22. A. Payra, P. Dutta, A. Sarkar, S.K. Sen, et al., Design of a self regulated flash type ADC with
high resolution. Michael Faraday IET International Summit (2015), pp. 591–595
23. R.M. Shende, P.R. Gumble, VLSI design of low power high speed 4 bit resolution pipeline
ADC in submicron CMOS technology. Int. J. VLSI Design Commun. Syst. 2, 81 (2011)
24. D.C. Daly, A.P. Chandrakasan, A 6-bit, 0.2 V to 0.9 V highly digital flash ADC with
comparator redundancy. IEEE J. Solid State Circuits 44, 3030–3038 (2009)
25. P. Ritter, S. Le Tual, B. Allard, et al., Design considerations for a 6 bit 20 GS/s SiGe BiCMOS
flash ADC without track-and-hold. IEEE J. Solid State Circuits 49, 1886–1894 (2014)
26. R.J. Van de Plassche, CMOS Integrated Analog-to-Digital and Digital-to-Analog Converters
(Springer Science & Business Media, 2013), p. 588
27. E. Sall, M. Vesterbacka, K.O. Andersson, A study of digital decoders in flash analog-to-digital
converters. IEEE international symposium on circuits and systems (IEEE Cat.
No. 04CH37512) 2004, pp. I–I
216 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
49. J. Oliveira, J. Goes, M. Figueiredo, et al., An 8-bit 120-MS/s interleaved CMOS pipeline ADC
based on MOS parametric amplification. IEEE Trans. Circuits Syst. II Express Briefs,
105–109 (2010)
50. S. Devarajan, L. Singer, D. Kelly, et al., A 12-b 10-GS/s interleaved pipeline ADC in 28-nm
CMOS technology. IEEE J. Solid State Circuits 52, 3204–3218 (2017)
51. H. Van de Vel, B.A. Buter, H. van der Ploeg, et al., A 1.2-V 250-mW 14-b 100-MS/s digitally
calibrated pipeline ADC in 90-nm CMOS. IEEE J. Solid State Circuits 44, 1047–1056 (2009)
52. J. Wu, A. Chou, C. H. Yang, et al., A 5.4 gs/s 12b 500 mW pipeline adc in 28nm cmos.
Symposium on VLSI circuits (2013), pp. 92–93
53. D. Vecchi, J. Mulder, F. M. van der Goes, et al., An 800MS/s dual-residue pipeline ADC in
40nm CMOS. IEEE international solid-state circuits conference (2011), pp. 184–186
54. C.Y. Chen, J. Wu, J.J. Hung, et al., A 12-bit 3 GS/s pipeline ADC with 0.4 mm2 and 500 mW
in 40 nm digital CMOS. IEEE J. Solid State Circuits 47, 1013–1021 (2012)
55. Understanding Pipelined ADCs. https://pdfserv.maximintegrated.com/en/an/AN1023.pdf
56. T. Liechti, A. Tajalli, O.C. Akgun, et al., A 1.8 V 12-bit 230-MS/s pipeline ADC in 0.18 μm
CMOS technology. IEEE Asia Pacific conference on circuits and systems (APCCAS) (2008),
pp. 21–24
57. B. Verbruggen, J. Craninckx, M. Kuijk, et al., A 2.6 mW 6 bit 2.2 GS/s fully dynamic pipeline
ADC in 40 nm digital CMOS. IEEE J. Solid State Circuits 45, 2080–2090 (2010)
58. C. Wang, X. Wang, Y. Ding, et al., A 14-bit 250MS/s low-power pipeline ADC with aperture
error eliminating technique. IEEE international symposium on circuits and systems (ISCAS)
(2018), pp. 1–5
59. D. Miyazaki, M. Furuta, S. Kawahito, A 75 mW 10 bit 120MSample/s parallel pipeline ADC.
29th European solid-state circuits conference (ESSCIRC) (IEEE Cat. No. 03EX705) (2003),
pp. 719–722
60. Y.M. Lin, B. Kim, P.R. Gray, A 13-b 2.5-MHz self-calibrated pipelined A/D converter in
3-mu m CMOS. IEEE J. Solid State Circuits 26, 628–636 (1991)
61. M.T. Grigoryan, A.A. Atanesyan, G.H. Hakobyan, et al., Two stage CTLE for high speed data
receiving. IEEE 40th international conference on electronics and nanotechnology (ELNANO)
(2020), pp. 374–377
62. C.K. Hsu, T.R. Andeen, N. Sun, A pipeline SAR ADC with second-order interstage gain error
shaping. IEEE J. Solid State Circuits 55, 1032–1042 (2020)
63. J. Li, U.K. Moon, A 1.8-V 67-mW 10-bit 100-MS/s pipelined ADC using time-shifted CDS
technique. IEEE J. Solid State Circuits 39, 1468–1476 (2004)
64. C.K. Hsu, N. Sun, A 75.8 dB-SNDR Pipeline SAR ADC with 2 nd-order interstage gain error
shaping. Symposium on VLSI circuits 2019, pp. 68–69
65. J. Zhong, Y. Zhu, S.W. Sin, et al., Inter-stage gain error self-calibration of a 31.5 fJ 10b
470MS/s pipelined-SAR ADC. IEEE Asian solid state circuits conference (A-SSCC) (2012),
pp. 153–156
66. H. Yu, M.C.F. Chang, A 1-V 1.25-GS/S 8-bit self-calibrated flash ADC in 90-nm digital
CMOS. IEEE Trans. Circuits Syst. II Express Briefs 55, 668–672 (2008)
67. J. Sun, J. Wu, A self-calibrated multiphase timing system in time-interleaved ADC. IEEE 2nd
advanced information technology, electronic and automation control conference (IAEAC)
(2017), pp. 292–295
68. M.S. Reddy, S.T. Rahaman, An effective 6-bit flash ADC using low power CMOS technol-
ogy. 15th international conference on advanced computing technologies (ICACT) (2013),
pp. 1–4
69. A.S.T.H.F. Kuttner, C. Sandner, M. Clara, A 6bit, 1.2 gsps low-power flash-adc in 0.13 μm
digital cmos. IEEE J. Solid State Circuits, 111–115 (2005)
70. S. Park, Y. Palaskas, A. Ravi, et al., A 3.5 GS/s 5-b flash ADCin 90 nm CMOS. IEEE custom
integrated circuits conference (2006), pp. 489–492
218 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
71. P.V. Rahul, A.A. Kulkarni, S. Sankanur, et al., Reduced comparators for low power flash adc
using tsmc018. International conference on microelectronic devices, circuits and systems
(ICMDCS) (2017), pp. 1–5
72. T. M. Ignatius, J. K. Antony, S. R. Mary, Implementation of high performance dynamic flash
ADC. Annual international conference on emerging research areas: magnetics, machines and
drives (AICERA/iCMMD, IEEE) (2014), pp. 1–5
73. H. Fan, J. Li, Q. Feng, et al., Exploiting smallest error to calibrate non-linearity in SAR ADCs.
IEEE Access (2018), pp. 42930–42940
74. D.S. Shylu, S. Radha, P.S. Paul, P.S. Sudeepa, Design of low power 4-bit Flash ADC in 90nm
CMOS process. 2nd international conference on signal processing and communication
(ICSPC) (2019), pp. 252–257
75. A. Van Roermund, M. Vertreg, D. Leenaerts, et al., A 12 b 500 MS/s DAC with> 70dB SFDR
up to 120 MHz in 0.18 μm CMOS. IEEE international digest of technical papers. solid-state
circuits conference (ISSCC) (2005), pp. 116–117
76. G. Radulov, P. Quinn, A 0.037 mm2 1GSps 12b self-calibrated 40nm CMOS DAC cell with
SFDR> 60 dB up to 200 MHz and IM3<—60dB up to 350MHz. European conference on
circuit theory and design (ECCTD) (2020), pp. 1–4
77. A.R. Bugeja, B.S. Song, A self-trimming 14-b 100-MS/s CMOS DAC. IEEE J. Solid State
Circuits 35, 1841–1852 (2000)
78. D. Arbet, G. Nagy, V. Stopjaková, G. Gyepes, A self-calibrated binary weighted DAC in
90nm CMOS technology. 29th international conference on microelectronics proceedings
(MIEL) (2014), pp. 383–386
79. J.H. Chi, S.H. Chu, T.H. Tsai, A 1.8-V 12-bit 250-MS/s 25-mW self-calibrated DAC. Pro-
ceedings of ESSCIRC (2010), pp. 222–225
80. T. Rabuske, J. Fernandes, F. Rabuske, et al., A self-calibrated 10-bit 1 MSps SAR ADC with
reduced-voltage charge-sharing DAC. IEEE international symposium on circuits and systems
(ISCAS) (2013), pp. 2452–2455
81. H.W. Chen, W.T. Shen, W.C. Cheng, H.S. Chen, A 10b 320MS/s self-calibrated pipeline
ADC. IEEE Asian solid-state circuits conference (2010), pp. 1–4
82. S.K. Gupta, M.A. Inerfield, J. Wang, A 1-GS/s 11-bit ADC with 55-dB SNDR, 250-mW
power realized by a high bandwidth scalable time-interleaved architecture. IEEE J. Solid State
Circuits 41, 2650–2657 (2006)
83. Y. Chen, J. Wang, H. Hu, et al., A 200 MS/s, 11 bit SAR-assisted pipeline ADC with bias-
enhanced ring amplifier. IEEE international symposium on circuits and systems (ISCAS)
(2017), pp. 1–4
84. Y.D. Jeon, Y.K. Cho, J.W. Nam, et al., A 9.15 mW 0.22 mm 2 10b 204MS/s pipelined SAR
ADC in 65nm CMOS. IEEE custom integrated circuits conference (2010), pp. 1–4
85. W. Li, F. Li, C. Yang, et al., An 85 mW 14-bit 150 MS/s pipelined ADC with a merged first
and second MDAC. China communications (2015), pp. 14–21
86. S.M. Louwsma, A.J.M. van Tuijl, M. Vertregt, B. Nauta, A 1.35 GS/s, 10 b, 175 mW time-
interleaved AD converter in 0.13 μm CMOS. IEEE J. Solid State Circuits 43, 778–786 (2008)
87. K. Gulati, M.S. Peng, A. Pulincherry, et al., A highly integrated CMOS analog baseband
transceiver with 180 MSPS 13-bit pipelined CMOS ADC and dual 12-bit DACs. IEEE J. Solid
State Circuits 41, 1856–1866 (2006)
88. A. Verma, B. Razavi, A 10-bit 500-ms/s 55-mW cmos adc. IEEE J. Solid State Circuits 44,
3039–3050 (2009)
89. S.H.W. Chiang, H. Sun, B. Razavi, A 10-bit 800-mhz 19-mw CMOS adc. IEEE J. Solid State
Circuits 49, 935–949 (2014)
90. B.D. Sahoo, B. Razavi, A 10-b 1-GHz 33-mW CMOS ADC. IEEE J. Solid State Circuits 48,
1442–1452 (2013)
91. A.M. Ali, H. Dinc, P. Bhoraskar, et al., A 14-bit 2.5 GS/s and 5GS/s RF sampling ADC with
background calibration and dither. IEEE symposium on VLSI circuits (VLSI-Circuits) (2016),
pp. 1–2
References 219
92. C.H. Chan, Y. Zhu, Z. Zheng, et al., A 39mW 7b 8GS/s 8-way TI ADC with cross-linearized
input and bootstrapped sampling buffer front-end. 44th European solid state circuits confer-
ence (ESSCIRC) (2018), pp. 254–257
93. Y. Haque, D.E. Lewis, R. Hales, et al., Time interleaved 16 bit, 250MS/s ADC using a hybrid
voltage/current mode architecture with foreground calibration. 40th European solid state
circuits conference (ESSCIRC) (2014), pp. 59–62
94. C.H. Chan, Y. Zhu, W.H. Zhang, et al., A two-way interleaved 7-b 2.4-GS/s 1-then-2 b/cycle
SAR ADC with background offset calibration. IEEE J. Solid State Circuits 53, 850–860
(2018)
95. C.H. Chan, Y. Zhu, S.W. Sin, et al., 26.5 A 5.5 mW 6b 5GS/S 4×-lnterleaved 3b/cycle SAR
ADC in 65nm CMOS. IEEE international solid-state circuits conference (ISSCC) Digest of
Technical Papers (2015), pp. 1–3
96. M. Baert, W. Dehaene, 20.1 a 5GS/s 7.2 ENOB time-interleaved VCO-based ADC achieving
30.5 fJ/conv-step. IEEE international solid-state circuits conference (ISSCC) (2019),
pp. 328–330
97. A. Ramkaj, J.C.P. Ramos, Y. Lyu, et al., 3.3 A 5GS/s 158.6 mW 12b Passive-Sampling
8×-Interleaved Hybrid ADC with 9.4 ENOB and 160.5 dB FoM S in 28nm CMOS. IEEE
international solid-state circuits conference (ISSCC) (2019), pp. 62–64
98. B. Vaz, B. Verbruggen, C. Erdmann, et al., A 13bit 5GS/s ADC with time-interleaved
chopping calibration in 16nm FinFET. 2018 IEEE symposium on VLSI circuits (2018),
pp. 99–100
99. M.B. Dayanik, D. Weyer, M.P. Flynn A 5GS/s 156MHz BW 70dB DR continuous-time
sigma-delta modulator with time-interleaved reference data-weighted averaging. 2017 sym-
posium on VLSI circuits (2017), pp. 38–39
100. Z. Yu, D. Chen, Algorithm for dramatically improved efficiency in ADC linearity test. IEEE
international test conference (2012), pp. 1–10
101. B. Chen, M. Maddox, M.C. Coln, et al., Precision passive-charge-sharing SAR ADC: Anal-
ysis, design, and measurement results. IEEE J. Solid State Circuits 53, 1481–1492 (2018)
102. L. Jin, K. Parthasarathy, T. Kuyel, et al., Accurate testing of analog-to-digital converters using
low linearity signals with stimulus error identification and removal. IEEE Trans. Instrum.
Meas. 54, 1188–1199 (2005)
103. L. Jin, D. Chen, R.L. Geiger, SEIR linearity testing of precision a/D converters in
nonstationary environments with center-symmetric interleaving. IEEE Trans. Instrum. Meas.
56, 1776–1785 (2007)
104. S. Kook, H.W. Choi, A. Chatterjee, Low-resolution DAC-driven linearity testing of higher
resolution ADCs using polynomial fitting measurements. IEEE transactions on very large
scale integration (VLSI) systems (2012), pp. 454–464
105. H. Xu, L. Wang, R. Yuan, Y. Chang, A/D converter background calibration algorithm based
on neural network. International conference on electronics technology (ICET IEEE) (2018),
pp. 1–4
106. H. Xu, L. Wang, R. Yuan, Y. Chang, Combined spectral and histogram analysis for fast ADC
testing. IEEE Trans. Instrum. Meas. 54, 1617–1623 (2005)
107. H.M. Chang, C.H. Chen, K.Y. Lin, K.T. Cheng, Calibration and testing time reduction
techniques for a digitally-calibrated pipelined ADC. 27th IEEE VLSI test symposium
(2009), pp. 291–296
108. J. Duan, L. Jin, D. Chen, INL based dynamic performance estimation for ADC BIST.
Proceedings of 2010 ieee international symposium on circuits and systems (2010),
pp. 3028–3031
109. T. Liu, L. Chen, L. Liu, et al., A calibration method of SFDR based on INL for pipelined A/D
converters. IEEE international conference on electron devices and solid-state circuits (2014),
pp. 1–2
220 4 Methods to Improve Linearity of Signal’s Analog-to-Digital Conversion. . .
110. D.R. Oh, J.I. Kim, D.S. Jo, et al., A 65-nm CMOS 6-bit 2.5-GS/s 7.5-mW 8$\times $ time-
domain interpolating flash ADC with sequential slope-matching offset calibration. IEEE
J. Solid State Circuits, 288–297 (2018)
111. H. Tang, H. Zhao, S. Fan, et al., Design technique for interpolated flash ADC. 10th IEEE
international conference on solid-state and integrated circuit technology (2010), pp. 180–183
112. H.C. Lee, J.A. Abraham, A novel low power 11-bit hybrid ADC using flash and delay line
architectures. Design, automation & test in Europe conference & exhibition (DATE IEEE)
(2014), pp. 1–4
113. J. Wu, F. Li, W. Li et al., A 14-bit 200MS/s low-power pipelined flash-SAR ADC. IEEE 58th
international midwest symposium on circuits and systems (MWSCAS) (2015), pp. 1–4
114. Y.K. Cho, J.H. Jung, K.C. Lee, A 9-bit 100-MS/s flash-SAR ADC without track-and-hold
circuits. International symposium on wireless communication systems (ISWCS) (2012),
pp. 880–884
115. P. Dhage, P. Jadhav, Design of power efficient hybrid flash-successive approximation register
analog to digital converter. International conference on communication and signal processing
(ICCSP) (2017), pp. 462–466
116. S. Fan, H. Zhao, H. Tang, et al., Mixed AC/DC-coupled averaging technique for ADC
nonlinearity reduction. 2nd Asia symposium on quality electronic design (ASQED) (2010),
pp. 102–105
117. J. Ren, J. Xiong, J. Liu, High-speed ADC quantization with overlapping metastability zones.
IEEE 61st international midwest symposium on circuits and systems (MWSCAS) (2018),
pp. 234–237
118. M. Miyahara, I. Mano, M. Nakayama, et al., 22.6 A 2.2 GS/s 7b 27.4 mW time-based folding-
flash ADC with resistively averaged voltage-to-time amplifiers. IEEE international solid-state
circuits conference digest of technical papers (ISSCC) (2014), pp. 388–389
119. A.A. Atanesyan, An on-chip self-calibration method for 8-bit flash ADC. Proceedings of the
Republic of Armenia National Academy of Sciences and National Polytechnic University of
Armenia: Series of technical sciences (2021), pp. 75–82
120. V.S. Melikyan, A.A. Atanesyan, M.T. Grigoryan et al., Duty-cycle correction circuit for high
speed interfaces. IEEE 39th international conference on electronics and nanotechnology
(ELNANO) (2019), pp. 42–45
121. A. Atanesyan, CMOS negative capacitance with improved AC performance. RAU manual
(2019), pp. 49–58
Chapter 5
Design of High-performance Heterogeneous
Integrated Circuits
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 221
V. Melikyan, Machine Learning-based Design and Optimization of High-Speed
Circuits, https://doi.org/10.1007/978-3-031-50714-4_5
222 5 Design of High-performance Heterogeneous Integrated Circuits
10,000
1000
100
10
1
1970 1980 1990 2000 2010 2020
Power (W) Frequency (MHz) Number of cores
1
MSPI = ð5:1Þ
ðseq% þ ð1 - seq%Þ=N Þ
where N is the number of cores. The higher the percentage of the sequential program
in the total one, the lower the MSPI saturates, depending on the number of cores.
As mentioned above, multi-core CPUs also provide parallelism at the instruction
execution level. For example, a processor consisting of two cores in the case of the
same frequency can provide up to twice higher performance compared to a single-
core processor [15, 16]. Currently, single-core processors are used only in systems
where there is no need for high computing power.
The architectures of multi-core processors in turn have certain limitations. Inter-
connection logic circuits in processors limit the bandwidth of data transfer between
cores. The larger the number of cores, the larger interconnection circuit is needed
to ensure the connection between them [17, 18]. For example, in order to provide
more than eight cores in one IC, one needs to have a network in IC instead of
interconnection [19].
There are two types of multi-core ICs. The first are processors that have several
cores and a first- or second-level shared cache memory [20]. The above examples
belong to this class. The second is those ICs where there are hundreds or thousands
of cores that are logically connected to each other through a network [21]. In these
systems, groups of several cells can form clusters. Clusters usually consist of one or
more cores, and these cores have shared memory. An example of such ICs is graphs
processor units (GPUs) [22]. GPUs are used to display an image on a monitor by
processing parallel three-dimensional objects. In order to solve this problem, GPUs
are designed with small computing cores. These cores are not as compatible as
processors, but enable the programmer to perform mathematical calculations and
perform them in parallel [23]. Current GPUs include tens of thousands of cores.
However, such systems also have certain limitations. Here, all cores are identical and
perform the same type of operations.
Along with all this, there is a limit to the number of concurrently operating cores
[24]. When all cores are in active mode at the same time, the power consumption of
IC increases significantly [25]. Its increase leads to such temperature values, in
which the efficiency of IC operation decreases. This problem is solved by limiting
the number of concurrently active cores or by reducing the operating frequency. In
both cases, IC performance decreases.
One of the existing solutions to these limitations as problems is the application of
heterogeneous systems, when computing cores of different types and with instruc-
tions or data parallelism are used [26]. A heterogeneous IC can consist of cores with
the same instruction set but different microarchitectures, or cores with different
instruction sets, or cores-accelerators that do not have an instruction set [27]. And
special-purpose cores allow to ensure higher performance compared to compatible
cores at the same power cost. Thus, when performing calculations using heteroge-
neous ICs, it is possible to achieve greater productivity while maintaining power
consumption, or in the case of a lower value of the latter, while maintaining
performance.
224 5 Design of High-performance Heterogeneous Integrated Circuits
1000
Simulation runtime,v
100
10
1
256 512 1024 1536 2048 3072 4096
Number of molecules, piece
Fortran CPU Hibrid CPU/GPU
100
80
60
40
20
0
256 512 768 1024128015361792204823042560281630723328358438404096
In summary, the factors that cause the demand for the design of heterogeneous
ICs can be mentioned.
Flexible architecture. Heterogeneous ICs are multi-core computing systems; for the
implementation of which it is necessary to design special-purpose and compatible
cores. Compatible cores will enable complex algorithms to be implemented at the
programming level, and special-purpose cores for low-power and high-
performance calculations. Optimal resource allocation will ensure maximum
performance.
Compatibility and reduction of power consumption. As mentioned above, improving
compatibility leads to increase in power consumption. For this reason, heteroge-
neous ICs are used to relieve the burden of cores in VLSI. The improvement of
design problems of the latter and the availability of flexible architecture contrib-
ute to the reduction of IC compatibility and the complexity of the implementation
of cores, which in turn leads to the reduction of power consumption.
Reduction of design time. The design of special ICs, depending on the field of
application, may have different features due to the presence of different comput-
ing nodes and interfaces. And with a flexible architectural heterogeneous IC
design tool, design time can be significantly reduced by using common interfaces
and configurable components.
There are four types of processor systems according to Michael Flynn’s classifica-
tion [30]:
• Single instruction, single data (SISD)
• Single instruction, multiple data (SIMD)
• Multiple instruction, single data (MISD)
• Multiple instruction, multiple data (MIMD)
SISD—systems with this architecture consist of one single-core processor that
executes one instruction at a time, and data is read from one memory [31]. The SISD
architecture (Fig. 5.5) corresponds to Von Neumann architecture. Such architectures
have parameters for executing asynchronous instructions [32]. Pipelined or
superscalar processors are implemented with SISD architecture. Instructions are
received from the control unit, decoded and executed by the processor, which
reads the data, processes it according to the instruction, and records it in memory.
Superscalar processors are CPUs with instruction-level parallelism. Unlike the
classic Von Neumann architecture, which can execute one instruction during one
period of the clock signal, superscalar processors have the ability to execute more
than one instruction in one clock cycle [33]. This is done in the CPU by integrating
more than one instruction execution units, the operation of which is independent of
each other and in parallel. It is important to note that each instruction execution unit
is not a separate processor. The single-core superscalar processor belongs to the
SISD class [34].
226 5 Design of High-performance Heterogeneous Integrated Circuits
As it can be seen, different cores execute the same instruction at the same time.
The same instruction set is executed in a different way through pipelined SIMD
processor. Unlike VP, pipelined SIMD processor cores are not mutually exclusive,
and each core executes one unique instruction. Execution of the above-mentioned
program by pipelined SIMD processor [38] is presented in Fig. 5.10.
As shown, with pipelined SIMD processor, the instructions are executed sequen-
tially, and unlike VP, the same instructions are executed in the same core, different
instructions are executed at the same time.
The advantage of the pipelined SIMD processor architecture compared to VP is
that the cores are not compatible and execute unique instructions, due to which they
are simpler from a structural point of view, occupy small area, and have low power
consumption. To increase the performance of the pipelined SIMD processor, the
number of cores executing the same instructions is increased [40].
SIMD architectures are used in GPU ICs, video processing applications, machine
learning, as well as heterogeneous ICs [41].
ICs with the MISD architecture execute different instructions on the same data.
They are not widely used in consumer electronics and are mainly used in space
systems where the probability of errors is high and the same processing is performed
by more than one core [42]. The structure of the MISD processor is presented in
Fig. 5.11 which is done for the purpose that the failure of one circuit in space devices
does not affect the overall system.
Processors with the MIMD structure consist of compatible cores, each of which in
turn has a separate instruction array and data array. Due to their structure, these ICs
provide maximum performance, but in other parameters they are inferior to all the
5.1 General Issues of Designing Means for High-performance. . . 229
above-mentioned architectures [43]. As seen from the MIMD structure (Fig. 5.12), it
consists of independent cores that receive independent instructions and perform
operations based on an independent data stream or data stream in shared
memory [44].
MIMD architectures are used in modern personal computers (PCs), mobile
computers, and SoCs of smartphones. They are suitable for multi-flow applications
and data-intensive tasks [35].
The most important issues for the design of heterogeneous ICs are:
• Implementation of a flexible or configurable architecture
• Data transfer issues between clock domains
• Implementation of data transfer interfaces
The first step in the design of heterogeneous ICs is the choice of architecture. It
exclusively depends on the field of application of the given heterogeneous IC, in
230 5 Design of High-performance Heterogeneous Integrated Circuits
particular, on the modern computing requirements in that field. For example, net-
work devices are mainly smart systems that require processing of data stream
supplied by sensors but have very strict requirements in terms of power consump-
tion. Here, power consumption is preferred over the above requirements. It is
desirable to choose such an architecture, which will consist of one or more ASIP
cores [45]. In another example, in the case of video processing performed in cloud
systems, the advantage is given to performance requirements, because these systems
have constant power supply and external cooling devices [46]. In order to obtain
maximum performance, it is advised to choose an architecture that provides high
parallelism at the data and instruction level. In conclusion, the choice of architecture
strongly depends on the field of application, and in order to reduce the design time,
there is a need to create a software environment that will enable the generation of a
heterogeneous IC with the required architecture from the existing design tools. In
those design means, depending on technical requirements, it is necessary to realize
the optimal data flow and ensure the interconnections between the components with
appropriate interfaces [47].
Another important task is to meet power consumption requirements. For this, it is
necessary that the implemented IC has low power consumption modes, when the
main computing cores are in an inactive state, and the central control unit, whether
automatic or CPU, should have a wide operating frequency range. By lowering the
frequency, the supply voltage and therefore the power consumption can be reduced.
In addition to low-power operating modes, a heterogeneous IC should have a
low-performance mode, when both cores and computing nodes operate at low
frequencies, which also leads to a decrease in power consumption [48].
Based on the requirements of power consumption and the characteristics of the
structure of heterogeneous ICs, an important design issue is to ensure uninterrupted
and as fast as possible data transfer between domains that are asynchronous to each
other. Computing cores can have different operating frequencies.
Thus, one of the most relevant problems of modern IC creation is the develop-
ment of heterogeneous IC system design tools to significantly improve their main
parameters and shorten the design period.
filters, converting one image representation format to another, and so on. The
processing of that data involves performing mathematical operations on the pixel
data, such as addition, subtraction, multiplication, division, matrix multiplication,
etc. These operations are expensive in terms of power consumption and require
parallel processing of video image to be able to meet today’s strict requirements
[49]. Currently, IC systems of mobile phones are integrating ASICs to perform video
processing. As mentioned above, the increase in the amount of data requires the
development of tools for designing heterogeneous architectures.
An inseparable part of heterogeneous architectures are control units of data
transfer interfaces. These interfaces can be used both for data transfer between
internal nodes and between devices externally connected to ICs. One of the widely
used interfaces is universal asynchronous receiver-transmitter (UART) [50]. It is
designed to transfer data between external devices. UART interface is used in such
devices in which data transfer speed does not play a primary role [51]. However, this
interface is not flexible in terms of application. The operating frequency is fixed and
does not change over time. Basically, devices equipped with UART interface are
used with default settings and are not subject to change. Such interfaces are used to
exchange data between control registers in heterogeneous ICs (and not only) [52].
Each processor node in computer systems must be connected to the shared system
memory through a common switch. The connection between system memory and
heterogeneous IC should be optimized without additional applications and effi-
ciently. In modern ICs, the connection between RAM and other constituent nodes
is provided by the AMBA AXI interface [53].
In heterogeneous ICs, each node has its own external interfaces, and depending
on the operating mode, their frequencies are different, and clock signals are asyn-
chronous [54]. Circuits with asynchronous clock signals form asynchronous clock
domains, between which the data transfer must be carried out through synchronizers.
Synchronizers are digital circuits that provide lossless data transfer from one clock
domain to another but cause additional delay. When using a classical synchronizer,
this delay increases with the reduction of technological process and the increase of
the frequency in IC which leads to a decrease in data throughput and
performance [55].
Heterogeneous ICs have many control units, the operating mode of which is based
on the values of their control registers. As mentioned above, with the increase in the
number of cores in heterogeneous ICs, their structure becomes more complicated, as
the logic circuit of the interconnect increases. Part of it is the interfaces providing
data transfer between cores and the programming of control registers. Programming
of control registers is not a frequent process and is not performed in active operating
mode. It is performed once after the circuit is released or before entering or exiting
low power mode. However, even in these problems, modern ICs implement parallel
232 5 Design of High-performance Heterogeneous Integrated Circuits
interfaces with 32-bit resolution, which occupy a large area on the die. It is
recommended to use UART interface for programming the control registers in ICs,
through which only one bit is needed for one-way data transfer [56]. It is widely used
for data transfer between different devices [57].
A UART is a computing device hub designed to communicate with other digital
devices. A node transmits data sequentially.
The parameters that characterize the UART are:
• Full bidirectional data exchange (sequential transmission and reception of data
independently of each other)
• Asynchronous data exchanges
• Large performance range
• Provision of 5-, 6-, 7-, and 8-bit long packets
• Provision of 1 or 2 “end” bits
• Provision of even or odd parity bit
• Data loss detection
• Packet structure mismatch error detection
• Protection against false “start” bit
The UART interconnection circuit is presented in Fig. 5.13, where RX is the
receiver and TX is the transmitter [58].
In the structure of the presented UART controller, two main constituent parts are
divided by dotted lines: transmitter and receiver (Fig. 5.14). And the control registers
are common. The receiver consists of a node that detects the negative edge of the
voltage on the UART RX line; a clock signal generator, the frequency of which is
determined by the value written in control registers; a module that reads 1 bit; a
control block; and an RX memory for temporarily storing the received data. The
transmitter consists of a TX memory that temporarily stores the data to be sent; a
node that generates the clock signal, which is structurally similar to the
corresponding node of the receiver; and a control logic node that controls the
UART TX line. The transmitter supports the same packet format as the transmitter
and can detect packet structure error, data loss error, and parity bit error.
The clock signal generators of the receiver and the transmitter do not differ from
each other in structure. The only difference is the different control signals. In other
words, these nodes work independently of each other, which ensures full two-way
data flow.
The clock signal generator represents a counter, the maximum value of which
determines the corresponding output of the control register. Depending on this value,
the maximum value of the counter changes, and when it is reached, the counter resets
to zero. The operating frequency of the UART is determined by the maximum value
of this counter. In terms of frequency selection, the UART controller has 20 operat-
ing states.
234 5 Design of High-performance Heterogeneous Integrated Circuits
Peven = dn - 1 . . . d 2 d1 d 0 0
ð5:2Þ
Peven = dn - 1 . . . d 2 d1 d 0 0
Modern heterogeneous ICs have different external interfaces [59]. Each external
interface is synchronous, consisting of data bits, control bits, and clock signals
[60]. Each interface is formed by its own clock domain. Data transfer from one
clock domain to another is characterized by clock domain crossing (CDC) [61].
Data transfer from one domain to another is impossible without matching circuits.
Various problems may occur during CDC depending on the parameters of clock
signals. Frequency and phase are important parameters of clock signals. The
mismatch between two different domains leads to a violation of timing parameters
for establishing and maintaining DFFs. Violation of these timing parameters causes a
metastable state in DFF output [62].
An example of connection of two flip-flops is presented, where CDC occurs
(Fig. 5.16). Here, DFF1 is clocked by clk1 clock signal (CS) and DFF2 by clk2 clock
signal.
If clk1 and clk2 signals are synchronous with each other, then all timing param-
eters will be preserved, and the data In1 will be transmitted losslessly to Out2 of
DFF2. However, if clk1 and clk2 signals are asynchronous, TCC occurs, and for a
certain period, a violation of the confirmation and maintenance time parameters may
occur at Out1 of DFF2.
An example of connecting two DFFs where CDC is generated is presented in
Fig. 5.16. Here, DFF1 is clocked by clk1 clock signal (CS) and DFF2 by clk2 clock
signal.
236 5 Design of High-performance Heterogeneous Integrated Circuits
If clk1 and clk2 signals are synchronous towards each other, then all timing
parameters will be preserved, and In1 data will be transmitted losslessly to Out2 of
DFF2. However, if clk1 and clk2 signals are asynchronous, CDC occurs, and for a
certain period, a violation of the confirmation and maintenance timing parameters
may occur at Out1 of DFF2.
This will cause a metastable state at the Out2. A timing diagram is presented in
Fig. 5.17 where a CDC violation occurs and a metastable state occurs at the
Out2 [63].
In case of metastability, the output of the flip-flop is uncertain and can be read as a
logic 1 or a logic 0 by the fetching circuit. In addition to functional failure, another
problem arises. The duration of the metastable state is also uncertain. Depending on
technological parameters used and the frequency of the circuit, this duration can vary
from a few percent of the reading CS to the size of a whole paragraph or more. In the
latter case, it can cause metastability in the outputs of sequentially connected flip-
flops, which also leads to the failure of the entire circuit. There are two concepts to
describe CDC: destination and endpoint. The source domain is the one that transmits
the data, and the endpoint domain is the one that reads the data. Another problem
arises when the CS frequency of the source domain is higher than that of the
destination domain. The phenomenon of data loss in case of CDC violation is
presented in Fig. 5.18 [64].
5.1 General Issues of Designing Means for High-performance. . . 237
To solve the above problems and avoid metastability, synchronization circuits are
used. A well-known matching circuit is the multi-flop synchronizer. A dual-flop
synchronizer is presented in Fig. 5.19 [65].
The operating principle of that circuit is to filter the metastability. As shown in
Fig. 5.20, here also a metastable state occurs at the output of DFF2. However, it will
not be transferred to the output of clk3 if a logic 1 or a logic 0 is asserted in Out2 at
the next positive edge of clk2. The timing diagram of the matching process is
presented.
238 5 Design of High-performance Heterogeneous Integrated Circuits
As it can be seen, the Out3 of DFF3 does not go into a metastable state due to
CDC and does not cause a behavioral failure of the circuit. If the frequency of clk2
clock signal is high enough, and the duration of metastable state at the output of
DFF2 exceeds the duration of the period of clk2, then a metastable state transition is
also possible at the output of DFF3. In modern ICs, clock signal frequencies reach
several GHz, and in order to avoid CDC problems, three-stage/four-stage synchro-
nizers are used.
There is a formula to calculate the probability of a CDC failure. It describes the
time between failures and is a function depending on many variables, including
source-destination clock frequency and destination-domain clock frequency. In
every IC design problem, it is important to calculate the mean time between failure
(MTBF) for an arbitrary signal crossing the CDC boundary. Failure means that the
signal passing through the synchronizing flip-flop goes into metastable state and
remains in metastable state until the next positive edge of the fetching clock signal.
This causes a metastable state at the input of the fetching circuit in the endpoint. The
MTBF is calculated by the following formula [66]:
th
eτ
MTBF = ð5:3Þ
f d f cs T m
where th is the necessary setup time of data (s), τ time constant of latch (s), fd data
change frequency (Hz), fcs clock frequency of destination domain (Hz), and Tm meta-
stability duration(s).
algorithms and low power consumption, but it is not compatible and is used in
limited problems. The architecture of ASIP core is shown in Fig. 5.21 [67].
Another design approach is to have accelerators with more complex architectures
in ICs that are compatible and capable of performing various computational opera-
tions. Here, the processor and accelerators are connected to each other through
special packages. The processor is also a control unit for accelerators.
Field programmable gate arrays (FPGAs) also play an important role in modern
heterogeneous systems [68]. FPGAs (Fig. 5.22) were originally used for prototyping
of digital ICs. However, along with the development of the structure of FPGAs, their
fields of application also developed. For example, FPGAs are used in consumer
devices. Sometimes, using FPGA in various devices is more affordable and efficient
than producing application-specific ICs for these devices, but in terms of operating
frequency and power consumption parameters, ICs surpass FPGAs [69].
FPGAs, unlike ICs, are limited by their operating frequency. Modern FPGAs can
have an operating frequency of 1–1.5 GHz, and in areas where process
parallelization leads to increased performance, FPGAs are widely used.
FPGAs enable programmers to design circuits at a high abstraction level using the
OpenCL library [70], which significantly reduces design time. However, with this
method, the design is not efficient in terms of resource use and power consumption.
240 5 Design of High-performance Heterogeneous Integrated Circuits
There are also architectures that are designed on the following principle: data
exchange between accelerators and the processor is carried out through a common
cache memory. In addition to cache, dynamic memory is also used as shared
memory [71].
An example of heterogeneous IC architecture is presented in Fig. 5.23 [38]. At a
high level of abstraction, the aforementioned heterogeneous IC consists of CPUs,
GPUs, FPGAs, coherent interconnects (CI), last level cache (LLC), dynamic random
access memory (DRAM) interconnect, dynamic memory control unit, and external
DRAMs. Central processors can consist of 1 or more cores, with a maximum of 16.
They, in turn, have L1 cache (1 MB). The cores have a common L2 cache (2 MB).
A GPU consists of hundreds or thousands of computing cores (CPUs). GPUs, like
CPUs, have a total of L2 cache. They provide data flow between LLC and processing
nodes. Between LLC and L2 cache is CI, which solves the problem of cache memory
coherence. In this heterogeneous IC, there is also an embedded FPGA, which makes
it possible to perform calculations specific to the given problem. External DRAMs
are used to buffer large amounts of data.
Thus, the above-discussed methods ensure the design of heterogeneous ICs, but
their application leads to limitations for implementing an even more effective design.
In particular, data transfer between component parts is carried out by means of
reliable parallel power rails. However, as mentioned in Sect. 5.1.1, the presence of a
large number of parallel power rails contributes to the complexity of interconnects
between them, which reduces the number of additional cores installed in IC. Clock
domain crossing solutions ensure lossless data transfer from one domain to another,
but cause additional delays in the process, which negatively affects performance.
The structure of heterogeneous ICs discussed above provides high performance, but
circuits with such architecture are not compatible and only solve a certain problem.
5.1 General Issues of Designing Means for High-performance. . . 241
Conclusions
1. One of the most urgent problems in the creation of modern integrated circuits is
the development of means of designing high-performance heterogeneous inte-
grated circuits, which can significantly improve their performance and power
consumption and shorten the design period.
2. The main obstacles of the design of high-performance heterogeneous integrated
circuits and the organization of corresponding process are the large number of
parallel data transfer circuits between cores and the development of data transfer
mechanisms between clock domains. An effective way to improve it is the
development of means for the design of high-performance heterogeneous inte-
grated circuits.
3. Research and analysis of existing approaches and means of developing means for
designing high-performance heterogeneous integrated circuits show that their
efficiency does not meet the strict requirements of current market.
4. Approaches to the development of high-performance heterogeneous integrated
circuit design tools have been proposed, which allow to significantly improve
their main technical parameters: performance, power consumption, and data
transfer mechanisms between their components, and reduce the design time.
In heterogeneous ICs, depending on the operating mode of cores, one core can have
high frequency of the clock signal, the other—low one. These cores have the same
data transfer function and interfaces based on the respective instructions. These
interfaces complicate the logic function of the interconnect and limit the number
of cores in the IC. In order to solve these problems, it is proposed to carry out data
transmission between cores through a serial flexible modified UART interface.
As already mentioned, one of the interfaces used for data transfer in ICs is UART,
which is widely used due to its simplicity and flexible parameters. UART has
transmit and receive lines for data transfer. The applicability of an interface, for
example, in computer or network systems, is due to the fact that it is common to
many devices, but the data structures change from device to device due to differ-
ences in their transfer rates and packet parameters. Due to its flexibility, the interface
has a very wide applicability in processor and microprocessor systems, too. The
devices in which UART is used are configured so that it is possible to transfer data
through them. However, the user needs to implement special settings for satisfactory
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 243
packet is designed for data bandwidth decoding. The data value in the packet should
be 0xAA (0b10101010) (Fig. 5.25). During data reception, the “master” control unit
decodes the UART speed parameter of the “slave” control unit. If the “slave” control
unit does not respond to the reset signal, then an attempt is made through the
“master” control unit to carry out the self-calibration process two more times.
After three failed processes, the “master” control unit switches to standard UART
operation mode at minimum speed.
After restoring the value of the speed parameter, the “master” control unit
performs further transmission accordingly. With this mechanism, the “slave” control
unit informs at what speed it can receive a transmitted packet. A new instruction has
been announced to detect the rest of parameters for the UART bus transfer. After
reset and receiving the packet intended for restoring performance, the “slave” control
unit transmits a packet for determining the parameter of the amount of data
(Fig. 5.26).
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 245
Fig. 5.26 The process of decoding the amount of data in a modified UART
The first two data in the received packets correspond to the instruction identifi-
cation bits. The corresponding value of the instruction for the amount of data per
packet is 2’b01. The next two bits determine the value of the data amount parameter.
They are:
• 2’b00—5 bits of data in one packet
• 2’b01—6 bits of data in one packet
• 2’b10—7 bits of data in one packet
• 2’b11—8 bits of data in one packet
After receiving the value of the amount of data parameter, the “slave” control unit
passes it to the next packet to determine the number of “end” bits. Its instruction
identification value is 2’b10 (Fig. 5.27).
The second and third bits in the received packet determine the number of “end”
bits. Their corresponding values are:
• 2’b00—1 end bit
• 2’b01—1.5 end bits
• 2’b10—not applicable
• 2’b11—2 end bits
The parity bit mode is determined in the same way. Its corresponding instruction
identification value is 2’b11 (Fig. 5.28). The values corresponding to the second and
third bits are:
• 2’b00, 2’b10—the packet does not contain a parity bit.
• 2’b01—even mode of parity bit.
• 2’b11—odd mode of parity bit.
246 5 Design of High-performance Heterogeneous Integrated Circuits
Fig. 5.27 Process of decoding the number of “end” bits in a modified UART
Fig. 5.28 Process of decoding the parity bit mode in a modified UART
After the decoding of all parameter values, the “slave” control unit can decide to
end the self-calibration process by sending an instruction, the identification value of
which will be 2’b00, or change the values of some parameters by sending respective
instructions.
With the mechanism mentioned above, the “slave” control unit transmits the
calibration parameters to the “master” control unit. A further extension of the
modified UART allows the transmitter to modify the packet structure and speed
parameters of the receiver via a special packet. That extension represents the
independence of the transmit and receive lines in terms of packet parameters and
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 247
transmission speed. That is, the “master” control unit transmits the data at one speed,
and the “slave” control unit at another. The same applies to packet parameters.
After the reset, the transmitter immediately transmits the performance calibration
packet, based on which the performance parameter is reset in the receiver based on
this data (Fig. 5.29). For example, the core is in high-performance mode and trans-
fers data at a speed of 200 Mbps. Before going into low power mode, the core
changes the performance setting to 9600 bps by the above mechanism and goes into
low power mode. After that, the “slave” control unit restores the data with a
bandwidth of 9600 bps. After the reset, the slave control unit’s receiver is clocked
at the maximum possible frequency, due to which every possible performance
parameter is restored (300 bit/s to 200 Mbit/s). Thus, each transmitter can change
the performance parameter of the receiver. However, after this process, the trans-
mitter performance of the “slave” control unit does not change, and it can change the
corresponding parameter of the “master” control unit with the same mechanism.
In order to evaluate the effectiveness of the method presented in Sect. 5.2.1, a logic
circuit of interconnect between two cores was designed. It is implemented by means
of adapters between special-purpose clock domains. Then, the same problem was
solved using the proposed modified UART, and the area and performance parame-
ters were evaluated.
In order to evaluate the accuracy of the behavioral description of the logic circuit
of data transfer between two cores, its testing environment was created (Fig. 5.30).
Data processed in heterogeneous ICs is not transferred between cores, or it happens
248 5 Design of High-performance Heterogeneous Integrated Circuits
when both cores are in the same power consumption mode. In that case, both cores
are clocked with the same clock signal.
However, there is often a transfer of instructions or configuration data from one
core to another. The transfer of instructions can occur when the cores are in different
power consumption modes and are clocked by asynchronous signals. That is why
synchronizers are used.
In this case, the observed circuit implements 8-bit instruction transmission. The
data flow is organized in a clock domain from “A” to “B.” In addition to 8-bit direct
data, control signals are also transmitted and in both directions. Feedback is neces-
sary to transmit the response, which means that the sent instruction has been received
at the endpoint “B” clock domain. The instructions transmitted for simulation are
stored in the input instruction array and passed to the synchronizer one by one. The
interface between them consists of an 8-bit bus, valid, and handshake signals
(Fig. 5.31).
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 249
Table 5.2 The provided performance and clock signal frequencies in the modified UART
Number of corresponding clock
Clock signal Clock signal frequency cycles Speed
coding (MHz) 2 GHz frequency (MB/s)
00001 1 20,000 0.1
00010 500 40 50
00100 1000 20 100
01000 1500 13.3 150
10000 2000 10 200
In a modified UART, the receiver’s finite state machine (FSM) has two additional
states that are anticipated for switching of input clock signal.
Those states are:
• WAIT_CLK_SWITCH—the modified UART receiver waits for the logical one
value from validation signal from clock signal change control domain.
• CALC_BAUD_RATE—performance parameter recovery is carried out using a
counter clocked at a frequency of 2 GHz.
To switch its input clock, UART receiver modifies its clock select (clk_sel)
output and waits for clock switch done (clk_switch_done) signal assertion
(Fig. 5.34).
The data rate calculation is performed when receiving the input data 0x55. The
duration of one logical 0 bit is calculated using a counter clocked at 2 GHz
(Fig. 5.35), after which it again changes the code of the clock signal accordingly
(from 200 to 100 Mb/s) and goes to state of waiting for valid/done signal assertion.
The start of the packet is detected by the negative edge of the line (Fig. 5.36).
The receiving bus passes through a two-stage synchronizer, and single clock
signals are generated from the positive and negative edges from the received signal.
After that, reception of sequential data begins, and the check of the “end” bit is
carried out (Fig. 5.37).
In case of its wrong value, it is accepted as a reset instruction, and the process of
changing the clock signal and performance starts (Fig. 5.38). After the change of the
clock signal, respective new packets are received (Fig. 5.39).
The transmitter and receiver of the modified UART perform synchronization of
clock signals independently of each other (Fig. 5.40).
From the logic synthesis result of the modified UART transmitter (Fig. 5.41), it
becomes clear that this circuit can be clocked by a synchronizing signal up to
2.12 GHz.
252 5 Design of High-performance Heterogeneous Integrated Circuits
The logic synthesis of the receiver showed that its maximum operating frequency
is 2.996 GHz (Fig. 5.42).
Thus, the area occupied by the logic circuit of the modified UART is 74.85%
more than the area of the core interconnection synchronizer, due to which the
number of interconnection buses between the cores has decreased by 8 times
(Table 5.3). Compared to the classic UART control unit, the area of the modified
UART has decreased by approximately 48%. The increase in the area of the core
when using the modified UART was only 2.25%.
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 253
As it was already mentioned above, in various IC systems, in order to find out how
appropriate it is to use a two-stage synchronizer for the solution of the given
problem, it is necessary to calculate the MTBF. In case of failure, as a result of
254 5 Design of High-performance Heterogeneous Integrated Circuits
switching of the transmitted signal, a metastable state occurs at the output of the first
cascade of the synchronizer. It remains in the metastable state for one complete
period of the clock signal and causes a metastable state at the output of the second
cascade, too. It can also cause a metastable state in the receiving clock domain. In
this case, the behavior of the circuit is violated due to the uncertain values of signals.
A variety of factors affect the MTBF, but the clock signal and data change frequen-
cies are decisive.
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 255
Path Slack
12
12 12
Number of Paths
6 6 6 6
4
2
0
0.54808 0.5481 0.54812 0.54814 0.54816 0.54818 0.5482 0.54822 0.54824 0.54826
Slack
Path Slack
16
16 16
12
Number of Paths
6
4
4 4
2 2
0
0.166 0.167 0.168 0.169 0.17 0.171 0.172 0.173 0.174 0.175 0.176 0.177 0.178 0.179 0.18
Slack
1
MTBF = , ð5:5Þ
f c:sig: f data X
where fc. sig. and fdata are the clock signal and data change frequencies, respectively,
and X is the coefficient describing the side effects on the MTBF.
Often, in designs clocked at ultra-high frequencies, the MTBF of a two-stage
synchronizer is not sufficient to solve the problem, which is why three-cascade or
even four-cascade synchronizers are used. However, in those architectures where the
occupied area has a decisive role, their use is not advisable.
256 5 Design of High-performance Heterogeneous Integrated Circuits
When a metastable state is detected at the output of the first cascade by an analog
comparator, the multiplexer transfers the input signal to the output. In that case,
when the required delay is not provided, a metastable state can occur in the
destination domain at the output of the fetching flip-flop.
where Tin.out.del. is the delay from the input to the output of the synchronizer during
the switching of the first cascade, Tmax.met. is maximum metastable state duration,
Tmin.out.del. is the minimum output delay, and Tsetup and Thold are the timing param-
eters for setup and hold times of fetching flip-flop at destination. Respective time
constraints must be given to logic synthesis tool so that the circuit operates without
failure in the result.
In digital circuits, a two-stage synchronizer is the basis for the design of synchro-
nizers with complex structures. Based on the proposed circuit, to design complex
synchronizers, its behavioral model was proposed in the “Verilog” hardware
description language [73]. It is adjustable, so the setup and hold timing parameters
of the first cascade which is inside, depending on the library cell, can be changed. It
can also change the delay time of output buffers. These timing parameters can be
changed randomly within certain limits to detect errors during the simulation.
In case of violation of setup and hold timing parameters, an undefined X value is
introduced into the simulation model at the output of the first cascade flip-flop. This
mechanism is implemented to simulate metastability. The X value can be passed to
read flip-flops during simulation. To simulate this, a random logic value of 1 or 0 is
passed to their input. The duration of metastability is also adjustable.
In the proposed model, the necessary timing parameters are embedded, which
makes it possible to detect functional errors caused by CDC during the RTL
simulation stage and fix them.
The two-stage synchronizer (Fig. 5.44) is considered the simplest circuit for syn-
chronizing 1-bit signals. Due to the peculiarity of its structure, the data is delayed by
time corresponding to one period. In case that there is a metastable state in the output
of the first cascade, then it must be established within the time corresponding to one
period, so that there is no metastable state in the output of the second cascade. If the
signal is setup, the second cascade will transmit a setup signal with a logic value of
1 or 0 and thereby prevent the transfer of an unstable signal to more sensitive
components of IC, the combinational cells. The two-stage synchronizer is a well-
258 5 Design of High-performance Heterogeneous Integrated Circuits
known and frequently used circuit in the market. It is not used for fast-to-slow CDCs
and can only be used for single-bit data signal synchronization.
A feedback synchronizer is widely used to transfer the entire data stream from one
domain to another. As shown above, this problem can be solved by using a two-stage
synchronizer for each bit of data stream. However, this method will lead to occupy-
ing a large area when there is a 128-bit bus and two additional synchronizers for each
bit. In addition to the problem of large circuit area, this method can also lead to data
loss in case some bits take the new changed value and the rest take the previous value
of data. Instead, having a certain time loss, a synchronizer based on the principle of
feedback is used (Fig. 2.21).
Instead of using a two-stage circuit to synchronize each bit of the data bus, the
source domain transmits an additional validation signal along with placing the data
on the bus. That signal passes through the two-stage synchronizer to the destination
domain, and there, upon detecting its active level, the data is read, and the feedback
control circuit sends back the receive signal, which is in turn synchronized to the
source domain, again through the two-stage synchronizer.
An asynchronous FIFO is used to transfer data from one domain to another
asynchronously. Each of domains has its own CS. The source domain and the
destination domain receive and read the data with their CSs, respectively (Fig. 5.45).
There are some limitations to consider for this synchronizer. The ratio of read and
write speeds should be compatible with the ratio of CS frequencies, so that data loss
can be avoided as a result. The FIFO must not go into an overfilled or underfilled
state.
The complex synchronizers discussed above are designed using the proposed
mixed-signal circuit and compared with classical approaches. “SAED 14 nm”
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 259
technological library was used [74]. Complex synchronizers are designed using
Verilog device description language.
Simulations were performed for two cases discussed above: in the presence of a
metastable state of the signal (Fig. 5.46) and in the presence of a stable signal
(Fig. 5.47).
In the first case of simulation, the clock signals of receiving clock domains, as
well as the transmitted data, are presented. When data is transferred from one domain
to the next, it creates a metastable state (v(meta)) (Fig. 2.23), violating the setup and
hold times. When metastable signal passes through comparators, a comparison of the
current voltage level occurs, outside or within the range of 0.3–0.8 V. In this case,
since the condition of metastability occurs, the signal “11” is obtained at the output
of comparators, which is transferred to NAND cell. As a result, the multiplexer
selection signal changes from 1 to 0. Therefore, the multiplexer connects the input of
the domain to the output, which eliminates the problem of generating a metastable
state.
When a stable signal (v(meta)) is obtained in the receiving clock domain
(Fig. 5.47), the values “01” and “10” corresponding to the stable state are obtained
at the output of comparators. In this case, the multiplexer selection signal was
received as a logic “1.” In this case, the synchronizer designed by the proposed
method only transmits the received signal to the output with a certain delay in order
to exclude data failure at the destination domain.
From the simulation results of the behavioral model, the metastable state is clearly
visible at the output of the first cascade, which is marked as X signal (Fig. 5.48).
After the metastable state is completed, its output switches to logic 0 value. After a
respective delay, the output of the circuit also switches. As it can be seen, the output
of the reading flip-flop in the destination domain is stable. In this case, the input of
the synchronizer passes to the output and does not cause a violation of setup and hold
timing parameters.
260 5 Design of High-performance Heterogeneous Integrated Circuits
In the other, the case is considered when the changed input signal does not pass to
the output of the circuit (Fig. 5.49). Here, the output of the first cascade goes into a
metastable state, and the multiplexer transmits the data coming from the source
domain with a certain delay. As a result, a small jitter occurs. In case that the
necessary timing delays are present in the circuit, this is not transmitted to the
destination domain at the time of reading, due to which functional failures do not
occur. In case that the frequencies of clock signals of source and destination domains
are close, the delay of the output of the circuit in case of two consecutive
missamplings can be at most two periods.
The designed synchronizer was compared with the classical two-stage circuit
(Fig. 5.50). From the simulation result, it can be seen that a time gain of at least one
clock signal period was recorded. When using three or more cascades, the number of
benefited periods increases, which leads to increased performance in complex IC
systems.
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 261
In addition to the occupied area, the proposed synchronizer has an advantage, that
is, when using it, the periods of the additional clock signal are not lost, compared to
the method of using cascades. This advantage can be decisive in designs, operating
at ultra-high frequencies, where a period of one or two clocks can be used for clock
synchronization of other data processing nodes. In the case of the proposed
approach, in contrast to the use of cascades, it is possible to reduce the area occupied
by the system and increase the performance.
However, in the proposed synchronizer, comparators are used, which are refer-
ence voltage-controlled, switching differential amplifiers. That is why the power
consumption increases when using the method.
Thus, simulations were performed in the case of a clock signal with a frequency
of 2 GHz. At such frequency, in the case of using a three-stage synchronizer,
18 more transistors are used compared to the proposed method. Compared to the
two-stage circuit, the proposed circuit occupies an average of 21% more area, but
when three or more cascades are used, the area saving is at least 30%.
• Internal cores
• Memory control unit (MCU)
• Queuing unit
• Instruction processor (INP)
• Power consumption control unit (PCU)
• Control/status registers (CSR)
• “Master”—AXI/AHB direct memory access control unit (DMACU)
• “Slave”—AXI/AHB internal register address control unit (IRACU)
• External cache.
• Clock synchronizer module.
The proposed circuit is connected to IC system through the bus of interconnection
of peripheral nodes (Fig. 5.53). In a given heterogeneous IC, the CPU executes
instructions, sent from the software. It generates unique instructions for the proposed
circuit and controls interrupts. It communicates with the internal registers of the
circuit with the proposed architecture through the bus of interconnection of periph-
eral nodes.
Internal cores are computing nodes with RISC architecture. They process 32-bit
data. Each CPU has an internal RF which consists of 32 common use registers. Data
and instruction memories are separated. The number of internal cores can be from
4 to 16, depending on the circuit creation settings.
The memory control unit is responsible for downloading data and instructions
and generating instructions corresponding to the direct memory access node. It also
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 265
Ins:buf = N core 4 KB
ð5:7Þ
data:buf = N core 64 KB
Before the circuit can start its operation, it must go through the initialization
process, which is issued by SDS in the form of instructions. These instructions are
processed and implemented by the INS. During this process, internal registers and
core resources are adjusted.
After the initialization process is completed, SDS prepares the CRBs and issues
their reading and execution process with appropriate instructions, updating the
CRB’s internal pointer. In order to read and process the CRB, it is necessary that
chain bit has logical 1 value. Otherwise, that CRB is closed by the device and ends.
The SDS is informed about the status of the CRB through the interrupt mechanism
implemented in the circuit. It is used if the interrupt bit in the corresponding CRB has
a logic 1 value. Otherwise, the SDS should periodically read the CRB. After
completing its processing, the circuit registers a logic 0 value in the chain field.
Buffers of different domains can be connected to each other by registering a
logical 1 value in the next field.
After the full image data is received in the read buffer, it is read and processed
(Fig. 5.57).
After reading, the image is transformed from the red-green-blue (RGB) color
range to the color-saturation-brightness (CSB) range, undergoes all the appropriate
processing and corrections there, and is converted back to the RGB range. Then it is
displayed on the monitor.
From the structure of the designed system, it can be seen that it has four clock
domains (Fig. 5.58). They are the corresponding domain of the clock signal received
from the image reading interface from the camera, the domain of the external
DRAM, the domain of the output image extraction interface, and the domain of
the reference clock signal.
From the results of the research, it can be seen that the operating frequency of the
circuit increases with the increase of the resolution of the video resolution
(Fig. 5.59).
In case of maximum frequency, it is possible to provide 60 Hz video processing
with a resolution of 1920 × 1200 (Fig. 5.60). Research shows that at an operating
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 269
frequency of approximately 200 MHz, the designed circuit is able to provide the
same amount of data processing as a 60 Hz video stream. Along with reducing the
resolution of the video image, the frequency of the possible supporting video stream
increases.
The same data processing algorithm is implemented using the circuit with the
proposed architecture. However, there are certain differences in terms of receiving
data and extracting processed data. In the case of heterogeneous ICs, the image data
is located in the system memory and there is no need for additional buffering. They
are read using DMACU (Fig. 5.61). After reading data, instructions are read for
cores with the same mechanism. After processing data, they are recorded in the
system memory (Fig. 5.39). In addition to reading instructions and data, the same
process is carried out for CRBs by means of DMACU (Fig. 5.62).
The CRB is read and buffered in the appropriate range in external cache memory.
This is then read from there by the queue unit, which in turn decodes and
generates instructions for the DMACU to read the array of instructions and data
from system memory (Fig. 5.63).
270 5 Design of High-performance Heterogeneous Integrated Circuits
In order to perform post-recording of data, it is read from the data area of the static
memory and transferred to the DMACU through the memory control unit (MCU)
(Fig. 5.64). It, in turn, implements the data post-recording process.
The circuit is designed for a maximum frequency of 2 GHz. It has the ability to
configure and select the number of cores from 4 to 16. There are strict requirements
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 271
for the frequency of MCU and DMACU. Their bandwidth should be equal to the
sum bandwidth of the cores. Otherwise, the waiting process in the core arbitration
process will take a long time, which will lead to a decrease in performance. The
number of gates used in one core is approximately 24,000. The area of the total
circuit is approximately 210,000 gates.
From the results of logical synthesis of the circuit (Fig. 5.65), the dependence of
the number of cores and the used gates can be seen. The dependence is almost linear
due to the presence of extra-core units. They occupy almost four core areas.
However, in case of further scaling, the area of cores will dominate compared to
them.
From the comparison it can be seen that the video image processing circuit,
designed on FPGA, is close to the proposed circuit in the case of four-core config-
uration (Fig. 5.66). In all other cases, the increase in area is multiple, even compared
to the initial area (Table 5.4).
From the performance results it can be seen that the performance increases with
the increase in the number of cores (Fig. 5.44).
272 5 Design of High-performance Heterogeneous Integrated Circuits
Fig. 5.65 Dependence on the number of cores and the number of used gates
5.2 Methods of Design for High-performance Heterogeneous Integrated Circuits 273
350,000
300,000
250,000
200,000
150,000
100,000
50,000
0
4 cores 8 cores 16 cores
Fig. 5.66 Difference in the number of gates between FPGA and proposed circuits
Table 5.4 Comparison of design results on FPGA and the proposed circuit
Circuit implemented on Proposed Proposed Proposed
FPGA 4 cores 8 cores 16 cores
Number of gates 188,249 211,536 314,147 497,458
Difference – 23,287 125,898 309,209
Video frame rate ~60 ~79 ~120 ~143
(Hz)
Fig. 5.67 Dependence of the performance of the proposed circuit and the number of cores
As it can be seen from the results, a further increase in the number of cores leads
to a sharp increase in area, but the increase in performance is not relevant (Fig. 5.67).
Thus, a method for the implementation of the architecture of heterogeneous ICs
was created, which, due to the use of queuing, memory control, and direct memory
waiting units and a special instruction, provides a 32.48% increase in speed in the
video image processing process at the expense of an 11.008% increase in area.
274 5 Design of High-performance Heterogeneous Integrated Circuits
Conclusions
1. A method of improving the means of data transfer between components in
heterogeneous integrated circuits has been proposed, which, due to the modified
architecture, provides an eightfold reduction in the number of interconnect buses
between cores, at the expense of an increase in the area spent in the core by
2.25%.
2. A method for improving the means of data transfer between clock domains in
heterogeneous integrated circuits has been proposed, which, due to the mixed-
signal architecture, provides an increase in performance of at least 50% at the
expense of an average of 21% additional area.
3. A method of implementing the architecture of heterogeneous integrated circuits
was proposed, which, due to the use of queuing, memory control and direct
memory waiting units and a special instruction, provides a 32.48% increase in
performance at the expense of an 11.008% increase in area.
References
11. Jayant, V. Shahi, C.M. Velpula, CPU temperature aware scheduler a study on incorporating
temperature data for CPU scheduling decisions. 2015 international conference on advances in
computing, communications and informatics (ICACCI) (2015), pp. 2409–2413
12. 2021 Trends. https://static1.squarespace.com/static/6130ef779c7a2574bd4b8888/t/61
6c79ed5a30e36825f47818/1634499069232/isscc2021.press_kit_110620.pdf. Institute of Elec-
trical and Electronics Engineers – University of Pennsylvania (2021), pp. 1–152
13. B. Shekhar, C.A. Andrew, The future of microprocessors. Commun. ACM 54(5), 67–77 (2011)
14. B. Shekhar, Thousand Core Chips—A Technology Perspective (Intel Corp, Microprocessor
Technology Lab, Hillsboro, 2012), pp. 746–749
15. White Paper, Next leap in microprocessor architecture: Intel® Core™ duo processor (2006),
p. 4
16. A.R.A. Saif, K. Bin Jumari, Performance study of Core2Duo desktop processors. 2009 Inter-
national conference on electrical engineering and informatics (2009), pp. 532–536
17. M.D. Hill, Amdahl’s law in the multicore era. 2008 IEEE 14th international symposium on high
performance computer architecture (2008) vol. 41, no. 7, pp. 33–38
18. B. Rubén, B. Daniele, B. Andrea, A. Giovanni, et al., A synchronization-based hybrid-memory
multi-core architecture for energy-efficient biomedical signal processing. IEEE Trans. Comput.
66(4), 575–585 (2017)
19. K. Takanori, L. Yamin, A cost and performance analytical model for large-scale on-chip
interconnection networks. 2016 4th international symposium on computing and networking
(CANDAR) (2016), pp. 447–450
20. M.J. Cade, A. Qasem, Balancing locality and parallelism on shared-cache mulit-core systems.
2009 11th IEEE international conference on high performance computing and communications
(HPCC 2009) (2009), pp. 188–195. https://doi.org/10.1109/HPCC.2009.61
21. J. Ma, C. Hao, W. Zhang, T. Yoshimura, Power-efficient partitioning and cluster generation
design for application-specific network-on-chip. 2016 international SoC design conference:
smart SoC for intelligent things (ISOCC) (2016), pp. 83–84. https://doi.org/10.1109/ISOCC.
2016.7799744
22. K. Onur, N. Nachiappan Chidambaram, J. Adwait, A. Rachata, Managing GPU concurrency in
heterogeneous architectures. 2014 47th annual IEEE/ACM international symposium on
microarchitecture (2014), pp. 114–126
23. J. Choquette, W. Gandhi, O. Giroux, et al., NVIDIA A100 tensor Core GPU: Performance and
innovation. IEEE Micro. 41(2), 29–35 (2021). https://doi.org/10.1109/MM.2021.3061394
24. F.L. Yuan, C.C. Wang, T.H. Yu, D. Marković, A multi-granularity FPGA with hierarchical
interconnects for efficient and flexible Mobile computing. IEEE J. Solid State Circuits 50(1),
137–149 (2015). https://doi.org/10.1109/JSSC.2014.2372034
25. Z. Lai, K.T. Lam, C.L. Wang, J. Su, A power modelling approach for many-core architectures.
Proceedings of the 2014 10th international conference on semantics, knowledge and grids
(SKG-2014) (2014), pp. 128–132. https://doi.org/10.1109/SKG.2014.10
26. F. Conti, C. Pilkington, A. Marongiu, L. Benini, He-P2012: Architectural heterogeneity
exploration on a scalable many-core platform. Proceedings of the ACM great lakes symposium
on VLSI, (GLSVLSI) (2014), pp. 231–232. https://doi.org/10.1145/2591513.2591553
27. W.P. Huang, R.C.C. Cheung, H. Yan, An efficient application specific instruction set processor
(ASIP) for tensor computation. Proceedings of the international conference on application-
specific systems, architectures and processors, vol. 2019 (2019), p. 37. https://doi.org/10.1109/
ASAP.2019.00-36
28. H. Anwar, M. Daneshtalab, M. Ebrahimi, M. Ramirez, et al Integration of AES on heteroge-
neous many-core system. Proceedings of the 2014 22nd euromicro international conference on
parallel, distributed, and network-based processing, (PDP 2014) (2014), pp. 424–427. https://
doi.org/10.1109/PDP.2014.86
29. H.-J. Wunderlich, Simulation on reconfigurable heterogeneous computer architectures (2017),
https://www.iti.uni-stuttgart.de/en/chairs/ca/projects/oldprojects/simtech/
276 5 Design of High-performance Heterogeneous Integrated Circuits
30. A.Z. Adamov, Computation model of data intensive computing with MapReduce. Proceedings
of the 14th IEEE international conference on application of information and communication
technologies (AICT-2020) (2020), pp. 1–5. https://doi.org/10.1109/AICT50176.2020.9368841
31. M. Davari, A. Ros, E. Hagersten, S. Kaxiras, An efficient, self-contained, on-chip directory:
DIR1-SISD. Parallel architectures and compilation techniques – Conference proceedings
(PACT) (2015), pp. 317–330. https://doi.org/10.1109/PACT.2015.23
32. I. Yamazaki, J. Kurzak, P. Luszczek, J. Dongarra, Design and implementation of a large scale
tree-based QR decomposition using a 3D virtual systolic array and a lightweight runtime.
Proceedings of the IEEE 28th international parallel and distributed processing symposium
workshops (IPDPSW-2014) (2014), pp. 1495–1504. https://doi.org/10.1109/IPDPSW.
2014.167
33. M.T. Sim, Q. Yi, An adaptive multitasking superscalar processor. 2019 IEEE 5th International
conference on computer and communications (ICCC 2019) (2019), pp. 1293–1299. https://doi.
org/10.1109/ICCC47050.2019.9064185
34. S. Processors, Superscalar processor: Intro (1995). No. 7, pp. 1–19. https://en.wikipedia.org/
wiki/Superscalar_processor
35. SISD, SIMD, MISD, MIMD. https://learnlearn.uk/alevelcs/sisd-simd-misd-mimd/
36. J. Chen, C. Yang, Optimizing SIMD parallel computation with non-consecutive array access in
inline SSE assembly language. Proceedings of the 2012 5th international conference on
intelligent computation technology and automation (ICICTA-2012) (2012), pp. 254–257.
https://doi.org/10.1109/ICICTA.2012.70
37. B.S. Mahmood, M.A.A. Jbaar, Design and implementation of SIMD vector processor on
FPGA. 2011 4th international symposium on innovation in information and communication
technology (ISIICT’2011) (2011), pp. 124–130. https://doi.org/10.1109/ISIICT.2011.6149607
38. L. Juan Gómez, M. Onur, P&S heterogeneous systems SIMD processing and GPUs (2021),
pp. 1–75. https://safari.ethz.ch/projects_and_seminars/fall2021/lib/exe/fetch.php?media=p_s-
hetsys-fs2021-meeting2-aftermeeting.pdf
39. B. Rajeshwari, K. Veena, MIMO receiver and decoder using vector processor. Proceedings/
TENCON IEEE region 10 annual international conference: 2017, vol. 2017-December,
pp. 1225–1230. https://doi.org/10.1109/TENCON.2017.8228044
40. K. Patsidis, C. Nicopoulos, G.C. Sirakoulis, G. Dimitrakopoulos, RISC-V2: A scalable RISC-V
vector processor. Proceedings of the IEEE international symposium on circuits and systems,
October (2020), pp. 1–5. https://doi.org/10.1109/iscas45731.2020.9181071
41. Y. Xiao, Z. Chen, L. Zhang, Accelerated CT reconstruction using GPU SIMD parallel com-
puting with bilinear warping method. 2009 1st international conference on information science
and engineering (ICISE-2009) (2009), pp. 95–98. https://doi.org/10.1109/ICISE.2009.203
42. A. Halaas, B. Svingen, M. Nedland, P. Sætrom, et al., A recursive MISD architecture for pattern
matching. IEEE Trans. Very Large Scale Integr. Syst. 12(7), 727–734 (2004). https://doi.org/
10.1109/TVLSI.2004.830918
43. A. Yazdanbakhsh, K. Samadi, N.S. Kim, H. Esmaeilzadeh, GANAX: A unified MIMD-SIMD
acceleration for generative adversarial networks. Proceedings of the international symposium
on computer architecture (2018), pp. 650–661. https://doi.org/10.1109/ISCA.2018.00060
44. S. Arrabi, D. Moore, L. Wang, K. Skadron, et al., Flexibility and circuit overheads in
reconfigurable sIMD/MIMD systems. Proceedings of the 2014 IEEE 22nd international sym-
posium on field-programmable custom computing machines (FCCM 2014) (2014), p. 236.
https://doi.org/10.1109/FCCM.2014.71
45. Y. Yamato, N. Hoshikawa, H. Noguchi, et al., A study to optimize heterogeneous resources for
open IoT. Proceedings of the 2017 5th international symposium on computing and networking
(CANDAR-2017), January (2018), pp. 609–611. https://doi.org/10.1109/CANDAR.2017.16
46. K. Gai, L. Qiu, H. Zhao, M. Qiu, Cost-aware multimedia data allocation for heterogeneous
memory using genetic algorithm in cloud computing. IEEE Trans. Cloud Comput. 8(4),
1212–1222 (2020). https://doi.org/10.1109/TCC.2016.2594172
References 277
47. A.R. Brodtkorb, C. Dyken, T.R. Hagen, et al., State-of-the-art in heterogeneous computing. Sci.
Program. 18(1), 1–33 (2010). https://doi.org/10.3233/SPR-2009-0296
48. K. Zhu, Y. Ding, Research on low power scheduling of heterogeneous multi core mission based
on genetic algorithm. Proceedings of the 9th international conference on measuring technology
and mechatronics automation (ICMTMA-2017) (2017), pp. 219–223. https://doi.org/10.1109/
ICMTMA.2017.0059
49. C. Yu, M. Cai, An image depth processing method based on parallel computing and multi-GPU.
Proceedings of the 2nd international conference on smart electronics and communication
(ICOSEC-2021) (2021), pp. 1009–1012. https://doi.org/10.1109/ICOSEC51865.2021.9591686
50. A.K. Gupta, A. Raman, N. Kumar, R. Ranjan, Design and implementation of high-speed
universal asynchronous receiver and transmitter (UART). 2020 7th international conference
on signal processing and integrated networks (SPIN-2020) (2020), pp. 295–300. https://doi.org/
10.1109/SPIN48934.2020.9070856
51. S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, H. Khachatryan, Configurable verification IP for
UART. 2020 IEEE 40th international conference on electronics and nanotechnology
(ELNANO) (2020), pp. 234–237
52. T. Praveen Blessington, B. Bhanu Murthy, G.V. Ganesh, T.S.R. Prasad, Optimal implementa-
tion of UART-SPI interface in SoC. 2012 international conference on devices, circuits and
systems, ICDCS 2012 (2012), pp. 673–677. https://doi.org/10.1109/ICDCSyst.2012.6188657
53. V. Melikyan, S. Harutyunyan, A. Kirakosyan, T. Kaplanyan, UVM verification IP for AXI.
2021 IEEE east-west design and test symposium, (EWDTS-2021) (2021), pp. 1–4. https://doi.
org/10.1109/EWDTS52692.2021.9580997
54. J. Liu, M. Hong, K. Do, J.Y. Choi, et al. Clock domain crossing aware sequential clock gating.
Design, automation & test in Europe conference & exhibition (DATE) (2015), pp. 1–6
55. S. Hatture, S. Dhage, Open loop and closed loop solution for clock domain crossing faults.
Global conference on communication technologies (GCCT-2015) (2015), pp. 645–649. https://
doi.org/10.1109/GCCT.2015.7342741
56. D. Basu, D.K. Kole, H. Rahaman, Implementation of AES algorithm in UART module for
secured data transfer. Proceedings of 2012 international conference on advances in computing
and communications (ICACC-2012) (2012), pp. 142–145. https://doi.org/10.1109/ICACC.
2012.32
57. B. Zhang, K. Zhang, J. Zhu, X. Li, UART interface design based on DM642 video surveillance
system and wireless network module. Proceedings of 2011 IEEE 2nd international conference
on software engineering and service science (ICSESS-2011) (2011), pp. 477–480. https://doi.
org/10.1109/ICSESS.2011.5982357
58. KeyStone architecture: Universal asynchronous receiver/transmitter (UART). Texas Instru-
ments (2010), pp. 1–51
59. J.H. Hong, S.W. Han, E.Y. Chung, A RAM cache approach using host memory buffer of the
NVMe interface. International SoC design conference: Smart SoC for intelligent things
(ISOCC-2016). (2016), pp. 109–110. https://doi.org/10.1109/ISOCC.2016.7799757
60. D. Akash, M. Kishore, Mohana, K.H. Basha, Interfacing of flash memory and DDR3 RAM
memory with Kintex 7 FPGA board. Proceedings of the 2nd IEEE international conference on
recent trends in electronics, information and communication technology (RTEICT-2017) pro-
ceedings, January (2017), pp. 2006–2010. https://doi.org/10.1109/RTEICT.2017.8256950
61. S. Zhou, T. Zhang, Y. Yang, cross clock domain signal research based on dynamic motivation
model. Proceedings of the 4th international conference on dependable systems and their
applications. (DSA-2017), January (2017), p. 156. https://doi.org/10.1109/DSA.2017.34
62. N. Karimi, K. Chakrabarty, Detection, diagnosis, and recovery from clock-domain crossing
failures in multiclock SoCs. IEEE Trans. Comput.-Aided Design Integra. Circuits Syst. 32(9),
1395–1408 (2013). https://doi.org/10.1109/TCAD.2013.2255127
63. V. Melikyan, S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, et al., Design and verification of
novel sync cell. Proceedings of the 2021 IEEE east-west design and test symposium, (EWDTS-
2021) (2021). pp. 1–5. https://doi.org/10.1109/EWDTS52692.2021.9580985
278 5 Design of High-performance Heterogeneous Integrated Circuits
64. C.E. Cummings, Clock domain crossing (CDC) design & verification techniques using system
Verilog. Techniques (2008), No. Cdc. pp. 1–56
65. M. Bartík, Clock domain crossing – An advanced course for future digital design engineers.
Proceedings of the 2018 7th mediterranean conference on embedded computing (MECO-
2018) – Including ECYPS-2018 (2018), pp. 1–5. https://doi.org/10.1109/MECO.2018.8406004
66. S. Beer, R. Ginosar, R. Dobkin, Y. Weizman, MTBF estimation in coherent clock domains.
Proceedings of the international symposium on asynchronous circuits and systems (2013),
pp. 166–173. https://doi.org/10.1109/ASYNC.2013.19
67. ASIP Designer (2021), https://www.synopsys.com/dw/doc.php/ds/cc/asip-brochure.pdf
68. T. Sato, S. Chivapreecha, P. Moungnoul, K. Higuchi, An FPGA architecture for ASIC-FPGA
co-design to streamline. Process. IDSs. 412–417 (2017). https://doi.org/10.1109/cts.2016.0079
69. A.S. Hussein, H. Mostafa, ASIC-FPGA gap for a RISC-V core implementation for DNN
applications. Proceedings of the 3rd novel intelligent and leading emerging sciences conference
(NILES-2021) (2021), pp. 385–388. https://doi.org/10.1109/NILES53778.2021.9600503
70. The OpenCL specification. Khronos OpenCL working Group (2019). https://www.khronos.org/
registry/OpenCL/specs/2.2/html/OpenCL_API.html
71. V. Mekkat, A. Holey, P.C. Yew, A. Zhai, Managing shared last-level cache in a heterogeneous
multicore processor. Parallel architectures and compilation techniques – Conference proceed-
ings (PACT) (2013), pp. 225–234. https://doi.org/10.1109/PACT.2013.6618819
72. S. Harutyunyan, T. Kaplanyan, A. Kirakosyan, A. Momjyan, Design and verification of
autoconfigurable UART controller. Proceedings of the 2020 IEEE 40th international conference
on electronics and nanotechnology (ELNANO-2020) (2020), pp. 347–350. https://doi.org/10.
1109/ELNANO50318.2020.9088789
73. T.K. Kaplanyan, A novel pulse synchronizer design with the proposed sync cell model. Proc.
RA NAS NPUA Ser. Tech. Sci. 74(4), 464–470 (2021)
74. V.Sh. Melikyan, M. Martirosyan, A. Melikyan, G. Piliposyan. 14nm educational design kit:
Capabilities, deployment and future. Proceedings of the 7th small systems simulation sympo-
sium 2018, Niš, Serbia, February 12–14 (2018), pp. 37–41
75. T.K. Kaplanyan, L.A. Mikaelyan, A.A. Petrosyan, A.M. Momjyan, et al, Design of video
processing platform with interchangeable input-output interfaces. 2019 IEEE 39th international
conference on electronics and nanotechnology: Proceedings (ELNANO-2019) (2019),
pp. 201–205. https://doi.org/10.1109/ELNANO.2019.8783420
Chapter 6
Design of Digital Integrated Circuits by
Improving the Characteristics of Digital
Cells
It is known [1] that standard cell (SC) libraries are currently widely used in IC design
process. Any digital circuit (DC) to some extent includes digital ICs, such as simple
logic operation; sequential, high-speed, power-saving, input/output (I/O) signal
control; and other cells.
As a result of the wide use of SC libraries, in order to increase efficiency and
reduce errors in the design process of their components, a large number of additional
conditions are proposed. Standard cells repeatedly go through the stages of design,
improvement, and optimization during the entire process of their creation (Fig. 6.1).
During the creation of SC library cells, in addition to basic design rules, such as
design rule check (DRC) and layout versus schematic design (LVS), special atten-
tion is also paid to a number of other characteristics of SCs, such as:
. Compatibility of power supply and grounding cables with future designs
. Static and dynamic power consumption
. Routability
. Access to I/O cells
. Compatibility with other logical cells
In addition to the main libraries of SCs, which comply with the design rules and
are optimized according to specifications, libraries with different heights (Fig. 6.2)
and threshold voltages (Fig. 6.3) are also developed for the same technological
process.
© The Author(s), under exclusive license to Springer Nature Switzerland AG 2024 279
V. Melikyan, Machine Learning-based Design and Optimization of High-Speed
Circuits, https://doi.org/10.1007/978-3-031-50714-4_6
280 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
After generating the ML learning graph data, according to the method, a training
data set is created from each DRC error place, centered on the short-circuit error
section of the second metal layer (M2). In general, to apply the I/O pattern, a given
section of the design is taken as m x n, where m is double of the unit height (UH) and
n is equal to UH (Fig. 6.5) [4].
Then the received image is divided into pixels, the height and width of which are
chosen according to the minimum distance of the first metal layer (M1), so that one
pixel does not include two different M1 I/O cells [22, 23]. Then the value of each
pixel is calculated, taking into account how much of it is occupied by the I/O cell
284 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
0,6 0,05 1 0
0,1 0,05 1 0
0,9 0,9 1 0
0 0 0 0
(Fig. 6.6) [4, 24], after which the processed information is transferred to the specifics
generation module (SGM).
SGM represents a DL network consisting of two packet and two maximum
unifying layers. The two packet layers separate the key specifics of I/O cell patterns,
while the maximum unifying layers are used to reduce the number of specifics. After
the specifics of I/O cells are extracted, they are smoothed and passed to the input of
the classification algorithm. It represents a fully connected neural network and a
sigma function.
The sigma function is considered the last layer of the classification algorithm,
which scales the output value of the neural network between 0 and 1. Such scaling
directs the model to understand whether a given I/O pattern will cause M2 short
connection [25–28].
After the PR model is obtained, using DSPR, the presented approach helps the
detailed placement tool to prevent receiving problematic pattern not to have short-
connection circuit of M2. DSPR consists of three main steps:
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 285
i , Aj = k :
RNL Am ð6:1Þ
n
Equation (6.1) [4] describes the RNLs in the ICR between the i-th (Am i ) cells with
“m” direction and the j-th (Am i ) cells with “n” direction. As a result of the method
operation, the placement of cells is continuously changed in a stepwise manner until
the probability of a short circuit predicted by the M2 model for the placed cells is less
than 0.5 (Fig. 6.7) [4].
This method of evaluation and optimization of accessibility of standard I/O cells
based on ML is fast and reliable, but has some disadvantages, namely:
. Application of a large number of sample designs for training ML model, since the
effectiveness of the model directly depends on the amount of data used for
training.
. Need to train the model in case of any change in cell floorplan and/or I/O cells
because in this case the application of the previous model may lead to an incorrect
result.
. RNL iterative addition of ICRs during design.
. Only M2 short-circuit fault prediction using PR and DSPR, because after having
the pattern of standard I/O cells, it is possible to predict more design rule
violation.
. In the process of creating a ML model, the training data, placed and routed
designs, are entered as graphical data.
This results in necessity to perform the step of extracting the pattern of I/O cells
from graphical data.
Taking into account the above, other methods aimed at improving the accessibil-
ity of cell pins were further developed, which, in addition to predicting and
correcting errors at M2 metal level, also perform routability optimization at other
metal levels, estimate the cost of the circuit, and improve IC characteristics by using
SCs of different heights.
Optimization of Design Results by Using Standard Cells with Different Heights
in the Same Design
It is known [14, 30, 31] that by using cells with different heights in the same design,
certain parameters of the designed circuit are improved, namely:
. As a result of the use of cells with greater height, due to the increase in power
consumption and the increase in area, it is possible to obtain a faster circuit.
286 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
. As a result of the use of cells with a smaller height, at the expense of deterioration
of performance, it is possible to get a more dense and power-saving design
(Fig. 6.8) [3].
In addition to deterioration of performance, cells with a small height have one
more disadvantage. Such cells are more likely to have routability and I/O cells’
accessibility issues because they include fewer routing paths due to their small size.
Such cells can be drawn wider. In order to obtain a larger fan out, such an approach
will increase the parasitic capacitances (PC) of gates and interconnects, which will
lead to excessive power consumption and increased area [32].
Taking the mentioned problems into account, it can be said that the use of high
and low SCs in the same design can have a positive effect on IC performance in
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 287
places where it is more important, and in places that require denser placement, it is
possible to use low cells.
There are various methods to improve the applicability of cells of different
heights in the same design [14, 30, 31]. According to one such method [3], the
main emphasis is placed especially on cells that differ from each other in height by
not a full measure.
The application of this method starts by taking register transfer level (RTL)
description of ready design, the timing parameters, logical and physical models of
SC library, technological information including different heights of cells, area of
placement and routing (P&R) part, and the aspect ratio of the design as input files.
The final goal of the method’s work is the orderly arrangement of logical cells on
rows corresponding to their height. In addition to placement, the goal of the method
is to obtain the smallest possible area and power consumption of the design under the
conditions of the same performance [33–36].
In addition to the ones listed above, some other checks and conditions are also
applied during the PR of the design to improve the performance of the method,
namely:
. In order to ensure manufacturability, each fragment of SC placement with a
certain height should have a minimum height of two units [37, 38].
. In order to ensure the continuity of nwell, each fragment must consist of an even
number of rows [39].
288 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
. Each fragment must match the structure of metals and polysilicon layer (poly) in
the design.
. The horizontal distance between two fragments should be at least 4 units.
. There should be enough distance in the vertical direction between two fragments
so that the power and grounding buses of SCs of the first and second fragments do
not touch each other (Fig. 6.9) [3]. In Fig. 6.9 [3], the minimum inter-metal
distance (MID) of M2 is 64 nm, and the thicknesses of buses are 45 nm and
64 nm, respectively, in the case of low and high cells. In this case, even though
the difference in the widths of buses is smaller than the MID, the minimum
distance d must be 64 nm in order to fall on routing paths of metals.
. Divider cells should be placed to maintain minimum horizontal and vertical
distances.
According to the method, at the beginning of the operation, the RTL description
is taken, and the gate level description (GLD) design is synthesized using SC logic
descriptions with different heights. Before starting the P&R of design process, the
layout exchange format (LEF) file of physical description of cells is modified in
order to break the closed loop of uncertainty between the floorplan and the height of
the cells. The heights of all cells are changed to be equal to cells with the smallest
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 289
height, but the previous values are preserved. The modified LEF file is written to the
computer’s memory as mLEF (modified LEF). Such an approach eliminates uncer-
tainties and allows using the current industrial P&R means for further processes
according to the method [40]. As the initial SC logical descriptions are used during
placement, P&R tools estimate and optimize the design with high accuracy. With
this method, no conditions specific to the use of SCs with different heights are
applied during primary placement, so that P&R tool freely selects the most suitable
cells for area and timing parameters from libraries of SCs with different heights.
Thus, the high cells are used in time-critical parts, and the low cells are used in area-
critical parts.
After the first stage of placement, the placement area is divided according to SCs
with a specific height, taking into account the area of dividing cells. In this process,
the cell height values are restored to their original value [41–43]. Then, there are two
ways to adjust the placement of cells:
. By cell movement
. By changing the height of the cell
The placement of the cells is considered regular if each of them is in the range of
height, anticipated for it (Fig. 6.10) [3].
After the initial placement of cells is completed, the operation of the method can
be divided into three stages:
. Division of floorplan and definition of areas
. Placement and regulation by calculating timing parameters
. Conversion from mLEF file to original LEF file
The method of optimizing the design results by using the presented standard cells
with different heights in the same design is highly effective in placing and legalizing
cells, but it also has drawbacks, such as:
. Changing the cell description file with the LEF extension, as a result of which the
files are modified two times, and according to the method description, such a
change should be made during the execution of each design.
. Creation of separate placement sections for cells of each height, as a result of
which the physical design of a circuit is divided into parts in which cells of
different heights are placed at a certain distance from each other.
290 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
. After the initial routing and segmentation, the second routing is needed, the
reason being that the modified mLEF file is used first, then the detailed placement
needs to use the original LEF file, too.
Due to these reasons, methods were later developed to ensure the applicability of
SCs with different heights in the same design, while predicting the power consump-
tion of the circuit and selecting optimal cells to obtain better results.
Standard Cell Library Characteristics Extraction with Optimal Power Con-
sumption Using a Neural Network
As it is known [15, 44, 45], power consumption in circuits can be divided into two
groups (Fig. 6.11) [19]—dynamic and static.
The power consumption in ICs is highly dependent on the power consumption of
SCs, so to ensure low power consumption, cells need to be as optimal as possible.
Currently, there are various methods for reducing SC power consumption [20–
24]. Some of them [15, 43] are based on the reduction of power consumption during
the design of circuits, while others [44, 45] are based on ensuring their low
consumption at the initial stage of SC design.
One of the SC power consumption reduction methods [15] is noteworthy, which
by applying the DL technique to an already ready SC library, as a result, returns
more suitable SC library and the most acceptable constraints to the given design.
In general, the library selection process for a design with optimal/near-optimal
power consumption of the described method [15] can be divided into three main
parts (Fig. 6.12) [15]:
. Training of power consumption estimation model
. Estimation of power consumption
. Library optimization
According to the method, presented in Fig. 6.12 [15], some input data are first
read, namely, the different SC libraries, descriptions of several circuits, frequencies
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 291
of circuits, and power consumption of each of circuits. The input information is then
given to power cost estimation module, which includes a DL network. At the end of
the training, in order to estimate the power consumption of any design, there is a
need to provide the method with the SC library, the description of the new circuit,
and the required frequency as input data. Information about the SC library here
includes:
. Height of cells in them
. Heights of p and n domains
. Supply voltages
The power consumption estimation model includes a two-layer neural network,
one hidden and one output layer (Fig. 6.13) [15]. Data input into a neural network is
performed with n number of input features, and data output is performed with y
output features. SC library data and circuit information are considered as input
features for model training. The output of the model is the total power consumption
of the circuit.
DL model is presented in the following way [15]:
f ð x Þ = W 0 * h1 þ b0 , ð6:2Þ
where f(x) and h1 are the results of output and hidden layers, respectively, W is the
weight matrices, and b is the initial orientation values of layers. A neural network
also contains a non-everywhere differential nonlinear continuous activation
function [15]:
292 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
Fig. 6.13 The structure of the ML model for power consumption estimation
The TensorFlow library [25] was used to build the power estimation model. The
model uses the adaptive moment estimation (Adam) Stochastic optimization algo-
rithm to reduce the number of errors between the actual and predicted values [26]. In
order to exclude over-adaptation, the method of omitting some neurons during
training is also used [27].
After getting power consumption estimation model, according to the method, the
description of the most optimal/near-optimal library is searched among the imported
libraries. All 24 imported libraries are used in parallel for this process and are given
power estimation model as input data. According to the method, all libraries are
checked in a simple step-by-step way, and as a result the selected library is returned
(Fig. 6.14) [15].
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 293
Table 6.1 Properties of the Number of gates Height P/N value Voltage (V)
applied SC libraries
1 7 1 1.2
2 7 1 0.8
3 7 2 1.2
4 7 2 0.8
5 7 3 1.2
6 7 3 0.8
7 9 1 1.2
8 9 1 0.8
9 9 2 1.2
10 9 2 0.8
11 9 3 1.2
12 9 2 0.8
13 12 1 1.2
14 12 1 0.8
15 12 2 1.2
16 12 2 0.8
17 12 3 1.2
18 12 3 0.8
19 14 1 1.2
20 14 1 0.8
21 14 2 1.2
22 14 2 0.8
23 14 3 1.2
24 14 3 0.8
n
1
yi - y0i ,
2
MSE = ð6:4Þ
n i=1
2
yi - y0i
DC = 1 - i
, ð6:5Þ
i ð yi - y00i Þ2
where y0i is the actual value, y0i is the predicted value, y00i is the average of actual
values, and n is the number of all training data. According to the definition, the
accuracy of the model is higher when the MSE has a minimum value and the DC has
a maximum value [47–49] (Table 6.2).
As a result of model formation with ten different train-test data, the average value
for MSE was 0.304, and 0.962 for DC.
Three new circuits from ISCAS-89 [29]—s526, s713, and s15850—were used for
model validation. As a result of simulation, two different values of power consump-
tion were obtained as a result of synthesis with target frequencies. In the end, the
MSE values were calculated again, according to which:
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 295
. Good results were obtained for s526 and s713, 0.227 and 0.096, respectively.
. A relatively large value of 1.230 was obtained for s15850.
As a result of evaluations, the following became clear.
With the library where the supply voltage is 0.8 V and the cell height is minimal,
optimal designs are obtained in the frequency range of 100 kHz to 500 MHz.
Although reducing the supply voltage can reduce power consumption, it also
slows down the circuit, resulting in timing violations. Therefore, libraries with a
high supply voltage (1.2 V) and a large cell height are suitable for designing high-
frequency circuits, between 700 MHz and 1 GHz.
At the end of the operation of the described method, the generalized data are
presented in the form of a table (Table 6.3) [15].
With the method of extracting the characteristics of the library of standard cells
with optimal power consumption using a neural network, it is possible to extract the
optimal library for the given design with a high degree of accuracy. However, it has
certain disadvantages, namely:
. To extract the optimal library information, a large number of libraries are read,
which may have many parameters. Along with changing the parameters, it is
necessary to change the ML model and retrain the whole system [50, 51].
. The method does not work with similar high accuracy in the case of different
circuits of the same order.
. The method essentially does not optimize the circuit and/or cells in the case of
optimization at a given frequency [52].
. Not only is the optimization/calculation of timing parameters not included in the
optimization process, but also those libraries that violate one or another timing
value are ignored during the calculation [16].
In order to eliminate the listed drawbacks, approaches to reduce IC power
consumption by optimizing the characteristics of digital cells were further devel-
oped, which, in addition to reducing the total power consumption of circuits, also
reduce the number of libraries needed for optimization process and optimize timing
parameters of the circuit.
A Method of Adding Additional Metals by Calculating Timing Parameters of a
Circuit
Currently, additional limitations are imposed during IC manufacturing due to the
very small size of circuits being produced [53–58]. In addition to the fact that the
manufactured circuit must have accurate DRC, LVS results, parasitic characteristics
in the acceptable range, sufficient timing parameters, low power consumption, etc., it
is important that it also meets special manufacturing rules. Among the methods
aimed at increasing performance in designs are:
. Addition of cells without logic
. Increasing the size of interconnections where possible
. Addition of additional metals
. Verification and optimization of antenna effect
. Limitation of the minimum area of layers
Some of the listed methods do not have huge effect on the operation and timing
parameters of a circuit, while others have quite an effect and can lead to deterioration
of timing, power, and other parameters.
The addition of additional metals is one of the methods that has a great impact on
timing parameters of a circuit. The expediency of using this method is conditioned
by several circumstances, namely:
. Ensuring the manufacturability of the circuit for advanced technological
processes
. Filling the missing gaps of the circuit with “dummy” metals
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 297
Side capacitance can occur between two conductors that are in the same metal
layer and have horizontal overlapping (Fig. 6.16) [53].
In Fig. 6.16 [53], C1 and C2 are, respectively, the capacitances created by metals
A and B with metal C, and C3 is the reciprocal capacitance of metals A and B. The
side capacitance Cs is calculated by (6.7) [53], where l is the length of overlap of
metals and Pl(d) is the capacitance per unit length, depending on distance d between
metals.
298 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
Edge capacitance can occur between conductors that are located on different
metal layers, have overlap of parallel edges, but do not have overlap of area pro-
jections (Fig. 6.17) [53].
In Fig. 6.17, metal C together with metals A and B create edge capacitances C1
and C2, and A and B together create capacitance C3. In general case, the capacitance
Ce between layers l1 and l2 is calculated by (6.8) [53], where P12(d) and P21(d ) are
the edge capacitances per unit length depending on distance d between metals.
The capacitance of a circuit to the ground point of all wires is called “equivalent”
and can be calculated by analyzing the network using the conductance matrix.
Thus, the metals added to circuits for these purposes lead to additional parasitic
characteristics and deterioration of timing parameters; therefore, when adding these
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 299
corners and sometimes holes inside. Because of this, it is necessary to perform one
more step: converting polygons to rectangles [31]. It should be noted that the quality
of subsequent addition of additional metals is highly dependent on the underlying
conversion, as a result of which the improved method of converting polygons to
rectangles (IPRC) is used during this process. It iteratively considers each given
vertex and efficiently creates rectangles [31].
Later, the IPRC method transforms into direction-specific polygons-to-rectangles
(DPRC) method, where the images are described as “wide” and “high” rectangles
depending on the preferred routing direction of a given metal layer, in the horizontal
and vertical directions, respectively. Then, on the basis of the obtained results, the
vertical and horizontal revision algorithm is applied, which unites the domains to be
filled, which have a common side with each other (Fig. 6.19) [53].
The performed verifications have shown that, compared to IPRC method, DPRC
significantly increases the upper limit of metal density and allows introducing better
solutions in the future, which is important especially in the case of introducing
additional metals in lower metal layers.
The next step in the described method is to plan the target density. In general, for
efficient calculation of minimum metal density, the selected verification window is
taken with dimensions w x w and shifted to area of the circuit by w2 step. To satisfy
such a verification, it is advisable to divide the circuit by w2 × w2 when adding the
additions and immediately apply the minimum metal density condition to obtained
windows. However, on the other hand, larger window sizes are more efficient for
synthesis of additions, in terms of providing greater flexibility in adding additions
and splitting local additions. Therefore, in the described method, instead of using the
above subwindows, a new approach is described: target density planning, which
creates a window-specific density value in each window, taking into account two
conditions:
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 301
where:
. acij is the critical wire area inside Wi, j window.
. anc
ij is the area of the noncritical wires inside Wi, j window.
. wc and wnc are acij and anc ij coefficients, respectively.
. e is an infinite small constant.
Equation (6.9) [53] consists of two parts (6.11 and 6.12) [53], where the first part
contributes to reducing the target density in deterministic and/or multiwired
windows.
i,j - Di,j ,
M ≤ Dmax ð6:14Þ
After creating the target densities, synthesis of metal additions is performed (i.e.,
additional metal layers that do not contain logical connections are added to free
places of the circuit). The additions created at this stage are of particular importance
in preventing the deterioration of timing parameters of the circuit, because they are
responsible for the emergence of additional capacitances in the circuit. The most
important points to be calculated in the process of creating additions can be consid-
ered [32–34]:
. Prevention of occurrence of areal capacitances between decisive wires and
implemented additions
. Increasing possible metal distance between additions and other conductors
. Prevention of inserting additions in the middle of two parallel conductors
. Reduction of parallel linear length between additions and parallel conductors
After creating the initial values of variables in the algorithm for creating general
additions, according to the difference between the windows and the maximum
possible densities, the windows are classified in descending order. Then, the oper-
ation of iterative addition is performed on them (Fig. 6.20) [53].
condition of later post-modification and create new ML models for each design,
resulting in an increase in design time and the number of description files. In
addition, when choosing the optimal library from the point of view of power
consumption for a given circuit, a large number of libraries are used, which leads
to the extension of the time of the tools, and the human factor in the design
process increases.
. Methods of creating metal fillings do not take into account the possible voltage
drop during their creation and are not aimed at improving it, while their addition
to the design can significantly improve the results of IC voltage drop and reduce
the resources spent on it later.
For the reasons listed, with the improvement of the characteristics of digital cells,
there is a need to develop new principles and means of effective IC design, which,
taking into account the issues presented above, will allow to optimize circuits.
Thus, in order to solve the aforementioned problems, the following principles are
proposed for the effective design of integrated circuits by improving the character-
istics of digital cells:
Unlike the methods known from the literature, which optimize several layers,
optimize all metal layers. Since the methods known from the literature optimize
only some metal layers, not considering the other layers, it is proposed to create
an environment for checking and optimizing the accessibility of I/O pins.
The latter will check the I/O pins by creating special matching cases for cells and
mark those cells that have accessibility problems. The main difference of this
approach, compared to existing methods, is that all cells are collected in one
complete design and combined according to a special pattern (Fig. 6.21), as a result
of which an opportunity is created to check the routability of cells in dense place-
ment conditions in cases of different directions of cells.
Using the ML algorithm, possible problems will be predicted as a result of cell
routing, and inter-cell distance rules necessary for their solution will be created. In
this case, the main difference compared to existing means is that the multi-class
classification method is used during the training of ML model, which makes it
possible to check and optimize other metal layers as well (Fig. 6.22) [35].
The use of filler cells created by ML during the placement of cells with different
heights. This method, in contrast to the methods known from the literature, in the
case of placement of cells with different heights, uses filler cells created by ML.
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 305
As a result of this, it is possible to place the cells, without a big change in total IC
area without violating the rules of power and grounding buses and main layers
(Fig. 6.23).
The addition of metal fittings connected to the power and grounding grid for IC
design process. Since metal additions increase the overall capacitance of the
circuit, but have an important effect on other stages of IC design, it is possible
to optimize the voltage drop on the power and grounding of the circuit when
adding them. The main difference of this method from the others is that the metal
additions added to the circuit are connected to the power and ground wires
(Fig. 6.24), as a result of which the influence of rapidly changing signal wires
in the design on the other wires is reduced, improving the voltage drop. Such a
change significantly reduces the voltage drop at the expense of a certain deteri-
oration of timing parameters.
By improving the characteristics of digital cells, the proposed principles of
effective design of integrated circuits will allow to significantly improve the main
technical characteristics and parameters of SCs: power consumption, designability,
and parasitic characteristics.
6.1 General Issues in Design of Digital Integrated Circuits by Improving. . . 307
Conclusions
1. Development of effective integrated circuit design tools by improving the char-
acteristics of digital cells is relevant, as it can significantly improve the main
parameters and characteristics of digital integrated circuit designs based on them.
2. Analysis of the existing approaches and means of developing means of effective
design of integrated circuits by improving the characteristics of digital cells
shows that in the latter, the important circuit parameters, designed based on
them, power consumption, voltage drop in the power supply, routability, etc.,
are not sufficiently optimized. However, from the point of view of practical
requirements of the design of modern digital integrated circuits, this degree of
optimization is not enough, and especially from the point of view of efficiency,
there is a need to develop new methods, algorithms, and software tools that are
significantly superior to the existing ones.
3. By improving the characteristics of digital cells, the principles of effective design
of integrated circuits have been proposed, which, at the expense of machine time
costs and nonsignificant deterioration of timing parameters and area of the
designed integrated circuit, allow to significantly improve the main parameters
of integrated circuits designed based on them: power consumption, voltage drop
in power supply, routability, etc.
308 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
As mentioned in Sect. 6.1, the accessibility of standard I/O pins plays an extremely
important role in IC design process. Its improvement leads to easier circuit routing
and reduced DRC violations. Two methods are used to improve the accessibility of
standard I/O cells: relatively simple and predictive and optimizing with the use of
ML algorithms.
Fig. 6.25 The block diagram of the method of optimizing access to I/O cells SCs with an
experimental design
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 309
Fig. 6.30 Comparison of verification process times of existing and proposed methods
the resulting libraries. Then, with the new libraries obtained, three types of designs
with simple RTL description, I/O accessibility verification algorithm, and applica-
tion of the presented method were made. DRC and LVS checks were performed on
the circuits obtained at the end of the design.
According to the experimental results, as a result of using the proposed method,
the cell library verification speed increased by approximately 5784.3 s or approxi-
mately 9.4 times compared to the I/O cell verification method (Fig. 6.30). Under the
conditions of the proposed placement of cells, the number of cases of their matching
312 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
Fig. 6.31 Comparison of cases of cell matching of existing and proposed methods
In order to predict the accessibility of I/O pins of SCs in the mid-design phase and to
increase their routability during design, an enhanced method of I/O cell accessibility
prediction and design optimization with machine learning is proposed. The operation
of this method can generally be divided into three parts (Fig. 6.32). They are:
. Extraction of I/O coordinates from designs
. ML model training
. Creation of horizontal and vertical ICRs for SCs
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 313
Fig. 6.32 Steps of an enhanced method for predicting the accessibility of ML I/O cells and
optimizing the design
Extraction of I/O coordinates from designs is one of the first and most important
steps of the proposed method, because the accuracy of later-created ML model
depends on the mentioned coordinates. Therefore, it is necessary that, on the one
hand, the extracted coordinates are accurate, and on the other hand, they contain
quite simple information, taking into account the speed of further processes. To
extract I/O coordinates from design, the method first reads various designs in the
memory, along with their corresponding DRC results. Then, using the simple
commands of placement and routing tools, it extracts the reference name of the
used cells [68], the metal layer creating I/O cells, and their coordinates. The obtained
results are saved in a predetermined special format (Table 6.4). As during the
extraction of I/O pins, not the topological description of design but the design library
[69] and simple commands of placement and routing tool are used, this process does
not take much time to perform.
After extracting the I/O coordinates, the proposed method creates a ML model to
predict the problematic cells. As a result of model operation, it is possible to identify
at most seven types of DRC violations:
314 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
Both packet layers have training weights, the values of which change depending
on the presence of DRC violations in the given segment. In the next step, the
extracted characteristics are passed to the classification node. This node multi-
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 315
classifies the input data according to several labels. A one-versus-all type of multi-
layer classification is used here [70, 71]. To apply the one-versus-all type of
classification, the entire data set is divided into eight separate groups, with the
training group labeled as “1” and all others as “0.” In this case, when the data is
entered into the classification algorithm, it goes through the process of binary
classification, and in the case of a predicted violation, a “positive” value is generated,
and in the case of all others, a negative value (6.18).
When the cells that have the probability of DRC occurrence during routing are
separated from the design, the proposed method then creates ICRs for them to
prevent the given violation. Later, these ICRs can be used in any design made
with these cells. To prevent DRC violations, the method calculates the existing
distance of I/O cells of problematic cells and then, reading the technology informa-
tion, adds an ICR value such that the probability of this violation will be less than
0.5. For example, for cells U1 and U2 in Fig. 6.34, the method predicted a short
circuit of M1, after which, to avoid the problem, it created an ICR in the amount of
2 units.
When the cells that have the probability of DRC occurrence during routing are
separated from designs, the proposed method then creates ICRs for them to prevent
the violation.
Later, these ICRs can be used in any design made with these cells.
316 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
During ICR creation, cell directions are considered to avoid excessive constraints.
As a result, the process of creating ICRs can be represented by (6.19):
where C1,x represents the first cell in the X direction and C2,y is the second cell in the
Y direction. The necessary distance between them is defined by A number.
Thus, as a result of the application of the ML model, it is possible to predict and
later also avoid routing problems arising during a specific arrangement, as well as to
have a larger number of metal layers to be checked.
Evaluation of the Effectiveness of the Proposed Machine Learning Enhanced
I/O Cell Accessibility Prediction and Design Optimization Method
SAED14nm [67] libraries were used to evaluate the effectiveness of the above
method. With them, designs were carried out, in which the density of cells varies
in the range of 30–40%. In this technological process, the number of cells for each
threshold voltage is approximately ~1000. The number of ML data taken from
designs was approximately 6000, which was then divided into training and testing
lists with a ratio of 70:30. The system on which checks were made has the following
parameters: 2 cores with a frequency of 2.4 GHz, Linux operating system. Then the
obtained results were compared with simple design methods presented in [63]
(Table 6.5).
From the point of view of metal layers different from M2, the simple design
approach and the presented method were also compared (Table 6.5).
The accuracy of the ML model for all DRC types was also calculated using (6.20)
(Table 6.6).
TP þ TN
accuracy = × 100%: ð6:20Þ
TP þ TN þ FP þ FN
Fig. 6.35 The result of logical synthesis with cells of different heights
As already mentioned above, the use of cells with different heights in the same
design increases the circuit performance and reduces its power consumption. Logical
synthesis tools are easily able to choose the height cell during synthesis process with
the greatest possible accuracy, which provides maximum performance and minimum
power consumption on the given time path (Fig. 6.35). The main problem of such
synthesis of circuits is that later the physical placement and routing tool does not
have the capability to perform the necessary synthesis with such ease.
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 319
Fig. 6.36 The flow of optimizing SCs when doing design of cells having different heights
A method for increasing the accessibility of cells with different heights in the
same design is proposed [72], which, by modifying the GDS file of physical
description of cells in the given library, makes it possible to place them regardless
of their height. The flow of the proposed method (Fig. 6.36) mainly consists of two
parts: changing the power and grounding buses of SCs and implementing the design
with modified cells [73–75].
Basically, SCs have the same structure of power and grounding buses [76]. Such
a structure allows for easy connection of cells to circuit-level power and grounding
grid during the design phase when placing the cells next to each other. And when
combining cells in vertical direction, they are placed straight and upside down,
sharing the given power or grounding bus [77–79].
When SCs of different heights are placed next to each other, the arrangement of
their power and ground buses, as well as many other main layers, does not match
correctly, which makes such placement impossible (Fig. 6.37).
As described, the combination of cells with different heights is unacceptable from
the point of view of cell placement approaches and power and grounding buses [80–
82]. The proposed approach to solve these problems, using ML algorithms, modifies
the power and grounding grid of SCs as well as creates special filler cells for correct
placement of cells with different heights.
The operation of the method begins with reading GDS description of SC library
and separating layers used for placement and routing. The collected information is
then processed into ML algorithm for reading and storage.
In the next stage, the ML model, reading the previously prepared information,
specifies the metals belonging to power and grounding grid [83]. In this process, the
320 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
basic patterns of metal layers are also extracted: minimum distance, minimum angle
length, minimum distance of the same wire, etc. Then, on those cells, where it is
possible to add additional power and grounding metals, according to the method,
following the rules, they are added. In general, it is possible to change the power and
ground buses on cells in cases of which:
. There are no signal and/or power and grounding wires in the specified areas.
. The addition of power and grounding wires does not lead to violation of rules of
metal layers.
. All minimum distance rules are kept.
. The additional supply and grounding buses are of the same type as the cell’s
buses.
. Added layers intersect with M1 and M2 routing paths.
All the cells, whose structure and the view of power and grounding buses do not
correspond to the mentioned points, are left out of the change process [84, 85].
Another important circumstance for the change of SCs is the selection of their
placement limit. Current placement and routing tools determine the legal location of
cells when placing them, depending on their starting point [86, 87]. For this reason,
in order to be able to place the modified cell using placement and routing tools, their
starting point is raised and aligned with the starting point of the small height cells
(6.21).
After the corresponding changes in the power and grounding grids, three types of
filling cells are created (Fig. 6.38) for filling the lines in places of matching. These
filling cells have two main uses:
. Keep one minimum distance between high and low cells
. Fill the free lines after placement and routing processes
If the free area is greater than one minimum distance, the filling cells can be
combined.
The files created as a result of method operation are of three types: GDS
descriptions of modified filling cells and added filling cells and the list of cells that
have not been modified.
In order to design with the obtained library, first, using the initial logical descrip-
tion of cells, a logical synthesis is performed. As the logic synthesis tool tries to
reduce circuit area and increase performance, it will select the required cell from the
given libraries.
In the physical design phase, the minimum size of the placement grid is chosen
according to the cells with the lowest height. After that, the circuit-level power and
grounding buses are created. They are mostly M2 or higher metal layers. Then
placement and routing is done, and then filling of empty places using the SC library
and filling cells, created according to the method.
Thus, by using the proposed method, it is possible to apply them in the same
design as a result of changing the physical descriptions of SCs with different heights.
During the operation of this method, only GDS descriptions of SC library are
modified, and then, using standard tools, placement and routing are performed.
Evaluating the Effectiveness of Optimization Method of SCs
for the Implementation of Design with Cells of Different Heights
To evaluate the effectiveness of the proposed method, three designs were performed
with libraries of 9T (low) and 12T (high) heights. At the end of the implementation
322 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
of designs, the physical and timing parameters of obtained circuits were compared
(Table 6.8).
Thus, as a result of using the proposed method, it is possible to obtain approx-
imately 19% optimization in circuit area and approximately 14.3% optimization in
timing parameters, compared to circuits designed with only 12T and only 8T
libraries, respectively. However, as a result of the changes made, the time spent
for placement and routing increased by approximately 27%, and the power con-
sumption increased by only 13%.
As mentioned earlier, IC power consumption has two component parts: dynamic and
static. One of the options used to reduce power consumption [14] is to disconnect
SCs that are not currently used in the circuit from the power supply and grounding
grid. Thus, the given part of the circuit will have as low power consumption as
possible (there will be a certain amount of consumption due to leakage currents).
However, implementing this approach requires either special libraries or more
complex design steps.
In order to overcome the described challenges, the method of integration of “sleep
mode” in SCs using a neural network is proposed [88]. The operation of this method
(Fig. 6.39) can be divided into five main parts:
. Creation of format of input files and reading
. Extraction of technology-specific parameters with the DL algorithm
. Creation of special cells
. Indication of cells to be changed
. Changing cells and extracting final files
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 323
Fig. 6.39 Operating principle of “sleep mode” integration method in SBs using a neural network
According to this method, all cells that can be optimized are disconnected from
the main power source. Their connection is provided by special cells. That is,
conditional “false” supply and grounding buses occur, the cells of which are
connected (Fig. 6.40).
In the first step, the proposed method reads the GDS description of the SC library
and extracts the coordinates of metals, interconnects, and blocking layers from it
[89]. The output information of this stage is the positions of different layers and, if
any, the names of I/O cells attached to them. The information is then given to a data
extraction algorithm, which includes a three-layer DL model. The first of the layers
is the input layer, followed by two hidden layers and one output layer [90]. The
output layer of the model returns the min values extracted for the following metal
rules: distance, overlap, edge distance, area, and U-distance. These data are later
used in the process of creating special types of cells [91, 92].
324 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
The creation of special cells is the next stage of the proposed method. At this
stage, the method, using the extracted values, creates special power and grounding
control cells, which can be of three types (Fig. 6.41):
. P-MOS and N-MOS control
. P-MOS control, N-MOS filling
. P-MOS filling, N-MOS control
In the next step, the method reads the cells of the SC library one by one and marks
those that have the characteristics of the change. They are:
. SC has a power and/or ground connection close to the edge.
. No large amount of routing barriers and metals near cell edges.
For the cells that meet the mentioned requirements, the power and grounding grid
is changed, and special cells are added [93]. If the power and grounding connections
of the cell are located on one side of it, then only one special cell is added; otherwise,
according to the appearance of the connections, two cells are added (Fig. 6.42). After
that, the SC initial power and ground connections are disconnected and connected to
the control cell.
At the end, the GDS description containing all the cells of the library is written,
including the modified ones.
Thus, the proposed method changes the appearance of the initial SCs of the
library through the DL network and integrates the “sleep mode” control into them. In
the future, the use of such cells in designs will provide an opportunity to manage the
connection of cells to power and grounding grid, thus reducing static and dynamic
power consumption.
Evaluating the Effectiveness of Cell “Sleep Mode” Integration Method Using
Neural Network for Low-Power Designs
To evaluate the effectiveness of the method, designs with two libraries of different
heights were implemented. One of them is 12T high; the other is 8T. In the
completed designs, the cells were used in one case without the proposed change,
6.2 Methods of Design of Digital Integrated Circuits by Improving. . . 325
Fig. 6.42 The final appearance of SB after the addition of control cells
in the other case, including it. According to the results (Table 6.9), high-height cells
have greater capabilities for change, but the optimization in power consumption is
approximately 5–6% greater for low-height cells. According to the obtained results,
the total power consumption can be reduced by approximately 12%, but due to the
added special cells, the area of circuits increases by approximately 5–28%, and the
timing parameters deteriorate by approximately 16% on average.
Thus, as a result of using the proposed method, it is possible to obtain an average
power consumption saving of approximately 12% at the expense of approximately
5–28% increase in area and approximately 16% deterioration of timing parameters.
As already mentioned in Sect. 6.1, the addition of extra metals in the circuit is
extremely important for increasing its productivity. However, this process causes an
increase in the overall parasitic capacitance of the circuit, which affects its perfor-
mance and speed. On the other hand, during circuit operation, it is important to have
a small value of voltage drop across its buses, because, otherwise, the connected
voltage from high-level power and ground buses will drop greatly before reaching
the SCs [94].
326
6
b
328 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
When the coordinates of the areas to be filled are already known, the next step is
to identify the power and/or grounding wires located near them or crossing them. It is
important for this process that the rectangle to be filled in:
. The area meets the minimum requirements of the given metal layer.
. The width and length should be greater than the sum of dimensions of minimum
thickness and distance for the given metal layer.
All fillable quadrilaterals that meet the specified conditions are marked as possi-
ble power and grounding domains.
In the next step, already marked domains are given a special attribute—“owner.”
This indicates which wire the specified domain should be connected to later: power
or ground. In order to connect the domain metals to any wire, it is important that:
. It is close to the given wire or has overlap.
. Its addition does not lead to maximum density problems.
Since one power or ground wire may be adjacent to or overlap multiple filler
metals, a wire-to-delta distance is defined between the wire and the center of gravity
of the filler domain. The filler domains are then classified by delta distance. After
that, the method physically connects the power and ground wires to metals of the fill
domain, according to the already mentioned “owner” characteristic. To maintain the
total density, the method also records the current metal density in memory
[99, 100]. At the end, additions are added to the remaining free domains to maintain
the minimum density rules by calculating timing conditions [53]. At the end of the
design, the output information is the physical description of the design containing
metal fillings (Fig. 6.44).
References
1. L.M. Naga, P. Mullangi, Design and development of an ASIC standard cell library using 90nm
technology node. 2018 international conference on computer communication and informatics
(ICCCI) (2018), pp. 1–6. https://doi.org/10.1109/ICCCI.2018.8441222
2. S. Hougardy, M. Neuwohner, U. Schorr, A fast optimal double row legalization algorithm.
International symposium on physical design (ISPD ’21) (2021), pp. 1–5
3. K. Darav, N.A. Kennings, D. Westwick, L. Behjat, High performance global placement and
legalization accounting for fence regions. 2015 IEEE/ACM international conference on
computer-aided design (ICCAD) (2015), pp. 514–519
4. T.-C. Yu et al., Pin accessibility prediction and optimization with deep-learning-based pin
pattern recognition. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 40(11),
2345–2356 (2021). https://doi.org/10.1109/TCAD.2020.3040078
5. S. Fang, C. Tai, R. Lin, On benchmarking pin access for nanotechnology standard cells. 2017
IEEE computer society annual symposium on VLSI (ISVLSI) (2017), pp. 237–242. https://
doi.org/10.1109/ISVLSI.2017.49
6. M. Danigno et al., Proposal and evaluation of pin access algorithms for detailed routing. 2019
26th IEEE international conference on electronics, circuits and systems (ICECS) (2019),
pp. 602–605. https://doi.org/10.1109/ICECS46596.2019.8965194
7. V.B. Suresh, P.V. Kumar, S. Kundu, On lithography aware metal-fill insertion. Thirteenth
international symposium on quality electronic design (ISQED) (2012), pp. 200–207. https://
doi.org/10.1109/ISQED.2012.6187495
8. X. Bai et al., Timing-aware fill insertions with design-rule and density constraints. IEEE Trans.
Comput.-Aided Des. Integr. Circuits Syst., 1–4 (2021). https://doi.org/10.1109/TCAD.2021.
3133854
9. Y. Chen, H. Jiao, Standard cell optimization for ultra-low-voltage digital circuits. 2019
international conference on IC design and technology (ICICDT) (2019), pp. 1–4
10. W.-T. Wong, K. Singh, J. Huisken, J.P. de Gyvez, Power and variation improved near-Vt
standard cell library for 28-nm FDSOI. 2019 IEEE SOI-3D-subthreshold microelectronics
technology unified conference (S3S) (2019), pp. 1–2. https://doi.org/10.1109/S3S46989.2019.
9320687
11. N. Mamikonyan, N. Melikyan, R. Musayelyan, IR drop estimation and optimization on
DRAM memory using machine learning algorithms. 2020 IEEE East-West design & test
symposium (EWDTS) (Varna, 2020), pp. 1–4. https://doi.org/10.1109/EWDTS50664.2020.
9224772
12. X.X. Huang, H.C. Chen, S.W. Wang et al., Dynamic IR-drop ECO optimization by cell
movement with current waveform taggering and machine learning guidance. 2020 IEEE/
ACM international conference on computer aided design (ICCAD) (San Diego, 2020), pp. 1–9
13. Y. Lin, B. Yu, D.Z. Pan, High performance dummy fill insertion with coupling and uniformity
constraints. IEEE TCAD 36(9), 1532–1544 (2017). https://doi.org/10.1145/2744769.2744850
332 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
14. S.A. Dobre, A.B. Kahng, J. Li, Design implementation with noninteger multiple-height cells
for improved design quality in advanced nodes. IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst. 37(4), 855–868 (2018)
15. S.H. Lim et al., Generating power-optimal standard cell library specification using neural
network technique. 2017 IEEE Asia Pacific conference on postgraduate research in micro-
electronics and electronics (Prime Asia) (2017), pp. 101–104. https://doi.org/10.1109/
PRIMEASIA.2017.8280374
16. Y.C. Liu, C.Y. Han, S.Y. Lin, J.C. Li, PSN-aware circuit test timing prediction using machine
learning. IET Comput. Digit. Techniq. 11(2), 60–67 (2017)
17. Q. Zhou, X. Wang, Z. Qi, Z. Chen et al., An accurate detailed routing routability prediction
model in placement. Proceedings of the Asia symposium on quality electronic design
(ASQED) (2015), pp. 119–122
18. P. Debacker, K. Han, A.B. Kahng, H. Lee et al., Vertical M1 routing-aware detailed placement
for congestion and wirelength reduction in sub-10nm nodes. Proceedings of ACM/IEEE
Design Automation Conference DAC) (2017), pp. 1–6
19. F. Tabrizi, N.K. Darav, S. Xu, L. Rakai et al., A machine learning framework to identify
detailed routing short violations from a placed netlist. Proceedings of ACM/IEEE Design
Automation Conference (DAC) (2018), pp. 1–6
20. Z. Qi, Y. Cai, Q. Zhou, Accurate prediction of detailed routing congestion using supervised
data learning. Proceedings of 2014 IEEE international conference on computer design (ICCD)
(2014), pp. 97–103
21. F. Tabrizi, N.K. Darav, L. Rakai et al., Detailed routing violation prediction during placement
using machine learning. Proceedings of international symposium on VLSI design, automation
and test (VLSIDAT) (2017), pp. 1–4
22. W.T.J. Chan, Y. Du, A.B. Kahng et al., BEOL stack-aware routability prediction from
placement using data mining techniques. Proceedings of IEEE international conference on
computer design (ICCD) (2016), pp. 41–48
23. Z. Xie, Y.H. Huang, G.C. Fang, H. Ren et al., RouteNet: Routability prediction for mixed-size
designs using convolutional neural network. Proceedings of IEEE/ACM international confer-
ence on computer-aided design (ICCAD) (2018), pp. 1–8
24. W.T.J. Chan, P.-H. Ho, A.B. Kahng, P. Saxena, Routability optimization for industrial designs
at sub-14nm process nodes using machine learning. Proceedings of ACM international
symposium on physical design (ISPD) (2017), pp. 1–5
25. J. Seo, J. Jung, S. Kim, Y. Shin, Pin accessibility-driven cell layout redesign and placement
optimization. Proceedings of ACM/IEEE design automation conference (DAC) (2017),
pp. 1–6
26. M.M. Ozdal, Detailed-routing algorithms for dense pin clusters in integrated circuits. IEEE
Trans. Comput.-Aided Des. Integr. Circuits Syst. 28(3), 340–349 (2019)
27. W. Ye, B. Yu, D.Z. Pan, Y.C. Ban et al., Standard cell layout regularity and pin access
optimization considering middle-of-line. Proceedings of Great Lakes symposium on VLSI
(GLSVLSI) (2015), pp. 1–5
28. Y. Ding, C. Chu, W.K. Mak, Pin accessibility-driven detailed placement refinement. Pro-
ceedings of ACM international symposium on physical design (ISPD) (2017), pp. 1–4
29. X. Xu, B. Cline, G. Yeric, B. Yu, et al., Self-aligned double patterning aware pin access and
standard cell layout co-optimization. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst.
34(5), 699–712 (2015)
30. S. Dobre, A.B. Kahng, J. Li, Mixed cell-height implementation for improved design quality in
advanced nodes. Proceedings of ICCAD (Austin, 2015), pp. 854–860
31. L. Guo, Y. Cai, Q. Zhou, X. Hong, Logic and layout aware voltage island generation for low
power design. Proceedings of ASP DAC (Yokohama, 2007), pp. 666–671
32. J.A. Ellis, Embedding rectangular grids into square grids. IEEE Trans. Comput. 40(1), 46–52
(1991)
References 333
33. B. Kahng, S. Kang, H. Lee et al., High performance gate sizing with a signoff timer. Pro-
ceedings of ICCAD (San Jose, 2013), pp. 450–457
34. K. Han, A.B. Kahng, J. Lee et al., A global-local optimization framework for simultaneous
multi-mode multi-corner clock skew variation reduction. Proceedings of DAC (San Francisco,
2015), pp. 1–6
35. R.L.S. Ching, E.F.Y. Young, K.C.K. Leung, C. Chu, Postplacement voltage island generation.
Proceedings of ICCAD (San Jose, 2006), pp. 641–646
36. H. Wu, M.D.F. Wong, I.-M. Liu, Timing-constrained and voltage island-aware voltage
assignment. Proceedings of DAC (San Francisco, 2006), pp. 429–432
37. J. Alpert, A. Devgan, C. Kashyap, A two moment RC delay metric for performance optimi-
zation. Proceedings of ISPD (San Diego, 2000), pp. 73–78
38. V. Kashyap, C.J. Alpert, F. Liu, A. Devgan, PERI: A technique for extending delay and slew
metrics to ramp inputs. Proceedings of TAU (Monterey, 2002), pp. 57–62
39. Y. Lin et al., MrDP: Multiple-row detailed placement of heterogeneous sized cells for
advanced nodes. Proceedings of ICCAD (Austin, 2016), pp. 1–8
40. OpenCores. [Online]. Available: http://opencores.org. Accessed 11 Aug 2014
41. H. Wu, M.D.F. Wong, Improving voltage assignment by outlier detection and incremental
placement. Proceedings of DAC (San Diego, 2007), pp. 459–464
42. H. Wu, I.-M. Liu, M.D.F. Wong, Y. Wang, Post-placement voltage island generation under
performance requirement. Proceedings of ICCAD (San Jose, 2005), pp. 309–316
43. F. Beeftink, P. Kudva, D. Kung, L. Stok, Gate-size selection for standard cell libraries. 1998
IEEE/ACM international conference on computer-aided design. Digest of technical papers
(IEEE Cat. No.98CB36287) (San Jose, 1998), pp. 545–550
44. F. Ye, F. Firouzi, Y. Yang, K. Chakrabarty, et al., On-chip droop-induced circuit delay
prediction based on support-vector machines. IEEE Trans. Comput.-Aided Des. Integr. Cir-
cuits Syst. 35(4), 665–678 (2016)
45. J. Zhou, S. Jayapal, B. Busze, L. Huang, et al., A 40 nm dual- width standard cell library for
near/sub-threshold operation. IEEE Trans. Circuits Syst. I Regular Pap. 59(11), 2569–2577
(2012)
46. S. Kajihara, K. Kinoshita, I. Pomeranz, R. Sudhakar, Combinationally irredundant ISCAS-89
benchmark circuits. Circuits and systems, 1996. ISCAS ’96., Connecting the world., 1996
IEEE (1996) (Vol. 4), pp. 632–634. https://doi.org/10.1109/ISCAS.1996.542103
47. M. Anis, M. Allam, M. Elmasry, Impact of technology scaling on CMOS logic styles. IEEE
Trans. Circuits Syst. II Analog Digit. Sign. Process 49(8), 577–588 (2002)
48. Kingma, J. Ba, Adam: A method for stochastic optimization. CoRR (2014) (Vol.
abs/1412.6980), pp. 1–4
49. S. Kung, R. Puri, Optimal P/N width ratio selection for standard cell libraries. 1999 IEEE/
ACM international conference on computer aided design. Digest of TEChnical Papers (Cat.
No.99CH37051) (San Jose, 1999), pp. 178–184
50. M. Abadi et al., Tensorflow: Large-scale machine learning on heterogeneous distributed
systems. CoRR (2016) (Vol. abs/1603.04467), pp. 1–4
51. P. Bastani, K. Killpack, L.C. Wang, E. Chiprout, Speedpath prediction based on learning from
a small set of example. 2008 45th ACM/IEEE design automation conference (Anaheim, 2008),
pp. 217–222
52. A.A. Yarygin, Current issues of machine learning with the support of intellectual agents in
decision-making tasks. Automat. Prob. Ideas Solut. С, 1–62 (2017) (in Russian)
53. B. Jiang et al., FIT: Fill insertion considering timing. 2019 56th ACM/IEEE design automation
conference (DAC) (2019), pp. 1–6
54. B. Kahng, K. Samadi, CMP fill synthesis: A survey of recent studies. IEEE TCAD 27(1), 3–19
(2018)
55. P. Liu, P. Tu, H. Wu, Y. Tang et al., An effective chemical mechanical polishing filling
approach. Proceedings of ISVLSI (2015), pp. 44–49
334 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
56. Y. Bo, S. Sriraaman, ICCAD-2018 CAD contest in timing-aware fill insertion. Proceedings of
ICCAD (2018), pp. 1–4
57. C. Feng, H. Zhou, C. Yan, J. Tao, et al., Efficient approximation algorithms for chemical
mechanical polishing dummy fill. IEEE TCAD 30(3), 402–415 (2011)
58. P. Gupta, A.B. Kahng, O.S. Nakagawa, K. Samadi, Closing the loop in interconnect analyses
and optimization: CMP fill, lithography and timing. Proceedings of VMIC (2005),
pp. 352–363
59. B. Kahng, K. Samadi, P. Sharma, Study of floating fill impact on interconnect capacitance.
Proceedings of ISQED (2006), pp. 1–6
60. Dummy Filling Methods for Reducing Interconnect Capacitance and Number of Fills /
T. Kurokawa, T. Kanamoto, A. Ibe, C.W. Kasebe, et al // Proc. ISQED (2005), pp. 586–591.
61. IBM Inc, CPLEX: High-performance mathematical programming solver for linear program-
ming, mixed integer programming, and quadratic programming, Version 12.70. https://www.
ibm.com/analytics/cplex-optimizer
62. R.O. Topaloglu, ICCAD-2014 CAD contest in design for manufacturability flow for advanced
semiconductor nodes and benchmark suite. Proceedings of ICCAD (2014), pp. 367–368
63. S.S. Abazyan, V.A. Janpoladov, N.E. Mamikonyan, Standard cell pin access checking inte-
gration into test design verification. Proc. RA NAS NPUA. Ser. Tech. Sci. 73(1), 74–81 (2020)
64. S.A.A.B. Olivier Aupoix, Optimizing standard cell pin accessibility in 14nm FDSOI with
synopsys pin access checker. Synopsys Users Group (SNUG) (2014), pp. 1–6
65. I. Ricci, de Munari, P. Ciampolini, An evolutionary approach for standard-cell library reduc-
tion. Proceedings of ACM great lakes symposium on VLSI (GLSVLSI) (2007), pp. 305–310
66. W. Agatstein, K. McFaul, P. Themins, Validating an ASIC standard cell library. Proceedings
of IEEE ASIC seminar and exhibit (1990), pp. 12/(6.1–6.5)
67. V. Melikyan, M. Martirosyan, A. Melikyan, G. Piliposyan, 14nm educational design kit:
Capabilities deployment and future. Small Syst. Simul. Symp. (9), 1–5 (2018)
68. V.A. Janpoladov, A.A. Petrosyan, S.S. Abazyan, H.V. Margaryan, Random faults injection
and simulation in auto-correction circuits. Proc. RA NAS NPUA Ser. Tech. Sci. 73(2),
171–180 (2020)
69. S. Abazyan, V. Melikyan, Enhanced pin-access prediction and design optimization with
machine learning integration. Microelectr. J. 116, 1–5, 105198 (2021). https://doi.org/10.
1016/j.mejo.2021.105198
70. S.-O. Shim, Multi-class classification based on relative distribution of class. 2020 2nd inter-
national conference on computer and information sciences (ICCIS) (2020), pp. 1–4. https://
doi.org/10.1109/ICCIS49240.2020.9257679
71. P. Del Moral, S. Nowaczyk, S. Pashami, Hierarchical multi-class classification for fault
diagnosis. Proceedings of the 31st European safety and reliability conference (2021),
pp. 2457–2464. https://doi.org/10.3850/978-981-18-2016-8_524-cd.
72. S. Abazyan, Standard cell library enhancement for mixed multi-height cell design implemen-
tation. 2021 IEEE East-West design & test symposium (EWDTS) (2021), pp. 86–89. https://
doi.org/10.1109/EWDTS52692.2021.9581045
73. C. Han, A. Kahng, L. Wang, B. Xu, Enhanced optimal multi-row detailed placement for
neighbor diffusion effect mitigation in sub-10nm VLSI. IEEE Trans. Comput.-Aided Des.
Integr. Circuits Syst. PP, 1–2 (2018). https://doi.org/10.1109/TCAD.2018.2859266
74. T. Lin, C. Chu, J. Shinnerl, Bustany, et al., POLAR: A high performance mixed-size
wirelengh-driven placer with density constraints. IEEE Trans. Comput.-Aided Des. Integr.
Circuits Syst. (34), 447–459 (2015). https://doi.org/10.1109/TCAD.2015.2394383
75. U. Brenner, BONNPLACE legalization: Minimizing movement by iterative augmentation.
IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 32, 1215–1227 (2013). https://doi.org/
10.1109/TCAD.2013.2253834
76. S.S. Abazyan, Method of designing power supply and grounding network of integrated
circuits. Manual Natl. Acad. Sci. Repub. Armenia Natl. Polytech. Univ. Armenia. Tech. Sci.
Ser. 74(2), 197–203 (2021) (in Armenian)
References 335
77. M. Danigno, P. Butzen, J. Ferreira, A. Oliveira et al., Proposal and evaluation of pin access
algorithms for detailed routing. 2019 26th IEEE international conference on electronics,
circuits and systems (ICECS) (2019), pp. 602–605
78. K. Khalil, O. Eldash, A. Kumar, M. Bayoumi, Economic LSTM approach for recurrent neural
networks. IEEE Trans. Circuits Syst. II Express Briefs 66(11), 1885–1889 (2019). https://doi.
org/10.1109/TCSII.2019.2924663
79. J. Verbraeken, M. Wolting, J. Katzy, J. Kloppenburg, et al., A survey on distributed machine
learning. ACM Comput. Surv. 53(2), 1–33 (2020). https://doi.org/10.1145/3377454
80. K. Khalil, O. Eldash, A. Kumar, M. Bayoumi, Machine learning-based approach for hardware
faults prediction. IEEE Trans. Circuits Syst. I Regular Papers 67(11), 3880–3892 (2020).
https://doi.org/10.1109/TCSI.2020.3010743
81. Y. Du, Q. Ma, H. Song et al., Spacer-is-sielectric-compliant detailed routing for self-aligned
double patterning lithography. Design automation conference (DAC), 2013 50th
ACM/EDAC/IEEE (2013), pp. 1–6. https://doi.org/10.1145/2463209.2488848
82. C.-K. Cheng, D. Lee, D. Park, Standard-cell scaling framework with guaranteed
pin-accessibility. 2020 IEEE international symposium on circuits and systems (ISCAS)
(2020), pp. 1–5. https://doi.org/10.1109/ISCAS45731.2020.9180592
83. W.-T. J. Chan, Y. Du, A.B. Kahng et al., Beol stack-aware routability prediction from
placement using data mining techniques. 2016 IEEE 34th international conference on com-
puter design (ICCD) (2016), pp. 41–48. https://doi.org/10.1109/ICCD.2016.7753259
84. J.-R. Yu, D.D. Gao, et al., Accurate lithography hotspot detection based on principal compo-
nent analysis-support vector machine classifier with hierarchical data clustering. J. Micro/
Nanolithogr. MEMS MOEMS 14(1), 1–12 (2014). https://doi.org/10.1117/1.JMM.14.1.
011003
85. W.-T.J. Chan, P.-H. Ho et al., Routability optimization for industrial designs at sub-14nm
process nodes using machine learning. Proceedings of the 2017 ACM on international
symposium on physical design, ISPD ’17 (Association for Computing Machinery,
New York, 2017), pp. 15–21. https://doi.org/10.1145/3036669.3036681
86. R.-B. Lin, Y.-X. Chiang, Impact of double-row height standard cells on placement and routing
(2019), pp. 317–322. https://doi.org/10.1109/ISQED.2019.8697712
87. J. Jooyeon, K.Taewhan, Utilizing middle-of-line resource in filler cells for fixing routing
failures. 2021 IEEE international midwest symposium on circuits and systems (MWSCAS)
(2021), pp. 1–5
88. S. Abazyan, Sh. Melikyan, D. Musayelyan, Standard cell library enhancement using neural
network based sleep mode control integration for low leakage designs. 2021 IEEE East-West
design & test symposium (EWDTS) (2021), pp. 105–108
89. S.A. Vitale, P.W. Wyatt, N. Checka, et al., FDSOI process technology for subthreshold-
operation ultralow-power electronics. Proc. IEEE 98(2), 333–342 (2010). https://doi.org/10.
1109/JPROC.2009.2034476
90. M.-C. Kim, N. Viswanathan, Z. Li, C.J. Alpert, ICCAD-2013 CAD contest in placement
finishing and benchmark suite. IEEE/ACM international conference on computer-aided
design, digest of technical papers (2013), pp. 268–270. https://doi.org/10.1109/ICCAD.
2013.6691130
91. S. Sreevidya, R. Holla, R. Raghu, Low power physical design and verification in 16nm
FinFET technology. 2019 3th international conference on electronics, communication and
aerospace technology (ICECA) (2019), pp. 936–940
92. A. Okazaki, VLSI researches for machine learning and neuromorphic computing. 2019
international symposium on VLSI technology, systems and application (VLSI-TSA) (2019),
p. 1. https://doi.org/10.1109/VLSI-TSA.2019.8804628
93. M. Bartík, External power gating technique – An inappropriate solution for low power devices.
2020 11th IEEE annual information technology, electronics and mobile communication
conference (IEMCON) (2020), pp. 0241–0245
336 6 Design of Digital Integrated Circuits by Improving the Characteristics. . .
94. A. Suren, M. Shavarsh, Educational open SPICE models neural network-based generation
method. Proceedings of the 9th small systems simulation symposium (2022), pp. 50–54
95. Z. Xie, X. Xu, J. Hu, Y. Chen, Fast IR drop estimation with machine learning. Proceedings of
the 39th international conference on computer-aided design (2020), pp. 1–8
96. Y. Chen, A.B. Kahng, G. Robins, A. Zelikovsky, Closing the smoothness and uniformity gap
in area fill synthesis. Proc. ISPD, 137–142 (2002)
97. T. Ruiqi, D.F. Wong, R. Boone, Model-based dummy feature placement for oxide chemical-
mechanical polishing manufacturability. IEEE Trans. Comput.-Aided Des. Integr. Circuits
Syst. 20(7), 902–910 (2001)
98. N. Mamikonyan, DRAM structure with prioritized memory bank using multi-VT bit cells
architecture. 2020 IEEE East-West design & test symposium (EWDTS) (Varna, 2020),
pp. 1–4. https://doi.org/10.1109/EWDTS50664.2020.9224821
99. J. Sercu, H. Barnes, Thermal aware IR drop using mesh conforming electro-thermal
co-analysis. 2017 IEEE 21st workshop on signal and power integrity (SPI) (Baveno, 2017),
pp. 1–4. https://doi.org/10.1109/SaPIW.2017.7944013
100. T. Lan et al., Timing-aware fill insertions with design-rule and density constraints. 2019 IEEE/
ACM international conference on computer-aided design (ICCAD) (Westminster, 2019),
pp. 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942079
Index
© The Editor(s) (if applicable) and The Author(s), under exclusive license to 337
Springer Nature Switzerland AG 2024
V. Melikyan, Machine Learning-based Design and Optimization of High-Speed
Circuits, https://doi.org/10.1007/978-3-031-50714-4
338 Index
I P
IC design flow, 329 Pipeline ADCs, 176–179, 189–192, 194, 210,
IC reliability, 59, 83, 124 211, 214
Instruction sets, 223, 228, 243 Pulse Amplitude Modulation (PAM4), 8, 22,
Integrated circuits (ICs), 1–52, 59–102, 25, 27, 46–52
109–159, 165, 221–274, 279–329
I/O cell, 1–5, 9, 10, 20, 109–115, 117, 119,
121–123, 126–130, 137, 144, 145, 150, S
279–286, 308–313, 315, 316, 323, 329 Self-calibration, 159, 169, 170, 172, 173, 176,
183–188, 190–195, 198, 202, 207, 211,
214, 243, 244, 246
M Signal distortions, 5, 9, 12, 69, 112, 116, 122,
Machine learning (ML), 228, 282, 283, 285, 137, 151, 159
292, 295, 296, 304, 305, 308, 312–314, Signal linearity, 165–194
316, 318, 319, 329 Signal transmission, 2, 7, 8, 12, 22, 46, 113,
Metal fillers, 303, 306, 327, 329 121, 122, 127–129, 148, 150, 155, 159
Speed of receiving sequential information, 25,
27
N Stability, 33–35, 63, 76, 77, 80, 81, 83, 84, 201
Negative feedback, 46, 67, 82–83, 93, 100, 102, Standard cells, 279, 281, 289, 296, 329
189, 190, 199
Neural networks, 284, 291, 293, 296, 322, 323,
329 T
Non-standard operating conditions, 59, 76, 77, Transfer of information, 1–52
81–83, 101
O
Offset, 22, 26, 41, 42, 46, 60, 125, 171
Offset errors, 172, 177, 178, 183, 185, 191, 194,
195, 197, 204, 210, 211, 214