Critical Clock-Domain-Crossing Bugs: Steve Edn080320Ms4271 Figure 1 Steve Edn080320Ms4271 Figure 2
Critical Clock-Domain-Crossing Bugs: Steve Edn080320Ms4271 Figure 1 Steve Edn080320Ms4271 Figure 2
Critical clock-domain-
crossing bugs
Awareness of CDC issues, along with the use
of good design practices and proven EDA tools
for CDC verification, can avoid costly silicon
re-spins and significantly improve time to market.
T
oday’s SOC (system-on-chip) designs have dozens synchronizer can be a simple double flip-flop. Designers com-
of clocks, many of which are asynchronous. This monly use this technique for a control signal’s CDCs. In a data
design approach facilitates the convergence of dig- transfer across clock domains, the data is first set up; then, a
ital-audio, video, wireless, and networking applica- control signal that synchronizes with the destination domain
tions in a single chip. CDCs (clock-domain cross- travels to the destination to enable data capture. Although
ings) can cause difficult-to-detect functional fail- this data-transfer technique across clock domains is a common
ures in SOCs involving multiple asynchronous clocks. Simula- and proven technique, it involves pitfalls that require special
tion and static-timing analysis often do not detect issues such attention. This technique relies on data to be stable when you
as metastability and the coherency of correlated signals’ CDCs; assert an enable STEVE EDN080320MS4271 FIGURE 2
(Figure 1).
as a result, these issues often end up as bugs in silicon. Unfortu- Having too low a margin between the data you are set-
STEVE
nately, most relevant EDN080320MS4271
literature does not adequately coverFIGURE 1 up and the enable you are asserting may corrupt the data
some ting
of these critical CDC issues, and designers learn about them transfer. A good way to prevent such problems is to design a
only after making costly mistakes. Two of the most common full handshake when you set up the data. In this approach, you
and critical issues involving CDCs are improper sequencing of assert and synchronize the request in the destination domain
data/enable in enable-based synchronization and data coher- and adequately assert an acknowledge to let the next data load
ency due to the convergence of signals. occur. This approach might add a few cycles of latency, but it
avoids functional failures.
Enable-based synchronization Glitches are other sources of worry across clock domains.
A receiver flip-flop output can become metastable if it vio- Typically, any combinational logic may be subject to short-
lates the data/reset setup-and-hold times. This scenario can lived glitches. These issues are generally harmless because
arise when the transmitter—the source of data—and the re- they resolve themselves when you activate the next clock
ceiver flip-flop are in asynchronous-clock domains. To avoid edge. Although these issues are not problematic for synchro-
such issues, designers use synchronizers that isolate metasta- nous transfers, a glitch may occur with asynchronous cross-
bility and deliver a clean signal to the downstream logic. A ings if you activate a destination clock. The design may there-
fore receive a glitch as a pulse, causing a functional failure.
D For this reason, it is important to avoid using any combi-
E national logic that may cause glitches on a CDC path. You
CLOCK 1 CLOCK 2
CLOCK 1
AVOID
CLOCK 2 COMBINING
CLOCK 1 LOGIC ON
ADEQUATE PROPER DATAPATH
MARGIN: DATA/ENABLE
DATA WILL E SEQUENCING
BE PROPERLY
CAPTURED CLOCK 1
D CLOCK 2
SHORT OR
DO NOT COMBINE
NO MARGIN: INCORRECT
E LOGIC ON
DATA MAY DATA/ENABLE CONTROL PATH
GET LOST SEQUENCING
Figure 1 In a data transfer across clock domains, the data must be stable Figure 2 A good design practice is to avoid using any
when enable is asserted. Too short of a margin between data setup and logic, except the recirculation-multiplexer logic, which is
enable assertion can result in data corruption. part of the enable flip-flop, on the datapath CDCs.
D 1
Q Q 1
E 0 CLOCK 1
D CLOCK 2
D 1
Q 1
E 0
E
D 1
E 0 0
AVOID COMBINATIONAL CLOCK 1 AVOID ANY LOGIC
(a) (b) LOGIC ON THE ON THE CROSSING
CLOCK 2
Figure 3 You can map a simple, glitch-free CLOCK-DOMAIN OR BETWEEN
CROSSINGS SYNCHRONIZING FLIP-FLOPS
multiplexer (a) with AND and OR gates
that can create glitches (b). Figure 4 Any glitch in the Gray encoder may cause a functional failure in the design.
should perform any computation either before crossing clock ers use an enabled AND instead of a multiplexer or combine
domains or after the destination domain captures the signals. the multiplexer with other combinational logic on the data-
Glitches may affect both control and data CDCs. In a data path. They rely on the enable signal to ensure that data syn-
transfer, a glitch may affect the enable line or the data line; chronously transfers to the destination and that glitches do
both present risks affecting safe data transfer. You must syn- not occur. As designers become more creative and use extra
chronize the enable logic in the destination domain and avoid logic in enabled-data crossings, they expose their designs to
using combinational logic after synchronization. Glitches on glitch risks that are difficult to detect. To comprehend these
the datapath may be harmful, too. A good design practice is to risks, consider a simple example of a glitch-free multiplexer;
avoid using any logic, except the recirculation-multiplexer log- you can implement this multiplexer so that it can create a
ic, which is part of the enable flip-flop, on the datapath CDCs glitch. Downstream tools, such as synthesis, optimization, and
(Figure 2). technology mapping, can transform the circuit and introduce
Although this data-synchronization scheme is the most logic that can cause a glitch and thus cause a functional fail-
common, many variations of enabled-data crossing involve an ure. You can map a simple, glitch-free multiplexer with AND
enable signal with combinational logic. Occasionally, design- and OR gates that can create glitches (Figure 3).
ers cross clock domains to compute empty and full MORE AT EDN.COM number of corner-case problems in CDC, and it
flags. You Gray-encode the binary counters, trans- is difficult for any designer to pay attention to all
fer to the other domain, and then convert the + Go to www.edn. the details, especially when under tight schedule
counters back to binary before using them. Occa- com/ms4271 and pressure. The best way to catch these issues is to
sionally, designers access pointers in a FIFO block click on Feedback approach them with a systematic methodology
to do empty/almost-empty or full/almost-full flag Loop to post a com- that has concise metrics. Static-CDC verification
calculations. This practice may create CDCs, con- ment on this article. has recently emerged as an accepted approach to
vergences, or both that a designer may overlook. achieve this goal. This approach targets metasta-
Adopting standard practices prevents the intro- bility, convergence, and other CDC issues that
duction of CDC bugs into the design. traditional verification tools, such as simulation and static-
Gray-encoding circuitry seems simple; however, errors can timing verification, do not cover. Static-CDC verification suc-
easily slip into a design. You must Gray-encode and register cessfully targets corner cases that designers may overlook. Fur-
the signals before crossing clock domains. Sending Gray-en- thermore, it provides a systematic-verification approach that
coded signals directly to the destination domain defies the can fit into any design flow as part of the verification-sign-off
purpose. Furthermore, any glitch in the Gray encoder may tool suite.EDN
cause a functional failure in the design (Figure 4).
Another subtle issue is mismatch between Gray-encoding Au t h o r s ’ b i o g r a p h i e s
assumptions and the binary-counter range. Designs some- Shaker Sarwary is technology director at Atrenta (San Jose, CA).
times fail when a designer expects a Gray counter targeting He has a doctorate from Paris University (France), and he has
the full range of a 4-bit counter to count to lower counts and performed postdoctorate work at the University of California—
loop back to zero. For example, a designer can build the write Berkeley. He has held senior engineering positions in the areas of
pointer of a six-layer-deep FIFO to count from zero to five and synthesis and verification at Lattice Semiconductor, Get2Chip, and
loop back to address zero. A Gray encoder targeting counting Cadence. You can reach him at [email protected].
from zero to seven for a full 3-bit counter will fail when the
pointer moves from five to zero (Figure 5). Saurabh Verma is an engineering manager at Atrenta. He has a
Designing a Gray encoder may give a false sense of secu- bachelor’s degree from Indian Institute of Technology Kanpur. He
rity if you fail to account for these details. Both junior and has rich experience in formal technology and rule-based-design veri-
experienced designers may face such issues. There are a large fication. You can reach him at [email protected].