0% found this document useful (0 votes)
167 views4 pages

Critical Clock-Domain-Crossing Bugs: Steve Edn080320Ms4271 Figure 1 Steve Edn080320Ms4271 Figure 2

Critical clock-domain-crossing bugs can occur when signals pass between asynchronous clock domains in system-on-chip (SOC) designs. Two common issues are improper sequencing of data/enable signals and loss of data coherence between related signals. Careful design practices can avoid these problems, including ensuring adequate setup time between data and enable signals, adding handshake logic to prevent data corruption, and avoiding combinational logic on clock domain crossing paths that could introduce glitches. Simulation and formal verification tools are needed to detect these kinds of subtle bugs that often cause silicon re-spins.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
167 views4 pages

Critical Clock-Domain-Crossing Bugs: Steve Edn080320Ms4271 Figure 1 Steve Edn080320Ms4271 Figure 2

Critical clock-domain-crossing bugs can occur when signals pass between asynchronous clock domains in system-on-chip (SOC) designs. Two common issues are improper sequencing of data/enable signals and loss of data coherence between related signals. Careful design practices can avoid these problems, including ensuring adequate setup time between data and enable signals, adding handshake logic to prevent data corruption, and avoiding combinational logic on clock domain crossing paths that could introduce glitches. Simulation and formal verification tools are needed to detect these kinds of subtle bugs that often cause silicon re-spins.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

by sh a k e r S a r wa ry a n d Sau r a b h V e r m a • Atrenta Inc

Critical clock-domain-
crossing bugs
Awareness of CDC issues, along with the use
of good design practices and proven EDA tools
for CDC verification, can avoid costly silicon
re-spins and significantly improve time to market.

T
oday’s SOC (system-on-chip) designs have dozens synchronizer can be a simple double flip-flop. Designers com-
of clocks, many of which are asynchronous. This monly use this technique for a control signal’s CDCs. In a data
design approach facilitates the convergence of dig- transfer across clock domains, the data is first set up; then, a
ital-audio, video, wireless, and networking applica- control signal that synchronizes with the destination domain
tions in a single chip. CDCs (clock-domain cross- travels to the destination to enable data capture. Although
ings) can cause difficult-to-detect functional fail- this data-transfer technique across clock domains is a common
ures in SOCs involving multiple asynchronous clocks. Simula- and proven technique, it involves pitfalls that require special
tion and static-timing analysis often do not detect issues such attention. This technique relies on data to be stable when you
as metastability and the coherency of correlated signals’ CDCs; assert an enable STEVE EDN080320MS4271 FIGURE 2
(Figure 1).
as a result, these issues often end up as bugs in silicon. Unfortu- Having too low a margin between the data you are set-
STEVE
nately, most relevant EDN080320MS4271
literature does not adequately coverFIGURE 1 up and the enable you are asserting may corrupt the data
some ting
of these critical CDC issues, and designers learn about them transfer. A good way to prevent such problems is to design a
only after making costly mistakes. Two of the most common full handshake when you set up the data. In this approach, you
and critical issues involving CDCs are improper sequencing of assert and synchronize the request in the destination domain
data/enable in enable-based synchronization and data coher- and adequately assert an acknowledge to let the next data load
ency due to the convergence of signals. occur. This approach might add a few cycles of latency, but it
avoids functional failures.
Enable-based synchronization Glitches are other sources of worry across clock domains.
A receiver flip-flop output can become metastable if it vio- Typically, any combinational logic may be subject to short-
lates the data/reset setup-and-hold times. This scenario can lived glitches. These issues are generally harmless because
arise when the transmitter—the source of data—and the re- they resolve themselves when you activate the next clock
ceiver flip-flop are in asynchronous-clock domains. To avoid edge. Although these issues are not problematic for synchro-
such issues, designers use synchronizers that isolate metasta- nous transfers, a glitch may occur with asynchronous cross-
bility and deliver a clean signal to the downstream logic. A ings if you activate a destination clock. The design may there-
fore receive a glitch as a pulse, causing a functional failure.
D For this reason, it is important to avoid using any combi-
E national logic that may cause glitches on a CDC path. You

CLOCK 1 CLOCK 2

CLOCK 1

AVOID
CLOCK 2 COMBINING
CLOCK 1 LOGIC ON
ADEQUATE PROPER DATAPATH
MARGIN: DATA/ENABLE
DATA WILL E SEQUENCING
BE PROPERLY
CAPTURED CLOCK 1
D CLOCK 2
SHORT OR
DO NOT COMBINE
NO MARGIN: INCORRECT
E LOGIC ON
DATA MAY DATA/ENABLE CONTROL PATH
GET LOST SEQUENCING
Figure 1 In a data transfer across clock domains, the data must be stable Figure 2 A good design practice is to avoid using any
when enable is asserted. Too short of a margin between data setup and logic, except the recirculation-multiplexer logic, which is
enable assertion can result in data corruption. part of the enable flip-flop, on the datapath CDCs.

april 3, 2008 | EDN 55


REDUNDANCY

D 1
Q Q 1
E 0 CLOCK 1
D CLOCK 2
D 1
Q 1
E 0
E
D 1
E 0 0
AVOID COMBINATIONAL CLOCK 1 AVOID ANY LOGIC
(a) (b) LOGIC ON THE ON THE CROSSING
CLOCK 2
Figure 3 You can map a simple, glitch-free CLOCK-DOMAIN OR BETWEEN
CROSSINGS SYNCHRONIZING FLIP-FLOPS
multiplexer (a) with AND and OR gates
that can create glitches (b). Figure 4 Any glitch in the Gray encoder may cause a functional failure in the design.

should perform any computation either before crossing clock ers use an enabled AND instead of a multiplexer or combine
domains or after the destination domain captures the signals. the multiplexer with other combinational logic on the data-
Glitches may affect both control and data CDCs. In a data path. They rely on the enable signal to ensure that data syn-
transfer, a glitch may affect the enable line or the data line; chronously transfers to the destination and that glitches do
both present risks affecting safe data transfer. You must syn- not occur. As designers become more creative and use extra
chronize the enable logic in the destination domain and avoid logic in enabled-data crossings, they expose their designs to
using combinational logic after synchronization. Glitches on glitch risks that are difficult to detect. To comprehend these
the datapath may be harmful, too. A good design practice is to risks, consider a simple example of a glitch-free multiplexer;
avoid using any logic, except the recirculation-multiplexer log- you can implement this multiplexer so that it can create a
ic, which is part of the enable flip-flop, on the datapath CDCs glitch. Downstream tools, such as synthesis, optimization, and
(Figure 2). technology mapping, can transform the circuit and introduce
Although this data-synchronization scheme is the most logic that can cause a glitch and thus cause a functional fail-
common, many variations of enabled-data crossing involve an ure. You can map a simple, glitch-free multiplexer with AND
enable signal with combinational logic. Occasionally, design- and OR gates that can create glitches (Figure 3).

56 EDN | april 3, 2008


Although this transformation may BINARY GRAY BAD GRAY CODE certainty, even with synchronized cross-
seem unlikely with a stand-alone mul- COUNT COUNT IF COUNT ings. Although synchronizers isolate
0 000 000
tiplexer, it may well occur if you intro- GOES TO FIVE metastability and ensure that a “clean”
1 001 001
duce more logic on the datapath. Syn- 2 010 011 signal travels to downstream logic, they
thesis and optimization tools may iden- 3 011 010 cannot prevent latency. Coherency
tify opportunities to increase timing 4 100 110 problems occur when two correlated,
performance, reduce area, or decrease 5 101 111 GOOD GRAY CODE separately synchronized signals cross
6 110 101 IF COUNT
power consumption by combining mul- 7 111 100 GOES TO SEVEN
clock domains; each synchronizer intro-
tiplexer logic with other logic on the duces a different latency factor due to
path; however, these tools may also Figure 5 A Gray encoder targeting counting the CDC. If one of the signals captures
create a final implementation prone to from zero to seven for a full 3-bit counter will a transition, metastability settles to the
glitches. To avoid such problems, you fail when the pointer moves from five to zero. correct value in the first cycle, where-
should control the use of these tools to as the other signal captures a transition
avoid such transformations. Unfortu- in the next cycle. That is, metastabil-
nately, designers often fail to consider these details when cre- ity settles to an incorrect value, and you must wait for the next
ating and implementing a design. Furthermore, a glitch is not clock cycle to capture the transition. Then, you will observe an
an easily predictable event; simulation or static-timing veri- incorrect set of values at the destination for at least one cycle.
fication cannot detect a glitch on an asynchronous crossing. If the signals represent a state variable, then you will observe an
Once the symptom appears in silicon, it is difficult to perform unknown or unwanted state at the destination. This unknown
a root-cause analysis. It takes significant effort and time to state causes a functional failure in the design.
link silicon failures to a glitch on a CDC. Static-CDC analysis This problem is one of the most common in CDC, and it is
is better for systematically catching and reporting such issues becoming more important as designs become larger. Design re-
and avoiding costly silicon re-spins. use and IP (intellectual-property) integration may create con-
vergences of which designers may be unaware. To avoid coher-
Data coherency ency problems—assuming that you know the convergences—
Another critical issue involving asynchronous clocks is the you should use correlated signals so that they change values
coherency problem due to convergence of independently syn- at different times. You must use Gray encoding to correlate
chronized signals. CDCs introduce latency and cycle-level un- signals that are CDCs. This scenario occurs when FIFO point-

58 EDN | april 3, 2008


MORE AT EDN.COM

ers cross clock domains to compute empty and full MORE AT EDN.COM number of corner-case problems in CDC, and it
flags. You Gray-encode the binary counters, trans- is difficult for any designer to pay attention to all
fer to the other domain, and then convert the + Go to www.edn. the details, especially when under tight schedule
counters back to binary before using them. Occa- com/ms4271 and pressure. The best way to catch these issues is to
sionally, designers access pointers in a FIFO block click on Feedback approach them with a systematic methodology
to do empty/almost-empty or full/almost-full flag Loop to post a com- that has concise metrics. Static-CDC verification
calculations. This practice may create CDCs, con- ment on this article. has recently emerged as an accepted approach to
vergences, or both that a designer may overlook. achieve this goal. This approach targets metasta-
Adopting standard practices prevents the intro- bility, convergence, and other CDC issues that
duction of CDC bugs into the design. traditional verification tools, such as simulation and static-
Gray-encoding circuitry seems simple; however, errors can timing verification, do not cover. Static-CDC verification suc-
easily slip into a design. You must Gray-encode and register cessfully targets corner cases that designers may overlook. Fur-
the signals before crossing clock domains. Sending Gray-en- thermore, it provides a systematic-verification approach that
coded signals directly to the destination domain defies the can fit into any design flow as part of the verification-sign-off
purpose. Furthermore, any glitch in the Gray encoder may tool suite.EDN
cause a functional failure in the design (Figure 4).
Another subtle issue is mismatch between Gray-encoding Au t h o r s ’ b i o g r a p h i e s
assumptions and the binary-counter range. Designs some- Shaker Sarwary is technology director at Atrenta (San Jose, CA).
times fail when a designer expects a Gray counter targeting He has a doctorate from Paris University (France), and he has
the full range of a 4-bit counter to count to lower counts and performed postdoctorate work at the University of California—
loop back to zero. For example, a designer can build the write Berkeley. He has held senior engineering positions in the areas of
pointer of a six-layer-deep FIFO to count from zero to five and synthesis and verification at Lattice Semiconductor, Get2Chip, and
loop back to address zero. A Gray encoder targeting counting Cadence. You can reach him at [email protected].
from zero to seven for a full 3-bit counter will fail when the
pointer moves from five to zero (Figure 5). Saurabh Verma is an engineering manager at Atrenta. He has a
Designing a Gray encoder may give a false sense of secu- bachelor’s degree from Indian Institute of Technology Kanpur. He
rity if you fail to account for these details. Both junior and has rich experience in formal technology and rule-based-design veri-
experienced designers may face such issues. There are a large fication. You can reach him at [email protected].

60 EDN | april 3, 2008

You might also like