0% found this document useful (0 votes)
463 views32 pages

Metastability and CDC-1

The document discusses avoiding metastability issues in FPGA devices. Metastability occurs when signals cross between asynchronous clock domains in an integrated system and can lead to unpredictable data loss or errors. A clock domain crossing verification methodology is needed to reduce the risk of issues.

Uploaded by

teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
463 views32 pages

Metastability and CDC-1

The document discusses avoiding metastability issues in FPGA devices. Metastability occurs when signals cross between asynchronous clock domains in an integrated system and can lead to unpredictable data loss or errors. A clock domain crossing verification methodology is needed to reduce the risk of issues.

Uploaded by

teja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Avoiding Metastability in FPGA

Devices

David Landoll
Applications Architect
Mentor Graphics Corp.

MAPLD 2009
Today’s FPGAs
HDLC0 HDLC1
RS422 SpaceWire Up/Downlink Up/Downlink CAN0 CAN1

Onboard Computer
System on Chip / CAN Dual CAN
Switch transceiver
FPGA
DMA
DMA UART SpaceWire HDLC HDLC CAN
Controller
Controller UART Controller Controller Controller Controller

AMBA APB
AMBA AHB

Boot Memory Timers IRQCtrl I/O port Leon CPU AHB/APB


CPU AMBA
PROM Controller Bridge
AHB

Address/Control bus
FPU

32bit Data bus


EDAC
Controller
+3.3V +1.5V

1.5V Linear
SDRAM SDRAM Configuration CLK Regulator
PROM Generator
Data Memory Parity Memory

• Fabrication advances provide


PPS in PPS out +3.3V
JTAG
more available silicon area
• More functionality can weigh less and take up less space
• Integrating/reusing capabilities lowers cost

2
Integration Presents New Challenges

Flight Management
Weather Radar Flight Control &
Avoidance
Systems

Integrated Avionics Maintenance


Processing Diagnostics

Boeing 787: Integration’s Next Step


From its central processor to its common data network,
surveillance system and navigation system, the theme
of the Boeing 787 Dreamliner is integration.
Image Processing James W. Ramsey Communications

Such integration usually involves multiple independent clock domains, 3


which leads to clock-domain crossings and metastability errors!
Clock Domain Crossing (CDC) Errors
Unpredictable Loss of Data
■ CDC problems
— corrupt control and data signals
— are subtle, intermittent, unpredictable
— are the 2nd major cause of respins
— are difficult to reproduce and debug
— are temperature, voltage, and process sensitive
— will only occur in hardware; often in the final design

■ Traditional verification techniques do not work for CDC


signals
A CDC Verification methodology is needed to
reduce the risk of CDC related data errors

4
Metastability
What the heck is it, anyway?
■ What is a clock?
Vcc, Vdd : +5V, +3.3V
— Periodic pulsing signal Vee, Vss: GND: 0V
— Digital logic uniformly connected to this signal
— Acts as the Symphony Conductor – keeps logic in sync
— Action happens across the logic at one specific point
■ Typically the “rising edge”

5
Metastability
What the heck is it, anyway?
■ What’s in a register?
— (Also known as a latch, flip-flop, etc)
— Contain transistors that “trap” the input value at the
appropriate time
■ E.g. rising edge of the clock
— How does this happen?

6
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
0 D Q 0 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;

CLK

D Q

Transistor Model of a D Flip-Flop

7
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
1 D Q 0 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;

CLK

D Q

Transistor Model of a D Flip-Flop

8
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
1 D Q 1 if rising_edge(CLK) then
Q <= D;
1 CLK end if;
end process;

CLK

D Q

Transistor Model of a D Flip-Flop

9
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
0 D Q 1 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;

CLK

D Q

Only works ifTransistor


D has a “good value”
Model of aatDthe rising edge of the clock
flip-flop
(no Set-up/hold time violations)

10
Metastability
The Physics of a Register
■ When setup/hold conditions are violated, the output of
a storage element becomes unpredictable
Setup/hold
window
1
D Q D MTBF =
fclk  fin  td
CLK
CLK fclk = Clock Frequency
fin = Input Signal Frequency
Q td = Duration of critical time window

■ This effect is called metastability


■ If not contained, metastability can propagate…

Metastability is UNAVOIDABLE in designs with


multiple asynchronous clocks
11
Clock Domain Crossings
Guaranteed to Cause Metastability
When 2 or more designs run on disparate clocks:
— The clocks will continually skew, guaranteeing setup/hold violations
— Signals from one design to another are “Clock Domain Crossings” (CDCs)
Clock Domain Crossing signal

D Q D Q

CLK CLK

Sensor System Guidance System

Signals that cross


Clock A asynchronous clock
Tx domains (CDC signals)
Clock B WILL violate setup and
hold conditions
Setup/hold window

12
Mitigating Clock Domain Crossing Issues

Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted
■ Loss of Data

Approaches:
— Avoid having systems that have multiple clocks
■ Although sensible, it’s becoming impossible
— Design around the problem
■ Designer can add “synchronizers” to the design
■ Metastability still happens, but nobody else sees it
— E.g. 2DFF, FIFO, etc.
— “Fences in” metastability

13
Isolate Metastability: Synchronizers

■ Designers add synchronizers to reduce the probability of


metastable signals
■ Synchronizers are sub-circuits that can prevent metastable values
from being sampled across clock domains
— Take unpredictable metastable signals and create predictable behavior

14
Mitigating Clock Domain Crossing Issues
Isolate Metastability: Synchronizers

Clock A Clock B
Metastability window
Rx
Tx

i -1 i i +1 i +2 ii -1
-1 ii i +1
i +1 i +2
i +2 i +3

When metastability occurs, the delay through a


synchronizer becomes unpredictable

15
Synchronizer Delays Can Reconverge
with unexpected results
■ CDC signals cross with an assumed relationship
■ Can be combinational, sequential, or deeply sequential
■ Unpredictable delays on CDC paths lead to reconvergence errors
— Designs need logic to correctly handle reconvergence
— Can occur on single-bit or multiple-bit signals

Sync 0 S2

Grey Decoder
Grey Encoder

tx_d0

FSM Input
S1
tx_d1 Sync 1

Sync 2 S3 S4
tx_d2
111
111
000

010
010
000

010
000
000

1
0
1
0
1
0
0
1
0
1
0

Invalid Command
Valid Command – but delayed

16
And, Synchronizers Fail if Misused

■ Synchronization between clock


domains requires a transfer
protocol
— Ensures data is predictably
transferred between domains

■ These protocols must be verified

■ When protocol is violated


— Data is lost

— Simulation may not show a failure

— Silicon will eventually show a


functional error

Synchronizer won’t function properly if the required


Transfer Protocol is violated

17
Verification Must Cover
All Three CDC Problems
Missing sync
problem Possible protocol
problem

Reconvergence
problem

Clock domain crossings need:


— Structured synchronization
— Transfer protocols
— Global reconvergence checking

18
Mitigating Clock Domain Crossing Issues

■ Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted
■ Approaches:
— Avoid having systems that have multiple clocks
— Designer can add “synchronizers” to the design
— Designer-added synchronizers + full CDC verification
■ Assures synchronizers are present and used correctly

19
Recommendations

During design planning


1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for both clock
domains, and use coding guidelines
Use signal naming conventions
1.
2. Many clock domain errors come from design changes, not the initial design
3. Limit “clock domain crossings” to specific areas or blocks in the design, when possible.
For Example:
4. NOTE: These techniques can help assure synchronizers are present, but are
• Append
unlikely“_A_reg” to signals
to help identify leaving A-clk
reconvergence orregister, “_A” for
CDC protocol A-clk combo signals
issues.
3. • When multi-clock
Leverage design
during code is required,
reviews planmissing
- help identify for proper verification
synchronizers
Howsure
• 1. Make to we
ONLYaccomplish this?
_A_reg signals go to synchronizers (no combo logic)

My_data_Rx_sync1
My_data_Tx_Reg
My_sig_Tx My_data_Rx_sync2

Comb_sig_Rx

20
Verifying CDC Synchronization

■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized signals can create
unexpected behavior
■ Approaches:
— Simulation
■ Digital logic simulators do NOT model transistor behavior
■ Do not model “metastability”

21
For example …
Setup Violation Hold Violation

D D

CLK CLK
Q in simulation
Q in simulation
Q Q

Q in silicon Q in silicon

Simulation captures a ‘1’ while Simulation captures a ‘0’ while


silicon produces either a ‘1’ or silicon produces either a ‘1’ or
‘0’ ‘0’

Simulation Does NOT Reflect Silicon Behavior

22
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation
■ Won’t model CDC’s correctly to detect errors
— Static Timing Analysis
■ Can be used to identify signals that cross domains
■ Can be used as input for a manual review
■ But…Won’t detect missing or incorrectly used synchronizers, or
reconvergence

23
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation
■ Won’t model CDC’s correctly to detect errors
— Static Timing Analysis
■ Identifies signals for manual review, but otherwise useless
— Manual Design Reviews
■ Error prone (and very time consuming)
■ Typically only identifies synchronizer structures, misses
reconvergence and invalid sync protocol usage
■ Evidence suggests at least some synchronizers will be missed

24
For Example…
Trivial Reconvergence Error

■ Reconverging synchronized CDC signals - timing is unpredictable.


■ Need to verify the downstream logic can handle variations
— Manually identifying the reconvergence is very hard
— Manually identifying all possible behaviors is harder
— Manually assuring logic will behave correctly – typically intractable

25
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation - Won’t model CDC’s correctly to detect errors
— Timing Analysis - Identifies signals for review, but otherwise useless
— Manual Design Reviews - error prone, incomplete
— Lab Verification?
■ Problem is intermittent, debug is impossible
— Spice simulation? – It *does* model transistors, but…
■ Where will you get the “Spice deck”? (transistor level model)
■ Would be far too slow on a large FPGA

26
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— So - we need a new method that reliably:
■ Identifies ALL CDC signals, structures, reconvergence
■ Assures ALL connected, functioning correctly
■ Creates reports for manual reviews
■ ➔ The EDA industry has responded
— 6 commercial tools now available…and counting

— But…most won’t identify all 3 of our CDC issues

27
Mentor’s CDC Verification Technology
Who’s using our technology?
■ Mil-Aero
— Honeywell, Inc.
— L-3 Communications
— Lockheed Martin Co
— Ministry of Aerospace & Aeronautics
— Northrop Grumman Corp
— Raytheon
— Rockwell Collins Inc.
— SAAB Group
— Thales
■ Commercial
— Widely used in commercial space

■ The market leader in CDC verification

28
Example Value from One Customer
■ Design
— IEEE standard serial communications core
— Used in 50-60 other COMMERCIAL ASIC products
— Widely deployed (millions in use daily)
■ Placed core in a sensor guidance system
— Found issues in the lab
— Debugged FPGA for weeks
— Suspected a CDC issue, but not sure…
■ Deployed Mentor’s CDC solution
— Results same day
— Found 199 serious CDC bugs!
■ 45 Missing Synchronizers
■ 83 Incorrect Synchronizers
■ 76 Reconverging Signals
■ 11 other problems
— Most resulting from “more stressful” usage
■ In production:
— Commercial ASIC : Customer issue – device is erratic, locks up
— Avionics: Could result in an Airworthiness Directive

29
Summary
Recommendations
During design planning
1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for all clock domains
3. When multi-clock design is required, plan for proper verification

During verification
1. Watch for multiple clocks in designs (Tip – Count PLLs)
2. Ask how CDC issues are mitigated (remember there are 3)

Utilize commercial tools designed for detecting these problems


1. Verify all 3 classes of CDC problems
1. Structural Verification
2. Protocols Verification
3. Reconvergance Verification
2. Use reports to aid manual reviews
3. Use CDC tools to support ROBUSTNESS

30
In Conclusion …

■ Every multi-clock design is subject to metastability


■ Traditional verification methodologies CANNOT
assure robustness

■ To properly mitigate the dangers of CDC, we strongly


recommend a solution that… :
— Supports Manual Reviews

— Automatically reports all sources of CDC problems

— Has a proven CDC verification methodology &


customer success

31
32

You might also like