Metastability and CDC-1
Metastability and CDC-1
Devices
David Landoll
Applications Architect
Mentor Graphics Corp.
MAPLD 2009
Today’s FPGAs
HDLC0 HDLC1
RS422 SpaceWire Up/Downlink Up/Downlink CAN0 CAN1
Onboard Computer
System on Chip / CAN Dual CAN
Switch transceiver
FPGA
DMA
DMA UART SpaceWire HDLC HDLC CAN
Controller
Controller UART Controller Controller Controller Controller
AMBA APB
AMBA AHB
Address/Control bus
FPU
1.5V Linear
SDRAM SDRAM Configuration CLK Regulator
PROM Generator
Data Memory Parity Memory
2
Integration Presents New Challenges
Flight Management
Weather Radar Flight Control &
Avoidance
Systems
4
Metastability
What the heck is it, anyway?
■ What is a clock?
Vcc, Vdd : +5V, +3.3V
— Periodic pulsing signal Vee, Vss: GND: 0V
— Digital logic uniformly connected to this signal
— Acts as the Symphony Conductor – keeps logic in sync
— Action happens across the logic at one specific point
■ Typically the “rising edge”
5
Metastability
What the heck is it, anyway?
■ What’s in a register?
— (Also known as a latch, flip-flop, etc)
— Contain transistors that “trap” the input value at the
appropriate time
■ E.g. rising edge of the clock
— How does this happen?
6
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
0 D Q 0 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;
CLK
D Q
7
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
1 D Q 0 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;
CLK
D Q
8
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
1 D Q 1 if rising_edge(CLK) then
Q <= D;
1 CLK end if;
end process;
CLK
D Q
9
Metastability
The Physics of a Register
■ Let’s take a look at a register -- simple D-type flip-flop
— CMOS D-type transmission gate flip-flop process(CLK)
begin
0 D Q 1 if rising_edge(CLK) then
Q <= D;
0 CLK end if;
end process;
CLK
D Q
10
Metastability
The Physics of a Register
■ When setup/hold conditions are violated, the output of
a storage element becomes unpredictable
Setup/hold
window
1
D Q D MTBF =
fclk fin td
CLK
CLK fclk = Clock Frequency
fin = Input Signal Frequency
Q td = Duration of critical time window
D Q D Q
CLK CLK
12
Mitigating Clock Domain Crossing Issues
Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted
■ Loss of Data
Approaches:
— Avoid having systems that have multiple clocks
■ Although sensible, it’s becoming impossible
— Design around the problem
■ Designer can add “synchronizers” to the design
■ Metastability still happens, but nobody else sees it
— E.g. 2DFF, FIFO, etc.
— “Fences in” metastability
13
Isolate Metastability: Synchronizers
14
Mitigating Clock Domain Crossing Issues
Isolate Metastability: Synchronizers
Clock A Clock B
Metastability window
Rx
Tx
i -1 i i +1 i +2 ii -1
-1 ii i +1
i +1 i +2
i +2 i +3
15
Synchronizer Delays Can Reconverge
with unexpected results
■ CDC signals cross with an assumed relationship
■ Can be combinational, sequential, or deeply sequential
■ Unpredictable delays on CDC paths lead to reconvergence errors
— Designs need logic to correctly handle reconvergence
— Can occur on single-bit or multiple-bit signals
Sync 0 S2
Grey Decoder
Grey Encoder
tx_d0
FSM Input
S1
tx_d1 Sync 1
Sync 2 S3 S4
tx_d2
111
111
000
010
010
000
010
000
000
1
0
1
0
1
0
0
1
0
1
0
Invalid Command
Valid Command – but delayed
16
And, Synchronizers Fail if Misused
17
Verification Must Cover
All Three CDC Problems
Missing sync
problem Possible protocol
problem
Reconvergence
problem
18
Mitigating Clock Domain Crossing Issues
■ Problem:
— Signals crossing a clock domain will violate set-up/hold
— Impact: Control/data signals will be dropped/corrupted
■ Approaches:
— Avoid having systems that have multiple clocks
— Designer can add “synchronizers” to the design
— Designer-added synchronizers + full CDC verification
■ Assures synchronizers are present and used correctly
19
Recommendations
My_data_Rx_sync1
My_data_Tx_Reg
My_sig_Tx My_data_Rx_sync2
Comb_sig_Rx
20
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized signals can create
unexpected behavior
■ Approaches:
— Simulation
■ Digital logic simulators do NOT model transistor behavior
■ Do not model “metastability”
21
For example …
Setup Violation Hold Violation
D D
CLK CLK
Q in simulation
Q in simulation
Q Q
Q in silicon Q in silicon
22
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation
■ Won’t model CDC’s correctly to detect errors
— Static Timing Analysis
■ Can be used to identify signals that cross domains
■ Can be used as input for a manual review
■ But…Won’t detect missing or incorrectly used synchronizers, or
reconvergence
23
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation
■ Won’t model CDC’s correctly to detect errors
— Static Timing Analysis
■ Identifies signals for manual review, but otherwise useless
— Manual Design Reviews
■ Error prone (and very time consuming)
■ Typically only identifies synchronizer structures, misses
reconvergence and invalid sync protocol usage
■ Evidence suggests at least some synchronizers will be missed
24
For Example…
Trivial Reconvergence Error
25
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— Simulation - Won’t model CDC’s correctly to detect errors
— Timing Analysis - Identifies signals for review, but otherwise useless
— Manual Design Reviews - error prone, incomplete
— Lab Verification?
■ Problem is intermittent, debug is impossible
— Spice simulation? – It *does* model transistors, but…
■ Where will you get the “Spice deck”? (transistor level model)
■ Would be far too slow on a large FPGA
26
Verifying CDC Synchronization
■ Problem:
— Missing synchronizers will create metastability
— Correctly placed but misused synchronizers won’t work
— Reconvergence of synchronized ➔Control logic bugs
■ Approaches:
— So - we need a new method that reliably:
■ Identifies ALL CDC signals, structures, reconvergence
■ Assures ALL connected, functioning correctly
■ Creates reports for manual reviews
■ ➔ The EDA industry has responded
— 6 commercial tools now available…and counting
27
Mentor’s CDC Verification Technology
Who’s using our technology?
■ Mil-Aero
— Honeywell, Inc.
— L-3 Communications
— Lockheed Martin Co
— Ministry of Aerospace & Aeronautics
— Northrop Grumman Corp
— Raytheon
— Rockwell Collins Inc.
— SAAB Group
— Thales
■ Commercial
— Widely used in commercial space
28
Example Value from One Customer
■ Design
— IEEE standard serial communications core
— Used in 50-60 other COMMERCIAL ASIC products
— Widely deployed (millions in use daily)
■ Placed core in a sensor guidance system
— Found issues in the lab
— Debugged FPGA for weeks
— Suspected a CDC issue, but not sure…
■ Deployed Mentor’s CDC solution
— Results same day
— Found 199 serious CDC bugs!
■ 45 Missing Synchronizers
■ 83 Incorrect Synchronizers
■ 76 Reconverging Signals
■ 11 other problems
— Most resulting from “more stressful” usage
■ In production:
— Commercial ASIC : Customer issue – device is erratic, locks up
— Avionics: Could result in an Airworthiness Directive
29
Summary
Recommendations
During design planning
1. Create systems/designs using 1 clk, 1 edge when possible
2. If multiple clocks are required, try to use 1 designer for all clock domains
3. When multi-clock design is required, plan for proper verification
During verification
1. Watch for multiple clocks in designs (Tip – Count PLLs)
2. Ask how CDC issues are mitigated (remember there are 3)
30
In Conclusion …
31
32