Wu 2009
Wu 2009
Abstract – Multi-clock is commonly used in complex systems. can be defined the output of a flip-flop can not reach its
So if synchronous signals in one clock domain are transferred stable state at an exact time. And the uncertain time will
to another clock domain, they will become asynchronous always cause uncertain troubles [1].
signals. Asynchronous signals will cause metastable state, tCO is the time of clock to output, it relates with the
which will lead to unpredictable results. How metastabilities
setup time (tSU) and hold time (tHD). When tSU and tHD
led to errors in a system is described first. Then a simulation of
RS flip flop using pspice is to show the detail procedure of
meet with a flip-flop, the tCO is the typical output delay of
metastability. To demonstrate how often metastabilities will this flip-flop, which is always given in its datasheet.
happen, a FPGA based experimentation is realized by With the decrease of tSU and tHD, tCO becomes longer.
changing the internal layout of flip flop manually, which When tCO is longer than the maximum value of one
changes the propagation delay between them. flip-flop, the flip-flop turns into metastable state. At this
Keywords –metastability, FPGA, multi-clock domain. point, the time of tSU and tHD are called metastable
window (W). When the input signal edge is close and
I. INTRODUCTION close to the clock edge, tCO becomes long and long, the
time of tCO is an exponential relationship between the tSU
Nowadays digital system is more and more and tHD parameters ( (1) and Fig. 1 [2]). W andW are
complicated, one system always needs to communicate constant values related with a specific flip-flop.
with more than one other systems simultaneously. For
tCO
W
example, a computer has UART port, parallel port, USB Aperture W *(e ) (1)
port, Ethernet port etc. to exchange data with different
peripherals. And most of the communication buses work
in different clock with the computer’s, so it is important
to make a reliable communication channel between
asynchronous clock domains. There are two kinds of
asynchronous clocks, two clocks with unrelated
frequency and phase, and two clocks with same
frequency but unrelated phase.
Asynchronous signal means the signal has not a stable
phase relation with sample clock. For a local clock
domain, an asynchronous signal means an unstable
signal, because it is very possible that the signal is
sampled when it is changing its state. In another word,
Fig. 1. Metastable window
because an asynchronous signal arrives at random time
related with the local clock, the setup and hold time of a
local flip-flop is violated. When this happens, a flip-flop When signal arrives before aperture window, the
may go to three different states: A) it recognizes the input output time equals with tCO, and if signal arrives between
signal, so a new value is given out; B) it fails to recognize aperture window, the output time is great than tCO.
the input signal, so it keeps the old value; C) it fails to
recognize the input signal, and the flip-flop comes into a II. PROBLEM DEFINITION
metastable state.
When a flip-flop comes into a metastable state, we can From section I, we know how metastability comes.
not tell the exact time when the output will update to a When a flip-flop comes into a metastable state, it can not
new value. The time depends on how closely the to get a stable state at an acceptable time. This will cause
asynchronous signal comes to the local clock, closer and logics behind the flip-flop become disordered. [3]
longer. The time may be one millisecond, one second,
one minute, one hour or even one year. So metastability A. Problem one
4-741
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
controls. Let’s take Fig. 2 as an example to show this From the list of Table 1, if it is in the condition
problem. TC-2*TD-TCO<TS<TC-TD-TCO, the output value of REG1
In Fig. 2, four CPUs compose a Symmetrical and REG2 will make a different state, REG2 gives a new
Multi-Processing system (SMP), each of the CPU can state while REG1 keeps old state. If REG1 and REG2 act
build a dedicate communication data channel with any of as the variable of a state machine, the state machine will
other CPU. The cross switch function is build up by a led an error state, which may cause critical error [4].
FPGA, if one CPU wants to communicate with other, it
should first assert a REQ signal to indicate it wants to do ċ. PSPICE SIMULATION [5]
some data transfer. If the other CPU was in free, the
FPGA responds with an ACK signal. After catching the RS latch is constituted by two NAND gates or two
ACK signal, the CPU can build a data channel. NOR gates. Fig. 4 is a typical RS latch build up by two
NAND gates, it has two complemental output ports Q
and Q , R and S is the two input ports, R is reset port, S
is set port, both are valid in low voltage.
R
Q
Q
If the FPGA and CPU run in different clock domains, S
the REQ signal maybe arrive too late to FPGA, so CPU
can not get the ACK signal from FPGA and has to keep
Fig. 4. The structure of a RS latch
waiting. When the flip-flop in FPGA reaches its tCO time
from metastable state, the CPU is already time out.
The logic equations of RS latch are:
B. Problem two Q SQ Q RQ (2)
With (2), we can get Table 2 of how RS latch changes
When an asynchronous signal is sampled by more its state.
than one flip-flops, the propagation delay are different, as
in Fig. 3. The first path from A, B, C to D is delayed by Table 2. The state change of RS latch
an AND and NOT logic, so the propagation delay is 2*TD.
The second path from A, E to F is only delayed by a NOR Q Q state RS=00 RS=01 RS=11 RS=10
gate, so the propagation delay is TD, which is short than 11( Last state) 11 10 00 01
the first path. 10( Last state) 11 10 10 11
00( Last state) 11 11 11 11
01(Last state) 11 11 01 01
4-742
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
PARAMETERS:
v ar = 100n
S SN5400 Q
R3
1k
V C1
U6A
20p
V1 = 3.3v V1
V2 = 0v 0
TD = 0
VCC TR = 1n
TF = 1n U6B
V3 PW = 100n /Q R4
5Vdc PER = 400n
R 1k
V C2
SN5400 20p
0 V1 = 3.3v V2
V2 = 0v 0
TD = 0
TR = 1n
TF = 1n
PW = {v ar}
PER = 400n
4-743
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
frequency is about 300MHz. At 300 MHz, we can not get From (1), if an average period of the asynchronous
any metastable state counter in 30 minutes, but when signal is tC, then the probability of the edge of
clock is high as 310 MHz frequency, we can get only one asynchronous signal drops into metastable window is
metastable state counter per minute, that is almost the p=Aperture / tC. When this happened, tCO is great than
limit of FPGA. tC tSU t PROP , a metastable state will cause mistake.
Since we can not get enough values through changing
the clock frequency, we can get them by changing the So the error number ne of one clock period is
ne = n*p = n* (Aperture / tC)
tC
value of tSU t PROP . Xilinx ISE software provides n = fd / fC (3)
2 n = how many state changes in one clock period of
a way to change the location of registers in FPGA asynchronous input signal
manually, different locations of register led different fd = the frequency of state changing of input signal
propagation delay. fC = the clock frequency of local flip-flop
So the error counter (Ne) in one period (toperation) can
Table 3. Measurement values
be calculated by (4) [7]
tMET(ns) MTBF(ms) Ne = toperation / tC * ne = toperation * fd * fC *Aperture (4)
0.05 1.7 And the MTBF is
tCO
0.23 13.14
MTBF = 1 / (f d * f C *Aperture) = (e W ) / W *f d * f C (5)
0.57 311
0.73 7545 In Fig. 7, fc = 310MHz, fd =100MHz, the clock
skew between registers is ignored, the register’s
t t
maximum tCO ( CO _ MAX ) is 567ps, the SU is 314ps.
tMET is the time margin at the maximum tCO .
tC
tMET tCO _ MAX tPROP tSU (6)
2
Let’s use t MET to replace tCO , the (5) can be
turned into (7)
t MET
ln(MTBF ) ln(W * f d * f c ) (7)
W
If we can get different value pairs t MET of and MTBF,
then W and W can be get with (8).
t MET 1 t MET 2
W
˄A˅Far layout between registers ln(MTBF1 ) ln(MTBF2 ) (8)
t MET
e W
W
MTBF * f d * f c
The measurement pair values are shown in Table 3.
With Table (3) and (8), we can get the FPGA related
parameter W and W
are W 2.03 u 10 10 ,W 1.5 u 10 18 .
4-744
The Ninth International Conference on Electronic Measurement & Instruments ICEMI’2009
Fig. 9 is the fitting curve of MTBF. When tMET is 0.7ns,
there is about one error per minute; when tMET is 1.5ns,
there is about one error per day; when tMET is 2.2ns, there
is about one error per decade. And when MTBF is about
more than 10 years, the system can be say is stable and
reliable.
č. SUMMARY
ACKNOWLEDGMENT
REFERENCES
4-745