Lec 8
Lec 8
clock
tPD tCO
X clock clock
Y Q
input output
?
?
?
These delays are caused by the time required to
charge the parasitic capacitances of transistors and
To avoid metastability we must ensure the voltage
interconnects.
at the latch input is at valid level long enough to drive
In the timing diagrams above the parallel lines in-
the latch output to a valid voltage level. The time re-
dicate times during which is signal is held at a high
quired for this is called the “setup” time, 𝑡SU .
or low level. The crossing lines indicate the times at
The input level must also be held at the correct
which the signal changes.
level until the multiplexer has switched off com-
pletely. This is typically a much shorter time – often
Metastability, Setup and Hold Times zero – and is called the “hold” time, 𝑡H .
Consider the following implementation of an edge- 1
This is a “master-slave” flip-flop. The second, “slave,“ latch
triggered D flip-flop: holds the previously latched value when the clock is 0
D
combinational D1
To avoid metastability almost all digital circuits are Q
0 logic
Q
D Q
combinational D1 Q The time from the launch edge to the data arrival
0 logic
at the D flip-flop input is called the data available
clock time. The time at which the latch clock edge arrives is
called the clock arrival time. The delays included in
calculating these times include interconnect delays,
By ensuring the propagation delays through the
𝑡 , and 𝑡PD
combinational logic will meet the setup and hold re- CO
quirements we can avoid metastable behaviour.
The timing diagram below shows the relationship Timing Netlists
between the clock edges and the valid times at the It is important to note that the only timing paths that
inputs and outputs of each flip-flop: need to be analyzed are those that start at a clock in-
put (or a chip input pin, called an “input port”) and
>tSU end at the D input of a flip-flop (or a chip output port).
Q0 or D1 However, there may be more than one path from a
tCO tPD
clock to a particular D input. For example, consider
clock
the half-adder shown in Figure 1. The numbers next
launch edge latch edge
to input pins are the delays from that input to the out-
put, including interconnect delay to that input2
The data structure used for STA is a directed
Q changes 𝑡CO after the rising clock edge. 𝑡PD later
(acyclic) graph where each node represents a pin.
the input at the D input of the right flip-flop will have
Edges represent delays and are labelled with the
a valid (and correct) logic level. This must happen 𝑡SU
propagation delay (including both gate and intercon-
at the latest before the next rising edge of the clock.
nect). In the following graph each node represents
This level must also be held for at least 𝑡H before it
an output and the values on the edges represents the
changes.
delays:
The diagram above identifies two clock edges, the
“launch” and “latch” edges. In this example the
edges are separated by the clock period. However, carry_next
the clocks may arrive at different times due to dif- 9 6
carry
10
ferent interconnect delays. This is known as “clock a 5 t2 4
skew.” It’s also possible that the two clocks have dif- 11 5 5
ferent frequencies or latch on the falling edge. sum
clock sum_next 6
This setup time is often called a “library” or “mi- 7 7
b 4 t1 3
cro” setup time to distinguish it from the chip I/O
setup and hold times.
In this example there is only one clock, clock, and
so all paths start at clock. There are four flip-flops
Static Timing Analysis but we will limit our analysis to carry and sum for
now.
Timing Paths The sums of the delays along the data paths, work-
To avoid metastability we must compare the propa- ing from output to input, are:
gation delay along the data path to the propagation 2
These are not the real values, I’m using round numbers to
delay along the clock path as shown below: make the arithmetic easier.
2
xxxxx
a
7
a_in D 5xxxxxx carry~reg0
11 carry_next
CLK Q 9 6
1'h0 D
SCLR 10 24 1
CLK Q carry
t2 1'h0
b 5 SCLR
7 5
b_in D sum_next sum~reg0
7 t1 4
clock CLK Q 7 6
1'h0 3 D
SCLR 4 23 1
CLK Q sum
1'h0
SCLR
carry.D (6) + carry_next (9) + a.clk (11) = 26 The minimum clock arrival time and maximum
data arrival time are used when computing the setup
carry.D (6) + carry_next (10) + b.clk (7) = 23
time:
sum.D (6) + sum_next (4) + t2 (5) + a.clk (11) = 26 𝑡SU = 𝑡clock arrival (min) − 𝑡data arrival (max)
sum.D (6) + sum_next (4) + t2 (5) + b.clk (7) = 22 and the max data arrival and minimum clock arrival
times are used when computing the hold time:
sum.D (6) + sum_next (3) + t1 (7) + a.clk (11) = 27
𝑡H = 𝑡data arrival (min) − 𝑡clock arrival (max)
The setup and hold times on each timing path are
Exercise 1: What is the remaining path and delay? What are the then compared to the required setup and hold times.
clock path delays? The difference is called the “slack.” A positive slack
For the purposes of timing analysis we only need means the requirement is exceeded.
to find the path with the minimum and the path with Since each clock has many launch and latch edges,
the maximum delay between each clock to D-input the STA must pick an appropriate pair. The rule
path and each clock to clock input path. For the is to use the latch edge immediately following the
carry flip-flop the data path delays are 23 (min) and launch edge when computing the setup time and to
26 (max). For the sum flip-flop these are 20 (min) and use the latch edge immmediately before the launch
27 (max). This reduces the graph to: edge when computing the hold time.
Exercise 2: Use numbers in the graph above to compute the setup
23/26 time slack for carry if the clock period is 10 ns.
carry.D
20/27
The following screen capture from Time Quest, the
sum.D Intel FPGA STA tool, shows the clock and data wave-
clock 24/24
carry.clk forms used to compute of the setup time along the
23/23
sum.clk path from the clock input to the carry flip-flop. Note
the two slightly different clock delays and the term
“Data Required Time” which includes the setup time.
where the pairs of numbers are the minimum and
maximum delays on each path.
STA Algorithm
A static timing analyzer finds the timing paths and
min/max delays from a delay-annotated netlist, com-
putes the time difference between clock and data ar-
rival times and checks that the corresponding setup
and hold requirements are met.
3
The following screen capture shows how the data Asynchronous Clocks and Inputs
arrival time is computed by adding up the various
propagation delays along the path: If all clocks are derived from the same source clock
(e.g. through clock division or using a PLL) the time
relationships between clocks remains constant and
it’s possible to verify that timing constraints will be
met.
However, if two clocks are physically independent
then this is not possible – the setup and hold tim-
ing requirement of flip-flops with asynchronous in-
puts are bound to be violated at some point. Even
though it’s not possible to do timing analysis on asyn-
chronous signals, it is possible to determine how
often timing violations happen when signals cross
clock “domains” and the consequences. This topic
will be covered in more detail later.
In this example the clock period is 10 ns and the
setup slack on this path is about 9 ns.
Timing Simulations
Closing Timning A timing-annotated netlist can be used by a simula-
tor to run simulations that take into account delays.
“Closing” timing is the process of iterating a design
During the simulation the simulator can check that
until all paths have positive slack. There are various
the setup and hold requirements of each flip-flop are
options when a design does not meet its timing re-
met.
quirements:
The advantage of this “dynamic” timing analysis is
• ask the EDA software to spend more time (ef- that the verification results are independent of, and
fort) optimizing the layout and routing can serve as a check on, user-provided timing con-
straints. The disadvantage is that the simulation may
• use a larger or faster device or process – this not cover all possible events. Timing simulations can
makes it easier to optimize PAR be time-consuming for large designs and are primar-
• modify the design to speed up critical timing ily used for ASIC “sign-off.”
paths. This might mean having more logic in
parallel or dividing up the computation into
more clock cycles.