Clocking and Latches: Notes For EEC180B Spring 1999 University of California Davis
Clocking and Latches: Notes For EEC180B Spring 1999 University of California Davis
by
EEC180B Notes
Prof. V.G.Oklobdzija
1. C LOCK S UB -S YSTEM
Proper timing and clock system design are one of the most critical components in digital system. This has been summarized in the words of Prof. Steven Unger: Despite the deceptively simple outward appearance of the clocking system, it is often a source of considerable trouble in actual systems. This chapter is dedicated to the subject of clocking in digital systems. The function of the clock in the digital system can be compared to the function of a metronome in the music. It designates a beginning of a music score, exact moment when certain notes are to be played by particular instruments in orchestra, designates the end of the part or section, i.e. provides synchronization for various instruments in the orchestra during various parts and periods of the score that is being performed. Similarly, in digital system the clock designates the exact moment when the signal is to change as well as its final value is to be captured, when the logic is active or inactive. Finally, all the logic operations have to finish before the tick of the clock and the final values of the signals are being captured at the tick of the clock. Therefore, the clock provides the time reference point which determines the movement of data in the digital system. This definition fits the description of the synchronous systems, which will be the subject of this chapter. Asynchronous and self-timed systems are not covered here, though, they have been capturing attention in the research and academia for quite some time. The reason why self-timed systems are drawing attention is an increasing difficulty to control the clock
1200
1000
800
600
Alpha 21264
400
Alpha 21064 Cray-X-MP Cray-1S IBM 3090
Alpha 21164
PentiumPro
200
K6
0 1975
1980
1985
1990
1995
2000
Year
Fig. 1.1. The trend in Clock frequency over the years
May 28, 1998 12:13 PM 2
EEC180B Notes
Prof. V.G.Oklobdzija
skew and distribution of the clock, as the operating frequency of todays systems keeps increasing, reaching 600MHz or even 1GHz and beyond, for the next generation of the systems that are currently under development. The trend in clocking speed is shown in Fig.1.1, indicating the exponential growth in clock frequency, over the years.
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
a)
Combinational logic
b)
Combinational logic
SET
SET
Clock
L
CLR
Clock
CLR
Period
Period
Combinational logic
c)
D
SET
Combinational logic
SET
CLR
CLR
Clock phase 1
Clock phase 2
Combinational logic
d)
D
SET
SET
CLR
CLR
Clock phase 1
Clock phase 2
g12
Period
W1
W2
g21
Fig. 1.2. System clocking waveforms and general finite-state machine structures
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
overhead for the clock-skew directly influences the machine cycle and therefore the performance of the overall system.
Machine Cycle:
I-Fetch I-Decode Execute
Clock:
0 1
Fig. 1.3. Relationship between the Clock and the Machine cycle. Timing analysis strongly depends on the type of clocking used in the system. Singlephase systems and multi-phase overlapping systems require more extensive timing analysis than multi-phase non-overlapping and edge-triggered systems. The single-phase and multi-phase overlapping timing requirements are bounded by both short and long paths. This constrain is illustrated in Fig.1.4, for the simplified case where setup and hold times are set to zero. The advantage of these systems over their non-overlapping counterparts is that they have, so called, cycle stealing feature, which effectively reduces the clock cycle and boosts the performance, at the price of more difficult timing analysis and less robust operation. Fig.1.4 (a) shows the timing constraints for a single-phase system: 1. LS data available at t1 2. LS data must arrive at LD after t2 (or be latched up in Cycle 1 short path) 3. LS data arrives at LD by t3 (or reduces the path length available in Cycle 2) 4. LS data must arrive at LD before t4 (or be latched up in Cycle 3 long path) Fig.1.4 (b) presents the timing constraints for two-phase overlapping systems: 1. L2S data available at t1 2. L2S data must arrive at L2D after t2 (or be latched up in Cycle 1 short path) 3. L2S data must arrive at L2D by t4 (or violate system cycle time requirement) 4. L2S data must arrive at L1D before t5 (or be latched up in Cycle 3 long path)
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
a)
D
SET
Clock 1
CLR
SET
CLR
t1
Clock 2
b)
t2
t3
t4
Destination Destination Master - L1D Slave - L2D Combinational logic Clock 2 Cycle 1
D
SET
SET
SET
CLR
CLR
CLR
CLR
Clock 1 Clock 1
Clock 3 Cycle 2
t1
Clock 2
t3
Clock 3
t5 t4
t2
Fig. 1.4. Timing constraints in single-phase and two-phase overlapping clocking techniques
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
Data
Latch
Clock
Clock Data Q
Flip-Flop: is defined as a bi-stable memory element with the same inputs and outputs as a latch, Fig.1.6. However, the output Q responds to the changes of D only at the moment the clock C is making transitions. We define this as being edge triggered. The internal mechanisms of the flip-flop and that of a latch are entirely different. We further define a Flip-Flop as a leading edge triggered if the output Q assumes a value of the input D as a result of the transition of the clock C from 0-to-1. Conversely in a negative edge triggered Flip-Flop the output Q assumes a value of the input D as a result of the transition of the clock C from 1-to-0. It is also possible to build a double-edge triggered flip-flop that responds to both: leading and trailing edge of the clock C. Such
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
flip-flop implementations, first published in 1981 by Unger are starting to gain attention recently, given the increasing demand for performance.
Data Q
F-F
Clock
Clock Data Q
12:13 PM
EEC180B Notes
Prof. V.G.Oklobdzija
gate remains high. The output of the selected clocked NAND gate goes low at the rising edge of the clock causing the regenerative process in S-R latch.
Q Clk Q
Fig. 1.7. SN 7474, leading-edge triggered Flip-Flop On the leading edge of the clock, depending of the state of the input D, a flow of signals occurs and causes the latching to a new state. If we assume that the delay of all the NAND gates is the same, , and that leading edge of the clock occurs at 0 time point, we will have changes in different nodes of the circuit in moments of , 2, and 3, as the signal progresses through the circuit. That is illustrated in Fig.1.8 for 0 to 1 transition of Q and on Fig.1.9 for 1 to 0 transition. For the sake of clarity only the first part of the clock cycle is shown in the diagrams. In the remaining part of the cycle, low level on Clk node forces both nodes S and R to high level, making them ready for another cycle. The key point in the edge-triggered behavior of this flip-flop is that once the clock has made a transition from 0 to 1, one of the nodes S or R (depending of the state of D before the leading-edge of the clock) makes a 1 0 transition thus disabling the further impact of D on the value of the nodes S and R . In Fig.1.8 node S makes a transition 1 0 thus disabling the further changes of the nodes A and R . In this way input D is isolated and can not influence the state of nodes S and R while clock is 1. In Fig.1.9 node R makes a transition 1 0 thus disabling the further changes of the nodes B and S . In this way input D is isolated and can not influence the state of nodes S and R while clock is 1. It is worth to mention the issue of the races in this structure. Races are caused only if Data input changes in the window either around the leading edge of the Clock (for Data:
May 28, 1998 12:13 PM 9
EEC180B Notes
Prof. V.G.Oklobdzija
0 to 1) or around one gate delay before the leading edge of the Clock (for Data: 1 to 0). The width of the critical window is determined by the parameters of the NAND gates in the first (multiplexing) stage of a Flip-Flop.
0:0 1
A
:1 0
S :1
2 : 0
Clk
:1
1 R
3 :1
D time Clk D S R A B Q Q
0
B
12:13 PM
10
EEC180B Notes
Prof. V.G.Oklobdzija
0
A S
0: 0 1
3 :1
1
R
Clk
:1
1
2 :0
:1
D time Clk D R S A B Q Q
0
B
Fig. 1.9. SN 7474, Signal flow for 1 to 0 transition of Q The discussion of the races is presented in Fig.1.10 and Fig.1.11. If Data changes in the first part of the critical window, the change will have a chance to win the race. If Data changes in the second part of the window, the change will cause the race, but it will not win, and will not be remembered. In ideal case of the equal delays of the NAND gates N1&N4 for race 01 and N2&N3 for race 10, the first part and the last part of the window will be equal. Any change in parameters of the NAND gates will cause one or the other part of the window to be bigger. The races are very hard to present on the logic level and
12:13 PM
11
EEC180B Notes
Prof. V.G.Oklobdzija
in the way of changes in discrete time intervals of NAND gate delays because they actually occur between these discrete points. This is why we have chosen to present the signals in the fashion of windows.
N2
A N3 S
Clk
N4 D N1 B race 01
time Clk D B
race R A S
12:13 PM
12
EEC180B Notes
Prof. V.G.Oklobdzija
For the case in Fig.1.10 the race occurs in the N1-N4 loop between the signals in the nodes B and R . Both nodes are initially high. If both D and Clock go high in the time window of two gate delays, the positive-loop feedback is enabled and the race begins. The race ends when either B or R goes low, depending on the relative position of Data and Clock transition within the window. Similar situation occurs in the N2-N3 loop for the transition of D from 1 to 0 within the critical window. The difference is that the window is shifted one delay earlier because signals B and Clk actually enable the positive-feedback loop. The race is between the signals on node A and S .
N2
race10 N3
For D: 1 to 0, i.e. B: 0 to 1 there is a race in N2-N3 loop in the window around one gate delay before Clk: 0 to 1 S
Clk
N4 D N1 B
time Clk D B R A
race S
12:13 PM
13
EEC180B Notes
Prof. V.G.Oklobdzija
12:13 PM
14
EEC180B Notes
Prof. V.G.Oklobdzija
The latch delay is usually taken as worse of the two: DCQ or DDQ, whichever happens to be greater.
Clock
Data
Out
Clock
Fig. 1.13. Earls Latch The objective of minimizing latch delay has been the subject of work by Yuan and Svensson and later Afghahi and Svensson [7,8]. They developed very fast Latch and FlipFlops better known as True-Single-Phase-Clock (TSPC) Latch(as shown in Fig. 1.14.). The same latch (with some improvements) has been used in the first implementation of Digitals Alpha processor achieving 200MHz clock rate [10], as shown in Fig.1.15. An interesting development of a Hybrid Latch Flip-Flop element is presented in [9]. Development and modifications of the clocking strategy including the selection and development of an appropriate latch structure is best illustrated in the later stages of Digitals Alpha processor development described in [11-13].
12:13 PM
15
EEC180B Notes
Prof. V.G.Oklobdzija
+Vcc
+Vcc
+Vcc
Q D D
U H
DCQ
(a)
Q
(b)
Fig. 1.14. True-Single-Phase-Clock (TSPC) Flip-Flop: (a) schematic diagram (b) timing An attempt to reduce power dissipated by the clock sub-system is described in [5,6] in which the clock signal swing is reduced. In [6] a differential sense-amplifier structure is used for latching the signal. A Flip-Flop that was developed by Toshiba was later used in Digtals 21264 Alpha processor which runs at 600MHz clock rate (shown in Fig. 1.16).
VDD M9 M1 M4 Out
M2 M5
In1 M3 M7 M8 V DD
M6
In2
Fig. 1.15. Modified TSPC Flip-Flop as used in the first generation 21064 Alpha processor from Digital [10].
12:13 PM
16
EEC180B Notes
Prof. V.G.Oklobdzija
Finally the synthesis of the clock tree in order to reduce the clock skews is described in [4].
Fig. 1.16. Toshiba Flip-Flop [6] used in the third generation 212064Alpha processor from Digital 21264. The Flip-Flop is differential.
12:13 PM
17
EEC180B Notes
Prof. V.G.Oklobdzija
Further Reading:
1. Eby G. Friedman, Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, 1995. 2. Wagner, Clock System Design, IEEE Design & Test of Computers, October 1988. 3. S.H. Unger, C. Tan, Clocking Schemes for High-Speed Digital Systems, IEEE Transactions on Computers, Vol. C-35, No 10, October 1986. 4. Minami, M. Takano, Clock Tree Synthesis Based on RC Delay Balancing, Proceedings of IEEE Custom Integrated Circuits Conference, p. 28.3.128.3.4, May 1992. 5. Kojima, S. Tanaka, K. Sasaki, Half-Swing Clocking Scheme for 75% Power Saving in Clocking Circuitry, IEEE Journal of Solid-State Circuits, Vol. 30, No 4, April 1995. 6. H. Kawaguchi, T. Sakurai, A Reduced Clock-Swing Flip-Flop (RCSFF) for 63% Power Reduction, IEEE Journal of Solid-State Circuits, Vol. 33, No 5. May 1998. 7. M. Afghahi, C. Svensson, A Unified Single-Phase Clocking Scheme for VLSI Systems, IEEE Journal of Solid-State Circuits, Vol. 25, No 1. February 1990. 8. J. Yuan, C. Svensson, High-Speed CMOS Circuit Technique, IEEE Journal of Solid-State Circuits, Vol. 24, No1, February 1989. 9. H. Partovi et al, Flow-Through Latch and Edge-Triggered Flip-Flop Hybrid Elements, Proceedings of 1996 IEEE International Solid-State Circutis Conference, San Francisco, California February 1996. 10. D. Dobberpuhl et al, A 200MHz 64-b Dual-Issue CMOS Microprocessor, IEEE Journal of Solid-State Circuits, Vol. 27, No 11. November 1992. 11. B. J. Benschneider, et al, A 300-MHz 64-b Quad-Issue CMOS RISC Microprocessor, IEEE Journal of Solid-State Circuits, Vol. 30, No 11. November 1995. 12. B. Gieske, et al, A 600MHz Superscalar RISC Microprocessor with Out-ofOrder Execution, 1997 ISSCC Dig. Tech. Papers, p. 176-177, February 7, 1997. 13. P.E. Gronowski et al., High-Performance Microprocessor Design, IEEE Journal of Solid-State Circuits, Vol. 33, No 5. May 1998.
12:13 PM
18