Design Exercises
Design Exercises
These exercises are allocated marks at Tripos examination level, with 20 marks making a full exam question.
Example answers are available to supervisors.
RTL Exercsises
RTL1. Give a brief definition of RTL and Synthesisable RTL. Name two example languages. [4 Marks]
RTL2. Explain Verilogs blocking and non-blocking assignment statements. Show how to exchange the contents of
two registers using non-blocking assignment. Show the same using blocking assignment. [6 Marks]
RTL3. Synthesisable RTL standards require that a variable is updated by at most one thread: is this strictly
necessary ?
RTL4. Explain the terms structural hazard and non-fully pipelined. [4 Marks]
RTL5. Give a fragment of RTL that implements a counter that wraps after seven clock ticks. [3 Marks]
RTL6. Give a fragment of RTL that uses two multiply operators but where only one multiplier is needed in the
generated hardware. Sketch the output circuit. [3 Marks]
RTL7. Show an example piece of synchronous RTL before and after inserting an additional pipeline stage. [4 Marks]
RTL8. Give a concise abstract syntax for an RTL module that uses the synthesisable subset of Verilog or VHDL
(structural hierarchy may be ignored). [6 Marks]
RTL9. Describe possible sources of non-determinism that may arise in synthesiable RTL. [4 Marks]
RTL10. Give an RTL design for a component that accepts a five-bit input, a clock and a reset and gives a single-bit
output that holds when the running sum of the five bit input exceeds 511. [6 Marks]
RTL11. Give a schematic (circuit) diagram for the design of RTL10. Use adders and/or ALU blocks rather than giving
full circuits for an such components. [7 Marks]
RTL12. Summarise the main differences between synthesisable RTL and general multi-threaded software in terms of
programming style and paradigms. [20 Marks].
SYSC9. Give synthesisable SystemC for a five-bit synchronous counter that counts up or down dependent on an input
signal. You should sketch C code that looks roughly like RTL rather than worrying about a precise definition
of synthesisable. [5 Marks]
SYSC10. Give the SystemC synthesisable equivalent design for the design of RTL10. You should sketch C code that
looks roughly like RTL rather than worrying about a precise definition of synthesisable. [7 Marks]
SYSC11. Repeat the previous exercise using a slightly more complex design: e.g. a long division component.
SYSC12. Define suitable nets for a simplex interface that transfers bytes using a four phase-handshake. Describe the
protocol. Answer this part using RTL or natural language. You may assume a suitably high-frequency clock
is available that will not alias the protocol. [5 Marks]
SYSC13. Sketch RTL for a counter module that writes its output to the four-phase interface of SYSC12.. Precise syntax
and operational details are unimportant, but a sensible answer would be a Verilog module that increments once
for each output operation and wraps after decimal 255 back to zero. [5 Marks]
SYSC14. Sketch code for a blocking transactor that writes to a four-phase, net-level interface and also some client code
for it that, when the two are combined, gives it equivalent functionality to the module of SYSC13. Answer
this time using basic C-like syntax: later (ESL29.) you are asked to use a TLM library. [6 Marks]
SYSC15. Extend SYSC14. with further code for a transactor that owns its own thread and is a net-level client for the
four phase handshake that makes an upcall to a user-provided function for each byte received. [4 Marks]
SOC1. What is meant by polled I/O and how does it compare with interrupt driven I/O ? [4 Marks]
SOC2. Sketch a set of typical macro definitions in C suitable for making low-level hardware access to a UART or
similar device that contains status, control and data registers. [4 Marks]
SOC3. Give a pair of short subroutines in C that perform polled-mode, blocking read and write operations using
your macros of SOC2.. [4 Marks]
SOC4. Sketch the RTL or SystemC code for an interrupt arbiter that stores eight vectors with individual interrupt
enable flags. The arbiter monitors eight interrupt inputs and presents the highest-priority, non-masked
interrupt vector to the processor when the processor asserts an interrupt acknowledge signal. Fine details will
vary from answer to answer. Syntactic accuracy would not be expected in examination answers. [10 Marks]
SOC5. How does the processor set up the interrupt arbiter device of SOC4. and what must it do after servicing an
interrupt ? [4 Marks]
SOC6. How would you make an interrupt arbiter that shares work over two CPUs ? Is this always a good idea ?
[6 Marks]
SOC7. Give a programming model for a simple DMA controller with one control/status register and three operand
registers for block length and source and destination addresses. The DMA (direct memory access) controller,
when active, becomes a bus master and copies a block of data from one area to another, generating an interrupt
on completion. [4 Marks]
Answer: Theres not much to say here: Just a gentle introduction to the
b) Sketch a full implementation of such a DMA controller that includes provision for slave access to the
programmable registers, active bus mastership and interrupt generation. Memory access should use a highlevel modelling style that ignores bus arbitration. Answer preferably using SystemC syntax, or pseudocode at
the same level of abstraction. Use RTL if and where needed or preferred. [7 Marks]
SOC8. Say with justification whether your SystemC DMA controller could be synthesised into RTL for use in a real
SoC ? [3 Marks]
ESL1. Define a transaction in Computer Science. How does the ESL use of this term differ ? [5 Marks]
ESL2. What is the difference between a blocking and non-blocking transaction in terms of implementation, efficiency
and callability? [6 Marks]
ESL3. Sketch SystemC code for a shim function that converts a transactional port from blocking to non-blocking,
or vice versa. [5 Marks]
ESL4. Add a simple transactional entry point to the five-bit counter of SYSC9. that allows a remote client to make
a five-bit, asynchronous parallel load of a value using a TLM call. [4 Marks]
ESL5. Restructure your answer of ESL4. so that the five-bit counter has a hardware-style parallel load and remains
synthesisable and use a separate transactor to convert the TLM parallel load into a net-level parallel load.
(You may ignore contention with other, simultaneous net-level operations on the counter.) [7 Marks]
ESL6. Give two ways that timing annotations embedded in a transactional level call can be synchronised with system
global time ? [5 Marks]
ESL7. Sketch a templated TLM SystemC model for a basic FIFO with capacity 8 items. [8 Marks]
ESL8. Sketch code that will join two such TLM FIFOs together to make a longer FIFO. [5 Marks]
ESL9. Sketch synthesisable SystemC or RTL-like code for such a FIFO (using either a circular buffer in a RAM or
else based on a multi-stage structure). This is rather straightforward exercise, but it is useful preparation for
the next one! [5 Marks]
ESL10. Sketch code for a transactor (one of several possible) that enables interworking between the TLM and Synthesisable FIFOs of ESL7. and ESL9.. [5 Marks]
ESL11. Sketch a SystemC model of a bus bridge and say what arbitration, queuing and address translation policies it
implements. Hint: a high-level model will likely lead to the shortest answer. Syntax details are unimportant
and pseudocode is acceptable. [8 Marks]
ESL12. Sketch a block diagram for a SoC containing at least two identical processor cores, a DRAM controller and
some amount of on-chip SRAM. Mark each end of each connection with a suitable port style to be used as
part of a TLM model (eg. blocking, non-blocking, initiator, target). [10 Marks]
ESL13. Roughly estimate (order of magnitude) how many workstation instructions are used when modelling each
access to the DRAM. [5 Marks]
ESL14. Consider how back-door access to a DRAM (or other RAM model) might be implemented, whereby bus
cycles for certain traffic, such as instruction fetch, are modelled with less detail. [5 Marks]
ESL15. What is an ISS (instruction set simulator or emulator) ? [2 Marks]
ESL16. Consider what simulation performance an ISS might give and can it ever be faster than real time ? (Perhaps
mention JIT mode). [5 Marks]
ESL17. Describe ways that caches can be modelled in conjunction with an ISS. [5 Marks]
ESL18. Describe a feasible high-level or TLM model of the subsystem of SOC12., whereby the sound can come out of
the sound card on the modelling workstation. What problems might arise ? hint: There is a TLM example of
a music playing system, with TLM DAC model, in the additional material on the course web site. [4 Marks]
ESL19. a) When an ISS is embedded in a SoC design, what differences can we expect to see when compared with a
cycle-accurate model ? [5 Marks]
b) Why might embedded firmware be cross-compiled to native code for a workstation ? [5 Marks]
b) Give two or more ways hardware device access can be modelled when firmware is compiled for the modelling
platform (i.e. in a mixed-abstraction model). [5 Marks]
d) What issues of endianness might arise ? How can they be overcome ? [5 Marks]
ESL20. What problems might arise when using high-level models of systems that use dynamic code loading and
self-modifying code ? [5 Marks]
ESL21. Give alternative definitions of the blocking calls of SOC3. to produce a high-level C/C++ model of a UART
device (that just does console or file I/O rather than implementing a full serial port). [4 Marks]
ESL22. Explain how firmware can be conditionally compiled to either direct calls through the code of SOC3. or
instead call the code of ESL21.. (Note, there are two answers to the latter half, where the the bus interface
between the components is either modelled or not) [10 Marks]
ESL23. Briefly describe each of: cycle-accurate, approximately-timed, loosely-timed, untimed. [8 Marks]
ESL24. Why might a transactional system exhibit different behaviour on the different models ? Is this good or bad ?
[2 Marks]
ESL25. What is the purpose and effect of the timing quantum in the loosely-timed model? [5 Marks]
ESL26. Explain how different timing models can be used (e.g. loose, approximate, cycle-accurate) in conjunction with
your answer to the DMA question (SOC7.) and what bugs in the system architecture might be exposed by
each form. [6 Marks]
ESL27. How can contention for a resource be modelled ?
ESL28. Sketch code that would measure traffic load and the number of transactions per millisecond at a contention
point in an ESL model.
ESL29. (Non-examinable) Re-answer SYSC14. using the full TLM 2.0 syntax with convenience sockets based on the
additional material on the course web page.
Clock cycles
3
1
2
Function
Sending row address,
Read or write 16 bits in current row,
Write back time when finished with row.
Making some assumptions about the pattern of access that the processor will make of the memory, calculate
it performance in terms of instructions per second. [5 Marks]
d) If all instructions for inner loops are copied to a 32-bit wide on-chip SRAM (that provides true random
access at 400 MHz) at code start, what is the performance now. [5 Marks]
e) If a cache structure with 98 percent instruction and 80 percent data hit rate is applied, what processor
performance is now achieved ? [5 Marks]
Value
0.08
6 to 9
400K
0.25
0.25
0.06
0.03
1
0.9 to 1.4
51
21
Unit
m
layers
gates/mm2
m
m
fF
fF
fF/mm
V
ps
nA/gate
A processor core in the above technology uses 200k gates, excluding cache memories. It has two operating
conditions: 100 MHz at 0.9 volts or 400 MHz at 1.4 volts. The average net activity ratio during halt is
negligible and 0.3 when running.
Give all working and intermediate results. State any additional assumptions you need or use.
a) Estimate the area of the processor. [2 Marks]
b) Compute the power consumed per gate at each operating condition when driving a tracks of 0 mm and
1 mm. [2 Marks]
c) Estimate the power consumption of the processor core when halted and running for each operating condition. [3 Marks]
8
d) Compared with having the processor running at full performance all the time, how much power is saved
just by halting the processor when it is idle ? [2 Marks]
e) How much power is saved by dynamic frequency scaling ? [2 Marks]
f ) How does dynamic frequency scaling compare with halting ? [2 Marks]
g) How much power is saved by combined dynamic voltage and frequency scaling ? [2 Marks]
h) How much power might be saved by power gating (i.e. power isolation) ? [2 Marks]
i) Estimate the relative costs of performing a 32 bit addition and sending the 32 bit result 1 mm over the
chip [3 Marks]
TTE5. : FPGA
a) What are the principal differences between an FPGA and a masked ASIC for implementation of a SoC ?
[5 Marks]
b) How can a SoC design team use FPGAs to prototype their product before SoC fabrication ? [5 Marks]
c) When would it be sensible to ship an FPGA instead of a masked ASIC in production runs ? [5 Marks]
Gate
AND
OR
INV
XOR
D-type
D-type
Parameter
propagation delay
propagation delay
propagation delay
propagation delay
clock-to-q time
set up time
Value
0.1 ns
0.1 ns
0.05 ns
0.15 ns
0.2 ns
0.05 ns
b) Describe the algorithm for a static timing analyser and show its operation on your circuit, giving the
maximum clock frequency. [7 Marks]
c) Draw a circuit where a static timing analyser will give an overly poor answer. [3 Marks]
TTE8. : : Dynamic Clock Gating.
a) What is dynamic clock gating and why is it used ? [4 Marks]
b) Compare coarse-grained manual and fine-grained automatic clock gating. [4 Marks]
c) Describe some common clock-gate insertion transformations. [6 Marks]
d) Compare dynamic clock gating with power isolation in terms of automation, scale and functionality.
[6 Marks]
This section is not examinable in 09/10 and only parts may be lectured.
10
HLS7. : IP-XACT
a) What is the purpose of the IP-XACT specification ? [5 Marks]
b) How can device driver register definitions be kept in step with RTL implementations ? [5 Marks]
c) What alternatives to IP-XACT might be considered for structural netlists ? [5 Marks]
d) How might IP-XACT be used in conjunction with transactor synthesis ? [5 Marks]
Additional Material
The additional material is not examinable, except for specific examples of TLM modelling that were presented in
detail in lectures.
Additional Material: Multipliers and Adders
11
12
EP3. : Technology/Scaling.
a) What is meant by the term feature size in VLSI ? Give typical values. [5 Marks]
b) What are the main consequences of moving to a smaller feature size in VLSI fabrication ? [5 Marks]
c) What happens to the relative costs of computation and communication as features get smaller ? [5 Marks]
d) Why has parallel computation become more important than ever before ? [5 Marks]
END OF DOCUMENT.
13