Synthesis of Multi-FPGA Systems With Asynchronous Communications
Synthesis of Multi-FPGA Systems With Asynchronous Communications
Asynchronous Communications
Volume 2 of 2
by
April, 2007
T.B. Yee, 2007 Appendix A: Paper 248
Appendix A
Paper
This appendix contains the paper published in the proceedings of the International
Federation for Information Processing International Conference on Very Large Scale
Integration 2005 (IFIP VLSI-SOC 2005).
The following published papers were included in the bound thesis. These have
not been digitised due to copyright restrictions, but the links are provided.
Y. Tack Boon, M. Zwolinski, A.D. Brown (2005) “Multi-FPGA Synthesis with Asynchronous
Communication Subsystems.” IFIP International Conference on Very Large Scale Integration (VLSI-
SOC 2005).
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
256
Appendix B
The first few sections of this appendix provide the information on the JPEG decoder and a
full profile of test images and photographs of the test images decoded by the multi-FPGA
JPEG decoder. The rest of this appendix provides the detailed information on the hardware
demonstrator. The information provided includes: circuit description of the BT121
VideoDAC on the I/O VGA peripheral board, user manuals o f the development board, and
the setting up of the hardware demonstrator.
The JPEG File Interchange Format is a minimal file format, which enables JPEG
bitstreams to be exchanged between a wide variety of platforms and applications. The
JFIF is entirely compatible with the standard JPEG interchange format and it conforms to
the JPEG standard (ISO/IEC 10918-1 | ITU-T Recommendation T.81); the only additional
requirement is the presence of a JFIF application segment marked by an APPO marker.
The rest of this section provides the specifications and syntax of a JPEG file defined in
Annex B of the ISO/IEC 10918-1 | ITU-T Recommendation T.81 and the JFIF application
segment. The set of marker assignments and their description supported by the lossy
sequential DCT-based JPEG decoder is listed in Table B-1 below.
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
257
Code
Symbol Description
Assignment
SOI OxD8 Start of image
APPo OxEO JFIF application segment
APPn OxE1 - OxEF Other APP segments
DOT OxDB Quantisation table
SOFo OxCO Start of frame
DHT OxC4 Huffman table
SOS OxDA Start of scan
COM OxFE Comment, may be ignored (skipped)
EOI OxD9 End of image
JFIF marker identifiers are preceded by an all' T byte (OxFF). A two-byte SOI header
(OxFF, OxD8) identifies the JFIF file format, the APPq marker immediately follows the
SOI header and subsequently by the other segments and markers. The end of file is
identified by the EOI (OxFF, 0xD9) marker. Normally, the only marker identifier that
should beftyuiid oiice theimEyre daitais started is the IlOIiriarlcer. TA/tKm a CbdFlFibyte is
found followed by a zero byte, the zero byte must be discarded.
The following describes the JPEG file format and descriptions of the key segments given
in Table B-1:
Segments: Following the SOI marker, there can be any number of segments or markers
described in Table B-1 above.
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxCO to identify SOFO marker.
Length 2 This value equals to 8+ component*3 value.
Data precision 1 This is in bits/sample, usually 8.
Image height 2 This must be >0.
Image width 2 This must be >0.
Number of components Usually 1= grayscaled, 3= colour YCbCr or YIQ,
1
4= colour CMYK.
Read each component data of 3 bytes. It
contains:
Component ID (1 byte) (1= Y, 2= Cb, 3= Cr, 4= 1,
Each component 3
5= Q), sampling factors (1 byte) (bits 0-3 vertical,
bits 4-7 horizontal), quantisation table number (1
byte).
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxEO to identify APPO marker.
Length 2 This must be >=16
This identifies JFIF. 'JFIF#0' (0x4A, 0x46, 0x49,
File identifier mark 5
0x46,0x00)
Major revision number 2 Should be 1, otherwise error.
Minor revision number 2 Should be 0 to 2, otherwise try to decode anyway
0= no units, x/y-density specifies the aspect ratio
Units for x/y densities 1 instead: 1= x/y-density are dots/inch, 2= x/y-
density are dots/cm.
X-density 2 It should be >0.
Y-density 2 It should be >0.
Thumbnail width 1 -
Thumbnail height 1 -
If there is no JFIF#0 in the file identifier, or the length is <16, then it is probably not
a JFIF segment and should be ignored.
Noimally units— 0, x-density— 1, y-density= 1 means the image has an aspect ratio of
1:1 (evenly scaled).
JFIF files including thumbnails are very rare, the thumbnail can usually be ignored. If
there is no thumbnail, then width= 0 and height^ 0.
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxC4 to identify DHT marker.
Length 2 This specifies the length of Huffman table.
Huffman Table (HT) Bits 0-3: number of HT (0 to 3, otherwise error),
1 Bit 4; type of HT (= DC table, 1 = AC table). Bits 5-
information
7; not used, must be 0.
Number of symbols with codes of length 1 to 16,
Number of symbols 16 the sum(n) of these bytes is the total number of
codes, which must be <= 256.
Symbols Table containing the symbols in order of
n
increasing code length (n= total number of codes).
• A single DHT segment may contain multiple Huffman tables, each with its own
information byte.
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDD to identify DRI marker.
Length 2 This must be 4.
This is in unit of MCU blocks, means that every n
Restart interval VICU blocks, a RSTn marker can be found. The
2
first marker will be RSTO, then RST7, etc, after
RST7, repeating from RSTO.
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
260
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDB to identify D Q T marker.
Length 2 This specifies the length of the quantisation table.
Quantisation Table (QT) Bits 0-3: number of QT (0 to 3, otherwise error),
1 Bits 4-7: precision of QT (0= 8-bit, otherwise 16-
information
bit).
Bytes n 1 his gives the QT values, n= 64*(precision+ 1)
A single DQT segment may contain multiple quantisation tables, each with its own
information byte.
For precision^ 1(16 bits), the order is high-low for each o f the 64 words.
Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDA to identify S O S marker.
Length This must be equal to 6+2* (number of
2
components in scan).
Number of components This must be >=1 and <= 4 (otherwise error),
in scan 1
usually 1 or 3.
For each component, read 2 bytes. It contains 1
Each component byte: Component ID (1= Y, 2= Cb, 3= Cr, 4= 1, 5=
2
Q), 1 byte: Huffman table to use (bits 0-3: AC
table 0 to 3, bits 4-7: DC table 0 to 3).
Ignorable bytes 3 Skip the next 3 bytes.
- ^
m i
ST
• C(l#
m / m 7Wi 7 f t TV
Figure B-4 and Figure B-5 illustrate two 128-pixel by 128-pixel test images
(SQUARES.jpg and SLOPE.jpg) decoded using the multi-FPGA JPEG decoder.
Post-MOODS multi-FPGA synthesis simulation results of the decoding of the LENA test
image using the non-pipelined multi-FPGA JPEG decoder are given in Figure B-6 and a
zoom in view in Figure B-7. The simulations show the signal transitions and data transfers
of the various components (e.g. the UART RTL module, Frame buffer controller RTL
module, etc), and the communication channels in the multi-FPGA JPEG decoder. The
decoded pixel data are given in signal "/sim_top_level/decoded_data" (under the multi-
FPGA JPEG decoder core divider) in Figure B-7, and cursors 1 and 2 mark the first
decoded (pixel) value and the end of the eighth decoded (pixel) value in the test image
respectively in the figure (e.g. the first to the eighth pixel values obtained from the close
up view of the simulation in Figure B-7 are 0x7C, 0x94, Ox8A, 0x6F, Ox8C, 0x88, Ox8E,
0x65 respectively). The two-phase data handshaking scheme for the inter-device data in
subprogram communication channel 2 (under the SpC 2 divider) can also be seen clearly
in Figure B-7.
Simulation results for the 2-,3- and 6-device implementation of the pipelined multi-FPGA
JPEG decoder are given in Figure B-8 to Figure B-13. Cursors 1 and 2 in the zoom in
views of the simulations mark the first decoded (pixel) value and the end of the eighth
decoded (pixel) value respectively. Inter-device data sent through the explicit
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 264
communication channels (ExCs) using the two-phase data handshaking scheme can be
seen in the zoom in views of the simulations (e.g. ExC 4 in Figure B-9).
BBS mmm
HSS MMM
5 5 5 MMM
gag sss
25S- MMM
999 MMM
555 MMM
HHH ===
S S5
BBS MMM
MMM
BBB MMM
BBS MMM
BBS MMM
BBS MMM
BBB MMM
555
555
MMM
MMM
BBS MMM
555 MMM
EBB MMM
SSh = = =
g9g5 g =
9 ==
MMM
MM M MM ^ ^
BSS
B S f l Smm
BBB MMM
BBS MMM
BBB MMM
555 MMM
QQQ- MMM
BBS MMM
BBB
555
MMM
MMM
ill
• 1 ••• ') I
6666
p% g g% § 23 2 *
I'll' a n
@! g! e & a 9
mill II I
v i i i i i i u
9 gll
s m
iiliiiiiii
«! *l. • TE 3 ! r ' a *
6 6 6 16666661
"ICC ^
8 8 8 8 8 8 S 8 8 » t S 8- eaaeas
I •§; tl
ifl III
lii H i
iiiiiiiiiiiHiiiii lilfll
I i I^ i^ ijc jei!
u. 1111
jr j t j s 111.!
w, J i H^ ! 11 i I;
^ ^ ^
r-i
?, 2_GC D
2_Gc O - ^ ="
B'' E
6 6 6 1 6 6 6 16 i I
E: : f!(C
r ; r ; a3:
jT *
- CC #| ^ 1 ti
66 I 6661 6 6A66 666AA
mm
Figure B-10 Simulation of test image (LENA.JPG) decoding in a pipelined
multi-FPGA JPEG decoder (3-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 269
...ill
5SS 5ss
l l i U l i l ! ! , ! ,
l i i i
IHltll
m m III
I
6 6 6 1 6 6 6
n
m 1*11^* IN #
l i i ia a i l l 111 I t l t i l t
I » 8i »
"""'JSillUIUipil
6 6 6 1 6 6 6 1 6 6 6 1 A A11 6 6 6 6
Figure B-13 Simulation (zoom view) of test image (LENA.JPG) decoding in a
pipelined multi-FPGA JPEG decoder (6-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 272
D2-SB
development
board 3
D2-SB
development
board 2
D2-SB
development
board 1
I/O VGA
peripheral board
The following tables give the pin assignments of the three Digilent D2-SB development
boards; connectors that are not available (N/A) for user I/O assignments (e.g. VCC, GND)
or not connected (n/c) are highlighted in grey. Table B-2 lists signals assigned to
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
273
connectors A1 and A2, Table B-3 lists signals assigned to connectors B1 and B2, and
signals assigned to connectors CI, and C2 on development board 1 are given in Table B-4.
Connector A1
Connector A2
Conn, pin FPGA pin signal Conn, pin FPGA pin
1 N/A GND N/A
1 '' GND
2 N/A VU N/A
1 ^ VU
3 N/A VCC33 N/A
1 ^ VCC33
4 P112 P162
1
5 P111 vga hsync n P161
1 ® SRAMAddr(1)
6 P110 1 vpa vsync n P160
1 ® SRAMAddr(O)
7 1 P I 09 1 pin vga qrav(l) PI 52
1 7 SRAMAddr(3)
8 1 P108 1 pin vga grav(O) P151 SfRAMAddr(2)
1 ^
9 1 P102 1 pin vga qrav(3) 1 ® PI 50 SRAMAddr(5)
10 1 P101 1 pin vga gray(2) 1 10 P149 SRAMAddr(4)
11 1 P I 00 pin vga grav(5) P148
1 11 SRAMAddr(7)
12 1 P99 1 pin vga gray(4) P147 SRAMAddr(6)
1
13 1 P98 j pin vga grav(7) P146
1 SRAMAddr(9)
14 1 P97 1 pin vga gray(6) 1 14 P145 SRAMAddr(8)
15 j P96 1 P141
1 SRAMAddr(11)
16 1 P95 1 1 16 PI 40 SRAMAddrdO
17 1 P94 1 PI 39 SRAMAddr(13)
1
18 1 P93 1 1 18 PI 38 SRAMAddr(12)
19 1 P89 1 1 19 PI 36 SRAMAddr(15)
20 1 P181 1 1 20 P135 SRAMAddr(14)
21 1 P87 1 P134
1 SRAMAddr(17)
22 1 P180 1 1 22 P133 SRAMAddr(16)
23 1 P179 1 SRAMData(12) 1 23 P132 SRAMDatad)
24 1 P178 1 SRAMData(13) P129 SRAMData(O)
1
25 1 P176 1 SRAi\/IData(14) 25 PI 27 StRAMData(3)
26 1 P i 75 1 SRAMData(15) 26 P126 SRAMData(2)
27 1 P174 j SRAM CE 27 P125 SRAMData(5)
28 1 P I 73 1 SRAM WE 28 P123 SRAMData(4)
29 j PI 69 1 SRAM LB 29 P122 SRAMData(7))
30 1 P168 1 SRAM UB 30 P121 SRAMData(6)
31 1 P167 1 SRAM OE 31 P120 SRAMData(9)
32 1 P166 1 32 P116 SRAMData(8)
33 1 P165 1 RD 33 P115 SRAMData(ll)
34 1 P164 1 TD 1 34 P114 SRAMDatad 0)
35 1 P I 63 1 pin_vgaclk 25Mhz 35 P113
36 1 n/c 36 n/c
37 1 n/c 37 n/c
38 j n/c 38 n/c
39 1 n/c 39 P80 GCLKO
40 1 n/c 1 40 n/c
Table B-2 and A2 of development
board 1
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 274
Connector C1 Connector C2
Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
P112 P23
4 4
P111 P22 decoded_data(5)
5 5
P110 P21
6 6
P109 P20 decoded_data(6)
7 7
PI 08 P18 decoded_data(7)
8 8
P102 PI 7 decoded_data(8)
9 9
P101 P16 decoded_data(9)
10 10
P99 P15 decoded_data(10)
11 11
P99 P11 decoded_data(11)
12 12
P98 P10 decoded_data(12)
13 13
P97 P9 decoded_data(13)
14 14
P96 JFIF_eof P8 decoded_data(14)
15 15
P95 Data_Symboi(8) P7 decoded_data(15)
16 16
P94 Data_Symbol(9) P6 end_conv
17 17
P93 Data_Symbol(10) P5 s_sym_check
18 18
P89 Data_Symbol(11) P4
19 19
P45 Data_Symbol(12) P3 JPEG_start
20 20
P87 Data_Symbol(13) P206
21 21
P44 Data_Symbol(14) P205
22 22
P43 Data_Symbol(15) P204
23 23
P42 JFIF_info(0) P203
24 24
P41 JFIFJnfo(1) P202
25 25
P40 JFIF_info(2) P201
26 26
P36 JFIFJnfo(3) P200
27 27
P35 P199
28 28
P34 decoded_req PI 98
29 29
P33 decoded_ack P I 94
30 30
P31 decoded_data(0) P193
31 31
P30 decoded_data(1) PI 92
32 32
P29 decoded_data(2) P191
33 33
P27 decoded_data(3) P189
34 34
P24 decoded_data(4) P I 88
35 35
n/c n/c
36 36
n/c n/c
37 37
n/c n/c
38 38
n/c P77 GCLK1
39 39
n/c n/c
40 40
Table B-4 Pin assignment of signals to connector C1 and C2 of development
board 1
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
276
Table B-5 lists signals assigned to connectors A1 and A2, Table B-6 lists signals assigned
to connectors B1 and B2, and signals assigned to connectors C I , and C2 on development
board 2 are given in Table B-7.
35 n/c 35 n/c
36 n/c 36 n/c
37 n/c 37 n/c
38 n/c 38 n/c
39 n/c 39 n/c
40 n/c 40 n/c
Connector C1 Connector C2
Conn, FPGA Conn, FPGA
signal signal
pin pin pin pin
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VGC33
4 P112 4 P23
37 n/c 37 n/c
38 n/c 38 n/c
board 2
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
279
Table B-8 lists signals assigned to connectors A1 and A2, and Table B-9 lists signals
assigned to connectors B l , B2, CI, and C2 on development board 3.
Connector A1 C o n n e c t o r A2
Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
4 P112 4 P I 62
5 P111 vga_hsync_n 5 P161 SRAMAddr(l)
6 P110 vga_vsync_n 6 P160 SRAMAddr(O)
7 P109 Pin_vga_gray{1) 7 P I 52 SRAMAddr(3)
8 P108 pin_vga_gray(0) 8 P151 SRAMAddr(2)
9 P102 pin_vga_gray(3) 9 P I 50 SRAI\/IAddr(5)
10 P101 pin_vga_gray{2) 10 P149 SRAMAddr(4)
11 P100 pin_vga_gray(5) 11 P148 SRAMAddr(7)
12 P99 pin_vga_gray{4) 12 P147 SRAMAddr(6)
13 P98 pin_vga_gray(7) 13 P146 SRAMAddr(9)
14 P97 pin_vga_gray(6) 14 P I 45 SRAI\/IAddr(8)
15 P96 15 P141 SRAMAddr(ll)
16 P95 16 P140 SRAMAddr(IO)
17 P94 17 P I 39 SRAMAddr(13)
18 P93 18 P I 38 SRAMAddr(12)
19 P89 19 P136 SRAI\/IAddr(15)
20 P181 20 P135 SRAMAddr{14)
21 P87 21 P I 34 SRAMAddr(17)
22 P180 22 P I 33 SRAMAddr(16)
23 P179 SRAMData(12) 23 P132 SRAMData(l)
24 P I 78 SRAI\/lData(13) 24 P129 SRAMData(O)
25 P I 76 SRAMData(14) 25 P127 SRAMData(3)
26 P I 75 SRAMData(15) 26 P I 26 SRAIVlData(2)
27 P I 74 SRAIVI_CE 27 P125 SRAI\/IData(5)
28 P173 SRAI\/I_WE 28 P I 23 SRAMData(4)
29 P169 SRAM_LB 29 P122 SRAI\/IData(7))
30 P168 SRAM_UB 30 P121 SRAMData(6)
31 P167 SRAM_OE 31 P120 SRAI\/IData{9)
32 P I 66 32 P116 SRAI\/lData(8)
33 P165 RD 33 P115 SRAMData(11)
34 P I 64 TD 34 P114 SRAMData{10)
35 P163 pin_vgaclk_25Mhz 35 P113
36 n/c 36 n/c
37 n/c 37 n/c
38 n/c 38 n/c
39 n/c 39 P80 GCLKO
40 n/c 40 n/c
3G
n/c 36 n/c 36 n/c 36 n/c
38
n/c 38 n/c 38 n/c 38 n/c
Table B-9 Pin assignment of signals to connector B1, B2, C1, and C2 of
development board 3
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail 281
VREF : F S ADJUST
Reference
CLOCK - Amplifer
1.2 V
8
R0-R7 DAC lOR
8
G0-G7 I DAC lOG
Register
a
B0-B7 ' DAC . lOB
SYNC* -
BLANK' -
VAA' 'AGND
As illustrated in the fimctional block diagram, the BT121 contains three 8-bit D/A
converters, input registers, and a reference amplifier. On the rising edge of CLOCK, 24
bits of colour information (R0-R7, G0-G7, and B0-B7) are latched into the device and
presented to the three 8-bit D/A converters. Latched on the rising edge of CLOCK to
maintain synchronisation with the colour data, the SYNC* and BLANK* inputs add
appropriately weighted currents to the analogue outputs, producing the specific output
levels required for video applications.
The D/A converters on the BT121 use a segmented architecture in which bit currents are
routed to either the outputs or GND by a sophisticated decoding scheme. This architecture
eliminates the need for precision component ratios and greatly reduces the switching
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 282
transients associated with turning current sources on and off. Monotonicity and low glitch
are guaranteed by use of identical current sources and current steering their outputs. An
on-chip operational amplifier stabilises the full-scale output current against temperature
and power supply variations. The analogue outputs of the BT121 can directly drive a 37.5
n load, such as a doubly-terminated 75 O coaxial cable. The pin diagram of the BT121
videoDAC is illustrated in Figure B-16 and the pin descriptions are given m Table B-10.
o
< i Q o g
K
O Q
CD
o 1 8 1 1 § § o
% R N S s % % 5
R7 r 40 28 J GND
R 6 [ 41 27 J GND
R5 r 42 26 J BO
R 4 [ 43 25 J B 1
R 3 r 44 24 JB2
R2 ^ 1 # 23 ]B3
Ri r 2 22 J B 4
R0[ 3 21 ]B5 j
GND 1 4 20 jBG 1
GNDl" 5 19 ]B7 1
SYNC* r 6 18 ]CLOCK 1
i (3 o o s 8 S 5 o o a
§ o
m
The typical connection diagram using the internal voltage reference is shown in Figure B-
17 and the parts lists listed in Table B-11.
COMP
R4
C2.C3
VREF
C6 CI
BT121
Ground
GND
VAA
RSET;; <R1 :;R2 %R3
1N4148/9
FS Adjust
DAG To
ICR output
P monitor
1N4148#
lOG video
P
lector
lOB P
AGND
Note: The vendor numbers above are listed only as a guide. Substitution of devices
with similar characteristics will not affect the performance of the BT121.
Overview
I r O :
The Digilent D2-SB circuit board Icguatu : ocki oLflon
LED
S-'c'D"
provides a complete circuit development
platform centered on a Xilinx Spartan 2E
FPGA. D2-SB features include:
• A Xilinx XC2S200E-200 FPGA with Xilinx S p a r t a n 2 E X C 2 S 2 0 0 E - P Q 2 0 8
200K gates and 350MHz operation;
Expanskm Connectors
• 143 user l/Os routed to six standard
40-pin expansion connectors; I I I
• A socket for a JTAG-programmable I
18V02 configuration Flash ROM; s E
shown in the figure below. The primary JTAG cable to the configuration
configuration port (Port 1) uses a software. Port modules can disable their
standard 6-pin JTAG header (J7) that JTAG drivers; if more than one JTAG
can accommodate Digilent's JTAG3 driver is enabled on the scan chain,
cable (or cables from Xilinx or other programming may fail.
vendors). The other three JTAG
programming ports are available on the
A1, B1, and C1 expansion connectors,
* I /paw
and these ports are bi-directional. If no
peripheral board is present, a buffer on •
the VCCO voltage derived from the 3.3V VU on pin 2, and 3.3V on pin 3. Pins 4-
supply. If other VCCO voltages are 35 route to FPGA I/O signals, and pins
required, the regulator output can be 36-40 are reserved for JTAG and/or
modified by changing R12 according to: clock signals. The expansion headers
provide 192 signal connections, but the
VCCO = 1.25(1 + R12/R11). Spartan 2E-PQ208 has only 143
available I/O signals. Thus, some FPGA
Refer to the LM317 data sheet and D2- signals are routed to more than one
SB schematic for further information. connector. In particular, the lower 18
pins (pins 4-21) of the A1, B1, and CI
connectors are all connected to the
Oscillators same 18 FPGA pins, and they are
designated as the "system bus" (a
The D2-SB provides a 50MHz SMD unique chip select signal is routed to
primary oscillator and a socket for a each connector). Other than these 18
second oscillator. The primary oscillator shared signals, all remaining FPGA
is connected to the GCK2 input of the signals are routed to individual
Spartan 2E (pin 182), and the secondary expansion connector positions. The
oscillator is connected to GCK3 (pin lower 18 pins of the A2, B2, and C2
185). Both clock inputs can drive the connectors are designated as "periphera
DLL on the Spartan 2E, allowing for busses", and each of these busses
internal frequencies up to four times (named PA, PB, and PC) use 18 unique
higher than the external clock signals. signals. The 14 upper pins of each
Any 3.3V oscillator in a half-size DIP expansion connector (pins 22-35) have
package can be loaded into the been designated as "module busses".
secondary oscillator socket. The A1, A2, 01, and C2 connectors
each have fully populated module
Pushbutton and LED busses (named MAI, MA2, MCI, and
MC2). Insufficient FPGA pins were
A single pushbutton and LED are available to route full module buses to
provided on the board allowing basic the B connectors; only the 8 data pins of
status and control functions to be MB1 are routed, and no pins are routed
implemented without a peripheral board. to the upper B2 expansion connector
As examples, the LED can be (i.e., MB2 is a "no connect").
illuminated from a signal in the FPGA to
verify that configuration has been System Bus
successful, and the pushbutton can be
used to provide a basic reset function The "system bus" is a protocol used by
independent of other inputs. The circuits certain expansion boards that mimics a
are shown below. simple 8-bit microprocessor bus. It uses
eight data lines, six address lines, a
write-enable (WE) strobe that can be
used by the peripheral to latch written
data, an output-enable (OE) strobe that
F3:X can be used by the peripheral to enable
80'3"
read data, a chip select, and a clock to
enable synchronous transfers.
Write Cycle
(h
\ r
cs
/
(doe Koe
-K4-
OE y
K
tw
L J m ;
DBO.OB7
X K
Read Cvcle
teoe tdoe
iy
OE - \
WE
K
DB04)B7
X X
K e a d d a t a latdi
ii Sys Bus
Spartan 2E
PO 208
>8(16)
8 o c
a.
:Pm4C
A1 A2 B1 82 C1 C2
Pin # FPGA
Signal FPGA FPGA FPGA FPGA FPGA
Pin Signal Signal Signal Signal Signal
Pill Mn Pm Pin Pin
1 GND GND GND GND C O GND
2 VU VU VU VU J VU
3 VCC33 VC&# VCC33 VCC33 VCC33
4 AORO 112 PAmi 162 ADRO 112 PBmi 71 ADRO 112 PCKM 23
5 OBO 111 PAMZ 161 DBO 111 PBm2 70 DBO 111 pcm2 22
6 ADR1 110 P A o a 160 ADR1 110 PBm3 69 4DR1 110 pcms 21
7 DB1 109 PAm4 152 DB1 109 PBm4 G8 DB1 109 pcm4 20
8 A0R2 108 =Am5 151 ADR2 108 PBms 64 ADR2 pcms
103 13
9 DB2 ti 2 = i|06 15C DB2 102 PBm6 63 DB2 102 pcme 17
10 ADR3 in = -107 149 ADR3 101 PBm7 62 ADR3 101 PCW7 16
11 DB3 100 148 DB3 100 PBms 61 DB3 100 pcme 15
12 ,WR4 99 PAWS 147 ADR4 99 PBmg 60 ADR4 99 pcmg 11
13 DB4 98 PAIOIO 146 DB4 98 PBI010 59 DB4 98 PCI010 1C
14 ADR5 97 PAI011 145 ADR5 97 PBI011 58 ADR5 97 PCI011 9
15 DBS 96 PAI012 141 DB5 96 PBI012 57 DBS 96 PCI012 8
16 WE 95 PAI013 140 WE 95 PBI013 56 WE 95 PCI013 7
17 DBG 94 PAI014 139 DB6 94 PBI014 55 DB6 94 PCI014 6
18 OE 93 PAI015 138 OE 93 PBI015 49 OE 93 PCI015 5
19 DB7 89 PAI016 136 DB7 89 PBI016 48 DB7 89 PCI016 4
20 CSA 181 PAI017 135 CSB 88 PBI017 47 CSC 45 PCU017 3
21 LSBCLK 87 PAI018 134 LSBCLK 87 PBI016 46 LSBCLK 87 PCI018 206
22 VA1DB0 180 MA2DB0 133 MG1DB0 86 MCIDBO 44 IMC2DB0 205
23 VA1DB1 179 MA2DB1 132 MB1DB1 84 M C I OBI 43 MC2DB1 204
24 VA1DB2 178 MA20B2 129 MB1DB2 83 MC1DB2 42 MC2DB2 203
25 'AMDB3 176 MA2DB3 127 MB1DB3 82 MC1DB3 41 MC2DB3 202
26 VA1DB4 175 MA2DB4 126 MB1DB4 81 MC1DB4 40 MC2DB4 201
27 fAAilDBS 174 MA2DB5 125 MB1DB5 75 MC1DB5 36 MC2DB5 200
28 W\1DB6 173 VA20B6 123 MB1DB6 74 MC1DB6 35 VIC2DB6 199
29 '^A1DB7 169 VA2DB7 122 MB1DB7 73 dC1DB7 34 VIC2DB7 198
30 W1ASTB 168 MA2ASTB 121 \4C1ASTB 33 V1C2ASTB 194
31 AMDSTB 167 VA203TB 120 31 ,flC2DSTB 193
32 /A'YvRT 166 ',AA2WRT 116 viCAR- 30 ^C2WRT 192
33 ,'A"'A'AIT 165 ^2WAIT 115 . ' C I WAIT 29 ,1C?A'AIT 19'
34 /A1RST 164 VIA2RST 114 w^iRST 27 ^C2RST 189
35 163 v;A2INT 113 dClINT 24 .1C2INT 18S
36 ;TSELA ITSELB JTSELC
37 FMS rws "MS
38 rcK rcK "CK
39 rOO 3OLK0 80 mo "DO ( 3CLK1 77
40 "Dl 3 NO roi "Dl 1 |c 3ND
• 4 pushbuttons;
• 8 slide switches;
• 3-bit VGA port; DI04 circuit board block diagram
• PS/2 mouse or keyboard port;
Seven-Segment LED display
Functional Description
The DI04 board contains a modular 4-
The DI04 can be attached to Digilent digit, common anode seven-segment
system boards to quickly and easily add LED display. In a common anode
several useful I/O devices. The DI04 display, the seven anodes of the LEDs
draws power from the system board, and forming each digit are connected to four
signals from all I/O devices are routed to common circuit nodes (labeled AN1
individual pins on the system board through AN4 on the DI04). Each anode,
connectors. These features allow the and therefore each digit, can be
DI04 to be incorporated into system- independently turned on and off by
board circuits with minimal effort. driving these signals to a '1' or a '0'. The
All devices on the DI04 use the 3.3V cathodes of similar segments on all four
supply from the system board, except for displays are also connected together into
the PS/2 port which needs a 5VDC seven common circuit nodes labeled CA
supply (the DI04 contains a 5VDC through CG. Thus, each cathode for all
regulator). Signals coming from the PS/2 four displays can be turned on and off
port are routed through level shifting independently. This connection scheme
buffers to protect system boards that do creates a multiplexed display, where
not have 5V tolerant inputs. driving the anode signals and
corresponding cathode patterns of each
digit in a repeating, continuous
Power Supplies
succession can create a 4-digit display.
In order for each of the four digits to
The DI04 draws power from three pins
appear bright and continuously
on the 40-pin connectors: pin 37
illuminated, all four digits should be
supplies 3.3V; pin 39 provides system
driven once every 1 to 16ms (for a
GND, and pin 40 supplies unregulated
refresh frequency of 1 KHz to 60KHz).
voltage (VU). VU is connected to a
For example, in a 60Hz refresh scheme,
5VDC LDO regulator to produce a 5VDC
each digit would be illuminated for % of
supply for the PS/2 interface. The 3.3V
the refresh cycle, or 4ms. The controller
supply is used to drive all other I/O
must assure that the correct cathode
devices on the board. The DI04
pattern is present when the
consumes 5-10mA from the VU supply,
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
293
c o n i n » n @n(xk
Digit illL mi lated Segn- ent
Shown a b c d e f 0
0 1 1 1 1 1 0
/ ' 0 1 1 0 0 0 0
/f 2 1 1 0 1 1
3 0 1
g e 0 c 6 4 1 1 0 0 1 1
5 1 0 1 1 0 1 1
6 1 0 1 1 1 1
1 1 1 0 0 Q 0
Seven-segment display detail and cathode 8 1 1 1 1 1 1
9 1 1 1 1 0 1 1
patterns to display the decimal digits
Asn
ASC
AN4
Eight individual LEDs are provided for The eight slide switches on the DI04 car
circuit outputs. The LED cathodes are be used to generate logic high or logic
tied to GND via 270-ohm resistors, and low inputs to the attached system board
the LED anodes are driven from a The switches exhibit about 2ms of
74HC373. The '373 allows LED data to bounce, and no active debouncing circui
be latched on the DI04, so that the LD# is employed. A 4.7K-ohm series resistor
signals from the system board do not is used for nominal input protection.
need to be driven continuously (the LD#
signals use connector pins that are used vdd
in the "system bus" on some Digilent
boards). If the system bus is not needed,
then the LDG signal can be tied high.
ONC
74HC373
LD# - j o
PS2 Port
1C
If a key can be "shifted" to produce a
EdwO
new character (like a capital letter), then
a shift character is sent in addition to the
original scan code, and the host device
must determine which character to use.
Juf: Some keys, called extended keys, send
\ ' / ' an "EO" ahead of the scan code (and
'0" bii T stoo M they may send more than one scan
code). When an extended key is
Symbol Parameter Min Max released, an "EO FO" key-up code is
Tck Clock time 30 us 50 us
Tsu Data-to-clock setup time 5 us 25 us sent, followed by the scan code. Scan
Thld Clock-to-data hold time 5 us 25 us codes for most keys are shown in the
figure below.
Fi F2 M 1 F: Fr '3 =1- FC r! 11
05 36 C- ocj 25 93 0" - -J ,• c |EC:5
'... .
1 3 * - S : — »
13 26 ZE ^ 33 - =
4) <5 6E
TAB A' E F T u 3 = :{ ; 1
32 - "0 ^ :: 44 6D 54 ^ 5B
Caps. CC'. D = c t- i- L :: '
•c ^ IS :B ^ :E 42 4E. 4C 5A
{EC 7 2
if: X Y B fJ .,1
-I # 4A 53
cw At: At I
14 I Ecn I E3-4
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 296
The keyboard should send data to the Thus, each data transmission contains
host only when both the data and clock 33 bits, where bits 0, 11, and 22 are '0'
lines are high (or idle). Since the host is start bits, and bits 11, 21, and 33 are '1'
the "bus master", the keyboard should stop bits.
check to see whether the host is sending
data before driving the bus. To facilitate The three 8-bit data fields contain
this, the clock line can be used as a movement data as shown below. Data is
"clear to send" signal. If the host pulls valid at the falling edge of the clock, and
the clock line low, the keyboard must not the clock period is 20 to 30 KHz.
send any data until the clock is released The mouse assumes a relative
(host-to-keyboard data transmission will coordinate system wherein moving the
not be dealt with further here). mouse to the right generates a positive
The keyboard sends data to the host in number in the X field, and moving to the
11-bit words that contain a '0' start bit, left generates a negative number.
followed by 8-bits of scan code (LSB Likewise, moving the mouse up
first), followed by an odd parity bit and generates a positive number in the Y
terminated with a 'V stop bit. The field, and moving down represents a
keyboard generates 11 clock transitions negative number (the XS and YS bits in
(at around 20 - 30 KHz) when the data the status byte are the sign bits - a '1'
is sent, and data is valid on the falling indicates a negative number). The
edge of the clock. magnitude of the X and Y numbers
represent the rate of mouse movement -
the larger the number, the faster the
Mouse mouse is moving (the XV and YV bits in
the status byte are movement overflow
The mouse outputs a clock and data indicators — a '1' means overflow has
signal when it is moved; otherwise, these occurred). If the mouse moves
signals remain at logic '1'. Each time the continuously, the 33-bit transmissions
mouse is moved, three 11-bit words are are repeated every 50ms or so. The L
sent from the mouse to the host device. and R fields in the status byte indicate
Each of the 11-bit words contains a '0' Left and Right button presses (a '1'
start bit, followed by 8 bits of data (LSB indicates the button is being pressed).
first), followed by an odd parity bit, and
terminated with a '1' stop bit.
V c . ! = Statu: byie- Xd- f d
1
li 1 XS Y S X Y Y Y P 1 I )
t
' X Ixslxe x :
t
9 v: YJ f f f r = 1 ^
\SW3b: Swpb:
San: S t a r Wt We
VGA Port
The five standard VGA signals Red (R), VGA "OBI 5" Conneotof
Green (G), Blue (8), Horizontal Sync
"VVV-
(HS), and Vertical Sync (VS) are routed a 270
directly to the VGA connector. A 270-
ohm series resistor is used on each color
signal. This resistor forms a divider with
the 75-ohm VGA cable termination, 270
<1 4 ;
resulting in a signal that conforms to the
VGA specification (i.e., OV for fully off 14 -
and .7V for fully on). VGA signal timings
are specified, published, copyrighted and
sold by the VESA organization GND
(www.vesa.org).
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 297
The following VGA system timing These particle rays are initially
information is provided as an example of accelerated towards the grid, but they
how a VGA monitor might be driven in soon fall under the influence of the much
640 by 480 mode. For more precise larger electrostatic force that results from
information, or for information on higher the entire phosphor coated display
VGA frequencies, refer to document surface of the CRT being charged to
available at the VESA website (or 20kV (or more). The rays are focused to
experiment!). a fine beam as they pass through the
center of the grids, and then they
VGA system timing accelerate to impact on the phosphor
coated display surface.
CRT-based VGA displays use amplitude The phosphor surface glows brightly at
modulated, moving electron beams (or the impact point, and the phosphor
cathode rays) to display information on a continues to glow for several hundred
phosphor-coated screen. LCD displays microseconds after the beam is
use an array of switches that can impose removed. The larger the current fed into
a voltage across a small amount of liquid the cathode, the brighter the phosphor
crystal, thereby changing light permitivity will glow. Between the grid and the
through the crystal on a pixel-by-pixel display surface, the beam passes
basis. Although the following description through the neck of the CRT where two
is limited to CRT displays, LCD displays coils of wire produce orthogonal
have evolved to use the same signal electromagnetic fields. Because cathode
timings as CRT displays (so the "signals" rays are composed of charged particles
discussion below pertains to both CRTs (electrons), they can be deflected by
and LCDs). these magnetic fields. Current
CRT displays use electron beams (one waveforms are passed through the coils
for red, one for blue and one for green) to produce magnetic fields that interact
to energize the phosphor that coats the with the cathode rays and cause them to
inner side of the display end of a transverse the display surface in a
cathode ray tube (see drawing below). "raster" pattern, horizontally from left to
Electron beams emanate from "electron right and vertically from top to bottom. As
guns", which are a finely pointed, heated the cathode ray moves over the surface
cathodes placed in close proximity to a of the display, the current sent to the
positively charged annular plate called a electron guns can be increased or
"grid". decreased to change the brightness of
The electrostatic force imposed by the the display at the cathode ray impact
grid pulls away rays of energized point.
electrons as current flows into the
cathodes.
A n o d e (eniire s c r e e n )
Cathode ray tube display system
Ca#)o(!e ray tub*
^ Cef ecDon c a i s
/ . aw
.P 1 blue, Green)
'-"x.
R . G . S s i q n a l s (to quiis)
Information is only displayed when the over the display area, and a number of
beam is moving in the "forward" direction "columns" that corresponds to an area
(left to right and top to bottom), and not on each row that is assigned to one
during the time the beam is reset back to "picture element" or pixel. Typical
the left or top edge of the display. Much displays use from 240 to 1200 rows, and
of the potential display time is therefore from 320 to 1600 columns. The overal
lost in "blanking" periods when the beam size of a display, and the number of rows]
is reset and stabilized to begin a new and columns determines the size of each
horizontal or vertical display pass. pixel.
The size of the beams, the frequency at Video data typically comes from a video
which the beam can be traced across refresh memory, with one or more bytes
the display, and the frequency at which assigned to each pixel location (the
the electron beam can be modulated DI04 board uses 3-bits per pixel). The
determine the display resolution. Modern controller must index into video memory
VGA displays can accommodate as the beams move across the display,
different resolutions, and a VGA and retrieve and apply video data to the
controller circuit dictates the resolution display at precisely the time the electron
by producing timing signals to control the beam is moving across a given pixel.
raster patterns. The controller must A VGA controller circuit must generate
produce synchronizing pulses at 3.3V (or the HS and VS timings signals and
5V) to set the frequency at which current coordinate the delivery of video data
flows through the deflection coils, and it based on the pixel clock. The pixel clock
must ensure that video data is applied to defines the time available to display 1
the electron guns at the correct time. pixel of information. The VS signal
Raster video displays define a number of defines the "refresh" frequency of the
"rows" that corresponds to the number of display, or the frequency at which all
horizontal passes the cathode makes information on the display is redrawn.
STTTTa
K* 0.C pxe '
VGA display
surface
"CJgh
r-izcnia ±1 ; CtSp a y ^ C
c u -ir 3 E
c i a k e c . r r e r t -amip - information
/ c s p ax-ed c u ng N s time
Hsnj"
_ l-c'izcn:^ signel
i_r
pc'ch^ r e t ' 3 :.9 f r e q . e i c y 'back oofch
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
299
T T,fp
disp
T.
T
pw bp
Expansion Connectors
Connector pinouts are shown below.
Separately available tables show pass-
through connections for the devices on
the DI04 board when it is attached to
various system boards.
T . B . Yee, 2007 A p p e n d i x B : H a r d w a r e d e m o n s t r a t o r in d e t a i l 300
Appendix C
File formats
This appendix explains the format of various data files used within the MOODS synthesis
environment. The first is the ICODE (Intermediate CODE) generated Aom the VHDL
compiler. Two other data files are used within the multi-FPGA partitioning process, the
first is the partitioning information {.par) file which provides input information to the
partitioning algorithm. The MOODS synthesis tool generates the second file; a module
call list (.TMcZ) output file listing the call structure in the module call graph.
C.1 ICODE
The ICODE file is a textual representation of the user's design generated by the source
compiler. This input file to the MOODS synthesis system is a language independent
representation of the original source code, which allows the translation from other high-
level languages such as (SystemC, Verilog). At present, the MOODS synthesis system
only has a VHDL compiler, which converts a VHDL description into an equivalent
ICODE representation.
The rest of this section provides a complete ICODE language grammar in Backus-Naur
Format (BNF). Throughout this grammar, non-italicised entries refer either to other
entries, or base entries. Italics are used to distinguish between different occurrences of a
particular type of entry (e.g. /a6gZ_name is a "name", ivz^frA number a "number"). The
base entries used are:
real — floating-point number using the standard C + + formats for real numbers
(including exponents).
actf_list ;;=
ACTF actjist
alias_declaration ;;=
component declaration : :=
COMPONENT componentjxame, io list [ info ]
conditionaMnst ::=
conditional_inst_name condjJM actt_list a c t f j i s t [ info ]
conditional_inst_name ; :=
IF IIFNOT
T.B. Yee, 2007 Appendix C: File formats
constant ::=
number [w/f^/rA number ]
declaration ::=
io_port_declai-ation
I variable declaration
declaration_part ;:=
{ declaration [info] }
file_info ;;=
In decimal_mtQgQx
I pos decimaljxitQgQX
I file decimal_\ntQgQV
filemap_info ::=
generaMnst' ;;=
general_inst_nanie :;=
' Genera] instructions are defined in the ICODE instruction database, ICInstDB and may be enhanced as
required.
T.B. Yee, 2 0 0 7 A p p e n d i x C: F i l e f o r m a t s
;o4
ISEQISNEQ I S L T | S L T E | S G T | S G T E
I USLL I USRL I USLA | USRA | UROL | UROR
I SSLL I SSRL I SSLA | SSRA | SROL | SROR
I UMINUS I UADD | USUB | UMUL
I UDIV I UMOD I UREM | UINC | UDEC
I SMINUS I SADD | SSUB | SMUL
I SDIV I SMOD I SREM | SABS | SINC | SDEC
index
decimalJiniQgQx
info ::=
info_specification ;;=
probability_info
I iteration_info
I filemap_info
I file_info
instruction ::=
general_inst
I memory inst
I conditional_inst
I switch__inst
I protect inst
I decode_inst
I moduleap_inst
instructionjpart : :=
{ [label_nsmQ ] instruction }
number
inport_declaration ::=
io_list ::=
term { term }
io_port_declaration ::=
inport declaration | outport declaration
iteration_info ;:=
its a^ecz/MaZ integer
memory_data ;:=
'['constant { constant}']'
memory_read_inst ;:=
moduleap_inst :;=
name
string
outport_declaration ::=
probability_info ::=
pt I pf ':' real
program_declaration ::=
PROGRAMprogram_n&mQ io_list [ actt_list ] [ info ]
declaration_part
instraction_part
ENDMODULE [ program jxdxm ] [ info ]
protect_instruction ::=
P R O T E C T real [ actt_list ]
ram_declaration :;=
RAM mm var name ADDRESS
range ; :=
register__declaration ::=
R E G I S T E R var_name var_range [ INIT constant ]
ROM m/M_war_name ^/ara range ADDRESS a^^cfrgj'j' raiige DATA memory data
submodule_declaration ::=
ENDCASE
term ::=
constant | var
var ::=
var_name
variable_declaration ;;=
register_declaration
I alias_declaration
I ram_declaration
I rom declaration
Notes :
Most instructions are defined in the ICODE database (ICInstDB), which also
specifies the exact format of their parameters lists.
com ment
N o t e : uses the w i n d o w s ini file f o r m a t
; P R O C E D U R E P R 0 C 1 module
p r o c 2 _ _ 1 _ 4 _ 4 =3
key n a m e
[Design_Profile]
TIME_STEP= 4
118=1141 M u l t i p l e key v a l u e s
18 25 = 1 1 1 1
[Domain_lnfo]
DOMAIN= 4
d o m _ 1 = 500 20
d o m _ 2 = 400 20
d o m _ 3 = 200 50
d o m 4= 200 30
Section names are enclosed in square brackets and the items under it are related to that
section. The next lines are broken into two parts: the key name and the key value(s).
Multiple values for a key are separated by a space and comments are introduced by a
semicolon character. This input file provides various types of data to the K-way partitioner
and these are grouped under different section headers listed below:
[Module lock] — Items under this section header are module name (key name) and
the domain number (key value) that the module is locked to during K-way
T.B. Yee, 2007 Appendix C: File formats
;09
[Pre-allocate] — Items under this section are similar to the ones mentioned above,
wheie the key names are module names in the design but the key value under this
section header aie the initial domains that the modules are assigned to. This forms
the starting partition of the K-way partitioning algorithm.
[Design Profile] - The items under this section header give the design activity
profiling information. The first key name under this section header is
and the key value gives the number of time steps in each profile data. The next
lines are the profile data and these are made up of the source-destination module
node numbers as the key names, and the key values are made up of activation
count values with a space between each time step. The activation count value is the
number of times the source module calls (or activates) the destination module (e.g.
Figure C-1 illustrates a design profile with 4 time steps. Module 1 calls module 18
four times in time step 3 and only once in time steps 1, 2, and 4.).
[Domain Info] - The domain info section contains information on the target
devices available for the multi-FPGA system. The first key name under this section
header is DOMAIN and the key value gives the number of devices available. The
next lines give the available area and I/O resources for each device. The first key
value gives the area available, and the second key value gives the I/O resources
available for device n denoted by the key name, dom M.
simulating a design to obtain the design activity information using ModelSim simulation
package.
c:\CAD\Projects\m_cali2\m_call2.mcl
M o d u l e call list
filename MODULE CALL LIST
Appendix D
This appendix gives some background and idea of the complexity and implementation
methods for the example VHDL designs. Post-MOODS synthesis simulation results of the
multi-FPGA implementations are included for all the example designs.
The design solves quadratic equations using the formula of Equation D.l. The 32-bit,
fixed-point quadratic equation solver example given in Figure D-3 uses the integer-maths
library given in Figure D-1 and the quadratic procedure in the VHDL package given in
Figure D-2.
- Integer-maths
_*********** library
************ package
*************
library ieee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;
package c_types is
-- c style integer and unsigned types
subtype int is signed(31 downto 0);
subtype uint is unsigned(31 downto 0);
use work.c_types.all;
package imath is
- simple constants
constant neg: boolean := false;
constant pos: boolean ;= true;
return best;
end cbrti;
T.B. Yee, 2007 Appendix D: VHDL code listings g2^
while(mask /= 0) loop
if (((best+mask)*(best+mask)) <= a) then
best := best or mask;
end if;
mask := mask srI 1;
end loop;
return best;
end sqrti;
if {a_int<acos_x1) then
xO = acos_xO
x1 = acos_x1
yO = acos_yO
y1 acos_y1
elsif (aJnt<acos_x2) then
xO •acos_x1;
x1 :acos_x2;
yO :acos_y1;
y1 acos_y2;
elsif (a_int<acos_x3) then
xO = acos_x2;
x1 = acos_x3;
yO acos_y2;
y1 acos_y3;
elsif (aJnt<acos_x4) then
xO = acos_x3
x1 = acos_x4
yO = acos_y3
y i = acos_y4
else
xO = acos_x4;
x1 = acos_x5;
yO = acos_y4;
y1 = acos_y5;
end if;
T.B. Yee, 2 0 0 7 Appendix D: V H D L code listings
315
yOb := shiftjeft(y0,8);
y1b := shiftjeft(y1,8);
result := shift_right(yOb + multi(sdivi(y1b-y0b,x1-x0),(ajnt-x0)),8);
if{sb=neg) then
result := X"0003242F" - result;
end if;
return result;
end acosi;
lf(a_int<spi_2) then
sb := pos;
elsif(a_int<spi) then
a J n t ;= spl - aJnt;
sb := neg;
elsif(ajnt<s3pi_2) then
a J n t ;= a_int - spi;
sb ;= neg;
else
a J n t ;= s2pi - aJnt;
sb ;= pos;
end if;
else
xO :cos_x4;
x1 :cos_x5;
yO :cos_y4;
y1 : cos_y5;
end if;
yOb := shift_left{y0,8);
y1 b := shiftjeft(y1,8);
result := shift_right(yOb + multl(sdivi(y1b-y0b,x1-x0),(ajnt-x0)),8);
if{sb=neg) then
result := -result;
end if;
return result;
end cosi;
if(sa=pos) then
ua := a;
else ua := -a;
end if;
if(sb=pos) then
ub := b;
else ub ;= -b;
end if;
temp := signed(udivi(unsigned{ua),unsigned(ub)));
if(sa=sb) then
return temp;
else return -temp;
end if;
end sdivi;
- sign test
function sign
- moods inline
(x: in int
) return boolean is
begin
return not to_bool(x(31));
end sign;
- to_bool conversion
function to_bool
- moods map move u%1 u%1
( a : in stdjogic
) return boolean is
begin
if(a='1') then return true;
else return false;
end if;
end to_bool;
function sqi
( a: in int
) return int is
variable rl: signed(63 downto 0);
begin
rl := a*a;
return rl(31 downto 0);
end sqi;
function cbi
( a: in int
) return int is
variable rl: signed(95 dow/nto 0);
begin
rl := a*a*a;
return rl(31 downto 0);
end cbi;
function multi
( a,b: in int
) return int is
variable rl: signed(63 downto 0);
begin
rl := a * b;
return rl(31 downto 0);
end multi;
function sqi
( a : in uint
) return uint is
variable rl: unsigned(63 downto 0);
begin
rl := a*a;
return rl(31 downto 0);
end sqi;
function cbi
( a : in uint
) return uint is
variable rl: unsigned(95 downto 0);
begin
rl := a*a*a;
return rl(31 downto 0);
end cbi;
T.B. Yee, 2007 Appendix D; VHDL code listings
118
function multi
( a,b: in uint
) return uint is
variable rl: unslgned(63 downto 0);
begin
rl := a * b;
return rl(31 downto 0);
end multl;
end imath;
library ieee;
use leee.stdJogic_1164.all;
use leee.numerlc_std.all;
use work.c_types.all;
use work.imath.all;
package algeqn_package is
procedure quadratici(a,b,c: in int; x1,x2: out int; no _real: out int);
procedure cubici(a1,a2,a3: in Int; x1,x2,x3: out int; no_real: out int);
end algeqn_package;
procedure quadratic!
—- moods inline
(
a,b,c: in int;
x1,x2: out int;
no_real: out int
) is
variable d, rd, a2 : int;
begin
d := sqi(b) - multl(multi(toJnt{4),a),c);
a2 := multi{a,toJnt(2));
procedure cubici
-— moods inline
(
a1,a2,a3: in int;
x1,x2,x3: out int;
no_real: out int
) is
variable q,r,q3,d.s,a1_3.srd,t_1, t_2,theta3,t1,t2: int;
begin
t_1 := multi{toJnt(3),a2) - sqi(a1);
q := sdivi(t_1 ,toJnt(9));
t_2 := multi(multi(toJnt(9),a1),a2) - multi(toJnt(27),a3) - multi(to int(2) c b i f a i n
r := sdivi(t_2,toJnt(54));
q3 := cbi(q);
d := q3 + sqi(r);
if(d=0) then
s := cbrtj(r);
a1_3 := sdivi(a1,toJnt(3));
x1 := shiftjeft(s,1) - a1_3;
t1 := -s - a1_3;
x2 := t1;
x3 := t1;
no_real := to_int(3);
elsif (d >0) then
srd := sqrti(d);
s := cbrti(r+srd);
t1 := Gbrti(r-srd);
x1 := s+t1-sdivi(a1,toJnt(3));
no_real := t o j n t ( l ) ;
else
thetaS := sdivi(acosi(sdivi(shiftJeft(r,16),sqrti(-q3))),to int(3))-
t1:=sdivi(a1,toJnt(3));
t2 := shiftjeft(sqrti(-q),1);
x1 := shift_right(multi(t2,cosi(theta3)),16)-t1;
x2 := shift_right(multi(t2,cosi(theta3+X"00021828")),16)-t1;
x3 := shift_right(multi(t2,cosi(theta3+X"00043050")),16)-t1;
no_real := toJnt(3);
end if;
end cubici;
end algeqn_package;
__****# A A A A * A * * * * * * * A A * A * * * * * * * * * * A * *
library ieee;
use ieee.stdJogic_1164.all;
use ieee.nurneric_std.all;
use work.c_types.all;
use work.algeqn_package.all;
entity eq_solver is
port(
a1,a2,a3: in int;
x1,x2: out int;
no_real: out int
);
end eq_solver;
architecture behaviour of eq_solver is
begin
process is
variable b1: int ;= X"00000000";
variable b2: int := X"00000000";
variable b3: int := X"00000000";
variable y1: int := X"00000000";
variable y2: int := X"00000000";
variable vreal: int := X"00000000";
begin
b1 := a1;
b2 := a2;
b3 := aS;
quadratici(b1 ,b2,b3,y1 ,y2,vreal);
x1 <= y1;
x2 <= y2;
no_real <= vreal;
wait for 40 ns;
end process;
end behaviour;
Figure D-4 shows the post-MOODS synthesis simulation of the non-pipelined multi-
FPGA quadratic equation solver. This two-device implementation has a single subprogram
communication channel {SpC 1). Integer inputs al, a2, and a3 of the quadratic equation
solver are given values 1, -25 and 150 respectively. Outputs x l , x2 and number of real
numbers (no_real) are updated after 9100 ns. With a system clock period of 40 ns, the
non-pipelined multi-FPGA quadratic equation solver takes 224 clock cycles (i.e. clock
cycles - (9100 ns -140 ns) / 40 ns) to complete the application and output the result.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 321
I N N
_ » _ • _» >t
I III II
mill #1 #
t-t-l 5 %
The 32-bit, fixed-point cubic equation solver example given in Figure D-5 is capable of
finding real solutions to Equation D.2. It uses the integer-maths library given in Figure D-
1 and the cubic procedure in the VHDL package given in Figure D-2.
+ c = 0. (D.2)
__**** A * A A * * * * * * A A * A * * * * * * * * * * * *
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.c_types.all;
use work.algeqn_package.all;
entity eq solver is
poi1(
a1,a2,a3: in int;
x1,x2,x3: out int;
no_real: out int);
end eq_solver;
architecture behaviour of eq_solver is
begin
process is
variable b1 ,b2,b3, y1 ,y2,y3: int;
variable vreal: int;
begin
b1 := a1;
b2 := a2;
b3 := a3;
cubicl{b1 ,b2,b3,y1 ,y2,y3,vreal);
x1 <= y1;
x2 <= y2;
x3 <= y3;
no_real <= vreal;
wait for 40 ns;
end process;
end behaviour;
Figure D-6 shows the post-MOODS synthesis simulation of the non-pipelined multi-
FPGA cubic equation solver. This 2-device implementation has two subprogram
communication channels {SpC 1 and SpC 2) and the arbitration of these two shared
communication channels are provided by two SpC arbiters. Integer inputs al, a2, and a3 of
the cubic equation solver are given values -20, -100 and 2000 respectively. Outputs x l , x2,
x3 and number of real numbers (no real) are updated after 70900 ns. With a system clock
period of 40 ns, the non-pipelined multi-FPGA cubic equation solver takes 1770 clock
cycles (i.e. clock cycles = (70900 ns -100 ns) / 40 ns) to complete the application.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 323
N N N N N
I Si li
lipiiiiii III 111 llliii
WhM
Figure D-6 Simulation of the non-pipelined multi-FPGA cubic equation
solver
T.B. Yee, 2007 Appendix D: VHDL code listings 324
The 2-D IDCT architecture is adapted from [142]. The architecture is made up of a one-
dimensional 8-point IDCT followed by an internal double b u f f e r memory, followed by
another one-dimensional 8-point IDCT. The algorithm used for the calculation of the 2-D
IDCT is based on Equation (D.3).
Equation (D.3) can be separated into the row part and column part as shown in equations
(D.4) and (D.5). The 2-D IDCT is computed by first applying 1-D IDCT on the rows and
then on the columns.
Vi
where K = — for col = 0, K =
V2
for col ^ 0.
M M
The 2-D IDCT behavioural VHDL example is given in Figure D-8 and it uses the VHDL
package in Figure D-7.
T.B. Yee, 2 0 0 7 A p p e n d i x D : V H D L c o d e listings
)25
A ************* A **********
- VHDL package
_********** for 2-D Inverse*******
************************** discrete
*****cosine transform
************** *****
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
package idct_package is
procedure idct1_mult_add (
signal Index ; in unsigned(2 downto 0);
signal inia : in signed(11 downto 0);
signal in2a : in signed(11 downto 0);
signal In3a : in signed(11 downto 0);
signal in4a : in signed(11 downto 0);
signal in5a ; in signed(11 downto 0);
signal in6a : in signed(11 downto 0);
signal in7a ; in signed(11 downto 0);
signal in8a : in signed(11 downto 0);
result_a : out signed(21 downto 0));
procedure idct2_mult_add (
signal index : in unsigned(2 downto 0);
signal inib : in signed(10 downto 0);
signal in2b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
signal in4b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
signal in6b : in signed(10 downto 0);
signal in7b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
result_b : out signed(20 downto 0));
end idct_package;
procedure idct1_mult_add
(
signal index : in unsigned(2 downto 0);
signal inia : in signed(11 downto 0);
signal in2a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
signal In4a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
signal in6a : in signed(11 downto 0);
signal in7a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
result_a : out signed(21 downto 0)
) is
variable p1Jmp,p2_tmp,p3_tmp,p4_tmp,p5_tmp,p6_tmp,p7Jmp,p8_tmp
signed(21 downto 0);
begin
p1_tmp := resize(signed(in1a * (91)), 22);
case index is
when "000"=>
p2_tmp := resize(signed(in2a * (126)), 22);
p3_tmp := resize(signed(in3a * (118)), 22);
p4_tmp ;= resize(signed(in4a * (106)), 22);
p5_tmp := resize(signed(in5a * (91)), 22);
p6_tmp := resize(signed(in6a * (71)), 22);
p7_tmp := resize(signed(in7a * (49)), 22);
p8_tmp := resize(signed(in8a * (25)), 22);
T.B. Yee, 2007 Appendix D: V H D L code listings ;26
procedure idct2_mult_add
(
signal index : in unsigned(2 downto 0);
signal inib : in signed(10 downto 0);
signal in2b : in signed(10 downto 0);
signal in3b : in signed(10 downto 0);
signal in4b : in signed(10 downto 0);
signal in5b : in signed(10 downto 0);
signal in6b : in signed(10 downto 0);
signal in7b : in signed(10 downto 0);
signal in8b : in signed(10 downto 0);
result_b : out signed(20 downto 0)
) is
variable p1_tmp,p2_tmp,p3_tmp,p4_tmp,p5_tmp,p6_tmp,p7Jmp,p8_tmp : signed(20 downto 0);
begin
p1_tmp := resize(signed(in1b * (91)), 21);
case index is
when "000" =>
p2_tmp = resize(signed(in2b' 21);
p3_tmp = resize(signed(in3b' '(1180), 21);
p4_tmp = resize(signed(in4b' (106)), 21);
p5_tmp = resize(signed(in5b' (91)), 21);
p6_tmp = resize(signed(ln6b' (71)), 21);
p7_tmp = resize(signed(in7b' (49)), 21);
p8_tmp = resize(signed(in8b" (25)). 21);
when "001" =>
p2_tmp resize(signed(in2b' (106)), 21);
p3_tmp resize(signed(in3b' (49)), 21);
p4_tmp resize(signed(in4b' (-25)), 21);
p5_tmp resize(signed(in5b' (-91)), 21);
p6_tmp resize(signed(in6b' (-126)), 21);
pTJmp resize(signed(in7b' (-1180), 21);
p8_tmp resize(signed(in8b' (-71)). 21);
=>
when "010"
p2_tmp resize(signed(in2b' (71)), 21);
p3_tmp resize(signed(in3b * (-49)), 21);
p4_tmp resize(signed(in4b" (-126)), 21);
p5_tmp resize(signed(in5b * (-91)), 21);
p6_tmp resize(signed(in6b *'(25)), 21);
p7Jmp resize(signed(in7b *"(118)), 21);
p8_tmp resize(signed(in8b *' (106)), 21);
when "011" =:>
p2_tmp : resize(signed(in2b *'(25)), 21);
p3_tmp : resize(signed(in3b *'(-118)), 21);
p4_tmp : resize(signed(in4b *'(-71)), 21);
p5_tmp = resize(signed(in5b '(91)),
* 21);
p6_tmp : resize(signed(in6b *'(106)), 21);
p7_tmp : resize(signed(in7b *'(-49)), 21);
p8_tmp : resize(signed(in8b * (-126)), 21);
when "100 ' =>
p2_tmp : resize(signed(in2b * (-25)), 21);
p3_tmp : resize(signed(in3b * (-118)), 21);
p4_tmp : resize(signed(in4b (71)), 21);
p5Jmp : resize(signed(in5b ^ (91)), 21);
p6_tmp resize(signed(in6b' (-106)), 21);
p7Jmp resize(signed(in7b' (-49)), 21);
p8_tmp resize(signed(in8b' (126)), 21);
when "101
p2_tmp := resize(signed(in2b' (-71)). 21);
p3_tmp := resize(signed(ln3b' (-49)), 21);
p4_tmp := resize(signed(in4b' (126)), 21);
p5_tmp := resize(signed(in5b' (-91)), 21);
p6_tmp := resize(signed(in6b' (-25)), 21);
p7_tmp := resize(signed(in7b' ( 1 i e O ) , 2 1 ) ;
p8_tmp := resize(signed(in8b' (-106)), 21);
T.B. Yee, 2007 Appendix D: VHDL code listings 328
library IEEE;
use IEEE.std_iogic_1164.all;
use IEEE.numeric_std.all;
use work.idct_package.all;
entity idct is
port(
in_hs_rdy: in unsigned(0 downto 0); - Handshake ready
in_hs_rcv: buffer unsigned(0 downto 0) := "0"; - Handshake receive
dct_2djn: in signed(11 downto 0);
idct_out: out signed(7 downto 0) := (others=>'0'); - 8 bit output.
out_hs_rdy: buffer unsigned(0 downto 0) := "0"; - Handshake ready
out_hs_rcv: in unsigned(0 downto 0); - Handshake receive
sys_clock: in unsigned(0 downto 0);
-moods clock
sys_reset: in unsigned(0 downto 0)
-moods reset
)',
end idct;
ARCHITECTURE behaviour of idct is
- IDCT_2 signals
signal xaO_reg, xa1_reg, xa2_reg, xa3_reg.
xa4_reg, xa5_reg, xa6_reg, xa7_reg: signed(11 downto 0):= (others=> 0');
- IDCT_2 signals
signal xbO_reg, xb1_reg, xb2_reg, xb3_reg,
xb4_reg, xb5_reg, xb6_reg, xb7_reg: signed(10 downto 0):= (others=> 0');
T.B. Yee, 2007 Appendix D: VHDL code listings 329
- memory section
type RAM_mem_type is array (0 to 63) of signed(10 downto 0);
signal ID_ram1_mem: RAM_mem_type;
-moods ram
signal iD_input_cnt: unsigned(3 downto 0):= "0000";
signal ID_wr_cntr: unsigned(6 downto 0):= "0000000";
signal ID_rd_cntr: unsigned(3 downto 0):= "0000";
- Handshake signals
signal ID_stage2_rdy: unsigned(0 downto 0):= "0";
signal ID_stage2_rcv: unsigned(0 downto 0):= "0";
signal IDJndexJ ; unsigned(3 downto 0):= "0000";
signal IDJndexJ ; unsigned(3 downto 0);= "0000";
begin
********** * A * A *
- Semaphore Master
while(ID_stage2_rdy /= ID_stage2_rcv) loop
wait until sys_clock'event and sys_clock = "1";
end loop;
end behaviour;
LI
II
III
5%; *s
' R 533 55 65! 6)
H a l l
llllll l l U i II 1
The triple-data encryption standard core implements the triple data encryption algorithm
(TDEA) in the electronic codebook (ECB) mode [144]. The idea of triple DES is that data
is encrypted three times (i.e. encrypted, decrypted and then encrypted again) using two
different keys. In this case, the two encryptions use the first key and the decryption uses
the second key. The VHDL package of the triple-DES is given in Figure D-11 and the
behavioural VHDL of the triple-data encryption standard (triple-DES) core is given in
Figure D-12.
********************AA*A*************
- ***VHDL package
* * ******** * * * ***for Triple-DES -
******************
library ieee;
use ieee,std_logic_1164.all;
package des_functions is
subtype vec56 is std_loglc_vector(1 to 56);
subtype vec64 Is std_logic_vector(1 to 64);
- The key_reduce function reduces a 64-bit key to a 56-bit key by stripping off parity bits
function key_reduce1(key : in vec64) return vec56;
function key_reduce2(key : In vec64) return vec56;
library ieee;
use ieee.stdJoglc_1164.all;
use ieee.numeric_std.all;
package body des_functions is
subtype vec3 is std_logic_vector{1 to 3);
subtype vec4 is stdJogic_vector(1 to 4);
subtype vecB is std_logic_vector(1 to 6);
subtype vec28 is std_logic_vector(1 to 28);
subtype vec32 is stdJogic_vector(1 to 32);
subtype vec48 is std_logic_vector(1 to 48);
stdJogic_vector(to_unsigned(S_block4(toJnteger(unsigned(data(25)&data(30)&data(26 to 29)))),4))&
std_logic_vector(to_unsigned(S_block5(to_integer(unsigned(data(31)&data(36)&data(32 to 35)))),4))&
stdJogic_vector(to_unsigned(S_block6(toJnteger(unsigned(data(37)&data(42)&data(38 to 41)))),4))&
std_logic_vector(to_unsigned(S_block7(toJnteger(unsigned{data(43)&data(48)&data(44 to 47)))),4));
end;
function key_rotate(key ; vec56; round : natural range 0 to 15; encrypt ; stdjogic) return vec56 is
type distance_type is array (natural range 0 to 31) of integer range 0 to 31;
constant shift_distance : distance_type :=
-moods ROM
(0, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1,
27, 27, 26, 26, 26, 26, 26, 26, 27, 26, 26. 26, 26, 26, 26, 27);
variable distance ; natural range 0 to 31;
begin
distance ;= shift_distance(to_integer(unsigned(encrypt & to_unsigned(round,4))));
return vec28(unsigned(key(1 to 28)) ror distance) & vec28(unsigned(key(29 to 56)) ror distance);
end;
******A***AAA******A**A***
Triple-DES
* * * * * * * * * * * * * * * * * * * * * * * * * *
library ieee;
use ieee.std_logiG_1164.all;
use work.des_functions.all;
entity tdes_ede2 is
port(
plaintext: in stdJogic_vector(1 to 32); -- now uses 32-bit input
keys: in stdJogic_vector(1 to 32); -- now uses 32-bit key (4 x 32-bits = 128-bit key)
in_hs_rdy: in std_logic_vector(0 downto 0);
in_hs_rcv: buffer stdjogic_vector(0 downto 0) := "0";
encrypt: in s t d j o g i c ;
out_hs_rdy: buffer stdjogic_vector(0 downto 0) := "0";
out_hs_rGv: in stdjogic_vector(0 downto 0);
ciphertext: out stdJogic_uector(1 to 32); - now uses 32-bit
sys_reset: in s t d j o g i c ;
- m o o d s reset
sys_clock: in s t d j o g i c
- m o o d s clock
):
end;
architecture behaviour of tdes_ede2 is
process
variable data, k e y l , key2 : vec64;
variable k e y : vec56;
variable mode : s t d j o g i c ;
begin
r e s e t j o o p : loop
in_hs_rcv <= "0";
out_hs_rdy <= "0";
wait until sys_clock'event and sys_clock = '1';
exit r e s e t j o o p when sys_reset = '1';
m a i n j o o p : loop
for in_cnt in 0 to 3 loop
while(in_hs_rdy = in_hs_rcv) loop
wait until sys_clock'event and sys_clock = '1';
end loop;
case in_cnt is
when 0 =>
data(33 to 64) := plaintext(1 to 32);
keyl (33 to 64) := keys(1 to 32);
T.B. Yee, 2007 Appendix D: VHDL code listings J J.
when 1 =>
data(1 to 32) := plaintext(1 to 32);
key1(1 to 32) := keys(1 to 32);
when 2 =>
key2(33 to 64) := keys(1 to 32);
when 3 =>
key2(1 to 32) := keys(1 to 32);
when others => NULL;
end case;
in_hs_rcv <= not in_hs_rcv;
wait until sys_clock'event and sys_clock = '1';
end loop;
case out_cnt is
when 0 => ciphertext(1 to 32) <= data(1 to 32);
when others => ciphertext(1 to 32) <= data(33 to 64);
end case;
out_hs_rdy <= not out_hs_rdy;
wait until sys_clock'event and sys_clock = '1';
end loop;
wait until sys_clock'event and sys_clock = i ' ;
exit resetjoop when sys_reset = '1';
end loop;
end loop;
end process;
end;
11
i
g i 2 s i t *
£ 5 5 £ £ 2 S
r J cJ r J fV rV rsi' ry
8
I
ill
5 5 2 5
III n i l
Hi
4 * 4 * 3 3 3 3 3
2 .c' 2 £
II !
6 6 (6 6 6 6 61 6 6 6 I 6 Mm 6
The 256-bit advanced encryption standard (AES) [146] implements the Rijndael algorithm
that processes data blocks of 128 bits using a 256-bit cipher key. The behavioural VHDL
of the 256-bit AES example is given in Figure D-16 and it uses the VHDL package in
Figure D-15.
* * * * * * * * * * * * *
- ************************************************
VHDL package for 256-AES packages
library ieee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;
package aes_procedures is
subtype u_sign8 is unslgned(1 to 8);
subtype u_slgn16 is unsigned(1 to 16);
subtype u_sign32 is unslgned(1 to 32);
subtype u_sign64 is unsigned(1 to 64);
subtype u_sign128 is unsigned(1 to 128);
type rom_tab_1 is array(0 to 255) of u_sign8;
type rom_tab_2 is array(0 to 29) of u_sign8;
type rom_tab_5 is array(0 to 255) of u_sign32;
type rom_tab_7 is array(0 to 255) of integer;
type tab_4 is array(0 to 3) of u_sign8;
type tab_a8 is array(0 to 7) of u_sign32;
type tab_a6 is array(0 to 5) of u_sign32;
type tab_a4 is array(0 to 3) ofu_sign32;
type tab_90 is array(0 to 89) of u_sign32;
type tab_44 is array(0 to 43) ofu_sign32;
type tab_64 is array(0 to 63) of u_sign32;
procedure rco(
a: in unsigned(4 downto 0);
a_out: out u_sign32 );
end aes_procedures;
T.B. Yee, 2 0 0 7 A p p e n d i x D: V H D L code listings 343
procedure r_oneto24(
a: in u_sign32;
q_out: out u_sign32
) is
begin
q_out := a(25 to 32) & a(1 to 24);
end r_oneto24;
procedure r_oneto16(
a: in u_sign32;
q_out: out u_sign32
) is
begin
q_out := a(17 to 32) & a(1 to 16);
end r_oneto16;
procedure r_oneto8(
-— moods Inline
a: in u_sign32;
q_out: out u_sign32
)is
begin
q_out := a(9 to 32) & a(1 to 8);
end r_oneto8;
procedure rco (
-— moods inline
a: in unsigned(4 downto 0);
a_out: out u_sign32
) is
constant rcotab: rom_tab_2 :=
- moods rom
("00000001", "00000010", "00000100", "00001000", "00010000",
"00100000", "01000000", "10000000", "00011011", "00110110",
"01101100", "11011000", "10101011", "01001101", "10011010",
"00101111", "01011110", "10111100", "01100011", "11000110",
"10010111", "00110101", "01101010", "11010100", "10110011",
"01111101", "11111010", "11101111", "11000101", "10010001");
'begin
a_out := rcotab(to_integer(a)) & "000000000000000000000000";
end rco;
end aes_procedures;
T.B. Yee, 2007 Appendix D: VHDL code listings )44
* * * * * * * * * *
- **************************************************
Encryption tables for 256-AES example
library leee;
use ieee.std_loglc_1164.all;
use ieee.numeric_std.all;
use work.aes_procedures.all;
package encryption_tables is
procedure ftable_quad(
a: in u_sign32;
q_out: out u_sign32
);
end encryption_tables;
b := tojnteger(a);
q := ftabletab(b);
return q;
end ftable;
procedure ftable_quad (
-— moods inline
a: in u_sign32;
q_out: out u_sign32
)is
constant ftabletab : rom_tab_5 :=
- moods rom
r := ftabletab(to_integer(a(1 to 8)));
s := ftabletab(toJnteger(a(9 to 16)));
t := ftabletab(to_integer(a(17 to 24)));
u := ftabletab(toJnteger(a(25 to 32)));
q_out(1 to 32):= (r(1 to 8) xor s(25 to 32) xor t(17 to 24) xor u(9 to 16)) &
(r(9 to 16) xor s(1 to 8) xor t(25 to 32) xor u(17 to 24)) &
(r(17 to 24) xor s(9 to 16) xor t(1 to 8) xor u(25 to 32)) &
(r(25 to 32) xor s(17 to 24) xor t(9 to 16) xor u(1 to 8));
end ftable_quad;
end encryption_tables;
* * * * * * * * * * * * * * * * * __
— 256-Bit AES —
* * * * * * * * * * * * * * * * * __
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.aes_procedures.all;
use work.encryption_tables.all;
entity aes256 is
pon(
key, d_block: in u_sign32;
in_hs_rdy: in unsigned(0 downto 0);
jn_hs_rcv: buffer unsigned(0 downto 0) := "0";
ciphertext: out u_sign32;
out_hs_rdy: buffer unsigned(0 downto 0):= "0";
out_hs_rcv: in unsigned{0 downto 0)
);
end aes256;
keyloop := "0000001";
while keyloop /= "0000100" loop
-—moods unroll
if (keyloop + i < "0111100") then
tempi := keyloop + i - "0001000";
temp2 := keyloop + i - "0000001";
tempS := keyloop + i;
fkey(toJnteger(temp3(5 downto 0))) := fkey(toJnteger(temp1 (5 downto 0))) xor
fkey(toJnteger(temp2(5 downto 0))); - w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;
1 mod Nk = 4
if(i + "0000100" < "0111100") then
tempi := i + "0000011";
cc := fkey(to_integer(temp1(5 downto 0)));
temp_vec1(1 to 32):= fbsub_quad(cc(1 to 32)); --SubWord(RotWord(w[i-1]))
temp2 := i - "0000100";
temp3 := i + "0000100";
fkey(to_integer(temp3(5 downto 0))):= fkey(to_integer(temp2(5 downto 0))) xor
temp_vec1(1 to 32);
end if;
keyloop := "0000101";
while keyloop /= "0001000" loop
-—moods unroll
if(keyloop + i < "0111100") then
tempi := keyloop + i - "0001000";
temp2 := keyloop + 1 - "0000001";
temp3 := keyloop + i;
fkey(toJnteger(temp3(5 downto 0))) := fkey(to_integer(temp1 (5 downto 0))) xor
fkey(to_integer(temp2(5 downto 0))); -- w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;
I := i + "0001000"; - increment by Nk
j :=j + "00001";
end loop;
-======================== First Round ==========================-
transition_state(1 to 32) := fkey(O) xor temp_transition_state(1 to 32); - AddRoundKey
transition_state(33 to 64) := fkey(1) xor temp_transition_state(33 to 64);
transition_state(65 to 96) := fkey(2) xor temp_transition_state(65 to 96);
transition_state(97 to 128) := fkey(3) xor temp_transition_state(97 to 128);
*******************A**************************AA*AAA*A*
i := "0000100"; - start off with the 4th key, 3 used in the first round
T.B. Yee, 2007 Appendix D: V H D L code listings 35g
case EncLoopS is
when 0 =>
temp_vec1(1 to 32) := transition_state{1 to 8) & transition_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
ftable_quad(temp_vec1, cc); -- Retrieve values from Forward Tables
temp_transition_state(1 to 32) := bb(1 to 32) xor cc(1 to 32);
when 1 =>
temp_vec1(1 to 32) := transition_state(33 to 40) & transltion_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
ftable_quad(temp_vec1, cc); -- Retrieve values from Forward Tables
temp_transjtion_state(33 to 64) := bb(1 to 32) xor cc(1 to 32);
when 2 =>
temp_vec1(1 to 32) := transltion_state(65 to 72) & transition_state(105 to 112) &
transition_state(17 to 24) & transition_state(57 to 64);
ftable_quad(temp_veGl, cc); -- Retrieve values from Forward Tables
temp_transition_state(65 to 96) ;= bb(1 to 32) xor cc(1 to 32);
when 3 =>
temp_vec1(1 to 32) := transition_state(97 to 104) & transition_state(9 to 16) &
transition_state{49 to 56) & transition_state(89 to 96);
ftable_quad(temp_veGl, cc); -- Retrieve values from Forward Tables
temp_transition_state(97 to 128) := bb(1 to 32) xor cc(1 to 32);
when others => NULL;
end case;
i := i + "0000001";
end loop;
Last Round :
for EncLoopS in 0 to 3 loop
bb := fkey(toJnteger(i(5 downto 0)));
case EncLoopS is
when 0 =>
temp_vec1 (1 to 32) := transition_state(1 to 8) & transition_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
dd(1 to 32):= fbsub_quad( temp_vec1 ); -- w[i-1] = SubWord(w[i-1])
temp_transitlon_state(1 to 32) ;= bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 1 =>
temp_vec1 (1 to 32) ;= transition_state(33 to 40) & transition_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
dd(1 to 32);= fbsub_quad( temp_vec1 ); -- w[i-1] = SubWord(w[i-1])
temp_transition_state(33 to 64) := bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 2 =>
temp_vec1(1 to 32) ;= transitlon_state(65 to 72) & transition_state(10S to 112) &
transition_state(17 to 24) & transition_state(S7 to 64);
dd(1 to 32);= fbsub_quad( temp_vec1 ); - w[i-1] = SubWord(w[i-1])
temp_transition_state(6S to 96) := bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 3 = >
temp_vec1(1 to 32) := transition_state(97 to 104) & transition_state(9 to 16) &
transition_state(49 to 56) & transition_state(89 to 96);
dd(1 to 32):= fbsub_quad( temp_vec1 ); - w[i-1] = SubWord(w[i-1])
temp_transition_state(97 to 128) := bb(1 to 32) xor dd(1 to 32); — AddRoundKey
when others => NULL;
end case;
i := i + "0000001";
end loop;
T.B. Yee, 2007 Appendix D: VHDL code listings 359
With a system clock period of 200 ns, the non-pipelined multi-FPGA 256-bit AES takes
5257 clock cycles (i.e. clock cycles = (1055500 ns - 4100 ns) / 200 ns) to process the 128-
bit data block using a 256-bit cipher key.
T.B. Yee, 2 0 0 7 A p p e n d i x D: V H D L c o d e listings 360
§ § o o
l i t ?
m
ill liii
i t i i mmVi
Figure D-17 Simulation of the non-pipelined multi-FPGA 256-bit AES core
T.B. Yee, 2007 A p p e n d i x D; V H D L c o d e listings 361
U M f i t till'
i Hi''
mm* m iglglSIS
l i i i l i III III!
I" '"S
library ieee;
use ieee.std_logic_1164.all;
package channel_package is
subtype semaphore is s t d j o g i c _ v e c t o r ( 0 downto 0);
subtype intS is integer range 0 to 255;
subtype channel_sem is std_logic_vector(0 downto 0)
subtype channel_ack is std_logic_vector(0 downto 0);
-- channel component
component channel
generic (width: positive := 1); -- width of c h a n n e l data
port (send_sem: in channel_sem; - send s e m a p h o r e
T.B. Yee, 2007 Appendix D: VHDL code listings
procedure send(signal sem: out channel_sem; signal ack: in channel_ack; signal chan_data: out
std_logic_vector; d: in std_logic_vector) is
- moods inline
begin
chan_data <= ch_send{d,ack,sem);
end procedure send;
procedure recv(signal sem: out channel_sem; signal ack: in channel_ack; signal chan_data: in
std_logic_vector; d: out stdJogic_vector) is
~ moods inline
begin
d := ch_recv(chan_data, ack, sem);
end;
The pipelined quadratic equation solver is a two-stage pipelined version of the quadratic
equation solver given in Section 6.2.1. The behavioural VHDL of the pipelined quadratic
equation solver example is given in Figure D-20.
library leee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;
use work.c_types.all;
use work.algeqn_package.all;
use work.imath.all;
use work.channel_package.all;
entity plpe_quad Is
pod(
a1,a2,a3: in int;
x1,x2: out int;
no_real: out Int
);
end plpe_quad;
architecture behaviour of plpe_quad Is
signal c1_send_sem, c1_recv_sem: channel_sem := "0";
signal c1_send_ack, c1_recv_ack: channel_ack := "0";
signal c1_send_data, c1_recv_data: std_loglc_vector(95 downto 0) := (others=>'0');
begin
-- Explicit communication channel
c1: entity work.SIMPLE_CHANNEL generic map (96)
port map(c1_send_sem, c1_recv_sem, c1_send_data, c1_send_ack, c1_recv_ack, c1_recv_data);
d1 := sqi(b2) - multl(multi(to_lnt(4),b1),b3);
d2 := multl(b1 ,to_int(2));
tempi := std_logic_vector(b2 & d2 & d1);
send(c1_send_sem, c1_send_ack, c1_send_data, tempi);
wait for 40 ns;
end loop;
end process Prs_1;
forever: loop
recv(c1_recv_sem, c1_recv_ack, c1_recv_data, temp2);
e1 := int(temp2(31 downto 0));
e2 := int(temp2(63 downto 32));
f1 := int(temp2(95 downto 64));
Figure D-21 shows the post-MOODS synthesis simulation of the two-stage pipelined
multi-FPGA quadratic equation solver. This two-device multi-FPGA implementation has a
single explicit communication channel {ExC 1) connecting the pipeline stages. Integer
inputs al, a2, and a3 of the quadratic equation solver are given values 1, -25 and 150
respectively. Outputs xl, x2 and number of real numbers (no real) are updated after 7660
ns. With a system clock period of 40 ns, the pipelined multi-FPGA quadratic equation
solver takes 189 clock cycles (i.e. clock cycles = (7660 ns -100 ns) / 40 ns) to complete the
application and output the result.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 366
n
^ I? 5 p
m m III
This second pipelined VHDL example is the two-stage pipelined version of the inverse
discrete cosine transform (IDCT) core given in Section 6.2.3. The behavioural VHDL of
the pipelined inverse discrete cosine transform example is given in Figure D-22.
library IEEE;
use IEEE.stdJoglc_1164.all;
use IEEE.numeric_std.all;
use work.channel_package.all;
use work.idct_package.all;
entity p i p e j d c t is
port (
in_hs_rdy: in unslgned(0 downto 0); -- Handshake ready
in_hs_rcv: buffer unslgned(0 downto 0) := "0"; -- Handshake receive
dct_2d_in: in signed(11 downto 0);
idct_out: out signed(7 downto 0) := (others=>'0');-- 8 bit output.
out_hs_rdy: buffer unsigned(0 downto 0) := "0"; - Handshake ready
out_hs_rcv: in unslgned(0 downto 0); -- Handshake receive
sys_clock: in unslgned(0 downto 0);
- m o o d s clock
sys_reset: in unsigned(0 downto 0)
- m o o d s reset
):
end p i p e j d c t ;
I D J n d e x J := I D J n d e x J + "0001";
III
# 3 3 1 * i
ai E l •>' a' MJ
# 1 3
-s' -Q* -g' 5
MB. 1
I & i -Ci f &•=' 8 -=' i•=' if ' i
-C f
limMii III ' J .} J J -I J
m i l I r l l l H p ? I
L k i & 2
6 6 6 6 6 6 I 6 6 6
The last pipelined VHDL example is the two-stage pipelined version of the 256-bit
advanced encryption standard (AES) core given in Section 6.2.5. The behavioural VHDL
of the pipelined 256-bit AES core is given in Figure D-25.
library ieee;
use ieee.std_logic_1164.all;
use leee.numerlc_std.all;
use work.channel_package.all;
use work.aes_procedures.all;
use work.encryptlon_tables.all;
entity pipe_aes256 is
port(
key, d_block: in u_sign32;
in_hs_rdy: in unsigned(0 downto 0);
in_hs_rcv: buffer unsigned(0 downto 0) := "0";
ciphertext: out u_sign32;
out_hs_rdy: buffer unsigned(0 downto 0):= "0";
out_hs_rcv: in unsigned(0 downto 0)
V
/'
end pipe_aes256;
begin
init(c1_send_sem);
init(c2_send_sem);
forever: loop
T.B. Yee, 2007 Appendix D: VHDL code listings 374
keyloop ;= "0000001";
while keyloop /= "0000100" loop
-—moods unroll
if (keyloop + i < "0111100") then
temp_a1 := keyloop + i - "0001000";
temp_a2 := keyloop + i - "0000001";
temp_a3 := keyloop + i;
fkey(toJnteger(temp_a3(5 downto 0))) := fkey(toJnteger(temp_a1(5 downto 0))) xor
fkey(to_integer(temp_a2(5 downto 0))); ~ w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;
j mod Nk = 4
if(i + "0000100" < "0111100") then
temp_a1 ;= i + "0000011";
c c l := fkey(to_integer(temp_a1(5 downto 0)));
temp_vec1(1 to 32):= fbsub_quad1(cc1(1 to 32)); - SubWord(RotWord(w[i-1]))
temp_a2 := i - "0000100";
temp_a3 := i + "0000100";
fkey(toJnteger(temp_a3(5 downto 0))):= fkey(to_integer(temp_a2(5 downto 0))) xor
temp_vec1 (1 to 32); - fkey
end if;
T.B. Yee, 2007 Appendix D: VHDL code listings 375
keyloop := "0000101";
while keyloop /= "0001000" loop
-—moods unroll
if(keyloop + 1 < "0111100") then
temp_a1 := keyloop + 1 - "0001000";
temp_a2 := keyloop + i - "0000001";
temp_a3 := keyloop + 1;
fkey(toJnteger(temp_a3(5 downto 0))) := fkey(toJnteger(temp_a1(5 downto 0))) xor
fkey(to_integer(temp_a2(5 downto 0))); - w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;
i := i + "0001000"; - increment by Nk
j : = j + "00001":
end loop;
i := "0000100"; - start off with the 4th key, Keys 0 to 3 used in the first round
for EncLoopI in 1 to 14 loop - For AES-256 (Nk=8, Nr=14)
for EncLoop2 in 0 to 3 loop
b b l := fkey(to_integer(i(5 downto 0))); -- fkey
temp3 := stdJogic_vector(bb1);
send(c2_send_sem, c2_send_ack, c2_send_data, temp3);
i := i + "0000001";
wait for 10 ns;
end loop;
end loop;
case EncLoop4 is
when 0 =>
temp_vec2(1 to 32) := transition_state(1 to 8) & transltion_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); - w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve v a l u e s from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(1 to 32) := bb2(1 to 32) xor ee2(1 to 32);
when 1 =>
temp_vec2(1 to 32) := transition_state(33 to 40) & transltion_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); -- w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve v a l u e s from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(33 to 64) := bb2(1 to 32) xor ee2(1 to 32);
when 2 =>
temp_vec2(1 to 32) := transition_state(65 to 72) & transitlon_state(105 to 112) &
transition_state(17 to 24) & transition_state(57 to 64);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); - w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); -- Retrieve values from Forward Tables
ee2 := CG2;
end if;
temp_transition_state(65 to 96) := bb2(1 to 32) xor ee2(1 to 32);
when 3 =>
temp_vec2(1 to 32) := transition_state(97 to 104) & transltion_state(9 to 16) &
transition_state(49 to 56) & transition_state(89 to 96);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); -- w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve values from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(97 to 128) := bb2(1 to 32) xor ee2(1 to 32);
when others => NULL;
end case;
end loop; - for EncLoop4 in 0 to 3 loop
transition_state(1 to 128) := temp_transition_state(1 to 128);
immm
•II
I
jjiiiiJiLfii
Figure D-26 Simulation of the pipelined multi-FPGA 256-bit AES core
T.B. Yee, 2007 A p p e n d i x D: V H D L c o d e listings 379
I J * '
I -S, 5 HI
I
ill W ' '
i n H i
Appendix E
This appendix presents the partitioning options added to the MOODS synthesis system for
multi-FPGA synthesis. The appendix covers the complete set of commands for multi-
FPGA synthesis using the MOODS Command Line Interface (CLI) and the command line
switches for the original MOODS synthesis core are briefly repeated when needed for the
sake of completeness. Background information and a more detailed guide to the original
MOODS synthesis system can be found in references [32, 39, 42, 161].
Having the synthesis project compiled and set up into the corresponding project libraries,
the MOODS optimiser, which is the heart of the MOODS Synthesis suite, can be invoked
using the MOODS CLI in the form of a DOS-prompt command given below:
-m "{project directory)\example.Imf"
- w example
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 381
-pre-opt
- m u l t 2 s h i f t
-prn_al
-prn_nl
-vhdl_out
-design_profile
-exchannels
{other arguments}
The above command assumes that a top-level design called has been compiled and
the main project name of the design is called example. File exampleXvsxi contains
information on the directory location of library files used in the project and this is passed
to the optimiser through the -m argument preceding the location of the (.Imf) file.
Argument -w specifies the directory where the output files generated from the synthesis
are to be written to. The -pre-opt argument allows pre-scheduling optimisation to be
performed on the design. At presents, the pre-scheduling optimisation only improves on
designs with array and vector dynamic indexing. Argument -mult2shift forces constant
divides, or multiplies by a positive power of two to be implemented as shift-left or -right
operations respectively to get a significant hardware reduction. Argument -prn al is
included to append a dump of control arcs to the design.cg output file. Argument -pm nl
is included to append a dump of data path nets to the design.d^g output file. Argument -
vhdl out specifies that multiple VHDL netlist output files are generated, one for each
target device. The first new argument, -design_profile is incorporated into the MOODS
optimiser to enable multi-FPGA synthesis. It instructs MOODS to retrieve partitioning and
design activity profile information (Section 4.5) in the design.pax file in the project
directory. A module call list design.mcl file is generated by MOODS during the prologue
stage when the initial data structures are built. Details of the module call list file can be
found in Appendix C.3. The second new argument, -exchannels enables the use of explicit
communication channels (Section 4.2.2.1).
Other arguments exist [161], but exceed the scope of this appendix.
T.B.Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 382
1. Set up a "cost Amotion" specifying the required target specification (e.g. target area
and/or delay).
8. Finish the design to produce final structural netlists suitable for targeting multiple
FPGA devices.
During the prologue stage in MOODS, the associated technology libraries are loaded and
the input design is read in, followed by the initialisation of data structures. A number of
messages about loading libraries and files, and preliminary tasks are displayed in the
console window. When it finishes, a command prompt will appear, e.g.:
The command "CF" is entered to get to the cost function definition menu of MOOD. At
any point in the synthesis session, typing "?" at a command prompt gives a list of all
available commands, as illustrated below in Figure E-1.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide
The cost function allows the user to specify what the final optimised implementation
should be like (e.g. how large or fast it is). Figure E-2 illustrates the typical steps to enter
an area delay cost function, and specify a clock period for optimising the design. The
following specifies area optimisation as the highest (first) priority with a target area of 0,
and delay optimisation as the second priority with a target total delay of 0. Both target
objectives are set to zero so that the final optimised implementation is as small and as fast
as possible. Of course, non-zero target values can be given instead. A clock period of 20
ns is specified using the "AC" command and entering a value of 20 when asked to enter
the new clock value at the subsequent prompt. With all of the cost function parameters set
up, command "F" finishes the cost function definition and returns to the main MOODS
prompt.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 384
E.1.2 Optimisation
After finishing the cost function set-up, the user can proceed to set up the optimisation
algorithm and perform optimisation on the design. There are currently two main
optimisation algorithms (described in Section 2.3.6) provided by the MOODS synthesis
core. The quasi-exhaustive heuristics is the simplest and MOODS proceeds to optimise the
design when the command "AOH" is entered at the MOODS prompt. Simulated annealing
is slower and more complex, and is more difficult to operate, however it can produce
better results, and also allow the design to be moved in many different directions round the
design space. Figure E-3 illustrates the steps in setting up the optimisation parameters
(annealing schedule), using the "AI" command. Once this data is entered, command "AO"
starts the annealing process, optimising the design.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 385
Enter factor to decrease temp (<1) OR -n for No. of steps [ 100.0]: -100
FI - Finish optimisation
COMMAND DESCRIPTION
DS Displays the K-way partitioning set-up.
EX This command is the same as the top-level MOODS command and it is used to
examine the data structures for the design.
EM The "EM" command leads to a set of further commands given in Figure E-5.
ET This command leads to two further commands that allows the user to display and edit
target device details (such as device area and I/O).
CT This command is used to change the number of target devices used to implement the
multi-FPGA system.
This command is used to change the maximum percentage of device utilisation. The
cu default value of 100 means the total logic (100%) capacity may be utilised if
required.
This command is used to specify the lowest percentage of the device area utilisation.
CL This value is used to determine the balanced criterion in the K-way partitioning
algorithm when a relaxed distribution of modules over the target devices is selected.
CA This command is used to specify the device area offset percentage.
A description of the complete set of commands in the K-way partitioning menu is given in
Table E-1. Figure E-5 illustrates the further set of commands associated with the "EM"
command in the main K-way partitioning menu. Process modules (locked/unlocked to the
top-level architectural module) in the design are displayed using command "B".
Commands "A" and "U" are used to lock and unlock process modules in the design.
Commands "P" and "L" displays the pre-allocated and locked modules (if any) specified
in the partitioning information (.par) file respectively. Command "E" allows the user to
manually lock modules in the design to target devices, and a locked module can be
unlocked using the "R" command.
Examine --> ?
Examine -->
After setting up the partitioning parameters and running the K-way partitioning algorithm,
the final partition of the design, together with the I/O utilisation and estimated area
utilisation of target devices are displayed in the console window. The partitioning
parameters can be altered and the K-way partitioning algorithm can be repeated to get
different partitioning results, else command "FI" is entered at the K-way partitioning
prompt to begin the communication subsystem optimisation. Alternatively, the MOODS
optimisation process can be re-run using the "RM" command.
When the communication subsystem optimisation finishes, the system writes out VHDL
packages for the subprogram communication channel arbiter(s) (Section 5.4.3), final
netlists for all the target devices, and report files, leaving the system in the "EXAMINE"
mode. The "FI" command is typed once more to end the session.
T.B. Y e e , 2 0 0 7 A p p e n d i x E: M O O D S m u l t i - F P G A s y n t h e s i s g u i d e 3
Adapting the same naming convention described in Section E. 1, assuming the top-level
design has been created from a behavioural VHDL file, de s i g n . vhd. After the multi-
FPGA synthesis session in MOODS, VHDL netlist output files
design_synth_doml.vhd, design_synth_dom2.vhdf
d e s i g n synth domA:. vhd for a multi-FPGA design targeting A: devices are created in
the project directory.