0% found this document useful (0 votes)

87 views136 pages

Synthesis of Multi-FPGA Systems With Asynchronous Communications

This document is a thesis by Tack Boon Yee submitted for a PhD at the University of Southampton, focusing on the synthesis of multi-FPGA systems with asynchronous communications. It includes appendices detailing a paper on multi-FPGA synthesis, hardware demonstrator implementation, and JPEG file format specifications. Additionally, it presents simulation results of test image decoding using the multi-FPGA JPEG decoder.

Uploaded by

pranshu126545

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

87 views136 pages

Synthesis of Multi-FPGA Systems With Asynchronous Communications

Uploaded by

pranshu126545

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

UNIVERSITY OF SOUTHAMPTON

Synthesis of Multi-FPGA Systems with

Asynchronous Communications

Volume 2 of 2

Tack Boon Yee

A thesis submitted for the degree of

Doctor of Philosophy.

School of Electronics and Computer Science,

University of Southampton

April, 2007
T.B. Yee, 2007 Appendix A: Paper 248

Appendix A

Paper

This appendix contains the paper published in the proceedings of the International
Federation for Information Processing International Conference on Very Large Scale
Integration 2005 (IFIP VLSI-SOC 2005).
The following published papers were included in the bound thesis. These have
not been digitised due to copyright restrictions, but the links are provided.

Y. Tack Boon, M. Zwolinski, A.D. Brown (2005) “Multi-FPGA Synthesis with Asynchronous
Communication Subsystems.” IFIP International Conference on Very Large Scale Integration (VLSI-
SOC 2005).
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
256

Appendix B

Hardware demonstrator in detail

This appendix contains implementation details of the hardware demonstrator and the
Digilent D2-SB FPGA-based development board and D I 0 4 peripheral board used to
implement the JPEG decoder.

The first few sections of this appendix provide the information on the JPEG decoder and a
full profile of test images and photographs of the test images decoded by the multi-FPGA
JPEG decoder. The rest of this appendix provides the detailed information on the hardware
demonstrator. The information provided includes: circuit description of the BT121
VideoDAC on the I/O VGA peripheral board, user manuals o f the development board, and
the setting up of the hardware demonstrator.

B.1 JFIF (JPEG File Interchange Format)

The JPEG File Interchange Format is a minimal file format, which enables JPEG
bitstreams to be exchanged between a wide variety of platforms and applications. The
JFIF is entirely compatible with the standard JPEG interchange format and it conforms to
the JPEG standard (ISO/IEC 10918-1 | ITU-T Recommendation T.81); the only additional
requirement is the presence of a JFIF application segment marked by an APPO marker.
The rest of this section provides the specifications and syntax of a JPEG file defined in
Annex B of the ISO/IEC 10918-1 | ITU-T Recommendation T.81 and the JFIF application
segment. The set of marker assignments and their description supported by the lossy
sequential DCT-based JPEG decoder is listed in Table B-1 below.
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
257

Code
Symbol Description
Assignment
SOI OxD8 Start of image
APPo OxEO JFIF application segment
APPn OxE1 - OxEF Other APP segments
DOT OxDB Quantisation table
SOFo OxCO Start of frame
DHT OxC4 Huffman table
SOS OxDA Start of scan
COM OxFE Comment, may be ignored (skipped)
EOI OxD9 End of image

Table B-1 Marker identifiers in the JFIF file

JFIF marker identifiers are preceded by an all' T byte (OxFF). A two-byte SOI header
(OxFF, OxD8) identifies the JFIF file format, the APPq marker immediately follows the
SOI header and subsequently by the other segments and markers. The end of file is
identified by the EOI (OxFF, 0xD9) marker. Normally, the only marker identifier that
should beftyuiid oiice theimEyre daitais started is the IlOIiriarlcer. TA/tKm a CbdFlFibyte is
found followed by a zero byte, the zero byte must be discarded.

The following describes the JPEG file format and descriptions of the key segments given
in Table B-1:

Header: It occupies two bytes (SOI: start of image - OxFF, OxDB)

Segments: Following the SOI marker, there can be any number of segments or markers
described in Table B-1 above.

Trailer: It occupies two bytes. (EOI: end of image - OxFF, OxD9).

T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 258

SOFO (Start of Frame 0) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxCO to identify SOFO marker.
Length 2 This value equals to 8+ component*3 value.
Data precision 1 This is in bits/sample, usually 8.
Image height 2 This must be >0.
Image width 2 This must be >0.
Number of components Usually 1= grayscaled, 3= colour YCbCr or YIQ,
1
4= colour CMYK.
Read each component data of 3 bytes. It
contains:
Component ID (1 byte) (1= Y, 2= Cb, 3= Cr, 4= 1,
Each component 3
5= Q), sampling factors (1 byte) (bits 0-3 vertical,
bits 4-7 horizontal), quantisation table number (1
byte).

The JFIF uses either 1 component (Y, grayscaled) or 3 components (YCbCr,

sometimes called Y U V , colour).

APPO (JFIF segment) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxEO to identify APPO marker.
Length 2 This must be >=16
This identifies JFIF. 'JFIF#0' (0x4A, 0x46, 0x49,
File identifier mark 5
0x46,0x00)
Major revision number 2 Should be 1, otherwise error.
Minor revision number 2 Should be 0 to 2, otherwise try to decode anyway
0= no units, x/y-density specifies the aspect ratio
Units for x/y densities 1 instead: 1= x/y-density are dots/inch, 2= x/y-
density are dots/cm.
X-density 2 It should be >0.
Y-density 2 It should be >0.
Thumbnail width 1 -

Thumbnail height 1 -

For thumbnails (RGB 24-bits), n= widthheight3

Bytes to be read n bytes should be read immediately followed by the
thumbnail height.
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
259

If there is no JFIF#0 in the file identifier, or the length is <16, then it is probably not
a JFIF segment and should be ignored.

Noimally units— 0, x-density— 1, y-density= 1 means the image has an aspect ratio of
1:1 (evenly scaled).

JFIF files including thumbnails are very rare, the thumbnail can usually be ignored. If
there is no thumbnail, then width= 0 and height^ 0.

DHT (Define Huffman Table) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxC4 to identify DHT marker.
Length 2 This specifies the length of Huffman table.
Huffman Table (HT) Bits 0-3: number of HT (0 to 3, otherwise error),
1 Bit 4; type of HT (= DC table, 1 = AC table). Bits 5-
information
7; not used, must be 0.
Number of symbols with codes of length 1 to 16,
Number of symbols 16 the sum(n) of these bytes is the total number of
codes, which must be <= 256.
Symbols Table containing the symbols in order of
n
increasing code length (n= total number of codes).

• A single DHT segment may contain multiple Huffman tables, each with its own
information byte.

DRI (Define Restart Interval) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDD to identify DRI marker.
Length 2 This must be 4.
This is in unit of MCU blocks, means that every n
Restart interval VICU blocks, a RSTn marker can be found. The
2
first marker will be RSTO, then RST7, etc, after
RST7, repeating from RSTO.
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
260

DOT (Define Quantisation Table) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDB to identify D Q T marker.
Length 2 This specifies the length of the quantisation table.
Quantisation Table (QT) Bits 0-3: number of QT (0 to 3, otherwise error),
1 Bits 4-7: precision of QT (0= 8-bit, otherwise 16-
information
bit).
Bytes n 1 his gives the QT values, n= 64*(precision+ 1)

A single DQT segment may contain multiple quantisation tables, each with its own
information byte.

For precision^ 1(16 bits), the order is high-low for each o f the 64 words.

SOS (Start of Scan) marker

Size in
Field Description
byte(s)
Marker identifier 2 OxFF, OxDA to identify S O S marker.
Length This must be equal to 6+2* (number of
2
components in scan).
Number of components This must be >=1 and <= 4 (otherwise error),
in scan 1
usually 1 or 3.
For each component, read 2 bytes. It contains 1
Each component byte: Component ID (1= Y, 2= Cb, 3= Cr, 4= 1, 5=
2
Q), 1 byte: Huffman table to use (bits 0-3: AC
table 0 to 3, bits 4-7: DC table 0 to 3).
Ignorable bytes 3 Skip the next 3 bytes.

• The image data (scans) is immediately following the SOS segment.

B.2 JFIF test images

A complete profile of the test images decoded by the multi-FPGA JPEG decoder is given
below. The following diagrams include the original JFIF file and photographs of the
decoded test image using the hardware demonstrator system. Figure B-1 to Figure B-3
illustrate three 64-pixel by 64-pixel test images (LENA.jpg, MANDRILL.jpg, and
DRAGON.jpg) decoded using the multi-FPGA JPEG decoder.
T.B. Yee, 2 0 0 7 A p p e n d i x B : Hardware demonstrator in detail 261

Original test image (LENA.jpg)

Figure B-1 JFIF test image (LENA.jpg)

Original test image (MANDRILL.jpg)

[W? [ W f i f m I
^ ^

- ^

m i
ST
• C(l#

Figure B-2 JFIF test image (MANDRILL.jpg)

T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 262

Original test image (DRAGON.jpg)

m / m 7Wi 7 f t TV

MULTl-FPGA JPEG DEMO

Figure B-3 JFIF test image (DRAGON.jpg)

Figure B-4 and Figure B-5 illustrate two 128-pixel by 128-pixel test images
(SQUARES.jpg and SLOPE.jpg) decoded using the multi-FPGA JPEG decoder.

Original test image (SQUARES.jpg)

Figure B-4 JFIF test image (SQUARES.jpg)

T.B. Yee, 2007 A p p e n d i x B: Hardware demonstrator in detail 263

Figure B-5 JFIF test image (SLOPE.jpg)

B.3 Simulations of test image decoding

Post-MOODS multi-FPGA synthesis simulation results of the decoding of the LENA test
image using the non-pipelined multi-FPGA JPEG decoder are given in Figure B-6 and a
zoom in view in Figure B-7. The simulations show the signal transitions and data transfers
of the various components (e.g. the UART RTL module, Frame buffer controller RTL
module, etc), and the communication channels in the multi-FPGA JPEG decoder. The
decoded pixel data are given in signal "/sim_top_level/decoded_data" (under the multi-
FPGA JPEG decoder core divider) in Figure B-7, and cursors 1 and 2 mark the first
decoded (pixel) value and the end of the eighth decoded (pixel) value in the test image
respectively in the figure (e.g. the first to the eighth pixel values obtained from the close
up view of the simulation in Figure B-7 are 0x7C, 0x94, Ox8A, 0x6F, Ox8C, 0x88, Ox8E,
0x65 respectively). The two-phase data handshaking scheme for the inter-device data in
subprogram communication channel 2 (under the SpC 2 divider) can also be seen clearly
in Figure B-7.

Simulation results for the 2-,3- and 6-device implementation of the pipelined multi-FPGA
JPEG decoder are given in Figure B-8 to Figure B-13. Cursors 1 and 2 in the zoom in
views of the simulations mark the first decoded (pixel) value and the end of the eighth
decoded (pixel) value respectively. Inter-device data sent through the explicit
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 264

communication channels (ExCs) using the two-phase data handshaking scheme can be
seen in the zoom in views of the simulations (e.g. ExC 4 in Figure B-9).

BBS mmm
HSS MMM
5 5 5 MMM

gag sss
25S- MMM
999 MMM

555 MMM
HHH ===
S S5
BBS MMM
MMM
BBB MMM
BBS MMM
BBS MMM
BBS MMM
BBB MMM
555
555
MMM
MMM

BBS MMM
555 MMM
EBB MMM
SSh = = =
g9g5 g =
9 ==
MMM

MM M MM ^ ^
BSS
B S f l Smm
BBB MMM
BBS MMM
BBB MMM
555 MMM
QQQ- MMM
BBS MMM
BBB
555
MMM
MMM

ill
• 1 ••• ') I
6666

Figure B-6 Simulation of test image (LENA.JPG) decoding in a non-pipelined

multi-FPGA JPEG decoder
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 265

p% g g% § 23 2 *

I'll' a n

@! g! e & a 9
mill II I
v i i i i i i u

Figure B-7 Simulation (zoom view) of test image (LENA.JPG) decoding in a

non-pipelined multi-FPGA JPEG decoder
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 266

9 gll

s m

iiliiiiiii
«! *l. • TE 3 ! r ' a *
6 6 6 16666661
"ICC ^

Figure B-8 Simulation of test image (LENA.JPG) decoding in a pipelined

multi-FPGA JPEG decoder (2-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 267

8 8 8 8 8 8 S 8 8 » t S 8- eaaeas

Figure B-9 Simulation (zoom view) of test image (LENA.JPG) decoding in a

pipelined multi-FPGA JPEG decoder (2-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 268

I •§; tl
ifl III
lii H i
iiiiiiiiiiiHiiiii lilfll
I i I^ i^ ijc jei!
u. 1111
jr j t j s 111.!
w, J i H^ ! 11 i I;
^ ^ ^

r-i
?, 2_GC D
2_Gc O - ^ ="
B'' E
6 6 6 1 6 6 6 16 i I
E: : f!(C
r ; r ; a3:
jT *

- CC #| ^ 1 ti
66 I 6661 6 6A66 666AA
mm
Figure B-10 Simulation of test image (LENA.JPG) decoding in a pipelined
multi-FPGA JPEG decoder (3-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 269

...ill
5SS 5ss
l l i U l i l ! ! , ! ,

Figure B-11 Simulation (zoom view) of test image (LENA.JPG) decoding in a

pipelined multi-FPGA JPEG decoder (3-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 270

l i i i

IHltll
m m III
I
6 6 6 1 6 6 6

Figure B-12 Simulation of test image (LENA.JPG) decoding in a pipelined

multi-FPGA JPEG decoder (6-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 271

n
m 1*11^* IN #
l i i ia a i l l 111 I t l t i l t
I » 8i »

"""'JSillUIUipil
6 6 6 1 6 6 6 1 6 6 6 1 A A11 6 6 6 6
Figure B-13 Simulation (zoom view) of test image (LENA.JPG) decoding in a
pipelined multi-FPGA JPEG decoder (6-device implementation)
T.B. Yee, 2007 A p p e n d i x B : Hardware demonstrator in detail 272

B.4 Hardware demonstrator development board

pin assignments
The multi-FPGA JPEG decoder hardware demonstrator is targeted onto three Digilent D2-
SB development boards and one of the boards connected to the I/O VGA peripheral board
as shown in the photograph of Figure B-14.

D2-SB
development
board 3

D2-SB
development
board 2

D2-SB
development
board 1

I/O VGA
peripheral board

Figure B-14 Multi-FPGA board connections

The following tables give the pin assignments of the three Digilent D2-SB development
boards; connectors that are not available (N/A) for user I/O assignments (e.g. VCC, GND)
or not connected (n/c) are highlighted in grey. Table B-2 lists signals assigned to
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
273

connectors A1 and A2, Table B-3 lists signals assigned to connectors B1 and B2, and
signals assigned to connectors CI, and C2 on development board 1 are given in Table B-4.

Connector A1
Connector A2
Conn, pin FPGA pin signal Conn, pin FPGA pin
1 N/A GND N/A
1 '' GND
2 N/A VU N/A
1 ^ VU
3 N/A VCC33 N/A
1 ^ VCC33
4 P112 P162
1
5 P111 vga hsync n P161
1 ® SRAMAddr(1)
6 P110 1 vpa vsync n P160
1 ® SRAMAddr(O)
7 1 P I 09 1 pin vga qrav(l) PI 52
1 7 SRAMAddr(3)
8 1 P108 1 pin vga grav(O) P151 SfRAMAddr(2)
1 ^
9 1 P102 1 pin vga qrav(3) 1 ® PI 50 SRAMAddr(5)
10 1 P101 1 pin vga gray(2) 1 10 P149 SRAMAddr(4)
11 1 P I 00 pin vga grav(5) P148
1 11 SRAMAddr(7)
12 1 P99 1 pin vga gray(4) P147 SRAMAddr(6)
1
13 1 P98 j pin vga grav(7) P146
1 SRAMAddr(9)
14 1 P97 1 pin vga gray(6) 1 14 P145 SRAMAddr(8)
15 j P96 1 P141
1 SRAMAddr(11)
16 1 P95 1 1 16 PI 40 SRAMAddrdO
17 1 P94 1 PI 39 SRAMAddr(13)
1
18 1 P93 1 1 18 PI 38 SRAMAddr(12)
19 1 P89 1 1 19 PI 36 SRAMAddr(15)
20 1 P181 1 1 20 P135 SRAMAddr(14)
21 1 P87 1 P134
1 SRAMAddr(17)
22 1 P180 1 1 22 P133 SRAMAddr(16)
23 1 P179 1 SRAMData(12) 1 23 P132 SRAMDatad)
24 1 P178 1 SRAMData(13) P129 SRAMData(O)
1
25 1 P176 1 SRAi\/IData(14) 25 PI 27 StRAMData(3)
26 1 P i 75 1 SRAMData(15) 26 P126 SRAMData(2)
27 1 P174 j SRAM CE 27 P125 SRAMData(5)
28 1 P I 73 1 SRAM WE 28 P123 SRAMData(4)
29 j PI 69 1 SRAM LB 29 P122 SRAMData(7))
30 1 P168 1 SRAM UB 30 P121 SRAMData(6)
31 1 P167 1 SRAM OE 31 P120 SRAMData(9)
32 1 P166 1 32 P116 SRAMData(8)
33 1 P165 1 RD 33 P115 SRAMData(ll)
34 1 P164 1 TD 1 34 P114 SRAMDatad 0)
35 1 P I 63 1 pin_vgaclk 25Mhz 35 P113
36 1 n/c 36 n/c
37 1 n/c 37 n/c
38 j n/c 38 n/c
39 1 n/c 39 P80 GCLKO
40 1 n/c 1 40 n/c
Table B-2 and A2 of development
board 1
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 274

Conn 2ctor B1 Connector 82

Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
P112 P71
4 4
P111 P70
5 5
P110 P69
6 6
P I 09 P68
7 7
P I 08 P64
8 8
P I 02 P63
9 9
P101 P62 Data_Symboi(0)
10 10
P100 P61 Data_Symbol(1)
11 11
P99 P60 Data_Symbol(2)
12 12
P98 P59 Data_Symbol(3)
13 13
P97 P58 Data_Symbol(4)
14 14
P96 P57 Data_Symbol{5)
15 15
P95 P56 Data_Symbol(6)
16 16
P94 P55 Data_Symbol(7)
17 17
P93 P49 JFIF_hs_rdy
18 18
P89 P48 JFIF_hs_rcv
19 19
P88 P47 Symbol_hs_rdy
20 20
P87 P46 Symbol_hs_rGV
21 21
P86 n/c
22 22
P84 n/c
23 23
P83 n/c
24 24
P82 n/c
25 25
P81 n/c
26 26
P75 n/c
27 27
P74 n/c
28 28
P73 n/c
29 29
n/c n/c
30 30
n/c n/c
31 31
n/c n/c
32 32
n/c n/c
33 33
n/c n/c
34 34
n/c n/c
35 35
n/c n/c
36 36
n/c n/c
37 37
n/c n/c
38 38
n/c n/c
39 39
n/c n/c
40 40

Table B-3 Pin assignment of signals to connector B1 and B2 of development

board 1
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 275

Connector C1 Connector C2
Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
P112 P23
4 4
P111 P22 decoded_data(5)
5 5
P110 P21
6 6
P109 P20 decoded_data(6)
7 7
PI 08 P18 decoded_data(7)
8 8
P102 PI 7 decoded_data(8)
9 9
P101 P16 decoded_data(9)
10 10
P99 P15 decoded_data(10)
11 11
P99 P11 decoded_data(11)
12 12
P98 P10 decoded_data(12)
13 13
P97 P9 decoded_data(13)
14 14
P96 JFIF_eof P8 decoded_data(14)
15 15
P95 Data_Symboi(8) P7 decoded_data(15)
16 16
P94 Data_Symbol(9) P6 end_conv
17 17
P93 Data_Symbol(10) P5 s_sym_check
18 18
P89 Data_Symbol(11) P4
19 19
P45 Data_Symbol(12) P3 JPEG_start
20 20
P87 Data_Symbol(13) P206
21 21
P44 Data_Symbol(14) P205
22 22
P43 Data_Symbol(15) P204
23 23
P42 JFIF_info(0) P203
24 24
P41 JFIFJnfo(1) P202
25 25
P40 JFIF_info(2) P201
26 26
P36 JFIFJnfo(3) P200
27 27
P35 P199
28 28
P34 decoded_req PI 98
29 29
P33 decoded_ack P I 94
30 30
P31 decoded_data(0) P193
31 31
P30 decoded_data(1) PI 92
32 32
P29 decoded_data(2) P191
33 33
P27 decoded_data(3) P189
34 34
P24 decoded_data(4) P I 88
35 35
n/c n/c
36 36
n/c n/c
37 37
n/c n/c
38 38
n/c P77 GCLK1
39 39
n/c n/c
40 40
Table B-4 Pin assignment of signals to connector C1 and C2 of development
board 1
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
276

Table B-5 lists signals assigned to connectors A1 and A2, Table B-6 lists signals assigned
to connectors B1 and B2, and signals assigned to connectors C I , and C2 on development
board 2 are given in Table B-7.

Conn ector A1 Connector A2

Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A \A:C33
4 P112
4 PI 62
5 P111
5 P161 decoded_data(4)
6 P110
6 PI 60
7 P109
7 PI 52 decoded_data(2)
8 P108
8 P151 decoded_data(3)
9 j P102
9 P150 decoded_data(0)
10 1 P101
10 P149 decoded_data(1)
11 1 P100
11 P148 decoded_req
12 1 P99
12 P147 decoded_ack
13 j P98
13 P146 JFIFJnfo(3)
14 1 P97
14 P145
15 1 P96 JFIFJnfo(1)
15 P141
16 1 P95
16 P140 JFIF_info(2)
17 P94 Data_Symbol(15)
17 P139
18 1 P93 JFIF_info{0)
18 P138
19 j P89
19 P136 Data_Symbol(13)
20 1 P181 Data_Symbol(14)
20 P135
21 P87 Data_Symbol(11)
21 P I 34
22 1 PI 80 JPEG_start Data_Symbol(12)
22 P133
23 1 P179 end_conv Data_Symbol(9)
23 P132
24 P178 s_sym_check Data_Symbol(10)
24 P129
25 P I 76 decoded_data(14) JFIF_eof
25 PI 27
26 1 P175 decoded_data(15) Data_Symbol(8)
26 P I 26
27 1 P I 74 decoded_data(12) 27 P125
28 1 P I 73 decoded_data(13) 28 P I 23
29 PI 69 decoded_data(10) 29 P I 22
30 1 P168 decoded_data(11) 30 PI 21
31 P167 decoded_data(8) 31 PI 20
32 P166 decoded_data(9) 32 P116
33 1 P165 decoded_data(6) 33 P115
34 1 P164 decoded_data(7) 34 P114
35 1 PI 63 decoded_data(5) 35 P113
36 1 n/c 36 n/c
37 1 n/c 37 n/c
38 1 n/c 38 n/c
39 1 n/c 39 P80 GCLKO
40 1 n/c 40 n/c
Table B and A2 of development
board 2
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
277

Conn actor B1 Connector B2

Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
4 P112 P71
4
5 P111 P70
5
6 P110 P69
6
7 P109 7 P68
8 P108 P64
8
9 P102 P63
9
10 P101 P62 Data_Symbol(0)
10
11 P100 11 P61 Data_Symbol(1)
12 P99 12 P60 Data_Symbol(2)
13 P98 13 P59 Data_Symboi(3)
14 P97 14 P58 Data_Symbol(4)
15 P96 15 P57 Data_Symbol(5)
16 P95 P56 Data_Symbol(6)
16
17 P94 17 P55 Data_Symbol(7)
18 P93 18 P49 JFIF_hs_rdy
19 P89 19 P48 JFIF_hs_rcv
20 P88 20 P47 Symbol_hs_rdy
21 P87 21 P46 Symbol_hs_rcv
22 P86 22 n/c
23 P84 23 n/c
24 P83 24 n/c
25 P82 n/c
25
26 P81 26 n/c
27 P75 27 n/c
28 P74 28 n/c
29 P73 29 n/c
30 n/c 30 n/c
31 n/c 31 n/c
32 n/c 32 n/c
33 n/c 33 n/c
34 n/c 34 n/c

35 n/c 35 n/c
36 n/c 36 n/c
37 n/c 37 n/c

38 n/c 38 n/c
39 n/c 39 n/c

40 n/c 40 n/c

Table B-6 Pin assignment of signals to connector B1 and B2 of development

board 2
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 278

Connector C1 Connector C2
Conn, FPGA Conn, FPGA
signal signal
pin pin pin pin
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VGC33
4 P112 4 P23

5 P111 5 P22 jpg_core_two_ba2_Data_inout(2)

6 P110 6 P21
7 P I 09 jpg_core_two_ba1_Data_req 7 P20 jpg_core_two_ba2_Data_inout(3)
8 P108 jpg_core_two_ba1_Data_ack 8 P18 jpg_core_two_ba2_Data_inout(4)
9 P I 02 jpg_core_two_ba1_Data_inout(0) 9 P17 jpg_core_two_ba2_DataJnout(5)
10 P101 jpg_core_two_ba1_DataJnout{1) 10 P16 jpg_core_two_ba2_Data_inout(6)
11 P99 jpg_coreJwo_ba1_DataJnout(2) 11 P15 jpg_core_two_ba2_Data_inout(7)
12 P99 jpg_core_two_ba1_Data_inout(3) 12 P11 jpg_core_two_ba2_Data_inout(8)
13 P98 jpg_core_two_ba1_Data_inout(4) 13 P10 jpg_core_two_ba2_Data_inout(9)
14 P97 jpg_core_two_ba1_Data_inout(5) 14 P9 jpg_core_two_ba2_Data_inout(10)
15 P96 jpg_core_two_ba1_Data_inout(6) 15 P8 jpg_core_two_ba2_DataJnout(11)
16 P95 jpg_core_two_ba 1 _Data_inout(7) 16 P7 jpg_core_two_ba2_DataJnout(12)
17 P94 jpg_core_two_ba1_Data_inout(8) 17 P6 jpg_core_two_ba2_Data_inout(13)
18 P93 jpg_core_two_ba1_Data_inout(9) 18 P5 jpg_core_two_ba2_Data_inout(14)
19 P89 jpg_coreJwo_ba1_DataJnout(10) 19 P4 jpg_core_two_ba2_Data_inout(15)
20 P45 jpg_core_two_ba1_DataJnout(11) 20 P3 jpg_core_two_ba2_Data_inout(16)
21 P87 jpg_core_two_ba1_Data_inout(12) 21 P206 jpg_core_two_ba2_DataJnout(17)
22 P44 jpg_core_two_ba1_Data_inout(13) 22 P205 jpg_core_two_ba2_DataJnout(18)
23 P43 jpg_core_two_ba1_DataJnout(14) 23 P204 jpg_core_two_ba2_Data_inout{19)
24 P42 jpg_core_two_ba1_DataJnout(15) 24 P203 jpg_core_two_ba2_Data_inout(20)
25 P41 jpg_coreJwo_ba1_DataJnout(16) 25 P202 jpg_core_two_ba2_Data_inout(21)
26 P40 jpg_core_two_ba1_Data_inout(17) 26 P201 jpg_core_two_ba2_DataJnout(22)
27 P36 jpg_co re_two_ba 1 _Data_i n out( 18) 27 P200 jpg_core_two_ba2_Data_inout(23)
28 P35 jpg_core_two_ba1_DataJnout(19) 28 PI 99 jpg_core_two_ba2_Data_inout(24)
29 P34 jpg_core_two_ba1_DataJnout(20) 29 PI 98 jpg_core_two_ba2_Data_inout(25)
30 P33 jpg_core_two_ba1_txcell_req1 (0) 30 PI 94 jpg_core_two_ba2_Data_inout{26)
31 P31 jpg_core_two_ba1Jxceil_req1(1) 31 Pi 93 jpg_core_two_ba2_Data_inout(27)
32 P30 jpg_core_two_ba1_txcell_ack1 (0) 32 P192 jpg_core_two_ba2_txcell_req 1
33 P29 j pg_co re_two_b a 1 J x c e 1 l_a ck 1 (1) P191 jpg_core_two_ba2_txcell_ack1
33
34 P27 jpg_core_two_ba2_Data_inout(0) 34 PI 89 jpg_core_two_ba2_Data_req
35 P24 jpg_core_two_ba2_DataJnout(1) P188 jpg_core_two_ba2_Data_ack
35
36 n/c 36 n/c

37 n/c 37 n/c

38 n/c 38 n/c

39 n/c 39 P77 GCLK1

40 n/c 40 n/c

board 2
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
279

Table B-8 lists signals assigned to connectors A1 and A2, and Table B-9 lists signals
assigned to connectors B l , B2, CI, and C2 on development board 3.

Connector A1 C o n n e c t o r A2
Conn, pin FPGA pin signal Conn, pin FPGA pin signal
1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33
4 P112 4 P I 62
5 P111 vga_hsync_n 5 P161 SRAMAddr(l)
6 P110 vga_vsync_n 6 P160 SRAMAddr(O)
7 P109 Pin_vga_gray{1) 7 P I 52 SRAMAddr(3)
8 P108 pin_vga_gray(0) 8 P151 SRAMAddr(2)
9 P102 pin_vga_gray(3) 9 P I 50 SRAI\/IAddr(5)
10 P101 pin_vga_gray{2) 10 P149 SRAMAddr(4)
11 P100 pin_vga_gray(5) 11 P148 SRAMAddr(7)
12 P99 pin_vga_gray{4) 12 P147 SRAMAddr(6)
13 P98 pin_vga_gray(7) 13 P146 SRAMAddr(9)
14 P97 pin_vga_gray(6) 14 P I 45 SRAI\/IAddr(8)
15 P96 15 P141 SRAMAddr(ll)
16 P95 16 P140 SRAMAddr(IO)
17 P94 17 P I 39 SRAMAddr(13)
18 P93 18 P I 38 SRAMAddr(12)
19 P89 19 P136 SRAI\/IAddr(15)
20 P181 20 P135 SRAMAddr{14)
21 P87 21 P I 34 SRAMAddr(17)
22 P180 22 P I 33 SRAMAddr(16)
23 P179 SRAMData(12) 23 P132 SRAMData(l)
24 P I 78 SRAI\/lData(13) 24 P129 SRAMData(O)
25 P I 76 SRAMData(14) 25 P127 SRAMData(3)
26 P I 75 SRAMData(15) 26 P I 26 SRAIVlData(2)
27 P I 74 SRAIVI_CE 27 P125 SRAI\/IData(5)
28 P173 SRAI\/I_WE 28 P I 23 SRAMData(4)
29 P169 SRAM_LB 29 P122 SRAI\/IData(7))
30 P168 SRAM_UB 30 P121 SRAMData(6)
31 P167 SRAM_OE 31 P120 SRAI\/IData{9)
32 P I 66 32 P116 SRAI\/lData(8)
33 P165 RD 33 P115 SRAMData(11)
34 P I 64 TD 34 P114 SRAMData{10)
35 P163 pin_vgaclk_25Mhz 35 P113
36 n/c 36 n/c
37 n/c 37 n/c
38 n/c 38 n/c
39 n/c 39 P80 GCLKO
40 n/c 40 n/c

Table B-8 Pin assignment of signals to connector A1 and A2 of development

board 3
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail 280

Connector B1 Connector B2 Connector C I Connector C2

Conn, FPGA Conn, FPGA Conn, FPGA Conn, FPGA
signal signal signal signal
pin pin pin pin pin pin pin pin
1 N/A GND 1 N/A GND 1 N/A GND 1 N/A GND
2 N/A VU 2 N/A VU 2 N/A VU 2 N/A VU
3 N/A VCC33 3 N/A VCC33 3 N/A VCC33 3 N/A VCC33
4 4 P71 4 P112 4 P23
5 P111 5 P70 5 P111 5 P22
6 P110 6 P69 6 P110 6 P21
7 P109 7 P68 7 P109 7 P20
8 P108 8 P64 8 P108 8 P18
9 P102 9 P63 9 P102 9 P17
10 P101 10 P62 10 PI 01 10 P16
11 PI 00 11 P61 11 P99 11 P15
12 P99 12 P60 12 P99 12 P11
13 P98 13 P59 13 P98 13 P10
14 P97 14 P58 14 P97 14 P9
15 P96 15 P57 15 P96 15 P8
16 P95 16 P56 16 P95 16 P7

17 P94 17 P55 17 P94 17 P6

18 P93 18 P49 18 P93 18 P5

19 P89 19 P48 19 P89 19 P4

20 P88 20 P47 20 P45 20 P3
21 P87 21 P46 21 P87 21 P206
22 P86 22 n/c 22 P44 22 P205
23 P84 23 n/c 23 P43 23 P204
24 P83 24 n/c 24 P42 24 P203
25 P82 25 n/c 25 P41 25 P202

26 P81 26 n/c 26 P40 26 P201

27 P75 27 n/c 27 P36 27 P200

28 P74 28 n/c 28 P35 28 P199

29 P73 29 n/c 29 P34 29 P I 98

30 n/c 30 n/c 30 P33 30 PI 94

31 n/c 31 n/c 31 P31 31 P I 93

32 n/c 32 n/c 32 P30 32 P192

33 n/c 33 n/c 33 P29 33 P191

34 n/c 34 n/c 34 P27 34 P189

35 n/c 35 n/c 35 P24 35 P188

3G
n/c 36 n/c 36 n/c 36 n/c

37 n/c 37 n/c 37 n/c 37 n/c

38
n/c 38 n/c 38 n/c 38 n/c

39 n/c 39 n/c 39 n/c 39 P77 GCLK1

40 n/c 40 n/c 40 n/c 40 n/c

Table B-9 Pin assignment of signals to connector B1, B2, C1, and C2 of
development board 3
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail 281

B.5 Circuit description of the Bt121 triple 8-bit

VideoDAC
The BT121 is a triple 8-bit videoDAC designed specifically for high-performance, high-
resolution colour graphics. The BT121 generates RS-343A-compatible video signals into a
doubly-terminated 7 5 0 load, and RS-170-compatible video signals into a singly-
terminated 7 5 0 load, without requiring external buffering. Both the differential and
integral linearity errors of the D/A converters are guaranteed to be a maximum of ± 1 LSB
over the flill temperature range. The functional block diagram of the BT121 is given in
Figure B-15.

VREF : F S ADJUST

Reference
CLOCK - Amplifer
1.2 V

8
R0-R7 DAC lOR

8
G0-G7 I DAC lOG
Register

a
B0-B7 ' DAC . lOB

SYNC* -

BLANK' -

VAA' 'AGND

Figure B-15 Functional block diagram of the BT121 videoDAC

As illustrated in the fimctional block diagram, the BT121 contains three 8-bit D/A
converters, input registers, and a reference amplifier. On the rising edge of CLOCK, 24
bits of colour information (R0-R7, G0-G7, and B0-B7) are latched into the device and
presented to the three 8-bit D/A converters. Latched on the rising edge of CLOCK to
maintain synchronisation with the colour data, the SYNC* and BLANK* inputs add
appropriately weighted currents to the analogue outputs, producing the specific output
levels required for video applications.

The D/A converters on the BT121 use a segmented architecture in which bit currents are
routed to either the outputs or GND by a sophisticated decoding scheme. This architecture
eliminates the need for precision component ratios and greatly reduces the switching
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 282

transients associated with turning current sources on and off. Monotonicity and low glitch
are guaranteed by use of identical current sources and current steering their outputs. An
on-chip operational amplifier stabilises the full-scale output current against temperature
and power supply variations. The analogue outputs of the BT121 can directly drive a 37.5
n load, such as a doubly-terminated 75 O coaxial cable. The pin diagram of the BT121
videoDAC is illustrated in Figure B-16 and the pin descriptions are given m Table B-10.

o
< i Q o g
K
O Q
CD
o 1 8 1 1 § § o

% R N S s % % 5

R7 r 40 28 J GND
R 6 [ 41 27 J GND
R5 r 42 26 J BO

R 4 [ 43 25 J B 1

R 3 r 44 24 JB2
R2 ^ 1 # 23 ]B3
Ri r 2 22 J B 4

R0[ 3 21 ]B5 j
GND 1 4 20 jBG 1
GNDl" 5 19 ]B7 1
SYNC* r 6 18 ]CLOCK 1

1—1 1—1 ' ' L j c ! ;

i (3 o o s 8 S 5 o o a
§ o
m

Figure B-16 Pin diagram of the BT121 videoDAC

Pin name Description

Composite blank control input (TIL compatible). A logical zero
drives the lOR, lOG, and lOB outputs to the blanking level. BLANK*
BLANK*
is latched on the rising edge of CLOCK. When BLANK* is a logical
zero, the R0-R7, G0-G7, and B0-B7 inputs are ignored.
Composite sync control input (TIL compatible). SYNC* does not
override any other control or data input. SYNC* should be asserted
SYNC*
only during the blanking interval. It is latched on the rising edge of
CLOCK.
R0-R7, G0-G7, Red, green, and blue data inputs (TIL compatible). RO, GO, and BO
are the least-significant data bits. They are latched on the rising
B0-B7
edge of CLOCK. Coding is binary.
Clock input (TTL compatible). The rising edge of CLOCK latches the
CLOCK R0-R7, G0-G7, B0-B7, SYNC*, and BLANK* inputs. It is typically the
pixel clock rate of the video system. It is recommended that the
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
283

CLOCK input be driven by a dedicated TTL buffer to avoid reflection-

induced jitters.
Red, green, and blue current outputs. These high-impedance current
lOR, lOG, lOB sources can directly drive a doubly-terminated 75 0 coaxial cable.
All outputs, whether used or not, should have a common output
load.
FS ADJUST Full-scale adjust control. A resistor (RSET) connected between this
pin and GND controls the magnitude of the full-scale video signal
Compensation pin. This pin provides compensation for the internal
reference amplifier. A 0.1 pF ceramic capacitor in series with a
resistor should be connected between this pin and the nearest VAA
pin (see Figure B-16) for optimum settling time. Connecting the
COIVIP
capacitor to VAA rather than to GND provides the highest possible
power supply noise rejection. The COMP resistor and capacitor
must be as close to the device as possible to keep lead lengths to
an absolute minimum.
Voltage reference input. The internal voltage reference is used and
VREF this pin is only connected to a 0.1 pF ceramic capacitor that
decouples this input to GND.
GND Analogue ground. All GND pins must be connected together on the
same PCB plane to prevent latchup.
VAA Analogue power. All VAA pins must be connected on the same PCB
plane to prevent latchup.
Table B-10 Pin descriptions of the BT121

The typical connection diagram using the internal voltage reference is shown in Figure B-
17 and the parts lists listed in Table B-11.

COMP
R4

Analog Power Plane +5V

VAA

C2.C3
VREF
C6 CI

BT121
Ground
GND
VAA
RSET;; <R1 :;R2 %R3

1N4148/9
FS Adjust

DAG To
ICR output
P monitor

1N4148#
lOG video
P
lector

lOB P
AGND

Figure B-17 Typical connection diagram with Internal voltage reference

T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 284

Location Description Vendor part number

CI 33 pF tantalum capacitor Mallory CSR13F336KM
C2, C3, C4, C5 0.1 pF ceramic capacitor Erie RPE112Z5U104M50V
C6 10 (jF capacitor Mallory CSR13G106KM
L1 Ferrite bead Fair-Rite 2743001111
R1,R2, R3 75 Q 1% metal film resistor Dale CIVIF-SSC
R4 15 Q 1% metal film resistor Dale CMF-55C
RSET 143 O 1 % metal film resistor Dale CMF-55C

Note: The vendor numbers above are listed only as a guide. Substitution of devices
with similar characteristics will not affect the performance of the BT121.

Table B-11 Typical connection parts list

T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 285

B.6 Digilent D2-SB system board reference manual

Overview
I r O :
The Digilent D2-SB circuit board Icguatu : ocki oLflon
LED
S-'c'D"
provides a complete circuit development
platform centered on a Xilinx Spartan 2E
FPGA. D2-SB features include:
• A Xilinx XC2S200E-200 FPGA with Xilinx S p a r t a n 2 E X C 2 S 2 0 0 E - P Q 2 0 8
200K gates and 350MHz operation;
Expanskm Connectors
• 143 user l/Os routed to six standard
40-pin expansion connectors; I I I
• A socket for a JTAG-programmable I
18V02 configuration Flash ROM; s E

• Dual on-board 1.5A power regulators

(1.8V and 3.3V): T
C1
• An SMD 50MHz oscillator, and a
socket for a second oscillator; D2-SB circuit board block diagram
• A JTAG programming port;
• A status LED and pushbutton for basic A pushbutton and LED are also included
I/O; for basic I/O. The D2-SB board has been
designed to serve primarily as a host for
The D2-SB has been designed to work peripheral boards. Each of the six
seamlessly with all versions of the Xilinx expansion connectors provides the
ISE CAD tools, including the free unregulated supply voltage (VU), 3.3V,
WebPack tools available from the Xilinx GND, and 32 FPGA I/O signals.
website. A growing collection of low-cost Because there are more connector pins
expansion boards can be used with the than FPGA pins, the A1, B1 and CI
D2-SB to add analog and digital I/O connectors share an 18-pin "system
capabilities, as well as various data ports bus", and not all pins on the B expansion
like Ethernet and USB. The D2-SB board connectors are used. JTAG signals are
ships with a power supply and also routed to the A1, 81, and CI
programming cable, so designs can be expansion connectors. This allows
implemented immediately without the peripheral boards to drive the scan
need for any additional hardware. chain, or to be configured along with the
Spartan 2E FPGA. Application-specific
Functional Description peripheral boards can be created to
mate with the D2-SB, or readymade
The Digilab D2-SB provides a minimal peripheral boards that offer many
system that can be used to rapidly standard f u n c t i o n s can be obtained f r o m
implement FPGA based circuits, or to Digilent (see w w w . d i g i l e n t i n c . c o m ) .
gain exposure to Xilinx CAD tools and
Spartan 2E devices. The D2-SB JTAG Ports and Device Configuration
provides only the essential supporting
devices for the Spartan 2E FPGA, The Spartan 2E FPGA and the 18V00
including clock sources and power ROM on the D2-SB, and any
supplies. All available I/O signals are programmable devices on peripheral
routed to standard expansion connectors boards attached to the D2-SB can be
that mate with 40-pin, 100 mil spaced programmed via the JTAG port. The
DIP headers available from any catalog JTAG scan chain is routed to the FPGA
distributor. and ROM on the D2-SB and then around
the board to four connection ports as
L
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 286

shown in the figure below. The primary JTAG cable to the configuration
configuration port (Port 1) uses a software. Port modules can disable their
standard 6-pin JTAG header (J7) that JTAG drivers; if more than one JTAG
can accommodate Digilent's JTAG3 driver is enabled on the scan chain,
cable (or cables from Xilinx or other programming may fail.
vendors). The other three JTAG
programming ports are available on the
A1, B1, and C1 expansion connectors,
* I /paw
and these ports are bi-directional. If no
peripheral board is present, a buffer on •

the D2-SB removes the expansion JTAG : ROY

:o"!neKcr ^
connector from the JTAG chain. If a
:03ft?" 2 :
peripheral board with a JTAG device is Cab e bypass —^ pc:o&
attached, the scan chain is driven out the ''OC I
expansion connector so that any JTAG
programmable parts can be configured.
If a Digilent port module is connected to
one of the three JTAG-enabled
expansion connectors, then the port JTAG signal routing on D2-SB
module can drive the JTAG chain
to program all devices in the scan chain Power Supplies
(port modules include Ethernet, USB,
EPF parallel, and serial modules - see The D2-SB board uses two LM317
www.diqilentinc.com for more voltage regulators to produce a 1.8VDC
information). supply for the Spartan 2E core, and
The scan chain can be driven from the 3.3VDC supply for the I/O ring. Both
primary port by powering on the D2-SB, regulators have good bypass
connecting it to a PC with a JTAG capacitance, allowing them to supply up
programming cable, and running the to 1 .SA of current with less than SOmV of
"auto-detect" feature of the configuration noise (typical). Power can be supplied
software. The configuration software from a lowcost wall transformer supply.
allows devices in the scan chain to be The external supply must use a 2.1mm
selectively programmed with any center-positive connector, and it must
available configuration file. If no produce between 6VDC and 12VDC of
programming ROM is loaded in the ICS unregulated voltage. The D2-SB uses a
socket (or if ROM is present but is not to four layer PCB, with the inner layers
be included in the scan chain), jumper- dedicated to VCC and GND planes.
shunts must be loaded at JP1 and JP2 in Most of the VCC plane is at 3.3V, with
the "Bypass ROM" location to route the an island under the FPGA at 1.8V. The
JTAG chain around the ROM socket. If FPGA and the other ICs on the board all
an 18V02 (or larger) ROM is loaded in have 0.047uF bypass capacitors placed
the ICS socket, it can be included in the as close as possible to each VCC pin.
scan chain by loading the JP1 and JP2 Total board current is dependant on
jumper-shunts in the "Include ROM" FPGA configuration, clock frequency,
positions. If a programming ROM is and external connections. In test circuits
present in the ICS socket, the FPGA will with roughly 50K gates routed, a SOMHz
automatically access the ROM for clock source, and a single expansion
configuration data if jumper shunts are board attached (the DIOS board),
loaded in all three positions of J8 (M2, approximately 200mA +/- 30% of supply
Ml, and MO). Port modules attached to current is drawn from the 1.8V supply,
ports A1, 81, or C1 can drive the scan and approximately 200mA +/- 50% is
chain if a jumper-shunt is installed on the drawn from the 3.3V supply. These
primary JTAG header across the TDI currents are strongly dependent on
and TOO pins. In their default state, FPGA and peripheral board
Digilent port modules will appear as a configurations. All FPGA I/O signals use
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail 287

the VCCO voltage derived from the 3.3V VU on pin 2, and 3.3V on pin 3. Pins 4-
supply. If other VCCO voltages are 35 route to FPGA I/O signals, and pins
required, the regulator output can be 36-40 are reserved for JTAG and/or
modified by changing R12 according to: clock signals. The expansion headers
provide 192 signal connections, but the
VCCO = 1.25(1 + R12/R11). Spartan 2E-PQ208 has only 143
available I/O signals. Thus, some FPGA
Refer to the LM317 data sheet and D2- signals are routed to more than one
SB schematic for further information. connector. In particular, the lower 18
pins (pins 4-21) of the A1, B1, and CI
connectors are all connected to the
Oscillators same 18 FPGA pins, and they are
designated as the "system bus" (a
The D2-SB provides a 50MHz SMD unique chip select signal is routed to
primary oscillator and a socket for a each connector). Other than these 18
second oscillator. The primary oscillator shared signals, all remaining FPGA
is connected to the GCK2 input of the signals are routed to individual
Spartan 2E (pin 182), and the secondary expansion connector positions. The
oscillator is connected to GCK3 (pin lower 18 pins of the A2, B2, and C2
185). Both clock inputs can drive the connectors are designated as "periphera
DLL on the Spartan 2E, allowing for busses", and each of these busses
internal frequencies up to four times (named PA, PB, and PC) use 18 unique
higher than the external clock signals. signals. The 14 upper pins of each
Any 3.3V oscillator in a half-size DIP expansion connector (pins 22-35) have
package can be loaded into the been designated as "module busses".
secondary oscillator socket. The A1, A2, 01, and C2 connectors
each have fully populated module
Pushbutton and LED busses (named MAI, MA2, MCI, and
MC2). Insufficient FPGA pins were
A single pushbutton and LED are available to route full module buses to
provided on the board allowing basic the B connectors; only the 8 data pins of
status and control functions to be MB1 are routed, and no pins are routed
implemented without a peripheral board. to the upper B2 expansion connector
As examples, the LED can be (i.e., MB2 is a "no connect").
illuminated from a signal in the FPGA to
verify that configuration has been System Bus
successful, and the pushbutton can be
used to provide a basic reset function The "system bus" is a protocol used by
independent of other inputs. The circuits certain expansion boards that mimics a
are shown below. simple 8-bit microprocessor bus. It uses
eight data lines, six address lines, a
write-enable (WE) strobe that can be
used by the peripheral to latch written
data, an output-enable (OE) strobe that
F3:X can be used by the peripheral to enable
80'3"
read data, a chip select, and a clock to
enable synchronous transfers.

The diagrams below show signal timings

Expansion Connectors assumed by Digilent to create peripheral
devices. However, any bus and timing
The six expansion connectors labeled models can be used by modifying
A1-A2, B1-B2, and C1-C2 use 2x20 circuits in the FPGA and attached
right-angle headers with 100 mil spacing. peripheral devices.
All six connectors have GND on pin 1,
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
288

Write Cycle
(h
\ r
cs
/
(doe Koe
-K4-

OE y
K
tw

L J m ;

DBO.OB7
X K

Read Cvcle
teoe tdoe
iy
OE - \
WE
K
DB04)B7
X X
K e a d d a t a latdi

Symbol Parameter Time (typ)

ten Time to enable after CS asserted 10 ns
th Hold time 1 ns
tdoe Time to disable after OE de-asserted 10 ns
teoe Time to enable after OE asserted 15 ns
tw Write strobe time 10 ns
tsu Data setup time 5 ns
twd Write disable time 0 ns
T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail
289

ii Sys Bus
Spartan 2E
PO 208

>8(16)

8 o c
a.

F - 5 4 . : ' p^i F1S4-21

Expansion Connector Signal Routing

:Pm4C

Expansion connector pin locations

T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 290

A1 A2 B1 82 C1 C2
Pin # FPGA
Signal FPGA FPGA FPGA FPGA FPGA
Pin Signal Signal Signal Signal Signal
Pill Mn Pm Pin Pin
1 GND GND GND GND C O GND
2 VU VU VU VU J VU
3 VCC33 VC&# VCC33 VCC33 VCC33
4 AORO 112 PAmi 162 ADRO 112 PBmi 71 ADRO 112 PCKM 23
5 OBO 111 PAMZ 161 DBO 111 PBm2 70 DBO 111 pcm2 22
6 ADR1 110 P A o a 160 ADR1 110 PBm3 69 4DR1 110 pcms 21
7 DB1 109 PAm4 152 DB1 109 PBm4 G8 DB1 109 pcm4 20
8 A0R2 108 =Am5 151 ADR2 108 PBms 64 ADR2 pcms
103 13
9 DB2 ti 2 = i|06 15C DB2 102 PBm6 63 DB2 102 pcme 17
10 ADR3 in = -107 149 ADR3 101 PBm7 62 ADR3 101 PCW7 16
11 DB3 100 148 DB3 100 PBms 61 DB3 100 pcme 15
12 ,WR4 99 PAWS 147 ADR4 99 PBmg 60 ADR4 99 pcmg 11
13 DB4 98 PAIOIO 146 DB4 98 PBI010 59 DB4 98 PCI010 1C
14 ADR5 97 PAI011 145 ADR5 97 PBI011 58 ADR5 97 PCI011 9
15 DBS 96 PAI012 141 DB5 96 PBI012 57 DBS 96 PCI012 8
16 WE 95 PAI013 140 WE 95 PBI013 56 WE 95 PCI013 7
17 DBG 94 PAI014 139 DB6 94 PBI014 55 DB6 94 PCI014 6
18 OE 93 PAI015 138 OE 93 PBI015 49 OE 93 PCI015 5
19 DB7 89 PAI016 136 DB7 89 PBI016 48 DB7 89 PCI016 4
20 CSA 181 PAI017 135 CSB 88 PBI017 47 CSC 45 PCU017 3
21 LSBCLK 87 PAI018 134 LSBCLK 87 PBI016 46 LSBCLK 87 PCI018 206
22 VA1DB0 180 MA2DB0 133 MG1DB0 86 MCIDBO 44 IMC2DB0 205
23 VA1DB1 179 MA2DB1 132 MB1DB1 84 M C I OBI 43 MC2DB1 204
24 VA1DB2 178 MA20B2 129 MB1DB2 83 MC1DB2 42 MC2DB2 203
25 'AMDB3 176 MA2DB3 127 MB1DB3 82 MC1DB3 41 MC2DB3 202
26 VA1DB4 175 MA2DB4 126 MB1DB4 81 MC1DB4 40 MC2DB4 201
27 fAAilDBS 174 MA2DB5 125 MB1DB5 75 MC1DB5 36 MC2DB5 200
28 W\1DB6 173 VA20B6 123 MB1DB6 74 MC1DB6 35 VIC2DB6 199
29 '^A1DB7 169 VA2DB7 122 MB1DB7 73 dC1DB7 34 VIC2DB7 198
30 W1ASTB 168 MA2ASTB 121 \4C1ASTB 33 V1C2ASTB 194
31 AMDSTB 167 VA203TB 120 31 ,flC2DSTB 193
32 /A'YvRT 166 ',AA2WRT 116 viCAR- 30 ^C2WRT 192
33 ,'A"'A'AIT 165 ^2WAIT 115 . ' C I WAIT 29 ,1C?A'AIT 19'
34 /A1RST 164 VIA2RST 114 w^iRST 27 ^C2RST 189
35 163 v;A2INT 113 dClINT 24 .1C2INT 18S
36 ;TSELA ITSELB JTSELC
37 FMS rws "MS
38 rcK rcK "CK
39 rOO 3OLK0 80 mo "DO ( 3CLK1 77
40 "Dl 3 NO roi "Dl 1 |c 3ND

D2-SB Expansion Connector Pinout

T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
291

Pm# FuncHon Pin# FuncUon Pin # Function Pm# Function

1 GND 53 VCCO 105 VCCO 157 TDO
2 TMS 54 M2 106 PROG 158 GND
3 PC-1017 55 PB-1014 1G7 INIT 159 TDI
4 PC-1016 56 PB-1013 108 ADR2 160 P4^03
b PC-I015 57 PB-1012 109 DB1 161 PA^02
6 PC-j014 58 PB-1011 110 ADR1 162 PA-101
7 PC-1013 59 PB-I010 111 DBO 163 MA1-INT
8 PC-j012 60 PB-109 112 ADRO 164 MA1-R3T
9 PC-I011 61 PB-I08 113 MA2-INT 165 MAÂAT
10 PC-I010 62 PB407 114 MA&^ST 166 MAI A RT
11 PC^09 63 PBÔG 115 MA2-\'VAIT 167 htA1-DSTB
12 GND 64 PB^05 116 MA2-WRT 168 WA1-ASTB
13 VCCO 65 GND 117 GND 169 MA^#B7
14 VCCINTT GG VCCO 118 VCCO 170 GND
15 r 1 8 67 VCCINT 119 VCONT 171 \CCO
16 7 68 PB^04 120 MA2^8TB 172 VCCINT
P^ÛG 69 PB403 121 MA2-ASTB 173 MAI-DBG
PCÔS 70 FB4C2 122 MA&D87 174 MA1-DB5
GND 71 FEWCI 123 MA2-D86 175
PC^04 72 GND 124 GND 176 MA1-DB3
21 PC^03 73 MB^DB7 125 MA2-DB5 177 GND
22 PC^02 74 MB^DBG 126 MA^DB4 178 MA1-DB2
23 PC^01 75 MB^DBS 127 MA2-DB3 179 MA1-DB1
24 MC1-INT 76 VCCWT 128 VCONT 180 MAI-DBu
25 GND 77 GCLK1 129 MA2-DB2 181 CSA
26 VCCO 78 VCCO 130 VCCO 182 GCLK2
27 MC1-RST 79 GND 131 GND 183 GND
28 VCCINT 80 GCLKO 132 M42-DB1 184 VCCO
29 MC1-WAIT 81 MB1-DB4 133 MA2-DB2 185 GCLK3
30 MC1-WRT 82 MB1-DB3 134 PAJ018 186 VCONT
31 ?,/IC1-DSTB 83 MB1-DB2 135 PA-I017 187
32 GND 84 MB1-DB1 136 PA4016 188 Mc:- N"
33 MC1-ASrB 85 GND 137 GND 189 k'C2-R:T
34 MC1-DB7 86 MB1-DB0 138 PATOIS 190 GND
35 MC1-DB6 87 LSBCLK 139 PA^014 191 IVIC2-''A'AIT
36 MC1-DB5 88 CSB 140 PA4013 192 MC2-WRT
37 VCCINT 89 DO 7 141 PA-1012 193 MC2-DSTB
38 VCCO % VCCiNT 142 VCCWT 194 MC2-ASTB
39 GND 91 VCCO 143 VCCO 195 VCCtNT
40 MC1.DB4 92 GND 144 GND 196 VCCO
41 MC1-DB3 93 OE 145 PA-1011 197 GND
42 MC1-DB2 94 DB6 146 PA-I010 198 MC2-DB7
43 fv1C1-DB1 95 A'E 147 °A409 199 MC2-DBe
44 MC1-DB0 96 DBS 148 PA^08 200 MC2-DB5
45 CSC 97 40R5 149 PA-107 201 MC2-DB4
46 P8-I018 98 3B4 150 :Â-IOG 202 IVIC2-DS3
47 °B-I017 99 4DR4 151 PA-105 203 MC2-DB2
48 :"B-I016 100 3B3 152 c'A-IO^ 204 MC2-D81
49 :'B-I015 101 4DR3 153 DIN 205 \iC2-DBG
50 Ml 102 3B2 154 -ED 206 =C-I018
51 3ND 103 3ND 155 CCLK 207 TCK
52 VIO 104 DONE 156 /cco :#8 ycco

FPGA Pin Assignment

T.B. Yee, 2 0 0 7 Appendix B: Hardware demonstrator in detail 292

B.7 Digilent DI04 peripheral board reference

manual

Overview and 10-50mA from the 3.3V supply

The DI04 circuit board provides a low- (depending on how many LEDs are
cost, ready-made source for many of the illuminated).
most common I/O devices found in
digital systems, it can be attached to a P2
Digilent system board to create a circuit
design platform capable of hosting a
wide array of circuits. DI04 features
include:
• A 4-digit seven segment LED display;
• 8 individual LEDs; 4 Oi acm*

• 4 pushbuttons;
• 8 slide switches;
• 3-bit VGA port; DI04 circuit board block diagram
• PS/2 mouse or keyboard port;
Seven-Segment LED display
Functional Description
The DI04 board contains a modular 4-
The DI04 can be attached to Digilent digit, common anode seven-segment
system boards to quickly and easily add LED display. In a common anode
several useful I/O devices. The DI04 display, the seven anodes of the LEDs
draws power from the system board, and forming each digit are connected to four
signals from all I/O devices are routed to common circuit nodes (labeled AN1
individual pins on the system board through AN4 on the DI04). Each anode,
connectors. These features allow the and therefore each digit, can be
DI04 to be incorporated into system- independently turned on and off by
board circuits with minimal effort. driving these signals to a '1' or a '0'. The
All devices on the DI04 use the 3.3V cathodes of similar segments on all four
supply from the system board, except for displays are also connected together into
the PS/2 port which needs a 5VDC seven common circuit nodes labeled CA
supply (the DI04 contains a 5VDC through CG. Thus, each cathode for all
regulator). Signals coming from the PS/2 four displays can be turned on and off
port are routed through level shifting independently. This connection scheme
buffers to protect system boards that do creates a multiplexed display, where
not have 5V tolerant inputs. driving the anode signals and
corresponding cathode patterns of each
digit in a repeating, continuous
Power Supplies
succession can create a 4-digit display.
In order for each of the four digits to
The DI04 draws power from three pins
appear bright and continuously
on the 40-pin connectors: pin 37
illuminated, all four digits should be
supplies 3.3V; pin 39 provides system
driven once every 1 to 16ms (for a
GND, and pin 40 supplies unregulated
refresh frequency of 1 KHz to 60KHz).
voltage (VU). VU is connected to a
For example, in a 60Hz refresh scheme,
5VDC LDO regulator to produce a 5VDC
each digit would be illuminated for % of
supply for the PS/2 interface. The 3.3V
the refresh cycle, or 4ms. The controller
supply is used to drive all other I/O
must assure that the correct cathode
devices on the board. The DI04
pattern is present when the
consumes 5-10mA from the VU supply,
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
293

corresponding anode signal is driven. To diagram is provided below. When

illustrate the process, if AN1 is driven configured with the code shown in the
high while CB and CC are driven low, appendix, the CPLD on the DI04 board
then a "1" will be displayed in digit implements a seven-segment controller
position 2. Then, if AN2 is driven high provided a suitable clock (256Hz to 1
while CA, CB and CC are driven low, KHz) is provided on the SCLK pin. The
then a "7" will be displayed in digit controller accepts four 4-bit binary
position 2. If AN1 and CB, CC are driven numbers in two successive registers,
for 4ms, and then AN2 and CA, CB, CC and decodes and displays them.
are driven for 4 ms in an endless
succession, the display will show "17" in
the first two digits. An example timing

c o n i n » n @n(xk
Digit illL mi lated Segn- ent
Shown a b c d e f 0
0 1 1 1 1 1 0
/ ' 0 1 1 0 0 0 0
/f 2 1 1 0 1 1
3 0 1
g e 0 c 6 4 1 1 0 0 1 1
5 1 0 1 1 0 1 1
6 1 0 1 1 1 1
1 1 1 0 0 Q 0
Seven-segment display detail and cathode 8 1 1 1 1 1 1
9 1 1 1 1 0 1 1
patterns to display the decimal digits

Anodes are connected via

transistors for greater current

Asn

ASC

AN4

kMe* x cigiu ) ) Dglt4T

a b c 0 e f g dp

Cathodes are connected to Xilinx

device via 1000 resistors
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
294

Discrete LEDs Switch Inputs

Eight individual LEDs are provided for The eight slide switches on the DI04 car
circuit outputs. The LED cathodes are be used to generate logic high or logic
tied to GND via 270-ohm resistors, and low inputs to the attached system board
the LED anodes are driven from a The switches exhibit about 2ms of
74HC373. The '373 allows LED data to bounce, and no active debouncing circui
be latched on the DI04, so that the LD# is employed. A 4.7K-ohm series resistor
signals from the system board do not is used for nominal input protection.
need to be driven continuously (the LD#
signals use connector pins that are used vdd
in the "system bus" on some Digilent
boards). If the system bus is not needed,
then the LDG signal can be tied high.
ONC

74HC373

LD# - j o
PS2 Port

The DI04 board includes a 6-pin mini-

O-m DIN connector that can accommodate a
!"• I
GN3
PS2 mouse or PS2 keyboard
connection. A 5VDC regulator and
voltage-mapping buffers are provided on
the board to interface lower voltage
system boards with keyboards and/or
Button Inputs mice.

The DI04 contains 4 N O. (normally 3^2 Fii Definitions

•1
open) pushbuttons. Button outputs are . u
Fuiici c -
ncoo
connected to Vdd via a 4.7K resistor. - il
o a
1 Data
When the button is pressed, the output is ^3 t t 2 Reserved
F 13 i-'M ;• 3 GNC
connected directly to GND. This results
in a logic signal that is low only while the
Trrrn' Bottom-up
4
5
Vdd
Cbok
PS2 Connector hole p a t t e r n e Reser i
button is actively pressed and high at all
other times. The buttons are debounced
with an RC filter and Schmitt trigger Both the mouse and keyboard use a
inverter as shown in the figure below. two-wire serial bus (including clock and
This circuit creates a logic high signal data) to communicate with a host device
when the button is pressed. The and both drive the bus with identical
debounce circuit provides ESD signal timings. Both use 11-bit words
protection and creates a signal with that include a start, stop and odd parity
clean edges, so the BTN# signals can be bit, but the data packets are organized
used as clock signals if desired. differently, and the keyboard interface
allows bidirectional data transfers (so the
host device can illuminate state LEDs on
the keyboard). Bus timings are shown
Vdd below. The clock and data signals are
T
only driven when data transfers occur,
and otherwise they are held in the "idle"
1—0 ' 8TN#
state at logic ' 1 T h e timings define
signal requirements for mouse-to-host
communications and bi-directional
keyboard communications.
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
295

1C
If a key can be "shifted" to produce a
EdwO
new character (like a capital letter), then
a shift character is sent in addition to the
original scan code, and the host device
must determine which character to use.
Juf: Some keys, called extended keys, send
\ ' / ' an "EO" ahead of the scan code (and
'0" bii T stoo M they may send more than one scan
code). When an extended key is
Symbol Parameter Min Max released, an "EO FO" key-up code is
Tck Clock time 30 us 50 us
Tsu Data-to-clock setup time 5 us 25 us sent, followed by the scan code. Scan
Thld Clock-to-data hold time 5 us 25 us codes for most keys are shown in the
figure below.

A host device can also send data to the

Keyboard
keyboard. Below is a short list of some
often used commands:
The keyboard uses open collector
drivers so that either the keyboard or an ED Set Num Lock, Caps Lock, and scro
attached host device can drive the two- Lock LEDs. After receiving an "ED",
wire bus (if the host device will not send the keyboard returns an "FA"; then
data to the keyboard, then the host can the host sends a byte to set LED
use simple input-only ports). status: Bit 0 sets Scroll Lock; bit 1
sets Num Lock; and Bit 2 sets Caps
PS2-style keyboards use scan codes to lock. Bits 3 to 7 are ignored.
communicate key press data (nearly all EE Echo. Upon receiving an echo
keyboards in use today are PS2 style). command, the keyboard replies with
"EE".
Each key has a single, unique scan code
F3 Set scan code repeat rate. The
that is sent whenever the corresponding keyboard acknowledges receipt of
key is pressed. If the key is pressed and an "F3" by returning an "FA", after
held, the scan code will be sent which the host sends a second byte
repeatedly once every 100ms or so. to set the repeat rate.
When a key is released, a TO" key-up FE Resend. Upon receiving FE, the
code is sent, followed by the scan code keyboard resends the last scan code
of the released key. sent.
FF Reset. Resets the keyboard.

Fi F2 M 1 F: Fr '3 =1- FC r! 11
05 36 C- ocj 25 93 0" - -J ,• c |EC:5
'... .
1 3 * - S : — »

13 26 ZE ^ 33 - =
4) <5 6E
TAB A' E F T u 3 = :{ ; 1
32 - "0 ^ :: 44 6D 54 ^ 5B
Caps. CC'. D = c t- i- L :: '
•c ^ IS :B ^ :E 42 4E. 4C 5A
{EC 7 2
if: X Y B fJ .,1
-I # 4A 53

cw At: At I
14 I Ecn I E3-4
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 296

The keyboard should send data to the Thus, each data transmission contains
host only when both the data and clock 33 bits, where bits 0, 11, and 22 are '0'
lines are high (or idle). Since the host is start bits, and bits 11, 21, and 33 are '1'
the "bus master", the keyboard should stop bits.
check to see whether the host is sending
data before driving the bus. To facilitate The three 8-bit data fields contain
this, the clock line can be used as a movement data as shown below. Data is
"clear to send" signal. If the host pulls valid at the falling edge of the clock, and
the clock line low, the keyboard must not the clock period is 20 to 30 KHz.
send any data until the clock is released The mouse assumes a relative
(host-to-keyboard data transmission will coordinate system wherein moving the
not be dealt with further here). mouse to the right generates a positive
The keyboard sends data to the host in number in the X field, and moving to the
11-bit words that contain a '0' start bit, left generates a negative number.
followed by 8-bits of scan code (LSB Likewise, moving the mouse up
first), followed by an odd parity bit and generates a positive number in the Y
terminated with a 'V stop bit. The field, and moving down represents a
keyboard generates 11 clock transitions negative number (the XS and YS bits in
(at around 20 - 30 KHz) when the data the status byte are the sign bits - a '1'
is sent, and data is valid on the falling indicates a negative number). The
edge of the clock. magnitude of the X and Y numbers
represent the rate of mouse movement -
the larger the number, the faster the
Mouse mouse is moving (the XV and YV bits in
the status byte are movement overflow
The mouse outputs a clock and data indicators — a '1' means overflow has
signal when it is moved; otherwise, these occurred). If the mouse moves
signals remain at logic '1'. Each time the continuously, the 33-bit transmissions
mouse is moved, three 11-bit words are are repeated every 50ms or so. The L
sent from the mouse to the host device. and R fields in the status byte indicate
Each of the 11-bit words contains a '0' Left and Right button presses (a '1'
start bit, followed by 8 bits of data (LSB indicates the button is being pressed).
first), followed by an odd parity bit, and
terminated with a '1' stop bit.
V c . ! = Statu: byie- Xd- f d
1

li 1 XS Y S X Y Y Y P 1 I )
t
' X Ixslxe x :
t
9 v: YJ f f f r = 1 ^

\SW3b: Swpb:
San: S t a r Wt We

VGA Port

The five standard VGA signals Red (R), VGA "OBI 5" Conneotof
Green (G), Blue (8), Horizontal Sync
"VVV-
(HS), and Vertical Sync (VS) are routed a 270
directly to the VGA connector. A 270-
ohm series resistor is used on each color
signal. This resistor forms a divider with
the 75-ohm VGA cable termination, 270
<1 4 ;
resulting in a signal that conforms to the
VGA specification (i.e., OV for fully off 14 -
and .7V for fully on). VGA signal timings
are specified, published, copyrighted and
sold by the VESA organization GND
(www.vesa.org).
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 297

The following VGA system timing These particle rays are initially
information is provided as an example of accelerated towards the grid, but they
how a VGA monitor might be driven in soon fall under the influence of the much
640 by 480 mode. For more precise larger electrostatic force that results from
information, or for information on higher the entire phosphor coated display
VGA frequencies, refer to document surface of the CRT being charged to
available at the VESA website (or 20kV (or more). The rays are focused to
experiment!). a fine beam as they pass through the
center of the grids, and then they
VGA system timing accelerate to impact on the phosphor
coated display surface.
CRT-based VGA displays use amplitude The phosphor surface glows brightly at
modulated, moving electron beams (or the impact point, and the phosphor
cathode rays) to display information on a continues to glow for several hundred
phosphor-coated screen. LCD displays microseconds after the beam is
use an array of switches that can impose removed. The larger the current fed into
a voltage across a small amount of liquid the cathode, the brighter the phosphor
crystal, thereby changing light permitivity will glow. Between the grid and the
through the crystal on a pixel-by-pixel display surface, the beam passes
basis. Although the following description through the neck of the CRT where two
is limited to CRT displays, LCD displays coils of wire produce orthogonal
have evolved to use the same signal electromagnetic fields. Because cathode
timings as CRT displays (so the "signals" rays are composed of charged particles
discussion below pertains to both CRTs (electrons), they can be deflected by
and LCDs). these magnetic fields. Current
CRT displays use electron beams (one waveforms are passed through the coils
for red, one for blue and one for green) to produce magnetic fields that interact
to energize the phosphor that coats the with the cathode rays and cause them to
inner side of the display end of a transverse the display surface in a
cathode ray tube (see drawing below). "raster" pattern, horizontally from left to
Electron beams emanate from "electron right and vertically from top to bottom. As
guns", which are a finely pointed, heated the cathode ray moves over the surface
cathodes placed in close proximity to a of the display, the current sent to the
positively charged annular plate called a electron guns can be increased or
"grid". decreased to change the brightness of
The electrostatic force imposed by the the display at the cathode ray impact
grid pulls away rays of energized point.
electrons as current flows into the
cathodes.

A n o d e (eniire s c r e e n )
Cathode ray tube display system
Ca#)o(!e ray tub*

^ Cef ecDon c a i s

/ . aw

.P 1 blue, Green)
'-"x.

R . G . S s i q n a l s (to quiis)

Ide'tcbon jnj '

I ccrW ccr'u +
VGAcaWe
S y n c algfiaia
High v o l t a g e s i f i p l y C o n W board (to d e f l e c t i o n c o n t f o ' )
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail 2 9 8

Information is only displayed when the over the display area, and a number of
beam is moving in the "forward" direction "columns" that corresponds to an area
(left to right and top to bottom), and not on each row that is assigned to one
during the time the beam is reset back to "picture element" or pixel. Typical
the left or top edge of the display. Much displays use from 240 to 1200 rows, and
of the potential display time is therefore from 320 to 1600 columns. The overal
lost in "blanking" periods when the beam size of a display, and the number of rows]
is reset and stabilized to begin a new and columns determines the size of each
horizontal or vertical display pass. pixel.
The size of the beams, the frequency at Video data typically comes from a video
which the beam can be traced across refresh memory, with one or more bytes
the display, and the frequency at which assigned to each pixel location (the
the electron beam can be modulated DI04 board uses 3-bits per pixel). The
determine the display resolution. Modern controller must index into video memory
VGA displays can accommodate as the beams move across the display,
different resolutions, and a VGA and retrieve and apply video data to the
controller circuit dictates the resolution display at precisely the time the electron
by producing timing signals to control the beam is moving across a given pixel.
raster patterns. The controller must A VGA controller circuit must generate
produce synchronizing pulses at 3.3V (or the HS and VS timings signals and
5V) to set the frequency at which current coordinate the delivery of video data
flows through the deflection coils, and it based on the pixel clock. The pixel clock
must ensure that video data is applied to defines the time available to display 1
the electron guns at the correct time. pixel of information. The VS signal
Raster video displays define a number of defines the "refresh" frequency of the
"rows" that corresponds to the number of display, or the frequency at which all
horizontal passes the cathode makes information on the display is redrawn.

STTTTa
K* 0.C pxe '

640 3 d s : eyed each

t T c the ysve s acmes
the s c e e n

VGA display
surface
"CJgh
r-izcnia ±1 ; CtSp a y ^ C
c u -ir 3 E

c i a k e c . r r e r t -amip - information
/ c s p ax-ed c u ng N s time

Tcicl horzontal dr-i

Honzortc c sp av retfsce
:ini%

Hsnj"
_ l-c'izcn:^ signel
i_r
pc'ch^ r e t ' 3 :.9 f r e q . e i c y 'back oofch
T.B. Yee, 2007 Appendix B: Hardware demonstrator in detail
299

The minimum refresh frequency is a output of a horizontal-sync counter

function of the display's phosphor and driven by the pixel clock to generate HS
electron beam intensity, with practical signal timings. This counter can be used
refresh frequencies falling in the 50Hz to to locate any pixel location on a given
120Hz range. The number of lines to be row. Likewise, the output of a vertical-
displayed at a given refresh frequency sync counter that increments with each
defines the horizontal "retrace" HS pulse can be used to generate VS
frequency. For a 640-pixel by 480-row signal timings, and this counter can be
display using a 25MHz pixel clock and used to locate any given row.
60 +/-1Hz refresh, the signal timings These two continually running counters
shown in the table below can be derived. can be used to form an address into
Timings for sync pulse width and front video RAM. No time relationship
and back porch intervals (porch intervals between the onset of the HS pulse and
are the pre- and post-sync pulse times the onset of the VS pulse is specified, so
during which information cannot be the designer can arrange the counters to
displayed) are based on observations easily form video RAM addresses, or to
taken from VGA displays. minimize decoding logic for sync pulse
A VGA controller circuit decodes the generation.

Symbol Parameter Vertical sync Horizontal sync

Time Lines Time Clocks
Ts Sync pulse time 16.7 ms 416800 521 32 us 800
Tdisp Display time 15.36 ms 384000 480 25.6 us 640
Tpw VS pulse time 64 us 1600 2 3.84 us 96
Tfp VS front porch 320 us 8000 10 640 ns 16
Tbp VS back porch 928 us 23200 29 1.92 us 48

T T,fp
disp

T.
T
pw bp

Expansion Connectors
Connector pinouts are shown below.
Separately available tables show pass-
through connections for the devices on
the DI04 board when it is attached to
various system boards.
T . B . Yee, 2007 A p p e n d i x B : H a r d w a r e d e m o n s t r a t o r in d e t a i l 300

P1 Signol Dir P2 Signal Dir

1 nc 1 nc
2 nc 2 nc
3 nc 3 nc
4 nc 4 nc
S nc 5 nc
6 nc 6 nc
7 nc 7 nc
3 nc 8 nc
9 nc a nc
10 nc 10 nc
1:1 nc 11 nc
12 nc 12 nc
13 AN3 m 13 V5 in
14 AN4 ;n 14 HS in
IS AN1 m 15 GRN in
16 AN2 m 16 RED in
17 BTN4 out 17 PS2D bidi
18 Bms otit ia BLU in
19 nc 13 BTN2 UJt
20 BIN 3 out 20 PS2C b CI
21 LED6 m 21 OP ! 1
22 LEDG in 22 8TN1 OJt
23 LED7 in 23 CG in
24 nc 24 SA'S out
25 LEC6 m 25 CF in
26 nc 26 SW7 out
27 LEDS n 27 CE in
28 nc 28 SW6 out
29 LED4 n 29 CD in
30 nc 30 S'„V5 out
31 LED3 n 31 CC in
32 nc 32 SW4 out
33 LED2 In 33 CB in
34 nc 34 5W3 out
35 LED1 n 35 CA in
36 nc 36 SV\'2 out
37 VCC33 37 VCC33
38 nc 33 SA'I
39 GND 3-3 GND
40 VU 40 VU

DI04 Expansion Connector Pinout

T.B. Yee, 2007 Appendix C: File formats 3q|

Appendix C

File formats
This appendix explains the format of various data files used within the MOODS synthesis
environment. The first is the ICODE (Intermediate CODE) generated Aom the VHDL
compiler. Two other data files are used within the multi-FPGA partitioning process, the
first is the partitioning information {.par) file which provides input information to the
partitioning algorithm. The MOODS synthesis tool generates the second file; a module
call list (.TMcZ) output file listing the call structure in the module call graph.

C.1 ICODE
The ICODE file is a textual representation of the user's design generated by the source
compiler. This input file to the MOODS synthesis system is a language independent
representation of the original source code, which allows the translation from other high-
level languages such as (SystemC, Verilog). At present, the MOODS synthesis system
only has a VHDL compiler, which converts a VHDL description into an equivalent
ICODE representation.

The rest of this section provides a complete ICODE language grammar in Backus-Naur
Format (BNF). Throughout this grammar, non-italicised entries refer either to other
entries, or base entries. Italics are used to distinguish between different occurrences of a
particular type of entry (e.g. /a6gZ_name is a "name", ivz^frA number a "number"). The
base entries used are:

• string - any combination of ASCII characters not including ICODE delimiters.

Delimiters may be used if preceded by the escape character, e.g. \" rather than
T.B. Yee, 2007 Appendix C: File formats
;02

integer - a binary/decimal/octal/hexadecimal integer number

real — floating-point number using the standard C + + formats for real numbers
(including exponents).

ICODE description ::=

{ info }
program_declaration
{ submodule declaration }
{ component declaration }

act list ::=

We/ name { label_name}

actf_list ;;=
ACTF actjist

actt list :;=

ACT act list
I ACTT actjist

alias_declaration ;;=

ALIAS alias_var_na.me [a/za5_range]FROM parent_var_name [var_^'w6_range]

component declaration : :=
COMPONENT componentjxame, io list [ info ]

conditionaMnst ::=
conditional_inst_name condjJM actt_list a c t f j i s t [ info ]

conditional_inst_name ; :=
IF IIFNOT
T.B. Yee, 2007 Appendix C: File formats

constant ::=
number [w/f^/rA number ]

declaration ::=
io_port_declai-ation
I variable declaration

declaration_part ;:=
{ declaration [info] }

decode inst ::=

DECODE decodejydiX [ info ]
{ CASE constant actt list [ info ] }
ENDCASE

file_info ;;=
In decimal_mtQgQx
I pos decimaljxitQgQX
I file decimal_\ntQgQV

filemap_info ::=

filemap ':' decimal_\ntegtx filename_stvmg

generaMnst' ;;=

general inst name io list [ actt list ] [ infb ]

general_inst_nanie :;=

NOOP I MOVE I UEXT | SEXT | CONCAT

' Genera] instructions are defined in the ICODE instruction database, ICInstDB and may be enhanced as
required.
T.B. Yee, 2 0 0 7 A p p e n d i x C: F i l e f o r m a t s
;o4

index
decimalJiniQgQx

info ::=

infb specification { %' infb specification }

info_specification ;;=
probability_info
I iteration_info
I filemap_info
I file_info

instruction ::=
general_inst
I memory inst
I conditional_inst
I switch__inst
I protect inst
I decode_inst
I moduleap_inst

instructionjpart : :=
{ [label_nsmQ ] instruction }

number

'%'6/MaAy_integer 1 i n t e g e r | integer | integer

T.B. Yee, 2007 Appendix C: File formats
305

inport_declaration ::=

INPORT io jjortjaame io_j7or?_range [ CLOCK | R E S E T ]

io_list ::=
term { term }

io_port_declaration ::=
inport declaration | outport declaration

iteration_info ;:=
its a^ecz/MaZ integer

memory_data ;:=
'['constant { constant}']'

memory inst ;:=

memory read inst | meniory_write_inst

memory_read_inst ;:=

MEMREAD /Me/MO^^/ var name ']/ 7'eaa(_var_name [ info ]

memory write inst :;=

MEMWRITE M/nfg_term TMg/Mon/ var name [ info ]

moduleap_inst :;=

MODULEAP module io_list [ actt__list ] [ info ]

name
string

outport_declaration ::=

OUTPORT io_portjia.m& io_portjSingQ [ INIT constant ]

T.B. Yee, 2007 Appendix C: File formats
;o6

probability_info ::=
pt I pf ':' real

program_declaration ::=
PROGRAMprogram_n&mQ io_list [ actt_list ] [ info ]
declaration_part
instraction_part
ENDMODULE [ program jxdxm ] [ info ]

protect_instruction ::=
P R O T E C T real [ actt_list ]

ram_declaration :;=
RAM mm var name ADDRESS

range ; :=

/Mj'6 mdex index

register__declaration ::=
R E G I S T E R var_name var_range [ INIT constant ]

rom declaration ::=

ROM m/M_war_name ^/ara range ADDRESS a^^cfrgj'j' raiige DATA memory data

submodule_declaration ::=

MODULE module_namt io list [ actt list ] [ info ]

declaration_part
instructionjpart
[label_name ] ENDMODULE [ module_name ] [ info ]

switch inst :;=

S W I T C H O N switchjwax [ info ]
{ CASE constant actt list [ info ] }
DEFAULT acttjist [ info ]
T.B. Yee, 2007 Appendix C: File formats
307

ENDCASE

term ::=
constant | var

var ::=

var_name

variable_declaration ;;=
register_declaration
I alias_declaration
I ram_declaration
I rom declaration

Notes :

Each entry is considered to occupied one line unless extended using

Comments may be included using the standard €++ delimiter '\\\

Most instructions are defined in the ICODE database (ICInstDB), which also
specifies the exact format of their parameters lists.

CASES in DECODES must be in sequential ascending order with no gaps within

the sequence. Any missing cases at the start or end of the sequence default to the
first choice.

Info entries may contain any form of application-dependent information such as

source line numbers, variables etc. Syntactically, everything within the braces is
ignored (although the key entries are identified in the BNF). In MOODS, info
records specify instruction activation probabilities ("pt", " p f ) , loop iterations
("its"), file mappings ("filemap") and back annotation information ("file", "In",
"pos").
T.B. Yee, 2007 Appendix C: File formats

C.2 Partitioning information {.par) file

The partitioning infoimation (^.par) file is an input file to the MOODS synthesis system.
The file format of the partitioning information file is similar to the standard Microsoft
initialisation (. ini) file.

Partitioning initialisation file ==

File to be p l a c e d in the d e s i g n folder

com ment
N o t e : uses the w i n d o w s ini file f o r m a t

section name [Pre-allocate]

; P R O G R A M module
m_call2 = 1
;PROCEDUREPROCImod^e
prod 0 _ 4 _ 4 =2

; P R O C E D U R E P R 0 C 1 module
p r o c 2 _ _ 1 _ 4 _ 4 =3
key n a m e
[Design_Profile]
TIME_STEP= 4
118=1141 M u l t i p l e key v a l u e s
18 25 = 1 1 1 1

[Domain_lnfo]
DOMAIN= 4
d o m _ 1 = 500 20
d o m _ 2 = 400 20
d o m _ 3 = 200 50
d o m 4= 200 30

Figure C-1 Partitioning information (.par) file

Section names are enclosed in square brackets and the items under it are related to that
section. The next lines are broken into two parts: the key name and the key value(s).
Multiple values for a key are separated by a space and comments are introduced by a
semicolon character. This input file provides various types of data to the K-way partitioner
and these are grouped under different section headers listed below:

[Module lock] — Items under this section header are module name (key name) and
the domain number (key value) that the module is locked to during K-way
T.B. Yee, 2007 Appendix C: File formats
;09

partitioning. This allows manual assignment of design modules to a fixed domain

This feature is useful in assigning modules that needs special peripheral devices on
a taiget device PCB board (e.g. a VGA connector, external memory modules).

[Pre-allocate] — Items under this section are similar to the ones mentioned above,
wheie the key names are module names in the design but the key value under this
section header aie the initial domains that the modules are assigned to. This forms
the starting partition of the K-way partitioning algorithm.

[Design Profile] - The items under this section header give the design activity
profiling information. The first key name under this section header is
and the key value gives the number of time steps in each profile data. The next
lines are the profile data and these are made up of the source-destination module
node numbers as the key names, and the key values are made up of activation
count values with a space between each time step. The activation count value is the
number of times the source module calls (or activates) the destination module (e.g.
Figure C-1 illustrates a design profile with 4 time steps. Module 1 calls module 18
four times in time step 3 and only once in time steps 1, 2, and 4.).

[Domain Info] - The domain info section contains information on the target
devices available for the multi-FPGA system. The first key name under this section
header is DOMAIN and the key value gives the number of devices available. The
next lines give the available area and I/O resources for each device. The first key
value gives the area available, and the second key value gives the I/O resources
available for device n denoted by the key name, dom M.

C.3 Module call list (.mcf) file

The module call list {.mcT) file is an output file generated by the MOODS synthesis system
and it lists all the subprogram module calls in a design. The module node numbers of the
source and destination modules are used to identify modules with subprogram calls when
T.B. Yee, 2007 Appendix C: File formats
'10

simulating a design to obtain the design activity information using ModelSim simulation
package.

c:\CAD\Projects\m_cali2\m_call2.mcl
M o d u l e call list
filename MODULE CALL LIST

Mod m_call2(prog mod) --> Mod p r o d 0_4_4

Call node u11
18
Mod p r o d 0_4_4 - > Mod proc2 1_4_4
control call n o d e no. ;Call node u23
18 25

source module destination module

s o u r c e m o d u l e no. s o u r c e m o d u l e no.
name name

Figure C-2 Module call list (.mcl) file

T.B. Yee, 2007 Appendix D: VHDL code listings 3j j

Appendix D

VHDL code listings

This appendix gives a complete listing of all the example V H D L designs used in the
experiments conducted in Chapter 6. The VHDL codes for the hardware demonstrator
have been omitted from this appendix due to its size (the behavioural VHDL of the JPEG
decoder is approximately 2000 lines of codes).

This appendix gives some background and idea of the complexity and implementation
methods for the example VHDL designs. Post-MOODS synthesis simulation results of the
multi-FPGA implementations are included for all the example designs.

D.1 Behavioural VHDL example designs

The five behavioural VHDL examples given in this section are used in experiments
(without explicit communication channels) described in Section 6.2. All the VHDL
packages which contain the definitions of constants, types, signals, functions, and
procedures are also included.

D.1.1 Quadratic equation solver

The design solves quadratic equations using the formula of Equation D.l. The 32-bit,
fixed-point quadratic equation solver example given in Figure D-3 uses the integer-maths
library given in Figure D-1 and the quadratic procedure in the VHDL package given in
Figure D-2.

-b± •\lb^ - 4ac (D.l)

2a
T.B. Yee, 2007 Appendix D: VHDL code listings 112

_*** * A*A*****

- Integer-maths
_*********** library
************ package
*************
library ieee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;

package c_types is
-- c style integer and unsigned types
subtype int is signed(31 downto 0);
subtype uint is unsigned(31 downto 0);

function toJnt(arg: integer) return int;

end c_types;

package body c_types is

function t o j n t
- moods inline
( arg: integer
) return int is
begin
return to_sjgned(arg,32);
end toJnt;
end c j y p e s ;

use work.c_types.all;
package imath is

- simple constants
constant neg: boolean := false;
constant pos: boolean ;= true;

• constants for acosi function

constant acos_xO: := X"00000000";
constant acos_x1: := X"00003333";
constant acos_x2: := X"0000G666";
constant acos_x3: := X"00009999";
constant acos_x4: := X"OOOOCCCC";
constant acos_x5: := X"OOOOFFFF";
constant acos_yO: := X"00019220";
constant acos_y1: :=X"00015E94";
constant acos_y2: :=X"000128C7":
constant acos_y3: := X"0000ED63":
constant acos_y4: := X"OOOOCCCD";
constant acos_y5: := X"00000000";

-- constants for cosi function

constant s2pi: int ;= X"0006487E";
constant spi_2: int := X"0001921F";
constant spi; int := X"0003243F";
constant s3pi_2: int := X"0004B65F";
constant cos_xO: int := X'OOOOOOOO";
constant cos_x1: int := X"0000506D";
constant cos_x2: int ;= X"0000A0D9";
constant cos_x3: int :=X"0000F146";
constant cos_x4: int:=X"000141B3";
constant cos_x5: int :=X"00019220";
constant cos_yO: int := X"OOOOFFFF";
constant cos_y1: int := X"OOOOF378";
constant cos_y2: int :=X"0000CF1C";
constant cos_y3: int := X"00009679";
constant cos_y4: int :=X"00004F1B";
constant cos_y5: int := X"00000000";
T.B. Yee, 2007 Appendix D: VHDL code listings
313

- integer cubed root function

function cbrti(a: In int) return Int;
-- Integer square rooot function
function sqrtl(a: in int) return Int;
- integer arccos function (inputs and outputs scaled by 65536)
function acosl(a: In Int) return int;
- integer cosine function (inputs and outputs scaled by 65536)
function cosi(a: in int) return int;
- signed integer division
function sdivl(a: in int; b: in Int) return Int;
- unsigned Integer divide
function udivi(a: in uint; b: in uint) return uint;
- sign test
function sign(x: in int) return boolean;
- to_bool conversion
function to_bool(a: in stdjogic) return boolean;
-- moods map move u:1 u:1
- signed sql
function sqi(a: in int) return Int;
- signed cbi
function cbl(a: in int) return Int;
- signed multi
function multi(a,b: in Int) return Int;
- unsigned sql
function sqi(a: in uint) return uint;
-- unsigned cbi
function cbi(a: in uint) return uint;
-- unsigned multi
function multl(a,b: in uint) return uint;
end imath;

package body imath Is

- integer cubed root function

function cbrti
(a: in int
) return int is
variable mask: Int := X"00000400";
variable best: Int := X"00000000";
variable sb: boolean;
variable a j n t : int;
begin
- a simple test for basic solutions
if(a=0 or a=-1 or a=1) then return a; end if;
lf(a<0) then
sb := nag;
aJnt := -a;
else
sb := pos;
aJnt := a;
end if;

while (mask /= 0) loop

if (cbi(best+mask) <= aJnt) then
best := best or mask;
end if;
mask := mask sr11;
end loop;

lf(not sb) then

best := -best;
end If;

return best;
end cbrti;
T.B. Yee, 2007 Appendix D: VHDL code listings g2^

- integer square root function

function sqrti
( a: in Int
) return int is
variable mask: int := X"00008000";
variable best: int := X"00000000";
variable sb: boolean;
variable a j n t : int;
begin
if (a <= 0) then return best; end if;

while(mask /= 0) loop
if (((best+mask)*(best+mask)) <= a) then
best := best or mask;
end if;
mask := mask srI 1;
end loop;

return best;
end sqrti;

- integer arccos function (inputs and outputs scaled by 65536)

function acosi
( a: in int
) return int is
variable sb: boolean;
variable aJnt: int;
variable xO,x1,yO,y1,yOb,y1b: int;
variable result: int;
begin
if(a<0) then
aJnt := -a;
sb := neg;
else
aJnt := a;
sb := pos;
end if;

if {a_int<acos_x1) then
xO = acos_xO
x1 = acos_x1
yO = acos_yO
y1 acos_y1
elsif (aJnt<acos_x2) then
xO •acos_x1;
x1 :acos_x2;
yO :acos_y1;
y1 acos_y2;
elsif (a_int<acos_x3) then
xO = acos_x2;
x1 = acos_x3;
yO acos_y2;
y1 acos_y3;
elsif (aJnt<acos_x4) then
xO = acos_x3
x1 = acos_x4
yO = acos_y3
y i = acos_y4
else
xO = acos_x4;
x1 = acos_x5;
yO = acos_y4;
y1 = acos_y5;
end if;
T.B. Yee, 2 0 0 7 Appendix D: V H D L code listings
315

yOb := shiftjeft(y0,8);
y1b := shiftjeft(y1,8);
result := shift_right(yOb + multi(sdivi(y1b-y0b,x1-x0),(ajnt-x0)),8);

if{sb=neg) then
result := X"0003242F" - result;
end if;

return result;
end acosi;

- Integer cosine function (Inputs and outputs scaled by 65536)

function cosi
( a : in int
) return int is
variable sb: boolean;
variable a j n t : int;
variable temp: Int;
variable xO,x1,yO,y1,yOb,y1b: int;
variable result; int;
begin
If (a<0) then
a J n t ;= -a;
else
a J n t := a;
end If;

if(a_int > s2pi) then

temp ;= signed(udlvi(unslgned(ajnt),unsigned{s2pi)));
a J n t ;= a J n t - multi(temp,s2pj);
end If;

lf(a_int<spi_2) then
sb := pos;
elsif(a_int<spi) then
a J n t ;= spl - aJnt;
sb := neg;
elsif(ajnt<s3pi_2) then
a J n t ;= a_int - spi;
sb ;= neg;
else
a J n t ;= s2pi - aJnt;
sb ;= pos;
end if;

if(a_int < cos_x1) then

xO cos_xO;
x1 = cos_x1;
yO = cos_yO;
y1 = cos_y1;
elsif(a_int < cos_x2) then
xO = cos_x1;
x1 cos_x2;
yO cos_y1;
y i = cos_y2;
elsif(ajnt < cos_x3) then
xO ;= cos_x2;
x1 := cos_x3;
yO ;= cos_y2;
y1 := cos_y3;
elsif(ajnt < cos_x4) then
xO := cos_x3;
x1 ;= cos_x4;
yO ;= cos_y3;
y1 ;= cos_y4;
T.B, Yee, 2007 Appendix D: VHDL code listings 3jg

else
xO :cos_x4;
x1 :cos_x5;
yO :cos_y4;
y1 : cos_y5;
end if;

yOb := shift_left{y0,8);
y1 b := shiftjeft(y1,8);
result := shift_right(yOb + multl(sdivi(y1b-y0b,x1-x0),(ajnt-x0)),8);

if{sb=neg) then
result := -result;
end if;

return result;
end cosi;

- signed integer division

function sdivi
(a: in int;
b: in int
) return int is
variable sa.sb; boolean;
variable ua,ub: int;
variable temp: int;
begin
sa ;= sign(a);
sb := sign(b);

if(sa=pos) then
ua := a;
else ua := -a;
end if;

if(sb=pos) then
ub := b;
else ub ;= -b;
end if;

temp := signed(udivi(unsigned{ua),unsigned(ub)));

if(sa=sb) then
return temp;
else return -temp;
end if;
end sdivi;

- unsigned integer divide

function udivi
(a: in uint;
b: in uint
) return uint is
variable mask: uint := X"40000000";
variable best: uint := X"00000000";
begin
while(mask/=0) loop
if((best+mask)*b <= a) then
best := best or mask;
end if;
mask := mask srI 1;
end loop;
return best;
end udivi;
T.B. Yee, 2007 Appendix D: VHDL code listings g ^ 'y

- sign test
function sign
- moods inline
(x: in int
) return boolean is
begin
return not to_bool(x(31));
end sign;

- to_bool conversion
function to_bool
- moods map move u%1 u%1
( a : in stdjogic
) return boolean is
begin
if(a='1') then return true;
else return false;
end if;
end to_bool;

function sqi
( a: in int
) return int is
variable rl: signed(63 downto 0);
begin
rl := a*a;
return rl(31 downto 0);
end sqi;

function cbi
( a: in int
) return int is
variable rl: signed(95 dow/nto 0);
begin
rl := a*a*a;
return rl(31 downto 0);
end cbi;

function multi
( a,b: in int
) return int is
variable rl: signed(63 downto 0);
begin
rl := a * b;
return rl(31 downto 0);
end multi;

function sqi
( a : in uint
) return uint is
variable rl: unsigned(63 downto 0);
begin
rl := a*a;
return rl(31 downto 0);
end sqi;
function cbi
( a : in uint
) return uint is
variable rl: unsigned(95 downto 0);
begin
rl := a*a*a;
return rl(31 downto 0);
end cbi;
T.B. Yee, 2007 Appendix D; VHDL code listings
118

function multi
( a,b: in uint
) return uint is
variable rl: unslgned(63 downto 0);
begin
rl := a * b;
return rl(31 downto 0);
end multl;
end imath;

Figure D-1 Integer-maths library package of quadratic and cubic equation

solvers

library ieee;
use leee.stdJogic_1164.all;
use leee.numerlc_std.all;
use work.c_types.all;
use work.imath.all;

package algeqn_package is
procedure quadratici(a,b,c: in int; x1,x2: out int; no _real: out int);
procedure cubici(a1,a2,a3: in Int; x1,x2,x3: out int; no_real: out int);
end algeqn_package;

package body algeqn_package is

procedure quadratic!
—- moods inline
(
a,b,c: in int;
x1,x2: out int;
no_real: out int
) is
variable d, rd, a2 : int;
begin
d := sqi(b) - multl(multi(toJnt{4),a),c);
a2 := multi{a,toJnt(2));

if(d < 0) then

no_real := toJnt(O);
else
rd := sqrti(d);
x1 := sdivi((-b + rd),a2);
x2 := sdivl((-b - rd),a2);
no_real := toJnt(2);
end if;
end quadratlcl;
T.B, Yee, 2007 Appendix D: VHDL code listings
119

procedure cubici
-— moods inline
(
a1,a2,a3: in int;
x1,x2,x3: out int;
no_real: out int
) is
variable q,r,q3,d.s,a1_3.srd,t_1, t_2,theta3,t1,t2: int;
begin
t_1 := multi{toJnt(3),a2) - sqi(a1);
q := sdivi(t_1 ,toJnt(9));
t_2 := multi(multi(toJnt(9),a1),a2) - multi(toJnt(27),a3) - multi(to int(2) c b i f a i n
r := sdivi(t_2,toJnt(54));

q3 := cbi(q);
d := q3 + sqi(r);

if(d=0) then
s := cbrtj(r);
a1_3 := sdivi(a1,toJnt(3));
x1 := shiftjeft(s,1) - a1_3;
t1 := -s - a1_3;
x2 := t1;
x3 := t1;
no_real := to_int(3);
elsif (d >0) then
srd := sqrti(d);
s := cbrti(r+srd);
t1 := Gbrti(r-srd);
x1 := s+t1-sdivi(a1,toJnt(3));
no_real := t o j n t ( l ) ;
else
thetaS := sdivi(acosi(sdivi(shiftJeft(r,16),sqrti(-q3))),to int(3))-
t1:=sdivi(a1,toJnt(3));
t2 := shiftjeft(sqrti(-q),1);
x1 := shift_right(multi(t2,cosi(theta3)),16)-t1;
x2 := shift_right(multi(t2,cosi(theta3+X"00021828")),16)-t1;
x3 := shift_right(multi(t2,cosi(theta3+X"00043050")),16)-t1;
no_real := toJnt(3);
end if;
end cubici;
end algeqn_package;

Figure D-2 VHDL package of quadratic and cubic equation solvers

T.B. Yee, 2007 Appendix D: VHDL code listings
320

__****# A A A A * A * * * * * * * A A * A * * * * * * * * * * A * *

- Quadratic equation solver

_************ A*AAAAAAili ******* AAA ****

library ieee;
use ieee.stdJogic_1164.all;
use ieee.nurneric_std.all;
use work.c_types.all;
use work.algeqn_package.all;

entity eq_solver is
port(
a1,a2,a3: in int;
x1,x2: out int;
no_real: out int
);
end eq_solver;
architecture behaviour of eq_solver is
begin
process is
variable b1: int ;= X"00000000";
variable b2: int := X"00000000";
variable b3: int := X"00000000";
variable y1: int := X"00000000";
variable y2: int := X"00000000";
variable vreal: int := X"00000000";
begin
b1 := a1;
b2 := a2;
b3 := aS;
quadratici(b1 ,b2,b3,y1 ,y2,vreal);
x1 <= y1;
x2 <= y2;
no_real <= vreal;
wait for 40 ns;
end process;
end behaviour;

Figure D-3 VHDL of quadratic equation solver example

Figure D-4 shows the post-MOODS synthesis simulation of the non-pipelined multi-
FPGA quadratic equation solver. This two-device implementation has a single subprogram
communication channel {SpC 1). Integer inputs al, a2, and a3 of the quadratic equation
solver are given values 1, -25 and 150 respectively. Outputs x l , x2 and number of real
numbers (no_real) are updated after 9100 ns. With a system clock period of 40 ns, the
non-pipelined multi-FPGA quadratic equation solver takes 224 clock cycles (i.e. clock
cycles - (9100 ns -140 ns) / 40 ns) to complete the application and output the result.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 321

I N N

_ » _ • _» >t

I III II

mill #1 #
t-t-l 5 %

Figure D-4 Simulation of the non-pipelined multl-FPGA quadratic equation

solver
T.B. Yee, 2007 Appendix D: VHDL code listings 322

D.1.2 Cubic equation solver

The 32-bit, fixed-point cubic equation solver example given in Figure D-5 is capable of
finding real solutions to Equation D.2. It uses the integer-maths library given in Figure D-
1 and the cubic procedure in the VHDL package given in Figure D-2.

+ c = 0. (D.2)

__**** A * A A * * * * * * A A * A * * * * * * * * * * * *

- Cubic equation solver

__AA*AAAAA*AAAAAAAAAAAAA**AAAAAA

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
use work.c_types.all;
use work.algeqn_package.all;
entity eq solver is
poi1(
a1,a2,a3: in int;
x1,x2,x3: out int;
no_real: out int);
end eq_solver;
architecture behaviour of eq_solver is
begin
process is
variable b1 ,b2,b3, y1 ,y2,y3: int;
variable vreal: int;
begin
b1 := a1;
b2 := a2;
b3 := a3;
cubicl{b1 ,b2,b3,y1 ,y2,y3,vreal);
x1 <= y1;
x2 <= y2;
x3 <= y3;
no_real <= vreal;
wait for 40 ns;
end process;
end behaviour;

Figure D-5 VHDL of Cubic equation solver example

Figure D-6 shows the post-MOODS synthesis simulation of the non-pipelined multi-
FPGA cubic equation solver. This 2-device implementation has two subprogram
communication channels {SpC 1 and SpC 2) and the arbitration of these two shared
communication channels are provided by two SpC arbiters. Integer inputs al, a2, and a3 of
the cubic equation solver are given values -20, -100 and 2000 respectively. Outputs x l , x2,
x3 and number of real numbers (no real) are updated after 70900 ns. With a system clock
period of 40 ns, the non-pipelined multi-FPGA cubic equation solver takes 1770 clock
cycles (i.e. clock cycles = (70900 ns -100 ns) / 40 ns) to complete the application.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 323

N N N N N

I Si li
lipiiiiii III 111 llliii

WhM
Figure D-6 Simulation of the non-pipelined multi-FPGA cubic equation
solver
T.B. Yee, 2007 Appendix D: VHDL code listings 324

D.1.3 Inverse discrete cosine transform

The 2-D IDCT architecture is adapted from [142]. The architecture is made up of a one-
dimensional 8-point IDCT followed by an internal double b u f f e r memory, followed by
another one-dimensional 8-point IDCT. The algorithm used for the calculation of the 2-D
IDCT is based on Equation (D.3).

xc„ =y y XN„„ . r . eos fzAillk' (D.3)

4 L 2 M J I 2 # y

Equation (D.3) can be separated into the row part and column part as shown in equations
(D.4) and (D.5). The 2-D IDCT is computed by first applying 1-D IDCT on the rows and
then on the columns.

^ y. (2 • col number + l)# roM' number • tt

C = K • cos ^ (D.4)
2.M

where K = — for row = 0, K = for row 9^ 0.

N N

rr \2 • row number + \)» col number • n /t-v c \

C = K • cos (D.5)
2.N

Vi
where K = — for col = 0, K =
V2
for col ^ 0.
M M

The 2-D IDCT behavioural VHDL example is given in Figure D-8 and it uses the VHDL
package in Figure D-7.
T.B. Yee, 2 0 0 7 A p p e n d i x D : V H D L c o d e listings
)25

A ************* A **********

- VHDL package
_********** for 2-D Inverse*******
************************** discrete
*****cosine transform
************** *****
library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;
package idct_package is

procedure idct1_mult_add (
signal Index ; in unsigned(2 downto 0);
signal inia : in signed(11 downto 0);
signal in2a : in signed(11 downto 0);
signal In3a : in signed(11 downto 0);
signal in4a : in signed(11 downto 0);
signal in5a ; in signed(11 downto 0);
signal in6a : in signed(11 downto 0);
signal in7a ; in signed(11 downto 0);
signal in8a : in signed(11 downto 0);
result_a : out signed(21 downto 0));

procedure idct2_mult_add (
signal index : in unsigned(2 downto 0);
signal inib : in signed(10 downto 0);
signal in2b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
signal in4b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
signal in6b : in signed(10 downto 0);
signal in7b : in signed(10 downto 0);
signal inSb : in signed(10 downto 0);
result_b : out signed(20 downto 0));
end idct_package;

package body idct_package is

procedure idct1_mult_add
(
signal index : in unsigned(2 downto 0);
signal inia : in signed(11 downto 0);
signal in2a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
signal In4a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
signal in6a : in signed(11 downto 0);
signal in7a : in signed(11 downto 0);
signal inSa : in signed(11 downto 0);
result_a : out signed(21 downto 0)
) is
variable p1Jmp,p2_tmp,p3_tmp,p4_tmp,p5_tmp,p6_tmp,p7Jmp,p8_tmp
signed(21 downto 0);
begin
p1_tmp := resize(signed(in1a * (91)), 22);
case index is
when "000"=>
p2_tmp := resize(signed(in2a * (126)), 22);
p3_tmp := resize(signed(in3a * (118)), 22);
p4_tmp ;= resize(signed(in4a * (106)), 22);
p5_tmp := resize(signed(in5a * (91)), 22);
p6_tmp := resize(signed(in6a * (71)), 22);
p7_tmp := resize(signed(in7a * (49)), 22);
p8_tmp := resize(signed(in8a * (25)), 22);
T.B. Yee, 2007 Appendix D: V H D L code listings ;26

when "001" =>

p2_tmp := resize(signed(in2a''(106)), 22);
p3_tmp := resize(signed(in3a ' (49)), 22);
p4_tmp := resize(signed(in4a'" (-25)), 22);
p5_tmp := resize(signed(in5a ''(-91)), 22);
p6_tmp := resize(signed(in6a''(-126)), 22);
p7_tmp := resize(signed(in7a ''(-118)), 22);
p8_tmp := resize(signed(in8a'X-71)),22);
=>
when "010"
p2_tmp resize(signed(in2a ''(71)), 22);
p3_tmp resize(signed(in3a'' (-49)), 22);
p4_tmp resize(signed(in4a ' ' (-126)), 22);
p5_tmp resize(signed(in5a'' ^91)), 22);
p6_tmp resize(signed(in6a *^ (25)), 22);
p7Jmp resize(signed(in7a * ( 1 i e O ) , 2 2 ) ;
p8_tmp resize(signed(in8a * (106)), 22);
=>
when "011"
p2_tmp resize(signed(in2a * (25)), 22);
p3_tmp resize(signed(in3a * (-118)), 22);
p4_tmp resize(signed(in4a * (-71)), 22);
p5_tmp resize(signed(in5a * (91)), 22);
p6 tmp resize(signed(in6a * (106)), 22);
p7_tmp resize(signed(in7a * (-49)), 22);
pSJmp resize(signed(in8a * (-126)), 22);
=>
when "100"
p2Jmp := resize(signed(in2a * (-25)), 22);
pSJmp := resize(signed(in3a * (-118)), 22);
p4_tmp := resize(signed(in4a **(71)), 22);
p5_tmp := resize(signed(in5a '(91)), 22);
p6_tmp := resize(signed(in6a''(-106)), 22);
p7_tmp := resize(signed(in7a'' (-49)), 22);
p8_tmp := resize(signed(in8a''(126)), 22);
when "101
p2_tmp : resize(signed(in2a''(-71)), 22);
p3_tmp : resize(signed(in3a ' (-49)), 22);
p4_tmp • resize(signed(in4a''(126)), 22);
p5_tmp : resize(signed(in5a ' (-91)), 22);
p6_tmp : resize(signed(in6a'' (-25)), 22);
p7_tmp resize(signed(in7a ''(118)), 22);
p8_tmp resize(signed(in8a' '(-106)), 22);
=>
when "110"
p2_tmp := resize(signed(in2a' (-106)), 22);
p3_tmp := resize(signed(in3a ' ' (49)), 22);
p4_tmp := resize(signed(in4a' (25)), 22);
p5_tmp := resize(signed(in5a ' ^91)), 22);
p6_tmp := resize(signed(in6a * (126)), 22);
p7_tmp := resize(signed(in7a * (-118)), 22);
p8_tmp := resize(signed(in8a * (71)), 22);
when "111"
p2_tmp := resize{signed(in2a' (-126)), 22);
p3_tmp := resize(signed(in3a' (118)), 22);
p4_tmp := resize(signed(in4a ' (-106)), 22);
p5_tmp := resize(signed(in5a ' (91)), 22);
p6_tmp := resize(signed(in6a ' (-71)), 22);
p7_tmp := resize(signed(in7a' (49)), 22);
p8_tmp := resize(signed(in8a' (-25)), 22);
when others => NULL;
end case;
end procedure idct1_mult_add;
T.B. Yee, 2007 Appendix D: V H D L code listings )27

procedure idct2_mult_add
(
signal index : in unsigned(2 downto 0);
signal inib : in signed(10 downto 0);
signal in2b : in signed(10 downto 0);
signal in3b : in signed(10 downto 0);
signal in4b : in signed(10 downto 0);
signal in5b : in signed(10 downto 0);
signal in6b : in signed(10 downto 0);
signal in7b : in signed(10 downto 0);
signal in8b : in signed(10 downto 0);
result_b : out signed(20 downto 0)
) is
variable p1_tmp,p2_tmp,p3_tmp,p4_tmp,p5_tmp,p6_tmp,p7Jmp,p8_tmp : signed(20 downto 0);
begin
p1_tmp := resize(signed(in1b * (91)), 21);
case index is
when "000" =>
p2_tmp = resize(signed(in2b' 21);
p3_tmp = resize(signed(in3b' '(1180), 21);
p4_tmp = resize(signed(in4b' (106)), 21);
p5_tmp = resize(signed(in5b' (91)), 21);
p6_tmp = resize(signed(ln6b' (71)), 21);
p7_tmp = resize(signed(in7b' (49)), 21);
p8_tmp = resize(signed(in8b" (25)). 21);
when "001" =>
p2_tmp resize(signed(in2b' (106)), 21);
p3_tmp resize(signed(in3b' (49)), 21);
p4_tmp resize(signed(in4b' (-25)), 21);
p5_tmp resize(signed(in5b' (-91)), 21);
p6_tmp resize(signed(in6b' (-126)), 21);
pTJmp resize(signed(in7b' (-1180), 21);
p8_tmp resize(signed(in8b' (-71)). 21);
=>
when "010"
p2_tmp resize(signed(in2b' (71)), 21);
p3_tmp resize(signed(in3b * (-49)), 21);
p4_tmp resize(signed(in4b" (-126)), 21);
p5_tmp resize(signed(in5b * (-91)), 21);
p6_tmp resize(signed(in6b *'(25)), 21);
p7Jmp resize(signed(in7b *"(118)), 21);
p8_tmp resize(signed(in8b *' (106)), 21);
when "011" =:>
p2_tmp : resize(signed(in2b *'(25)), 21);
p3_tmp : resize(signed(in3b *'(-118)), 21);
p4_tmp : resize(signed(in4b *'(-71)), 21);
p5_tmp = resize(signed(in5b '(91)),
* 21);
p6_tmp : resize(signed(in6b *'(106)), 21);
p7_tmp : resize(signed(in7b *'(-49)), 21);
p8_tmp : resize(signed(in8b * (-126)), 21);
when "100 ' =>
p2_tmp : resize(signed(in2b * (-25)), 21);
p3_tmp : resize(signed(in3b * (-118)), 21);
p4_tmp : resize(signed(in4b (71)), 21);
p5Jmp : resize(signed(in5b ^ (91)), 21);
p6_tmp resize(signed(in6b' (-106)), 21);
p7Jmp resize(signed(in7b' (-49)), 21);
p8_tmp resize(signed(in8b' (126)), 21);
when "101
p2_tmp := resize(signed(in2b' (-71)). 21);
p3_tmp := resize(signed(ln3b' (-49)), 21);
p4_tmp := resize(signed(in4b' (126)), 21);
p5_tmp := resize(signed(in5b' (-91)), 21);
p6_tmp := resize(signed(in6b' (-25)), 21);
p7_tmp := resize(signed(in7b' ( 1 i e O ) , 2 1 ) ;
p8_tmp := resize(signed(in8b' (-106)), 21);
T.B. Yee, 2007 Appendix D: VHDL code listings 328

when "110" =>

p2_tmp := resize(signed(in2b * (-106)), 21);
p3_tmp := resize(signed(in3b * (49)), 21);
p4_tmp := resize(signed(in4b * (25)), 21);
p5_tmp := resize(signed(in5b * (-91)), 21);
p6_tmp := resize(signed(in6b * (126)), 21);
p7_tmp := resize(signed(in7b * (-118)), 21);
p8_tmp := resize(signed(in8b * (71)), 21);
when "111" =>
p2_tmp := resize(signed(in2b * (-126)), 21);
p3_tmp := resize(signed(in3b * (118)), 21);
p4Jmp := resize(signed(in4b * (-106)), 21);
p5_tmp := resize(signed(in5b * (91)), 21);
p6_tmp := resize(signed(in6b * (-71)), 21);
p7_tmp := resize(signed(in7b * (49)), 21);
p8_tmp := resize(signed(in8b * (-25)), 21);
when others => NULL;
end case;

result_b := p1_tmp + p2_tmp + p3_tmp + p4_tmp + p5_tmp + p6_tmp + p7_tmp + p8_tmp;

end procedure idct2_mult_add;

end idct package;

Figure D-7 VHDL package for IDCT example

- 2-D Inverse discrete cosine transform

********** ********** A AA *A *************************

library IEEE;
use IEEE.std_iogic_1164.all;
use IEEE.numeric_std.all;
use work.idct_package.all;
entity idct is
port(
in_hs_rdy: in unsigned(0 downto 0); - Handshake ready
in_hs_rcv: buffer unsigned(0 downto 0) := "0"; - Handshake receive
dct_2djn: in signed(11 downto 0);
idct_out: out signed(7 downto 0) := (others=>'0'); - 8 bit output.
out_hs_rdy: buffer unsigned(0 downto 0) := "0"; - Handshake ready
out_hs_rcv: in unsigned(0 downto 0); - Handshake receive
sys_clock: in unsigned(0 downto 0);
-moods clock
sys_reset: in unsigned(0 downto 0)
-moods reset
)',

end idct;
ARCHITECTURE behaviour of idct is
- IDCT_2 signals
signal xaO_reg, xa1_reg, xa2_reg, xa3_reg.
xa4_reg, xa5_reg, xa6_reg, xa7_reg: signed(11 downto 0):= (others=> 0');
- IDCT_2 signals
signal xbO_reg, xb1_reg, xb2_reg, xb3_reg,
xb4_reg, xb5_reg, xb6_reg, xb7_reg: signed(10 downto 0):= (others=> 0');
T.B. Yee, 2007 Appendix D: VHDL code listings 329

- memory section
type RAM_mem_type is array (0 to 63) of signed(10 downto 0);
signal ID_ram1_mem: RAM_mem_type;
-moods ram
signal iD_input_cnt: unsigned(3 downto 0):= "0000";
signal ID_wr_cntr: unsigned(6 downto 0):= "0000000";
signal ID_rd_cntr: unsigned(3 downto 0):= "0000";
- Handshake signals
signal ID_stage2_rdy: unsigned(0 downto 0):= "0";
signal ID_stage2_rcv: unsigned(0 downto 0):= "0";
signal IDJndexJ ; unsigned(3 downto 0):= "0000";
signal IDJndexJ ; unsigned(3 downto 0);= "0000";
begin
********** * A * A *

ID1: process - IDCT Process 1

variable z_outJnt: signed(21 downto 0) := (others=>'0');
begin
resetjoop: loop

ID_wr_cntr <= "0000000";

in_hs_rcv <= "0";
IDJnput_cnt(3 downto 0) <= "0000";
I D_rd_cntr(3 downto 0) <="0000";
ID_stage2_rdy <= "0";
IDJndexJ <= "0000";
wait until sys_ciock'event and sys_clock = "1";
exit resetjoop when sys_reset = "1";
malnjoop: loop
if(ID_wr_cntr(6) = '0') then

while(IDJnput_cnt(3) = '0') loop

while(in_hs_rdy = in_hs_rcv) loop
wait until sys_clock'event and sys_clock = "1";
end loop;

case IDJnput_cnt(2 downto 0) is

when "000" => xaO_reg <= dct_2djn;
when "001" => xa1_reg <= dct_2djn;
when "010" => xa2_reg <= dct_2djn;
when "Oil" => xa3_reg <= dct_2djn;
when "100" => xa4_reg <= dct_2d_in;
when "101" => xa5_reg <= dct_2djn;
when "110" => xa6_reg <= dct_2djn;
when "111" => xa7_reg <= dct_2djn;
when others => NULL;
end case;

in_hs_rcv <= not in_hs_rcv;

ID_input_cnt(3 downto 0) <= IDJnput_cnt(3 downto 0) + "0001";
wait until sys_clock'event and sys_clock = "1";
end loop;

while (IDJndexJ /= "1000") loop

idct1_mu!t_add(IDJndexJ(2 downto 0),xa0_reg,xa1_reg,xa2_reg,
xa3_reg,xa4_reg,xa5_reg,xa6_reg,xa7_reg,z_outJnt);
if(z_outjnt(20) = '0' and z_outJnt(7) = '1') then
ID_ram1_mem(toJnteger(ID_wr_cntr(5 downto 0)))<= z_out_int(18 downto 8)+to_signed(1,11);
ID_wr_cntr <= ID_wr_cntr + "0000001";
else
ID_ram1_mem(toJnteger(ID_wr_cntr(5 downto 0))) <= z_outJnt(18 downto 8);
ID_wr_cntr <= ID_wr_cntr + "0000001";
end if;
T.B. Yee, 2007 Appendix D: VHDL code listings

ID_indexJ <= IDJndexJ + "0001";

end loop;
IDJndexJ <= "0000";
else
while(ID_rd_cntr(3) = '0') loop

- Semaphore Master
while(ID_stage2_rdy /= ID_stage2_rcv) loop
wait until sys_clock'event and sys_clock = "1";
end loop;

case ID_rd_cntr(2 downto 0) is

when "000" => xbO_reg <= ID_ram1_mem(0);
xb1_reg <= ID_ram1_mem(8);
xb2_reg <= ID_ram1_nnem(16); i
xb3_reg <= ID_ram1_mem(24);
xb4_reg <= ID_ram1_mem(32);
xb5_reg <= ID_ram1_mem(40);
xb6_reg <= ID_ram1_mem(48);
xb7_reg <= ID_ram1_mem(56);
when "001" => xbO_reg <= ID_ram1_mem(1);
xb1_reg <= ID_ram1_mem(9);
xb2_reg <= ID_ram1_mem(17);
xb3_reg <= ID_ram1_mem(25);
xb4_reg <= ID_ram1_mem(33)
xb5_reg <= ID_ram1_mem(41);
xb6_reg <= ID_ram1_mem(49);
xb7_reg <= ID_ram1_mem(57)
when "010" => xbO_reg <= ID_ram1_mem(2);
xb1_reg <= ID_ram1_mem(10)
xb2_reg <= ID_ram1_mem(18)
xb3_reg <= ID_ram1_mem(26)
xb4_reg <= ID_ram1_mem(34)
xb5_reg <= ID_ram1_mem(42)
xb6_reg <= ID_ram1_mem(50)
xb7_reg <= ID_ram1_mem(58)
when "011" => xbO_reg <= ID_ram1_mem(3);
xb1_reg <= ID_ram1_mem(11)
xb2_reg <= ID_ram1_mem(19)
xb3_reg <= ID_ram1_mem(27)
xb4_reg <= ID_ram1_mem(35)
xb5_reg <= ID_ram1_mem(43)
xb6_reg <= ID_ram1_mem(51)
xb7_reg <= ID_ram1_mem(59)
when "100" => xbO_reg <= ID_ram1_mem(4); i
xb1_reg <= ID_ram1_mem(12)
xb2_reg <= ID_ram1_mem(20)
xb3_reg <= ID_ram1_mem(28)
xb4_reg <= ID_ram1_mem(36)
xb5_reg <= ID_ram1_mem(44)
xb6_reg <= ID_ram1_mem(52)
xb7_reg <= ID_ram1_mem(60)
when "101" => xbO_reg <= ID_ram1_mem(5);
xb1_reg <= ID_ram1_mem(13)
xb2_reg <= ID_ram1_mem(21)
xb3_reg <= ID_ram1_mem(29)
xb4_reg <= ID_ram1_mem(37)
xb5_reg <= ID_ram1_mem(45)
xb6_reg <= ID_ram1_mem(53)
xb7_reg <= ID_ram1_mem(61)
T.B. Yee, 2007 Appendix D: VHDL code listings J J .

when "110" = xbO_reg <= ID._ram1_mem(6);

xb1_ _reg ID_ram1 mem(14);
xb2_ _reg iD_ram1 .mem(22);
xb3_ .reg ID_ram1 mem (30);
xb4_ .reg ID_ram1 mem(38);
xb5_ _reg ID_ram1 mem(46);
xb6 j e g ID_ram1 mem(54);
x b ? ! .reg ID_ram1 mem(56);
when "111" => xbO_ <= reg <= IDi_ram1_mem(7);
xb1_ .reg ID_ram1 mem(15);
xb2_ .reg < = ID_ram1 mem (23);
xb3_ .reg < = ID_ram1 .mem(31);
<=
xb4 .reg < = ID_ram1 mem(39);
x b S i .reg < = ID_ram1 mem(47);
xb6_ reg < = ID_ram1 mem(55);
xb7_ reg ID rami mem(63);
when others :> NULL;
end case;

ID_stage2_rdy <= not ID_stage2_rdy;

ID_rd_cntr(3 downto 0) <= ID_rd_cntr(3 downto 0) + "0001";
wait until sys_clock'event and sys_clock = "1";
end loop;
IDJnput_cnt(3 downto 0) <= "0000";
ID_wr_cntr(6 downto 0) <= (others=>'0');
ID_rd_cntr(3 downto 0) <= (others=>'0');
end if;
wait until sys_clock'event and sys_clock = "1";
exit resetjoop when sys_reset = "1";
end loop;
end loop;
end process ID1;
************************************************************

102: process - IDCT Process 2

variable idct2d_int: signed(20 downto 0):= (others=>'0');
begin
resetjoop: loop
ID_stage2_rcv <= "0";
out_hs_rdy <= "0";
idct2djnt := (others=>'0');
ID_indexJ <= "0000";
wait until sys_clock'event and sys_clock = "1";
exit resetjoop when sys_reset = "1";
mainjoop: loop

while(ID_stage2_rdy = ID_stage2_rcv) loop

wait until sys_clock'event and sys_clock = "1";
end loop;

while (IDJndexJ /= "1000") loop

idct2_mult_add(IDJndexJ(2 downto 0),xb0_reg,xb1_reg,xb2_reg,xb3_reg,
xb4_reg,xb5_reg,xb6_reg,xb7_reg,idct2djnt);
while(out_hs_rdy /= out_hs_rcv) loop
wait until sys_clock'event and sys_clock = "1";
end loop;
idct_out <= signed(idct2djnt(15 downto 8));
out_hs_rdy <= not out_hs_rdy;
IDJndexJ <= IDJndexJ + "0001";
wait until sys_clock'event and sys_clock = "1";
end loop;
T.B. Yee, 2007 Appendix D: VHDL code listings

IDJndexJ <= "0000";

ID_stage2_rcv <= not ID_stage2_rcv;
wait until sys_clock'event and sys_ciock = "1";
exit resetjoop when sys_reset = "1";
end loop;
end loop;
end process ID2;
*************** ****** **************** *********** * A # A ***********************

end behaviour;

Figure D-8 VHDL of IDCT example

The post-MOODS synthesis simulation of the non-pipelined multi-FPGA IDCT is given

in Figure D-9. Zoom in views of the simulation showing inputs and outputs updates are
given in Figure D-10. The multi-FPGA IDCT has a single subprogram communication
channel {SpC 1) and a single channel arbiter. With a system clock period of 40 ns, the
non-pipelined multi-FPGA IDCT takes 4175 clock cycles (i.e. (167480 ns - 480 ns) / 40
ns) to complete the application.
T.B. Yee, 2007 Appendix D: VHDL code listings 333

Figure D-9 Simulation of tlie non-pipelined multi-FPGA IDCT example

T.B. Yee, 2007 Appendix D: VHDL code listings 334

III
5%; *s
' R 533 55 65! 6)
H a l l
llllll l l U i II 1

Figure D-10 Simulation (zoom in views) of the non-pipelined multi-FPGA

IDCT example
T.B. Yee, 2007 Appendix D: VHDL code listings 335

D.1.4 Triple-Data Encryption Standard

The triple-data encryption standard core implements the triple data encryption algorithm
(TDEA) in the electronic codebook (ECB) mode [144]. The idea of triple DES is that data
is encrypted three times (i.e. encrypted, decrypted and then encrypted again) using two
different keys. In this case, the two encryptions use the first key and the decryption uses
the second key. The VHDL package of the triple-DES is given in Figure D-11 and the
behavioural VHDL of the triple-data encryption standard (triple-DES) core is given in
Figure D-12.

********************AA*A*************
- ***VHDL package
* * ******** * * * ***for Triple-DES -
******************

library ieee;
use ieee,std_logic_1164.all;
package des_functions is
subtype vec56 is std_loglc_vector(1 to 56);
subtype vec64 Is std_logic_vector(1 to 64);

- The key_reduce function reduces a 64-bit key to a 56-bit key by stripping off parity bits
function key_reduce1(key : in vec64) return vec56;
function key_reduce2(key : In vec64) return vec56;

- The des_core function Implements a DES encrypt/decrypt cycle

function des_core(plalntext: vec64; key : vec56; encrypt: stdjogic) return vec64;
end;

library ieee;
use ieee.stdJoglc_1164.all;
use ieee.numeric_std.all;
package body des_functions is
subtype vec3 is std_logic_vector{1 to 3);
subtype vec4 is stdJogic_vector(1 to 4);
subtype vecB is std_logic_vector(1 to 6);
subtype vec28 is std_logic_vector(1 to 28);
subtype vec32 is stdJogic_vector(1 to 32);
subtype vec48 is std_logic_vector(1 to 48);

function initlal_permutation(data : vec64) return vec64 Is

begin
return
data(58) & data(50) & data(42) & data(34) & data(26) & data(18) & data(10) & data(2) &
data(60) & data(52) & data(44) & data(36) & data(28) & data(20) & data(12) & data(4) &
data(62) & data(54) & data(46) & data(38) & data(30) & data(22) & data(14) & data(6) &
data(64) & data{56) & data(48) & data(40) & data{32) & data(24) & data(16) & data(8) &
data(57) & data(49) & data{41) & data(33) & data(25) & data(17) & data(9) & data(1) &
data(59) & data(51) & data(43) & data(35) & data(27) & data(19) & d a t a ( l l ) & data(3) &
data(61) & data(53) & data(45) & data(37) & data(29) & data(21) & data(13) & data(5) &
data{63) & data(55) & data(47) & data(39) & data(31) & data(23) & data(15) & data(7);
end;
T.B. Yee, 2007 Appendix D: VHDL code listings J JC

function final_permutation(data : in vec64) return vec64 is

begin
return
data(40) & data(8) & data(48) & data(16) & data(56) & data(24) & data(64) & data{32) &
data(39) & data(7) & data(47) & data(15) & data(55) & data(23) & data(63) & data(31) &
data(38) & data(6) & data(46) & data(14) & data(54) & data(22) & data(62) & data(30) &
data(37) & data(5) & data(45) & data(13) & data(53) & data(21) & data(61) & data{29) &
data{36) & data(4) & data(44) & data(12) & data(52) & data(20) & data(60) & data(28) &
data(35) & data(3) & data(43) & data(11) & data(51) & data(19) & data(59) & data(27) &
data(34) & data(2) & data(42) & data(10) & data(50) & data(18) & data(58) & data(26) &
data(33) & data(1) & data(41) & data(9) & data(49) & data(17) & data(57) & data(25);
end;

function expand(data : vec32) return vec48 is

begin
return
data(32) & data(1) & data(2) & data(3) & data(4) & data(5) & data(4) & data(5) &
data(6) & data(7) & data(8) & data(9) & data(8) & data(9) & data(10) & data(11) &
data(12) & data(13) & data(12) & data(13) & data(14) & data(15) & data(16) & data(17) &
data(16) & data(17) & data(18) & data(19) & data(20) & data(21) & data(20) & data(21) &
data(22) & data(23) & data(24) & data(25) & data(24) & data(25) & data(26) & data(27) &
data(28) & data(29) & data(28) & data(29) & data(30) & data(31) & data(32) & data(1);
end;

function substitute(data : vec48) return vec32 is

type S_block_type is array(0 to 63) of natural range 0 to 15;
constant S_blockO : S_block_type :=
--moods ROIVI
(14, 4, 13, 1, 2, 15, 11, 8, 3, 10, 6, 12, 5, 9, 0, 7, 0, 15, 7, 4, 14, 2, 13, 1, 10, 6, 12, 11,9, 5, 3, 8,
4, 1,14,8, 13,6,2. 11, 15, 12,9, 7, 3, 10, 5, 0,15, 12, 8, 2, 4, 9, 1. 7, 5, 11,3, 14, 10, 0, 6, 13);
constant S_block1 : S_block_type :=
-moods ROM
(15, 1,8, 14, 6, 11, 3,4, 9, 7,2, 13, 12, 0, 5, 10,3, 13,4, 7, 15,2, 8, 14, 12,0, 1, 10,6, 9, 11,5,
0, 14, 7, 11, 10, 4, 13, 1, 5,8, 12,6, 9, 3,2, 15, 13,8, 10, 1,3, 15, 4, 2, 11,6, 7, 12, 0, 5, 14,9);
constant S_block2 : S_block_type :=
-moods ROM
(10, 0. 9, 14, 6, 3, 15, 5, 1, 13, 12, 7, 11, 4, 2, 8, 13, 7, 0, 9, 3, 4, 6. 10, 2, 8, 5, 14, 12, 11, 15, 1,
13, 6, 4, 9, 8, 15, 3, 0, 11, 1, 2, 12, 5, 10, 14, 7, 1, 10, 13, 0 , 6 , 9 , 8. 7 , 4 , 15, 14, 3, 11,5, 2, 12);
constant S_block3 : S_block_type :=
—moods ROM
(7, 13, 14, 3, 0,6, 9, 10, 1,2, 8, 5, 11, 12,4, 15, 13, 8, 11, 5, 6, 15, 0, 3, 4, 7, 2, 12, 1, 10, 14,9,
10, 6, 9, 0, 12, 11, 7, 13, 15, 1,3, 14, 5,2, 8, 4,3, 15,0,6, 10, 1, 13, 8, 9, 4, 5, 11, 12, 7, 2, 14);
constant S_block4 : S_block_type :=
-moods ROM
(2, 12,4, 1, 7, 10, 11,6, 8, 5, 3, 15, 13, 0, 14, 9, 14, 11,2, 12,4, 7, 13, 1,5, 0, 15, 10, 3, 9, 8, 6,
4, 2, 1, 11, 10, 13, 7, 8. 15, 9, 12, 5, 6, 3, 0, 14,11,8, 12, 7, 1, 14. 2, 13, 6, 15, 0, 9, 10, 4, 5, 3);
constant S_block5 : S_block_type :=
-moods ROM
(12, 1, 10, 15, 9. 2, 6, 8, 0, 13, 3, 4, 14, 7, 5, 11, 10, 15, 4, 2, 7, 12, 9, 5, 6, 1, 13, 14, 0, 11, 3, 8,
9. 14, 15, 5,2, 8, 12. 3. 7, 0.4, 10, 1, 13. 11,6,4, 3,2, 12, 9, 5, 15, 10. 11, 14, 1, 7, 6, 0, 8, 13);
constant S_block6 : S_block_type :=
-moods ROM
(4, 11,2, 14, 15. 0,8, 13, 3, 12, 9, 7, 5, 10, 6, 1, 13, 0, 11,7, 4, 9, 1, 10, 14, 3,5, 12, 2, 15, 8,6,
1,4, 11, 13, 12, 3, 7, 14, 10, 15,6, 8, 0,5, 9, 2.6, 11, 13, 8, 1,4, 10, 7. 9. 5. 0, 15, 14, 2, 3, 12);
constant S_block7 : S_block_type :=
—moods ROM
( 1 3 , 2 , 8,4, 6, 15, 11, 1, 10,9, 3, 14, 5,0, 12, 7, 1, 15, 13, 8, 10, 3, 7,4, 12, 5.6, 11,0, 14, 9,2,
7, 11, 4, 1, 9, 12. 14. 2, 0, 6, 10, 13, 15, 3, 5, 8. 2, 1, 14, 7, 4, 10, 8. 13, 15, 12, 9, 0, 3. 5, 6, 11 );
begin
return
stdJogic_vector(to_unsigned(S_blockO(toJnteger(unsigned(data(1)&data(6)&data(2 to 5)))),4))&
std_logic_vector(to_unsigned(S_block1(toJnteger(unsigned(data(7)&data(12)&data(8 to 11)))).4))&
std_logic_vector(to_unsigned(S_block2(toJnteger(unsigned(data(13)&data(18)&data(14 to 17)))),4))&
std_logic_vector(to_unsigned(S_block3(toJnteger(unsigned(data(19)&data(24)&data(20 to 23)))),4))&
T.B. Yee, 2007 Appendix D: VHDL code listings

stdJogic_vector(to_unsigned(S_block4(toJnteger(unsigned(data(25)&data(30)&data(26 to 29)))),4))&
std_logic_vector(to_unsigned(S_block5(to_integer(unsigned(data(31)&data(36)&data(32 to 35)))),4))&
stdJogic_vector(to_unsigned(S_block6(toJnteger(unsigned(data(37)&data(42)&data(38 to 41)))),4))&
std_logic_vector(to_unsigned(S_block7(toJnteger(unsigned{data(43)&data(48)&data(44 to 47)))),4));
end;

function permute (data : in vec32) return vec32 is

begin
return
data(16) & data(7) & data(20) & data(21) & data(29) & data(12) & data(28) & data(17) &
data(1) & data(15) & data(23) & data(26) & data(5) & data(18) & data(31) & data(10) &
data(2) & data(8) & data(24) & data(14) & data(32) & data(27) & data(3) & data(9) &
data(19) & data(13) & data(30) & data(6) & data(22) & data(11) & data(4) & data(25); end;

function f(data : vec32; subkey : vec48) return vec32 is - Cipher function,f

begin
return permute(substitute(expand(data) xor subkey)); end;

function key_reduce1(key : in vec64) return vec56 is

begin
return
key(57) & key(49) & key(41) & key(33) & key(25) & key(17) & key(9) & key(1) &
key(58) & key(50) & key(42) & key(34) & key(26) & key(18) & key(10) & key(2) &
key(59) & key(51) & key(43) & key(35) & key(27) & key(19) & key(11) & key(3) &
key(60) & key(52) & key(44) & key(36) & key(63) & key(55) & key(47) & key(39) &
key(31) & key(23) & key(15) & key(7) & key(62) & key(54) & key(46) & key(38) &
key(30) & key(22) & key(14) & key(6) & key(61) & key(53) & key(45) & key(37) &
key(29) & key(21) & key(13) & key(5) & key(28) & key(20) & key(12) & key(4);
end;

function key_reduce2(key : in vec64) return vec56 is

begin
return
key(57) & key(49) & key(41) & key(33) & key(25) & key(17) & key(9) & key(1) &
key(58) & key(50) & key(42) & key(34) & key(26) & key(18) & key(IO) & key(2) &
key(59) & key(51) & key(43) & key(35) & key(27) & key(19) & key(11) & key(3) &
key(60) & key(52) & key(44) & key(36) & key(63) & key(55) & key(47) & key(39) &
key(31) & key(23) & key(15) & key(7) & key(62) & key(54) & key(46) & key(38) &
key(30) & key(22) & key(14) & key(6) & key(61) & key(53) & key(45) & key(37) &
key(29) & key(21) & key(13) & key(5) & key(28) & key(20) & key(12) & key(4);
end;

function key_rotate(key ; vec56; round : natural range 0 to 15; encrypt ; stdjogic) return vec56 is
type distance_type is array (natural range 0 to 31) of integer range 0 to 31;
constant shift_distance : distance_type :=
-moods ROM
(0, 1, 2, 2, 2, 2, 2, 2, 1, 2, 2, 2, 2, 2, 2, 1,
27, 27, 26, 26, 26, 26, 26, 26, 27, 26, 26. 26, 26, 26, 26, 27);
variable distance ; natural range 0 to 31;
begin
distance ;= shift_distance(to_integer(unsigned(encrypt & to_unsigned(round,4))));
return vec28(unsigned(key(1 to 28)) ror distance) & vec28(unsigned(key(29 to 56)) ror distance);
end;

function key_compress(key ; in vec56) return vec48 is

begin
return
key(14) & key(17) & key(11) & key(24) & key(1) & key(5) & key(3) & key(28) &
key(15) & key(6) & key(21) & key(10) & key(23) & key(19) & key(12) & key(4) &
key(26) & key(8) & key(16) & key(7) & key(27) & key(20) & key(13) & key(2) &
key(41) & key(52) & key(31) & key(37) & key(47) & key(55) & key(30) & key(40) &
key(51) & key(45) & key(33) & key(48) & key(44) & key(49) & key(39) & key(56) &
key(34) & key(53) & key(46) & key(42) & key(50) & key(36) & key(29) & key(32);
end;
T.B. Yee, 2 0 0 7 A p p e n d i x D: V H D L code listings

function des_core(plaintext: vec64; key : vec56; encrypt: s t d j o g i c ) return vec64 is

- m o o d s inline
variable data : vec64;
variable working_key : vec56 := key;
begin
data := initial_permutation(plaintext);
for round in 0 to 15 loop
working_key := key_rotate(working_key,round,encrypt);
data := data(33 to 64) & (f(data(33 to 64),key_compress(working_key)) xor data(1 to 32));
end loop;
return final_permutation(data(33 to 64) & data(1 to 32));
end;
end;

Figure D-11 VHDL package for trIple-DES example

******A***AAA******A**A***
Triple-DES
* * * * * * * * * * * * * * * * * * * * * * * * * *

library ieee;
use ieee.std_logiG_1164.all;
use work.des_functions.all;
entity tdes_ede2 is
port(
plaintext: in stdJogic_vector(1 to 32); -- now uses 32-bit input
keys: in stdJogic_vector(1 to 32); -- now uses 32-bit key (4 x 32-bits = 128-bit key)
in_hs_rdy: in std_logic_vector(0 downto 0);
in_hs_rcv: buffer stdjogic_vector(0 downto 0) := "0";
encrypt: in s t d j o g i c ;
out_hs_rdy: buffer stdjogic_vector(0 downto 0) := "0";
out_hs_rGv: in stdjogic_vector(0 downto 0);
ciphertext: out stdJogic_uector(1 to 32); - now uses 32-bit
sys_reset: in s t d j o g i c ;
- m o o d s reset
sys_clock: in s t d j o g i c
- m o o d s clock
):
end;
architecture behaviour of tdes_ede2 is
process
variable data, k e y l , key2 : vec64;
variable k e y : vec56;
variable mode : s t d j o g i c ;
begin
r e s e t j o o p : loop
in_hs_rcv <= "0";
out_hs_rdy <= "0";
wait until sys_clock'event and sys_clock = '1';
exit r e s e t j o o p when sys_reset = '1';
m a i n j o o p : loop
for in_cnt in 0 to 3 loop
while(in_hs_rdy = in_hs_rcv) loop
wait until sys_clock'event and sys_clock = '1';
end loop;
case in_cnt is
when 0 =>
data(33 to 64) := plaintext(1 to 32);
keyl (33 to 64) := keys(1 to 32);
T.B. Yee, 2007 Appendix D: VHDL code listings J J.

when 1 =>
data(1 to 32) := plaintext(1 to 32);
key1(1 to 32) := keys(1 to 32);
when 2 =>
key2(33 to 64) := keys(1 to 32);
when 3 =>
key2(1 to 32) := keys(1 to 32);
when others => NULL;
end case;
in_hs_rcv <= not in_hs_rcv;
wait until sys_clock'event and sys_clock = '1';
end loop;

for loop_cnt in 0 to 2 loop

case loop_cnt is
when 1 =>
key ;= key_reduce2(key2);
mode := not encrypt;
when others =>
key := key_reduce1(key1);
mode := encrypt;
end case;
data := des_core(data, key, mode);
end loop;
for out_cnt in 0 to 1 loop
while(out_hs_rdy /= out_hs_rcv) loop
wait until sys_clock'event and sys_clock = '1';
end loop;

case out_cnt is
when 0 => ciphertext(1 to 32) <= data(1 to 32);
when others => ciphertext(1 to 32) <= data(33 to 64);
end case;
out_hs_rdy <= not out_hs_rdy;
wait until sys_clock'event and sys_clock = '1';
end loop;
wait until sys_clock'event and sys_clock = i ' ;
exit resetjoop when sys_reset = '1';
end loop;
end loop;
end process;
end;

Figure D-12 VHDL of triple-DES example

The post-MOODS synthesis simulation of the non-pipelined multi-FPGA triple-DES core

is given in Figure D-13. Zoom in views of the simulation showing inputs and outputs
updates are given in Figure D-14. With a system clock period of 40 ns, the non-pipelined
multi-FPGA triple-DES core takes 3950 clock cycles (i.e. clock cycles = (158420 ns - 420
ns) / 40 ns) to encrypt 64-bit plaintext using a 128-bit key.
T.B. Yee, 2007 Appendix D: VHDL code listings 340

11
i

g i 2 s i t *

£ 5 5 £ £ 2 S
r J cJ r J fV rV rsi' ry

Figure D-13 Simulation of the non-pipelined multi-FPGA Triple-DES

T.B. Yee, 2007 Appendix D: VHDL code listings 341

8
I

ill
5 5 2 5

III n i l
Hi
4 * 4 * 3 3 3 3 3
2 .c' 2 £

II !
6 6 (6 6 6 6 61 6 6 6 I 6 Mm 6

Figure D-14 Simulation (zoom in views) of the non-pipelined multi-FPGA

Triple-DES
T.B. Yee, 2007 Appendix D: VHDL code listings 342

D.1.5 256-bit Advanced encryption standard

The 256-bit advanced encryption standard (AES) [146] implements the Rijndael algorithm
that processes data blocks of 128 bits using a 256-bit cipher key. The behavioural VHDL
of the 256-bit AES example is given in Figure D-16 and it uses the VHDL package in
Figure D-15.

* * * * * * * * * * * * *

- ************************************************
VHDL package for 256-AES packages

library ieee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;

package aes_procedures is
subtype u_sign8 is unslgned(1 to 8);
subtype u_slgn16 is unsigned(1 to 16);
subtype u_sign32 is unslgned(1 to 32);
subtype u_sign64 is unsigned(1 to 64);
subtype u_sign128 is unsigned(1 to 128);
type rom_tab_1 is array(0 to 255) of u_sign8;
type rom_tab_2 is array(0 to 29) of u_sign8;
type rom_tab_5 is array(0 to 255) of u_sign32;
type rom_tab_7 is array(0 to 255) of integer;
type tab_4 is array(0 to 3) of u_sign8;
type tab_a8 is array(0 to 7) of u_sign32;
type tab_a6 is array(0 to 5) of u_sign32;
type tab_a4 is array(0 to 3) ofu_sign32;
type tab_90 is array(0 to 89) of u_sign32;
type tab_44 is array(0 to 43) ofu_sign32;
type tab_64 is array(0 to 63) of u_sign32;

function word (a : in u_sign8) return u_sign32;

procedure r_oneto24(a: in u_sign32; q_out: out u_sign32);

procedure r_oneto16(a: in u_sign32; q_out: out u_sign32);

procedure r_oneto8(a: in u_sign32; q_out: out u_sign32);

procedure rco(
a: in unsigned(4 downto 0);
a_out: out u_sign32 );

end aes_procedures;
T.B. Yee, 2 0 0 7 A p p e n d i x D: V H D L code listings 343

package body aes_procedures is

function word (a : in u_sign8) return u_sign32 is

variable q : u_sign32;
begin
q := a(1 to 7) & "000000000000000000000000"
return q;
end word;

procedure r_oneto24(
a: in u_sign32;
q_out: out u_sign32
) is
begin
q_out := a(25 to 32) & a(1 to 24);
end r_oneto24;

procedure r_oneto16(
a: in u_sign32;
q_out: out u_sign32
) is
begin
q_out := a(17 to 32) & a(1 to 16);
end r_oneto16;

procedure r_oneto8(
-— moods Inline
a: in u_sign32;
q_out: out u_sign32
)is
begin
q_out := a(9 to 32) & a(1 to 8);
end r_oneto8;

procedure rco (
-— moods inline
a: in unsigned(4 downto 0);
a_out: out u_sign32
) is
constant rcotab: rom_tab_2 :=
- moods rom
("00000001", "00000010", "00000100", "00001000", "00010000",
"00100000", "01000000", "10000000", "00011011", "00110110",
"01101100", "11011000", "10101011", "01001101", "10011010",
"00101111", "01011110", "10111100", "01100011", "11000110",
"10010111", "00110101", "01101010", "11010100", "10110011",
"01111101", "11111010", "11101111", "11000101", "10010001");
'begin
a_out := rcotab(to_integer(a)) & "000000000000000000000000";
end rco;
end aes_procedures;
T.B. Yee, 2007 Appendix D: VHDL code listings )44

* * * * * * * * * *

- **************************************************
Encryption tables for 256-AES example

library leee;
use ieee.std_loglc_1164.all;
use ieee.numeric_std.all;

use work.aes_procedures.all;

package encryption_tables is

function fbsub(a: in u_sign8 ) return u_sign8;

function fbsub_quad(a: in u_sign32) return u_sign32;

function ftable( a : in u_sign8 ) return u_sign32;

function ftable_double( a, b : in u_sign8)return u_sign64;

procedure ftable_quad(
a: in u_sign32;
q_out: out u_sign32
);
end encryption_tables;

package body encryption_tables is

function fbsub ( a : In u_slgn8) return u_slgn8 is

-- moods inline
constant fbsubtab : rom_tab_1 :=
- moods rom
(''01100011","01111100","01110111","01111011'',''11110010'',"01101011 ","01101111","11000101"
"00110000","00000001","01100111","00101011","11111110","11010111 ","10101011","01110110",
"11001010","10000010","11001001 ","01111101", "11111010","01011001 "."01000111 ","11110000",
"10101101 ","11010100","10100010","10101111","10011100","10100100","01110010","11000000",
"10110111","11111101","10010011","00100110","00110110","00111 111","11110111","11001100",
"001 i o i o o " , " i o i o o i o r ' , " i 1 i o o i o i " , " i 1110001","01 i i o o o i " , " i 1 0 1 1 0 0 0 " , " 0 0 1 i o o o r ' , " o o o i o i o i " ,
"00000100","11000111 ","00100011 ","11000011 ","00011000","10010110","00000101 ","10011010",
"00000111 ","00010010","10000000","11100010","11101011 ","00100111 ","10110010","01110101",
"00001001","10000011","00101100","00011010","00011011","01101110","01011010","10100000",
"01010010","00111011","11010110","10110011","00101001","11100011","00101111","10000100",
"01010011 ","11010001 ","00000000","11101101", "00100000","11111100","10110001 ","01011011",
" 0 1 1 0 1 0 1 0 " , " 1 1 0 0 1 0 1 1 " , " 1 0 1 1 1 1 1 0 " , " 0 0 1 1 1 0 0 1 " , " 0 1 0 0 1 0 1 0 " , " 0 1 0 0 1 1 0 0 " , " 0 1 0 1 1 0 0 0 " , " 1 1 0 0 1 111",
"11010000","11101111","10101010","11111011","01000011","01001101","00110011","10000101",
"01000101 ","11111001 ","00000010","01111111 ","01010000","00111100","10011111 ","10101000",
"01010001","10100011","01000000","10001111","10010010","10011101","00111000","11110101",
"10111100","10110110","11011010","00100001 ","00010000","11111111'' , " 1 1 1 1 0 0 1 1 " , " 1 1 0 1 0 0 1 0 " ,
" 1 1 0 0 1 1 0 1 " , " 0 0 0 0 1 1 0 0 " , " 0 0 0 1 0 0 1 1 " , " 1 1 1 0 1 1 0 0 " , " 0 1 0 1 1 1 1 1 " , " 1 0 0 1 0 1 1 1 ' ,"01000100","00010111".
"11000100","10100111 ","01111110","00111101 ","01100100","01011101" ,"00011001 ","01110011",
"01100000","1000000r',"01001111","11011100","00100010","00101010" "10010000","10001000",
"01000110","11101110","10111000","00010100","11011110","01011110","00001011","11011011",
"11100000","00110010","00111010","00001010","01001001","00000110","00100100","01011100",
"11000010","11010011","10101100","01100010","10010001","10010101","11100100","01111001",
"11100111 ","11001000","00110111","01101101","1000110r',"11010101 ","01001110","10101001"
" 0 1 1 0 1 1 0 0 " , " 0 1 0 1 0 1 1 0 " , " 1 1 1 1 0 1 0 0 " , " 1 1 1 0 1 0 1 0 " , " 0 1 1 0 0 1 0 1 " , " 0 1 1 1 1 0 1 0 " , '" 1 0 1 0 1 1 1 0 " , " 0 0 0 0 1 0 0 0 " ,
" 1 0 1 1 1 0 1 0 " , " 0 1 1 1 1 0 0 0 " , " 0 0 1 0 0 1 0 1 " , " 0 0 1 0 1 1 1 0 " , " 0 0 0 1 1 1 0 0 " , " 1 0 1 0 0 1 1 0 " ,, " 1 0 1 1 0 1 0 0 " , " 1 1 0 0 0 1 1 0 " ,
" 1 1 1 0 1 0 0 0 " , " 1 1 0 1 1 1 0 1 " , " 0 1 1 1 0 1 0 0 " , " 0 0 0 1 1 1 1 1 " , " 0 1 0 0 1 0 1 1 " , " 1 0 1 1 1 1 0 1 ' ',"10001011","10001010",
"01110000","00111110","10110101 ","01100110","01001000","00000011' ',"11110110"."00001110",
"01100001 ","00110101 ","01010111 ","1011100r',"10000110","11000001' ',"00011101","10011110",
"11100001","1 111 1000","10011000","00010001","01101001","11011001' , " 1 0 0 0 1 1 1 0 " , " 1 0 0 1 0 1 0 0 " ,
"10011011","00011110","10000111","11101001","11001110","01010101" ,"00101000","11011111",
"10001100","10100001","10001001","00001101","10111 111","11100110","01000010","01101000",
"0100000r',"10011001","00101101","00001111","10110000","01010100","10111011","00010110":
T.B. Yee, 2007 A p p e n d i x D: V H D L code listings M5

variable b : natural range 0 to 255;

variable q : u_sign8;
begin
b := to_lnteger(a);
q := fbsubtab(b);
return q;
end fbsub;

function fbsub_quad ( a : in u_sign32) return u_slgn32 Is

— moods Inline
constant fbsubtab : rom_tab_1 :=
- moods rom
("01100011","01111100",''01110111",''01111011'',''11110010",''01101011 ","01101111","11000101"
"00110000","00000001","01100111","00101011","11111110","11010111 ","10101011 "."01110110",
"11001010","10000010","11001001 ","01111101","11111010","01011001'{"01000111"!"11110000",
"10101101","11010100","10100010","10101111","10011100","101001 GO","01110010","11000000",
"10110111","11111101","10010011","00100110","00110110","00111 111","11110111","11001100",
"00110100","10100101","11100101","11110001","01110001","11011000","00110001 ","00010101",
"00000100","11000111","00100011","11000011","00011000","10010110","00000101","10011010",
"00000111","00010010","10000000","11100010","11101011","00100111","10110010","01110101",
"0000100r',"10000011 ","00101100","00011010","00011011 ","01101110","01011010","10100000",
"01010010","00111011","11010110","10110011","00101001","11100011","00101111","10000100",
"01010011 ","11010001","00000000","11101101 ","00100000","1 111 1100","10110001 ","01011011",
" 0 1 1 0 1 0 1 0 " , " 1 1 0 0 1 0 1 1 " , " 1 0 1 1 1 1 1 0 " , " 0 0 1 1 1 0 0 r ' , " 0 1 0 0 1 0 1 0 " , " 0 1 0 0 1 1 0 0 " , " 0 1 0 1 1 0 0 0 " , " 1 1 0 0 1 111",
"11010000","11101111","10101010","11111011","01000011","01001101","00110011","10000101",
"01000101"."1111100r',"00000010","01111111","01010000","00111100","10011111","10101000",
"01010001","10100011"."01000000","10001111 ","10010010","10011101 ","00111000","11110101",
"10111100","10110110","11011010","00100001","00010000","11111 111","11110011","11010010",
"11001101","00001100","00010011","11101100","01011111"."10010111","01000100","00010111",
"11000100","10100111","01111110","00111101","01100100","01011101","0001100r',"01110011",
"01100000","10000001 ","01001111","11011100","00100010","00101010","10010000","10001000",
"01000110","11101110","10111000","00010100","11011110","01011110","00001011","11011011",
"11100000","00110010","00111010","00001010","01001001 ","00000110","00100100","01011100",
"11000010","11010011","10101100","01100010","10010001","10010101","11100100","01111001",
"11100111","11001000","00110111","01101101","10001101","11010101","01001110","10101001",
"01101100","01010110","11110100","11101010","01100101","01111010","10101110","00001000",
"10111010","01111000","00100101 ","00101110","00011100","10100110","10110100","11000110",
"11101000","11011101","01110100","00011111","01001011","10111101","10001011","10001010",
"01110000","00111110","10110101 ","01100110","01001000","00000011 ","11110110","00001110",
"01100001 ","00110101 ","01010111","10111001 ","10000110","11000001 ","00011101","10011110",
"11100001 ","11111000","10011000","00010001","01101001 ","11011001 ","10001110","10010100",
"10011011","00011110","10000111","11101001","11001110","01010101 ","00101000","11011111",
"10001100","10100001 ","10001001 ","00001101",' 10111111","11100110","01000010","01101000",
" 0 1 0 0 0 0 0 1 1 0 0 1 1 0 0 1 ","00101101 ","00001 111",' 1 0 1 1 0 0 0 0 " , " 0 1 0 1 0 1 0 0 " , " 1 0 1 1 1 0 1 1 ","00010110");
variable q: u_slgn32;
begin
q(1 to 8) := fbsubtab(toJnteger(a(1 to 8)));
q{9 to 16) := fbsubtab(to_integer(a(9 to 16)));
q(17 to 24) := fbsubtab(toJnteger(a(17 to 24)));
q(25 to 32) := fbsubtab(to_lnteger(a(25 to 32)));
return q;
end fbsub_quad;

function ftable(a : in u_slgn8) return u_slgn32 Is

-— moods Inline
constant ftabletab : rom_tab_5 :=
- moods rom
(
- Hex(C6,63,63,A5), Hex(F8,7C,7C,84), Hex(EE,77,77.99), Hex(F6,7B,7B,8D)
"11000110011000110110001110100101", "11111000011111000111110010000100",
"11101110011101110111011110011001", "11110110011110110111101110001101",
T.B. Yee, 2007 Appendix D: VHDL code listings 34g

- Hex(FF,F2,F2,0D), Hex(D6,6B,6B.BD), Hex(DE,6F,6F,B1), Hex(91,C5 C5 54)

"11111111111100101111001000001101", "11010110011010110110101110111101"
"11011110011011110110111110110001", "10010001110001011100010101010100" '
- Hex(60,30,30,50), Hex(02,01,01,03), Hex(CE,67,67,A9), Hex(56,2B,2B 70)
"01100000001100000011000001010000", "00000010000000010000000100000011"
" 1 1 0 0 1 1 1 0 0 1 1 0 0 1 1 1 0 1 1 0 0 1 1 1 1 0 1 0 1 0 0 1 " , " 0 1 0 1 0 1 1 0 0 0 1 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 111 101", '
- Hex(E7,FE,FE,19), Hex(B5,D7,D7,62), Hex(4D,AB,AB,E6), Hex(EC,76,76 9A)
"11100111111111101111111000011001". "10110101110101111101011101100010",
"01001101101010111010101111100110", "11101100011101100111011010011010", '
-- Hex(8F,CA,CA,45), Hex(1 F,82,82,9D), Hex(89,C9,C9,40), Hex(FA,7D,7D 87)
"10001111110010101100101001000101", "00011111100000101000001010011101",
"10001001110010011100100101000000", "11111010011111010111110110000111", '
- Hex(EF,FA,FA,15), Hex(B2,59,59,EB), Hex(8E,47,47,C9), Hex(FB,FO,FO,OB)
"11101111111110101111101000010101", "10110010010110010101100111101011",
"10001110010001110100011111001001", "11111011111100001111000000001 o i l " , '
- Hex(41 ,AD,AD,EC), Hex(B3,D4,D4,67), Hex(5F,A2,A2,FD), Hex(45,AF,AF.EA)
"01000001101011011010110111101100", "10110011110101001101010001100111",
"01011111101000101010001011111101", "01000101101011111010111111101010"
- Hex(23,9C,9C,BF), Hex(53,A4,A4,F7), Hex(E4,72,72,96), Hex(9B,C0,C0,5B)
"00100011100111001001110010111111", "01010011101001001010010011110111",
"11100100011100100111001010010110", "10011011110000001100000001011011",
- Hex(75,B7,B7,C2), Hex(E1,FD,FD,1C), Hex(3D,93,93,AE), Hex(4C,26,26,6A)
"01110101101101111011011111000010", "11100001111111011111110100011100",
"00111101100100111001001110101110", "01001100001001100010011001101010", '
- Hex(6C,36,36,5A), Hex(7E,3F,3F,41), Hex(F5,F7,F7,02), Hex(83,CC,CC,4F)
"01101100001101100011011001011010", "01111110001111110011111101000001",
" 1 1 1 1 0 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 1 0 " , " 1 0 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 0 1 0 0 1 111",
- Hex(68,34,34,5C), Hex(51,A5,A5,F4), Hex(D1,E5,E5,34), Hex(F9,F1,F1,08)
"01101000001101000011010001011100", "01010001101001011010010111110100",
"11010001111001011110010100110100", "11111001111100011111000100001000",
- Hex(E2,71,71,93), Hex(AB,D8,D8,73), Hex(62,31,31,53), Hex(2A,15,15,3F)
"11100010011100010111000110010011", "10101011110110001101100001110011",
"01100010001100010011000101010011", "00101010000101010001010100111111",
- Hex(08,04,04,0C), Hex(95,C7,C7,52), Hex(46,23,23,65), Hex(9D,C3,C3,5E)
"00001000000001000000010000001100", "10010101110001111100011101010010",
"01000110001000110010001101100101", "10011101110000111100001101011110",
- Hex(30,18,18,28), Hex(37,96,96,A1), Hex(0A,05,05,0F), Hex(2F,9A,9A,B5)
"00110000000110000001100000101000", "00110111100101101001011010100001",
"00001010000001010000010100001111", "00101111100110101001101010110101",
- Hex(0E,07,07,09), Hex(24.12,12,36), Hex(1B,80,80,9B), Hex(DF,E2,E2,3D)
"00001110000001110000011100001001", "00100100000100100001001000110110",
"00011011100000001000000010011011", "11011111111000101110001000111101",
- Hex(CD,EB,EB,26), Hex(4E,27,27,69), Hex(7F,B2,B2,CD), Hex(EA,75,75,9F)
"11001101111010111110101100100110", "01001110001001110010011101101001",
"01111111101100101011001011001101", "11101010011101010111010110011111",
- Hex(12,09,09,IB), Hex(1D,83,83,9E), Hex(58,2C,2C,74), Hex(34,1A.1A,2E)
"00010010000010010000100100011011", "00011101100000111000001110011110",
"01011000001011000010110001110100", "00110100000110100001101000101110",
- Hex(36,1B,1B,2D), Hex(DC,6E,6E,B2), Hex(B4,5A,5A,EE), Hex(5B,A0,A0,FB)
"00110110000110110001101100101101", "11011100011011100110111010110010",
"10110100010110100101101011101110", "01011011101000001010000011111011",
- Hex(A4,52.52,F6), Hex(76,3B,3B,4D), Hex(B7,D6,D6,61), Hex(7D.B3,B3.CE)
"10100100010100100101001011110110", "011101100011101100111 Oil01001101",
"10110111110101101101011001100001", "01111101101100111011001111001110",
- Hex(52,29,29,7B), Hex(DD,E3,E3,3E), Hex(5E,2F,2F,71), Hex(13,84,84,97)
"01010010001010010010100101111011", "11011101111000111110001100111110",
"01011110001011110010111101110001", "00010011100001001000010010010111",
- Hex(A6,53,53,F5), Hex(B9,D1,D1,68), Hex(00,00,00,00), Hex(C1,ED,ED,2C)
"10100110010100110101001111110101", "10111001110100011101000101101000",
"00000000000000000000000000000000", "11000001111011011110110100101100",
- Hex(40,20,20,60), Hex(E3,FC,FC,1F), Hex(79,B1,B1,C8), Hex(B6,5B,5B,ED)
"01000000001000000010000001100000", "11100011111111001111110000011111",
"01111001101100011011000111001000", "10110110010110110101101111101101",
T.B. Yee, 2007 Appendix D: VHDL code listings g/j.'y

- Hex(D4,6A,6A,BE), Hex(8D,CB,CB,46), Hex(67,BE,BE,D9), Hex(72,39,39,4B)

"11010100011010100110101010111110", "10001101110010111100101101000110",
"01100111101111101011111011011001", "01110010001110010011100101001011",
- Hex(94,4A,4A,DE), Hex(98,4C,4C,D4), Hex(B0,58,58,E8), Hex(85,CF,CF,4A)
"10010100010010100100101011 O i l 110", "10011000010011000100110011010100",
"10110000010110000101100011101000", "10000101110011111100111101001010",
- Hex(BB,D0.D0,6B), Hex(C5,EF,EF,2A), Hex(4F,AA,AA,E5), Hex(ED,FB,FB,16),
"10111011110100001101000001101011", "11000101111011111110111100101010",
"01001111101010101010101011100101", "11101101111110111111101100010110",
- Hex(86,43,43,C5), Hex(9A,4D,4D,D7), Hex(66,33,33,55), Hex(11,85,85,94)
"10000110010000110100001111000101", "10011010010011010100110111010111",
"01100110001100110011001101010101", "00010001100001011000010110010100",
- Hex(8A,45,45,CF), Hex(E9,F9,F9,10), Hex(04,02,02.06), Hex(FE,7F,7F,81)
"10001010010001010100010111001111", "11101001111110011111100100010000",
"00000100000000100000001000000110", "11111110011111110111111110000001",
- Hex(A0,50,50,F0), Hex(78.3C,3C,44), Hex(25,9F,9F,BA), Hex(4B,A8,A8,E3)
"10100000010100000101000011110000", "01111000001111000011110001000100".
"00100101100111111001111110111010", "01001011101010001010100011100011",
- Hex(A2,51,51,F3), Hex(5D,A3,A3,FE), Hex(80,40,40,CO), Hex(05,8F,8F,8A)
"10100010010100010101000111110011"01011101101000111010001111111110",
"10000000010000000100000011000000", "00000101100011111000111110001010",
- Hex(3F,92,92,AD), Hex(21,9D,9D,BC), Hex(70,38,38,48), Hex(F1,F5,F5,04)
"00111111100100101001001010101101", "00100001100111011001110110111100",
"01110000001110000011100001001000", "11110001111101011111010100000100",
- Hex(63,BC,BC,DF), Hex(77,B6,B6,C1), Hex(AF.DA,DA,75), Hex(42,21,21,63)
"01100011101111001011110011011111", "01110111101101101011011011000001",
"10101111110110101101101001110101", "01000010001000010010000101100011",
- Hex(20,10,10,30), Hex(E5,FF,FF,1A), Hex(FD,F3,F3,0E), Hex(BF,D2,D2,6D)
"00100000000100000001000000110000", "11100101111111111111111100011010",
"11111101111100111111001100001110", "10111111110100101101001001101101",
- Hex(81,CD,CD,4C), Hex(18,0C,0C,14), Hex(26,13,13,35), Hex(C3,EC,EC,2F)
"10000001110011011100110101001100", "00011000000011000000110000010100",
"00100110000100110001001100110101", "11000011111011001110110000101111",
-Hex(BE,5F,5F,E1), Hex(35,97,97,A2), Hex(88,44,44,CC), Hex(2E,17,17,39)
"10111110010111110101111111100001", "00110101100101111001011110100010",
"10001000010001000100010011001100", "00101110000101110001011100111001",
- Hex(93,C4,C4,57), Hex(55,A7,A7,F2), Hex(FC,7E,7E,82), Hex(7A,3D,3D,47)
"10010011110001001100010001010111", "01010101101001111010011111110010",
"11111100011111100111111010000010", "01111010001111010011110101000111",
- Hex(C8,64,64,AC), Hex(BA,5D,5D,E7), Hex(32,19,19,2B), Hex(E6,73,73,95)
"11001000011001000110010010101100", "10111010010111010101110111100111",
"00110010000110010001100100101011", "11100110011100110111001110010101",
- Hex(C0,60,60,A0), Hex(19,81,81,98), Hex(9E,4F.4F,D1), Hex(A3,DC,DC,7F)
"11000000011000000110000010100000", "00011001100000011000000110011000",
"10011110010011110100111111010001", "10100011110111001101110001111111",
- Hex(44,22,22,66), Hex(54,2A,2A,7E), Hex(3B,90,90,AB), Hex(0B.88,88,83)
"01000100001000100010001001100110", "01010100001010100010101001111110",
"00111011100100001001000010101 oil", " 0 0 0 0 1 0 1 1 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 1 1 " ,
- Hex(8C,46,46,CA), Hex(C7,EE,EE,29), Hex(6B,B8,B8,D3), Hex(28,14,14,3C)
"10001100010001100100011011001010", "11000111111011101110111000101001",
"01101011101110001011100011010011", "00101000000101000001010000111100",
- Hex(A7,DE,DE,79), Hex(BC,5E,5E,E2), Hex(16,0B,0B,1D), Hex(AD,DB,DB,76)
"10100111110111101101111001111001", "10111100010111100101111011100010",
"00010110000010110000101100011101"10101101110110111101101101110110",
- Hex(DB,E0,E0,3B), Hex(64,32,32,56), Hex(74,3A,3A,4E), Hex(14,0A,0A,1E)
"11011011111000001110000000111011", "01100100001100100011001001010110",
"01110100001110100011101001001110", "00010100000010100000101000011110",
- Hex(92,49,49,DB), Hex(OC,06,06,OA), Hex(48,24,24,6C), Hex(B8,5C,5C,E4)
"10010010010010010100100111011011", "00001100000001100000011000001010",
"01001000001001000010010001101100", "10111000010111000101110011100100",
- Hex(9F,C2,C2,5D), Hex(BD,D3.D3,6E). Hex(43,AC,AC,EF), Hex(C4,62,62,A6)
"10011111110000101100001001011101", "10111101110100111101001101101110",
"01000011101011001010110011101111", "11000100011000100110001010100110",
T.B. Yee, 2007 Appendix D: VHDL code listings 34g

- Hex(39,91,91,A8), Hex(31,95,95,A4), Hex(D3,E4,E4,37), Hex(F2,79,79,86)

"00111001100100011001000110101000", "00110001100101011001010110100100",
"11010011111001001110010000110111", "11110010011110010111100110001011",
- Hex(D5,E7,E7,32), Hex(8B,C8,C8,43), Hex(6E,37,37,59), Hex(DA,6D,6D,B7)
"11010101111001111110011100110010", "10001011110010001100100001000011",
"01101110001101110011011101011001", "11011010011011010110110110110111",
- Hex(01,8D,8D,8C), Hex(B1,D5,D5,64), Hex(9C,4E,4E,D2), Hex(49,A9,A9,E0)
"00000001100011011000110110001100", "10110001110101011101010101100100",
"10011100010011100100111011010010", "01001001101010011010100111100000",
- Hex(D8,6C,6C,B4), Hex(AC,56,56,FA), Hex(F3,F4,F4,07), Hex(CF,EA,EA,25)
"11011000011011000110110010110100", "10101100010101100101011011111010",
"11110011111101001111010000000111", "11001111111010101110101000100101".
- Hex(CA,65,65,AF), Hex(F4,7A,7A,8E), Hex(47,AE,AE,E9), Hex(10,08,08,18)
"11001010011001010110010110101111", "11110100011110100111101010001110",
"01000111101011101010111011101001", "00010000000010000000100000011000",
- Hex(6F,BA,BA,D5), Hex(F0,78,78,88), Hex(4A,25,25,6F), Hex(5C,2E,2E,72)
"01101111101110101011101011010101", "11110000011110000111100010001000",
"01001010001001010010010101101111", "01011100001011100010111001110010",
- Hex(38,1C,1C,24), Hex(57,A6,A6,F1), Hex(73,B4,B4,C7), Hex(97,C6,C6,51)
"00111000000111000001110000100100", "01010111101001101010011011110001",
"01110011101101001011010011000111"," 10010111110001101100011001010001",
- Hex(CB,E8,E8,23), Hex(A1,DD,DD,7C), Hex(E8,74,74,9C), Hex(3E,1F,1F,21)
"11001011111010001110100000100011", "10100001110111011101110101111100",
"11101000011101000111010010011100", "00111110000111110001111100100001",
- Hex(96,4B,4B,DD), Hex(61,BD,BD,DC), Hex(0D,8B.8B,86), Hex(0F,8A,8A,85)
"10010110010010110100101111011101", "01100001101111011011110111011100",
"00001101100010111000101110000110", "00001111100010101000101010000101",
- Hex(E0,70,70,90), Hex(7C,3E,3E,42), Hex(71,B5,B5,C4), Hex(CC,66.66,AA)
"11100000011100000111000010010000", "01111100001111100011111001000010",
"01110001101101011011010111000100", "11001100011001100110011010101010",
- Hex(90,48,48,08), Hex(06,03,03,05), Hex(F7,F6,F6,01), Hex(1C,0E,0E,12)
"10010000010010000100100011011000", "00000110000000110000001100000101",
"11110111111101101111011000000001"00011100000011100000111000010010",
- Hex(C2,61,61,A3), Hex(6A,35,35,5F), Hex(AE,57,57,F9), Hex(69,B9,B9,D0)
"11000010011000010110000110100011", "01101010001101010011010101011111",
"10101110010101110101011111111001", "01101001101110011011100111010000",
- Hex(17,86,86,91), Hex(99,C1,C1,58), Hex(3A,1D,1D,27), Hex(27,9E,9E,B9)
"00010111100001101000011010010001", "10011001110000011100000101011000",
"00111010000111010001110100100111", "00100111100111101001111010111001",
- Hex(D9,E1,E1,38), Hex(EB,F8,F8,13), Hex(2B,98,98,B3), Hex(22,11,11,33)
"11011001111000011110000100111000", "11101011111110001111100000010011",
"00101011100110001001100010110011", "00100010000100010001000100110011",
- Hex(D2,69,69,BB), Hex(A9,D9,D9,70), Hex(07,8E,8E,89), Hex(33,94,94,A7)
"11010010011010010110100110111011", "10101001110110011101100101110000",
"00000111100011101000111010001001", "00110011100101001001010010100111",
- Hex(2D,9B,9B,B6), Hex(3C,1E,1E,22), Hex(15,87,87,92), Hex(C9,E9,E9,20)
"00101101100110111001101110110110", "00111100000111100001111000100010",
"00010101100001111000011110010010", "11001001111010011110100100100000",
- Hex(87,CE,CE,49), Hex(AA,55,55,FF), Hex(50,28,28,78), Hex(A5.DF,DF,7A)
"10000111110011101100111001001001", "10101010010101010101010111111111",
"01010000001010000010100001111000", "10100101110111111101111101111010",
- Hex(03,8C,8C,8F), Hex(59,A1,A1,F8), Hex(09,89,89,80), Hex(1A,0D,0D,17)
"00000011100011001000110010001111", "01011001101000011010000111111000",
"00001001100010011000100110000000", "00011010000011010000110100010111",
-- Hex(65,BF,BF,DA), Hex(D7,E6,E6,31), Hex(84,42,42,C6), Hex(D0,68,68,B8)
"01100101101111111011111111011010", "11010111111001101110011000110001",
"10000100010000100100001011000110", "11010000011010000110100010111000",
- Hex(82,41,41,C3), Hex(29,99,99,B0), Hex(5A,2D,2D,77), Hex(1E,0F,0F,11)
"10000010010000010100000111000011", "00101001100110011001100110110000",
"01011010001011010010110101110111", "00011110000011110000111100010001",
- Hex(7B,B0,B0,CB), Hex(A8,54,54,FC), Hex(6D,BB,BB,D6), Hex(2C,16,16,3A)
"01111011101100001011000011001011", "10101000010101000101010011111100",
"01101101101110111011101111010110", "00101100000101100001011000111010");
T.B. Yee, 2007 Appendix D: VHDL code listings 349

variable b : natural range 0 to 255;

variable q : u_slgn32;
begin

b := tojnteger(a);
q := ftabletab(b);
return q;

end ftable;

function ftable_double( a, b : In u_sign8 ) return u_slgn64 is

moods inline
constant ftabletab : rom_tab_5 :=
-- moods rom

- Hex(C6,63,63,A5), Hex(F8,7C,7C,84), Hex(EE,77,77,99), Hex(F6.7B,78,80)

"11000110011000110110001110100101", "11111000011111000111110010000100",
"11101110011101110111011110011001", "11110110011110110111101110001101",
- Hex(FF,F2,F2,0D), Hex(D6,6B,6B,BD), Hex(DE,6F,6F,B1). Hex(91 ,C5,C5,54)
"11111111111100101111001000001101", "11010110011010110110101110111101",
"11011110011011110110111110110001", "10010001110001011100010101010100",
- Hex(60,30,30,50), Hex(02,01,01,03), Hex(CE,67,67,A9), Hex(56,2B,2B,7D)
"01100000001100000011000001010000", "00000010000000010000000100000011
"11001110011001110110011110101001", "01010110001010110010101101111101",
- Hex(E7,FE,FE,19), Hex(B5,D7,D7,62), Hex(4D,AB,AB,E6), Hex(EC,76,76,9A)
"11100111111111101111111000011001", "10110101110101111101011101100010",
" 0 1 0 0 1 1 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 1 1 0 0 1 1 0 " , "11101100011101100111 oil010011010",
- Hex(8F,CA,CA,45), Hex(1F,82,82,9D), Hex(89,C9,C9,40), Hex(FA,7D,7D,87)
"10001111110010101100101001000101", "00011111100000101000001010011101",
"10001001110010011100100101000000", "11111010011111010111110110000111",
- Hex(EF,FA,FA,15), Hex(B2,59,59,EB), Hex(8E,47,47,C9), Hex(FB,FO,FO.OB)
"11101111111110101111101000010101", "10110010010110010101100111101011",
"10001110010001110100011111001001", "11111011111100001111000000001011",
- Hex(41,AD,AD,EC), Hex(B3,D4,D4,67), Hex(5F,A2,A2,FD), Hex(45,AF,AF,EA)
"01000001101011011010110111101100", "10110011110101001101010001100111",
"01011111101000101010001011111101", "01000101101011111010111111101010",
- Hex(23,9C,9C,BF), Hex(53,A4,A4,F7), Hex(E4,72,72,96), Hex(9B,C0,C0,5B)
"00100011100111001001110010111111", "01010011101001001010010011110111",
"11100100011100100111001010010110", "10011011110000001100000001011011",
- Hex(75,B7,B7,C2), Hex(E1,FD,FD,1C), Hex(3D,93,93,AE), Hex{4C,26,26,6A)
"01110101101101111011011111000010", "11100001111111011111110100011100",
"00111101100100111001001110101110", "01001100001001100010011001101010",
- Hex(6C,36,36,5A), Hex(7E,3F,3F,41), Hex(F5,F7,F7,02), Hex(83,CC,CC,4F)
"01101100001101100011011001011010", "01111110001111110011111101000001",
"11110101111101111111011100000010", "10000011110011001100110001001111",
- Hex(68,34,34,5C), Hex(51,A5,A5,F4), Hex(D1,E5,E5,34), Hex(F9,F1,F1,08)
"01101000001101000011010001011100", "01010001101001011010010111110100",
"11010001111001011110010100110100", "11111001111100011111000100001000",
- Hex(E2,71,71,93), Hex(AB,08,08,73), Hex(62,31,31,53), Hex(2A,15,15,3F)
"11100010011100010111000110010011", "10101011110110001101100001110011",
"01100010001100010011000101010011", "00101010000101010001010100111111",
- Hex(08,04,04,0C), Hex(95,C7,C7,52), Hex(46,23,23,65), Hex(90,C3,C3,5E)
"00001000000001000000010000001100", "10010101110001111100011101010010",
"01000110001000110010001101100101", "10011101110000111100001101011110",
- Hex(30,18,18,28), Hex(37,96,96,A1), Hex(0A,05,05,0F), Hex(2F,9A,9A,B5)
"00110000000110000001100000101000", "00110111100101101001011010100001",
"00001010000001010000010100001 111", "00101111100110101001101010110101",
- Hex(OE,07,07,09), Hex(24,12,12,36), Hex(1B,80,80,98), Hex(0F,E2.E2,3D)
"00001110000001110000011100001001", "00100100000100100001001000110110",
"00011011100000001000000010011011", "11011111111000101110001000111101",
- Hex(CD,EB,EB,26), Hex(4E,27,27,69), Hex(7F,B2,B2,CD), Hex(EA,75,75,9F)
"11001101111010111110101100100110", "01001110001001110010011101101001",
"01111111101100101011001011001101", "11101010011101010111010110011111",
T.B. Yee, 2007 Appendix D: VHDL code listings 35Q

- Hex(12,09,09,18), Hex(1D,83,83,9E), Hex(58,2C,2C,74), Hex(34,1A,1A,2E)

"00010010000010010000100100011011"00011101100000111000001110011110",
"01011000001011000010110001110100", "00110100000110100001101000101110",
- Hex(36,1B,1B,2D), Hex(DC,6E,6E,B2), Hex(B4,5A,5A,EE), Hex(5B,A0,A0,FB)
"00110110000110110001101100101101"11011100011011100110111010110010",
"10110100010110100101101011101110", "01011011101000001010000011111011",
- Hex(A4,52,52,F6), Hex(76,3B,3B,4D), Hex(B7,D6,D6,61), Hex(7D,B3,B3,CE)
"10100100010100100101001011110110", "01110110001110110011101101001101",
"10110111110101101101011001100001", "01111101101100111011001111001110",
- Hex(52,29,29,7B), Hex(DD,E3,E3,3E), Hex(5E,2F,2F,71), Hex(13,84,84,97)
"01010010001010010010100101111011", "11011101111000111110001100111110",
"01011110001011110010111101110001", "00010011100001001000010010010111",
- Hex(A6,53,53,F5), Hex(B9,D1 ,D1,68), Hex(00,00,00,00), Hex(C1 ,ED,ED,2C)
"10100110010100110101001111110101", "10111001110100011101000101101000",
"00000000000000000000000000000000", "11000001111011011110110100101100",
- Hex(40,20,20,60), Hex(E3,FC,FC,1 F), Hex(79,B1 ,B1 ,C8), Hex(B6,5B,5B,ED)
"01000000001000000010000001100000", "11100011111111001111110000011111",
"01111001101100011011000111001000", "10110110010110110101101111101101".
- Hex(D4,6A,6A,BE), Hex(8D,CB,CB,46), Hex(67,BE,BE,D9), Hex(72,39,39,48)
"11010100011010100110101010111110", "10001101110010111100101101000110",
"01100111101111101011111011011001", "01110010001110010011100101001011",
- Hex(94,4A,4A.DE), Hex(98,4C,4C,D4), Hex(80,58,58,E8). Hex(85,CF,CF,4A)
"10010100010010100100101011011110", "10011000010011000100110011010100",
"10110000010110000101100011101000", "10000101110011111100111101001010",
- Hex(BB,D0,D0,6B), Hex(C5,EF,EF,2A), Hex(4F,AA,AA,E5), Hex(ED,FB,FB.16),
"10111011110100001101000001101011"11000101111011111110111100101010",
"01001111101010101010101011100101", "11101101111110111111101100010110",
- Hex(86,43,43,C5), Hex(9A,4D,4D,D7), Hex(66,33,33,55), Hex(11.85,85,94)
"10000110010000110100001111000101", "10011010010011010100110111010111",
"01100110001100110011001101010101", "00010001100001011000010110010100",
- Hex(8A,45,45,CF), Hex(E9,F9,F9,10), Hex(04,02,02,06), Hex(FE,7F,7F,81)
"10001010010001010100010111001111", "11101001111110011111100100010000",
"00000100000000100000001000000110", "11111110011111110111111110000001",
- Hex(A0,50,50,F0), Hex(78,3C,3C.44), Hex(25,9F,9F,8A), Hex(48,A8,A8,E3)
"10100000010100000101000011110000", "01111000001111000011110001000100",
"00100101100111111001111110111010", "01001011101010001010100011100011",
- Hex(A2,51,51,F3), Hex(5D,A3,A3,FE). Hex(80,40,40,C0), Hex(05.8F,8F,8A)
"10100010010100010101000111110011", "01011101101000111010001111111110",
"10000000010000000100000011000000", "00000101100011111000111110001010",
- Hex(3F,92,92,AD), Hex(21,9D,9D,BC), Hex(70,38,38,48), Hex(F1 ,F5,F5,04)
"00111111100100101001001010101101", "00100001100111011001110110111100",
"01110000001110000011100001001000", "11110001111101011111010100000100",
- Hex(63,BC,BC,DF). Hex(77,86,B6,C1), Hex(AF,DA,DA,75), Hex(42.21,21,63)
"01100011101111001011110011011111", "01110111101101101011011011000001",
"10101111110110101101101001110101", "01000010001000010010000101100011",
- Hex(20,10,10,30), Hex(E5,FF,FF,1A), Hex(FD,F3,F3,0E), Hex(BF,D2,D2,6D)
"00100000000100000001000000110000", "11100101111111111111111100011010",
"11111101111100111111001100001110", "10111111110100101101001001101101",
- Hex(81,CD,CD,4C), Hex(18.0C,0C,14), Hex(26,13,13,35), Hex(C3,EC,EC,2F)
"10000001110011011100110101001100", "00011000000011000000110000010100",
"00100110000100110001001100110101", "11000011111011001110110000101111",
- Hex(BE,5F,5F,E1), Hex{35,97,97,A2), Hex(88,44,44,CC), Hex(2E,17,17,39)
"10111110010111110101111111100001", "00110101100101111001011110100010",
"10001000010001000100010011001100", "00101110000101110001011100111001",
- Hex(93,C4,C4,57), Hex(55,A7,A7,F2), Hex(FC,7E,7E,82), Hex(7A,3D,3D,47)
"10010011110001001100010001010111", "01010101101001111010011111110010",
"11111100011111100111111010000010", "01111010001111010011110101000111",
- Hex(C8,64,64,AC), Hex(BA,5D,5D,E7), Hex(32,19,19,2B), Hex(E6,73,73,95)
"11001000011001000110010010101100", "10111010010111010101110111100111",
"00110010000110010001100100101011", "11100110011100110111001110010101",
- Hex(C0,60,60,A0), Hex(19,81,81,98), Hex(9E,4F,4F,D1), Hex(A3,DC,DC,7F)
"11000000011000000110000010100000", "00011001100000011000000110011000",
"10011110010011110100111111010001", "10100011110111001101110001111111",
T.B. Yee, 2007 Appendix D: VHDL code listings 351

- Hex(44,22,22,66), Hex(54,2A,2A,7E), Hex(3B,90,90,AB), Hex(0B.88,88,83)

"01000100001000100010001001100110", "01010100001010100010101001111110",
"00111011100100001001000010101011", "00001011100010001000100010000011",
- Hex(8C,46,46.CA), Hex(C7,EE,EE,29), Hex(6B.B8,B8,D3), Hex(28.14.14,3C)
"10001100010001100100011011001010", "11000111111011101110111000101001",
"01101011101110001011100011010011", "00101000000101000001010000111100",
- Hex(A7,DE,DE,79), Hex(BC,5E,5E,E2), Hex(16,0B,0B,1D), Hex(AD,DB,DB,76)
"10100111110111101101111001111001", "10111100010111100101111011100010",
"00010110000010110000101100011101"10101101110110111101101101110110",
- Hex(DB,E0,E0,3B), Hex(64,32.32,56), Hex(74,3A,3A.4E), Hex(14.0A,0A,1E)
"11011011111000001110000000111011", "01100100001100100011001001010110",
"01110100001110100011101001001110", "00010100000010100000101000011110",
- Hex(92,49,49,DB). Hex(0C,06,06,0A), Hex(48,24,24,6C), Hex(B8,5C,5C,E4)
"10010010010010010100100111011011", "00001100000001100000011000001010",
"01001000001001000010010001101100", "10111000010111000101110011100100",
- Hex(9F,C2,C2,5D), Hex(BD,D3,D3,6E). Hex(43,AC,AC,EF), Hex(C4,62,62.A6)
"10011111110000101100001001011101", "10111101110100111101001101101110",
"01000011101011001010110011101111", "11000100011000100110001010100110",
- Hex(39,91,91 ,A8). Hex(31,95,95,A4), Hex(D3,E4,E4,37), Hex(F2,79.79.8B)
"00111001100100011001000110101000", "00110001100101011001010110100100",
"11010011111001001110010000110111". "11110010011110010111100110001011",
- Hex(D5,E7,E7,32), Hex(8B,C8,C8,43), Hex(6E,37,37,59), Hex(DA,6D,6D,B7)
"11010101111001111110011100110010", "10001011110010001100100001000011",
"01101110001101110011011101011001", "11011010011011010110110110110111".
- Hex(01,8D,8D,8C), Hex(B1,D5.D5,64), Hex(9C.4E,4E,D2), Hex(49,A9,A9,E0)
"00000001100011011000110110001100", "10110001110101011101010101100100",
"10011100010011100100111011010010", "01001001101010011010100111100000",
- Hex(D8,6C.6C.B4), Hex(AC,56,56,FA), Hex(F3,F4,F4,07), Hex(CF,EA,EA,25)
"11011000011011000110110010110100", "10101100010101100101011011111010",
"11110011111101001111010000000111", "11001111111010101110101000100101",
- Hex(CA,65,65,AF). Hex(F4,7A,7A.8E), Hex(47,AE,AE,E9), Hex(10,08,08,18)
"11001010011001010110010110101111", "11110100011110100111101010001110",
"01000111101011101010111011101001", "00010000000010000000100000011000",
- Hex(6F,BA,BA,D5), Hex(F0,78,78,88), Hex(4A,25,25,6F), Hex(5C.2E,2E,72)
"01101111101110101011101011010101", "11110000011110000111100010001000",
"01001010001001010010010101101111", "01011100001011100010111001110010",
- Hex(38,1C,1C,24), Hex(57,A6,A6.F1). Hex(73,B4,B4,C7), Hex(97,C6.C6,51)
"00111000000111000001110000100100", "01010111101001101010011011110001",
"01110011101101001011010011000111", "10010111110001101100011001010001",
- Hex(CB,E8,E8,23), Hex(A1,DD,DD,7C), Hex(E8,74,74,9C), Hex(3E,1F,1F,21)
"11001011111010001110100000100011", "10100001110111011101110101111100",
"11101000011101000111010010011100", "00111110000111110001111100100001",
- Hex(96,4B,4B,DD), Hex(61.BD,BD,DC), Hex(0D,8B,8B,86), Hex(0F,8A,8A,85)
"10010110010010110100101111011101", "01100001101111011011110111011100",
"00001101100010111000101110000110", "00001111100010101000101010000101",
- Hex(E0,70,70,90), Hex(7C,3E,3E,42), Hex(71,B5,B5,C4), Hex(CC.66.66,AA)
"11100000011100000111000010010000", "01111100001111100011111001000010",
"01110001101101011011010111000100", "11001100011001100110011010101010",
- Hex(90,48,48,08), Hex(06,03,03,05), Hex(F7,F6,F6,01), Hex(1C,0E.0E,12)
"10010000010010000100100011011000", "00000110000000110000001100000101",
"11110111111101101111011000000001", "00011100000011100000111000010010",
- Hex(C2,61,61,A3), Hex(6A,35,35,5F), Hex(AE,57,57,F9), Hex(69.B9,B9,D0)
"11000010011000010110000110100011". "01101010001101010011010101011111",
"10101110010101110101011111111001", "01101001101110011011100111010000",
- Hex(17,86,86.91), Hex(99,C1,C1,58), Hex(3A,1D,1D,27), Hex(27.9E,9E.B9)
"00010111100001101000011010010001", "10011001110000011100000101011000",
"00111010000111010001110100100111", "00100111100111101001111010111001",
- Hex(D9,E1,E1,38). Hex(EB,F8,F8,13), Hex(2B,98,98,B3), Hex(22,11,11,33)
"11011001111000011110000100111000", "11101011111110001111100000010011",
"00101011100110001001100010110011", "00100010000100010001000100110011",
- Hex(D2,69,69,BB). Hex(A9,D9,D9,70), Hex(07,8E,8E,89), Hex(33.94,94,A7)
"11010010011010010110100110111011", "10101001110110011101100101110000",
"00000111100011101000111010001001", "00110011100101001001010010100111",
T.B. Yee. 2007 Appendix D: VHDL code listings 352

- Hex(2D,9B,9B,B6), Hex(3C,1E,1E,22), Hex(15,87,87,92), Hex(C9,E9,E9,20)

"00101101100110111001101110110110", "00111100000111100001111000100010",
"00010101100001111000011110010010", "11001001111010011110100100100000",
- Hex(87,CE,CE,49), Hex(AA,55,55,FF), Hex(50,28,28,78), Hex(A5,DF,DF,7A)
"10000111110011101100111001001001", "10101010010101010101010111111111",
"01010000001010000010100001111000", "10100101110111111101111101111010",
- Hex(03,8C,8C,8F), Hex(59,A1,A1,F8), Hex(09,89,89,80), Hex(1A,0D,0D,17)
" 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 111", " 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 0 " ,
"00001001100010011000100110000000", "00011010000011010000110100010111",
- Hex(65,BF,BF,DA), Hex(D7,E6,E6,31), Hex(84,42,42,C6), Hex(D0,68,68,B8)
"01100101101111111011111111011010", "11010111111001101110011000110001",
"10000100010000100100001011000110", "11010000011010000110100010111000",
- Hex(82,41,41,C3), Hex(29,99,99,B0), Hex(5A,2D,2D,77), Hex(1E,0F,0F,11)
"10000010010000010100000111000011", "00101001100110011001100110110000",
"01011010001011010010110101110111", "00011110000011110000111100010001",
- Hex(7B,B0,B0,CB), Hex(A8,54,54,FC), Hex(6D,BB,BB,D6), Hex(2C,16,16,3A)
"01111011101100001011000011001011", "10101000010101000101010011111100",
"01101101101110111011101111010110", "00101100000101100001011000111010"
);
variable c : natural range 0 to 255;
variable r : u_sign32;
variable q : u_sign64;
begin
c := to_integer(a);
r := ftabletab(c);
c := tojnteger(b);
q := ftabletab(c) & r;
return q;

end ftable double;

procedure ftable_quad (
-— moods inline
a: in u_sign32;
q_out: out u_sign32
)is
constant ftabletab : rom_tab_5 :=
- moods rom

- Hex(C6,63,63,A5), Hex(F8,7C,7C,84), Hex(EE,77,77,99), Hex(F6,7B,7B,8D)

"11000110011000110110001110100101", "11111000011111000111110010000100"
"11101110011101110111011110011001", "11110110011110110111101110001101"
- Hex(FF,F2,F2,0D), Hex(D6,6B,6B,BD), Hex(DE,6F,6F,B1), Hex(91,C5,C5,54)
"11111111111100101111001000001101", "11010110011010110110101110111101
"11011110011011110110111110110001", "10010001110001011100010101010100"
- Hex(60,30,30,50), Hex(02,01,01,03), Hex(CE,67,67,A9), Hex(56,2B,2B,7D)
"01100000001100000011000001010000", "00000010000000010000000100000011
"11001110011001110110011110101001", "01010110001010110010101101111101"
- Hex(E7,FE,FE,19), Hex(B5,D7,D7,62), Hex(4D,AB,AB,E6), Hex(EC,76,76,9A)
"11100111111111101111111000011001", "10110101110101111101011101100010
"01001101101010111010101111100110", "11101100011101100111011010011010"
- Hex(8F,CA,CA,45), Hex(1F,82,82,90), Hex(89,C9,C9,40), Hex(FA,7D,7D;iBi^
"10001111110010101100101001000101", "00011111100000101000001010011
"10001001110010011100100101000000", "11111010011111010111110110000111"
- Hex(EF,FA,FA,15), Hex(B2,59,59,EB), Hex(8E,47,47,C9), Hex(FB,FO,FO,OB)
"11101111111110101111101000010101", "10110010010110010101100111101011
"10001110010001110100011111001001", "11111011111100001111000000001011"
- Hex(41,AD,AD,EC), Hex(B3,D4,D4,67), Hex(5F,A2,A2,FD), Hex(45,AF,AF,EA)
"01000001101011011010110111101100", "10110011110101001101010001100111
"01011111101000101010001011111101", "01000101101011111010111111101010"
- Hex(23,9C,9C,BF), Hex(53,A4,A4,F7), Hex(E4,72,72,96), Hex(9B,C0,C0,5B)
"00100011100111001001110010111111"01010011101001001010010011110111
"11100100011100100111001010010110", "10011011110000001100000001011011"
T.B. Yee, 2007 Appendix D: VHDL code listings 35:

- Hex(75,B7,B7,C2), Hex(E1,FD,FD,1C), Hex(3D,93,93,AE), Hex(4C.26,26,6A)

"01110101101101111011011111000010", "11100001111111011111110100011100",
"00111101100100111001001110101110", "01001100001001100010011001101010",
- Hex(6C,36,36,5A), Hex(7E,3F,3F,41), Hex(F5,F7,F7,02), Hex(83,CC,CC,4F)
"01101100001101100011011001011010", "01111110001111110011111101000001",
"11110101111101111111011100000010", "10000011110011001100110001001111",
- Hex(68,34,34,5C), Hex(51 ,A5,A5,F4), Hex(D1 ,E5,E5,34). Hex(F9,F1 ,F1,08)
"01101000001101000011010001011100", "01010001101001011010010111110100".
"11010001111001011110010100110100", "11111001111100011111000100001000",
- Hex(E2,71,71,93). Hex(AB,D8,D8,73), Hex(62,31,31,53), Hex(2A.15,15,3F)
"11100010011100010111000110010011", "10101011110110001101100001110011",
"01100010001100010011000101010011", "00101010000101010001010100111111",
- Hex(08,04,04,0C), Hex(95,C7,C7,52), Hex(46,23,23,65), Hex(9D.C3,C3,5E)
"00001000000001000000010000001100", "10010101110001111100011101010010",
"01000110001000110010001101100101", "10011101110000111100001101011110",
- Hex(30,18,18,28), Hex(37,96,96,A1), Hex(0A,05,05,0F), Hex(2F,9A,9A,B5)
"00110000000110000001100000101000", "00110111100101101001011010100001",
"00001010000001010000010100001111", "00101111100110101001101010110101",
- Hex(OE,07,07,09), Hex(24,12,12,36), Hex(1 B,80,80,9B), Hex(DF,E2,E2,3D)
"00001110000001110000011100001001", "00100100000100100001001000110110",
"00011011100000001000000010011011", "11011111111000101110001000111101",
- Hex(CD,EB,EB,26), Hex(4E,27.27,69), Hex(7F,B2,B2,CD), Hex(EA,75,75,9F)
"11001101111010111110101100100110", "01001110001001110010011101101001",
"01111111101100101011001011001101", "11101010011101010111010110011111",
- Hex(12,09,09,16), Hex(1D,83,83,9E), Hex(58,2C,2C,74), Hex(34,1A,1A.2E)
"00010010000010010000100100011011"00011101100000111000001110011110",
"01011000001011000010110001110100", "00110100000110100001101000101110",
- Hex(36,1B,1B,2D), Hex(DC,6E,6E,B2), Hex(B4,5A,5A,EE), Hex(5B.A0.A0,FB)
"00110110000110110001101100101101", "11011100011011100110111010110010",
"10110100010110100101101011101110", "01011011101000001010000011111011",
- Hex(A4,52,52,F6), Hex(76,3B,3B,4D), Hex(B7,D6,D6,61), Hex(7D,B3.B3,CE)
"10100100010100100101001011110110", "01110110001110110011101101001101",
"10110111110101101101011001100001", "01111101101100111011001111001110",
- Hex(52,29,29,7B), Hex(DD,E3,E3,3E), Hex(5E,2F,2F,71), Hex{13,64,84,97)
"01010010001010010010100101111011", "11011101111000111110001100111110",
"01011110001011110010111101110001", "00010011100001001000010010010111",
- Hex(A6,53,53,F5), Hex(B9,D1,D1,68), Hex(00,00,00,00), Hex(C1,ED,ED,2C)
"10100110010100110101001111110101", "10111001110100011101000101101000",
"00000000000000000000000000000000", "11000001111011011110110100101100",
- Hex(40,20,20,60), Hex{E3,FC,FC,1F), Hex{79,B1,B1,C8), Hex(B6,5B,5B,ED)
"01000000001000000010000001100000", "11100011111111001111110000011111",
"01111001101100011011000111001000", "10110110010110110101101111101101",
- Hex(D4,6A,6A,BE). Hex(8D,CB,CB,46), Hex(67,BE,BE,D9), Hex(72.39,39,4B)
"11010100011010100110101010111110", "10001101110010111100101101000110",
"01100111101111101011111011011001", "01110010001110010011100101001011",
- Hex(94,4A,4A,DE), Hex(98,4C,4C,D4), Hex(B0.58,58,E8), Hex(85,CF,CF,4A)
"10010100010010100100101011011110", "10011000010011000100110011010100",
"10110000010110000101100011101000", "10000101110011111100111101001010",
- Hex(BB,D0,D0,6B), Hex(C5,EF,EF,2A), Hex(4F,AA,AA,E5), Hex(ED,FB,FB,16),
"10111011110100001101000001101011", "11000101111011111110111100101010",
"01001111101010101010101011100101", "11101101111110111111101100010110",
- Hex(86,43,43,C5), Hex(9A,4D,4D,D7), Hex(66,33,33,55), Hex(11.85,85,94)
"10000110010000110100001111000101", "10011010010011010100110111010111",
"01100110001100110011001101010101"00010001100001011000010110010100",
- Hex(8A,45,45,CF), Hex(E9,F9,F9,10), Hex(04,02,02,06), Hex(FE,7F,7F.81)
"10001010010001010100010111001111", "11101001111110011111100100010000",
"00000100000000100000001000000110", "11111110011111110111111110000001",
- Hex(A0,50,50,F0), Hex(78,3C,3C.44), Hex(25,9F.9F,BA), Hex(4B,A8,A8,E3)
"10100000010100000101000011110000" ,"01111000001111000011110001000100",
"00100101100111111001111110111010", "01001011101010001010100011100011",
- Hex(A2,51,51,F3), Hex(5D,A3,A3,FE), Hex(80,40,40,C0), Hex(05,8F,8F,8A)
"10100010010100010101000111110011", "01011101101000111010001111111110",
"10000000010000000100000011000000", "00000101100011111000111110001010",
T.B. Yee, 2007 Appendix D: VHDL code listings 354

- Hex(3F,92,92,AD), Hex(21,9D,9D,BC), Hex(70,38,38,48), Hex(F1,F5,F5,04)

"00111111100100101001001010101101", "00100001100111011001110110111100",
"01110000001110000011100001001000", "11110001111101011111010100000100",
- Hex(63,BC,BC,DF), Hex(77,B6,B6,C1), Hex(AF,DA,DA,75), Hex(42.21,21,63)
"01100011101111001011110011011111", "01110111101101101011011011000001",
"10101111110110101101101001110101", "01000010001000010010000101100011",
- Hex(20,10,10,30), Hex(E5,FF,FF,1A), Hex(FD,F3,F3,0E), Hex(BF,D2,D2,6D)
"00100000000100000001000000110000", "11100101111111111111111100011010",
"11111101111100111111001100001110", "10111111110100101101001001101101",
- Hex(81,CD,CD,4C), Hex(18,0C,0C,14), Hex(26,13,13,35). Hex(C3,EC.EC,2F)
"10000001110011011100110101001100", "00011000000011000000110000010100",
"00100110000100110001001100110101", "11000011111011001110110000101111",
- Hex(BE,5F,5F,E1), Hex(35,97,97,A2), Hex(88,44,44,CC), Hex(2E,17,17,39)
"10111110010111110101111111100001", "00110101100101111001011110100010",
"10001000010001000100010011001100", "00101110000101110001011100111001",
- Hex(93,C4,C4,57), Hex(55,A7,A7.F2), Hex(FC,7E,7E,82), Hex(7A,3D,3D,47)
"10010011110001001100010001010111", "01010101101001111010011111110010",
"11111100011111100111111010000010", "01111010001111010011110101000111",
- Hex(C8,64,64,AC), Hex(BA,5D,5D,E7), Hex(32,19,19,2B), Hex(E6,73,73.95)
"11001000011001000110010010101100", "10111010010111010101110111100111",
"00110010000110010001100100101011"11100110011100110111001110010101",
- Hex(C0,60,60,A0), Hex(19,81,81,98), Hex(9E,4F,4F,D1), Hex(A3,DC,DC,7F)
"11000000011000000110000010100000", "00011001100000011000000110011000",
"10011110010011110100111111010001", "10100011110111001101110001111111",
- Hex(44,22,22.66), Hex(54,2A,2A,7E). Hex(3B,90,90,AB), Hex(0B.88,88,83)
"01000100001000100010001001100110", "01010100001010100010101001111110",
"00111011100100001001000010101011", "00001011100010001000100010000011",
- Hex(8C,46,46,CA), Hex(C7,EE.EE.29), Hex(6B,B8,B8,D3), Hex(28.14,14,3C)
"10001100010001100100011011001010", "11000111111011101110111000101001",
"01101011101110001011100011010011", "00101000000101000001010000111100",
-Hex(A7,DE,DE,79), Hex(BC,5E,5E,E2), Hex(16,0B,0B,1D), Hex(AD,DB.DB,76)
"10100111110111101101111001111001", "10111100010111100101111011100010",
"00010110000010110000101100011101", "10101101110110111101101101110110".
- Hex(DB,E0.E0,3B), Hex(64,32.32,56), Hex(74,3A,3A,4E), Hex(14,0A,0A,1E)
"11011011111000001110000000111011", "01100100001100100011001001010110",
"01110100001110100011101001001110", "00010100000010100000101000011110",
- Hex(92.49,49,DB), Hex(0C.06,06,0A), Hex(48,24,24,6C), Hex(B8,5C,5C,E4)
"10010010010010010100100111011011", "00001100000001100000011000001010",
"01001000001001000010010001101100", "10111000010111000101110011100100",
- Hex(9F,C2,C2,5D), Hex(BD,D3,D3,6E), Hex(43,AC,AC,EF), Hex(C4,62,62,A6)
"10011111110000101100001001011101", "10111101110100111101001101101110",
"01000011101011001010110011101111", "11000100011000100110001010100110".
- Hex(39.91,91.A8). Hex(31,95,95,A4), Hex(D3,E4,E4.37), Hex(F2,79,79,8B)
"00111001100100011001000110101000", "00110001100101011001010110100100",
"11010011111001001110010000110111", "11110010011110010111100110001011",
- Hex(D5,E7,E7,32), Hex(8B,C8,C8,43), Hex(6E,37,37,59), Hex(DA.6D,6D,B7)
"11010101111001111110011100110010", "10001011110010001100100001000011",
"01101110001101110011011101011001", "11011010011011010110110110110111",
- Hex(01,8D,8D,8C), Hex(B1,D5,D5,64), Hex(9C,4E,4E,D2), Hex(49,A9,A9,E0)
"00000001100011011000110110001100", "10110001110101011101010101100100",
"10011100010011100100111011010010", "01001001101010011010100111100000",
- Hex(D8,6C,6C,B4), Hex(AC,56,56,FA), Hex(F3,F4,F4,07), Hex(CF,EA,EA,25)
"11011000011011000110110010110100", "10101100010101100101011011111010",
"11110011111101001111010000000111". "11001111111010101110101000100101",
- Hex(CA.65.65,AF), Hex(F4.7A,7A,8E), Hex(47.AE,AE.E9). Hex(10.08.08,18)
"11001010011001010110010110101111", "11110100011110100111101010001110",
"01000111101011101010111011101001", "00010000000010000000100000011000",
- Hex(6F,BA,BA,D5), Hex(F0,78,78,88), Hex(4A,25,25.6F). Hex(5C,2E,2E,72)
"01101111101110101011101011010101", "11110000011110000111100010001000",
"01001010001001010010010101101111", "01011100001011100010111001110010",
- Hex(38,1C,1C,24), Hex(57,A6,A6,F1), Hex(73,B4,B4,C7), Hex(97.C6,C6,51)
"00111000000111000001110000100100", "01010111101001101010011011110001",
"01110011101101001011010011000111", "10010111110001101100011001010001",
T.B. Yee, 2007 Appendix D: VHDL code listings 355

- Hex(CB,E8,E8,23), Hex(A1,DD,DD,7C), Hex(E8,74,74,9C), Hex(3E,1F,1F,21)

"11001011111010001110100000100011", "10100001110111011101110101111100",
"11101000011101000111010010011100", "00111110000111110001111100100001",
- Hex(96,4B,4B,DD), Hex(61,BD,BD,DC), Hex(0D,8B,8B,86), Hex(0F.8A,8A,85)
"10010110010010110100101111011101", "01100001101111011011110111011100",
"00001101100010111000101110000110", "00001111100010101000101010000101",
- Hex(E0,70,70,90), Hex(7C,3E,3E,42), Hex(71,B5,B5,C4), Hex(CC,66,66,AA)
"11100000011100000111000010010000", "01111100001111100011111001000010",
"01110001101101011011010111000100", "11001100011001100110011010101010",
- Hex(90,48,48,08), Hex(06,03,03,05), Hex(F7,F6,F6,01), Hex(1C,0E,0E,12)
"10010000010010000100100011011000", "00000110000000110000001100000101",
"11110111111101101111011000000001", "00011100000011100000111000010010",
- Hex(C2,61,61,A3), Hex(6A,35,35,5F), Hex(AE,57,57,F9), Hex(69,B9,B9,D0)
"11000010011000010110000110100011", "01101010001101010011010101011111",
"10101110010101110101011111111001", "01101001101110011011100111010000",
- Hex(17,86,86,91), Hex(99,C1,C1,58), Hex(3A,1D,1D,27), Hex(27,9E,9E,B9)
"00010111100001101000011010010001"10011001110000011100000101011000",
"00111010000111010001110100100111", "00100111100111101001111010111001",
- Hex(D9,E1,E1,38), Hex(EB,F8,F8,13), Hex(2B,98,98,B3), Hex(22,11.11,33)
"11011001111000011110000100111000", "111010111ll 110001111100000010011",
"00101011100110001001100010110011"00100010000100010001000100110011",
- Hex(D2,69,69,BB), Hex(A9,D9,D9,70), Hex(07.8E,8E,89). Hex(33,94,94,A7)
"11010010011010010110100110111011", "10101001110110011101100101110000",
"00000111100011101000111010001001", "00110011100101001001010010100111
- Hex(2D,9B,9B,B6), Hex(3C,1E,1E,22), Hex(15,87,87,92), Hex(C9,E9,E9,20)
"00101101100110111001101110110110", "00111100000111100001111000100010".
"00010101100001111000011110010010", "11001001111010011110100100100000",
- Hex(87,CE,CE,49), Hex(AA,55.55,FF), Hex(50,28,28,78), Hex(A5,DF.DF,7A)
"10000111110011101100111001001001", "10101010010101010101010111111111",
"01010000001010000010100001111000", "10100101110111111101111101111010",
- Hex(03,8C,8C,8F), Hex(59,A1,A1,F8), Hex(09,89,89,80), Hex(1A,0D,0D,17)
" 0 0 0 0 0 0 1 1 1 0 0 0 1 1 0 0 1 0 0 0 1 1 0 0 1 0 0 0 1 111", " 0 1 0 1 1 0 0 1 1 0 1 0 0 0 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 0 " ,
"00001001100010011000100110000000", "00011010000011010000110100010111",
- Hex(65,BF,BF,DA), Hex(D7,E6,E6,31), Hex(84,42,42,C6), Hex(D0,68,68,B8)
"01100101101111111011111111011010", "11010111111001101110011000110001",
"10000100010000100100001011000110", "11010000011010000110100010111000",
- Hex(82,41,41,C3), Hex(29,99,99,B0), Hex(5A,2D,2D,77), Hex(1E,0F,0F,11)
"10000010010000010100000111000011", "00101001100110011001100110110000",
"01011010001011010010110101110111", "00011110000011110000111100010001",
- Hex(7B,B0,B0,CB), Hex(A8,54,54,FC), Hex(6D.BB,BB,D6), Hex(2C,16,16,3A)
"01111011101100001011000011001011"10101000010101000101010011111100",
"01101101101110111011101111010110", "00101100000101100001011000111010"
);
variable r, s, t, u : u_sign32;
begin

r := ftabletab(to_integer(a(1 to 8)));
s := ftabletab(toJnteger(a(9 to 16)));
t := ftabletab(to_integer(a(17 to 24)));
u := ftabletab(toJnteger(a(25 to 32)));

q_out(1 to 32):= (r(1 to 8) xor s(25 to 32) xor t(17 to 24) xor u(9 to 16)) &
(r(9 to 16) xor s(1 to 8) xor t(25 to 32) xor u(17 to 24)) &
(r(17 to 24) xor s(9 to 16) xor t(1 to 8) xor u(25 to 32)) &
(r(25 to 32) xor s(17 to 24) xor t(9 to 16) xor u(1 to 8));
end ftable_quad;
end encryption_tables;

Figure D-15 VHDL package for 256-bit AES example

T.B. Yee, 2007 Appendix D: VHDL code listings 356

* * * * * * * * * * * * * * * * * __

— 256-Bit AES —
* * * * * * * * * * * * * * * * * __

library ieee;
use ieee.std_logic_1164.all;
use ieee.numeric_std.all;

use work.aes_procedures.all;
use work.encryption_tables.all;
entity aes256 is
pon(
key, d_block: in u_sign32;
in_hs_rdy: in unsigned(0 downto 0);
jn_hs_rcv: buffer unsigned(0 downto 0) := "0";
ciphertext: out u_sign32;
out_hs_rdy: buffer unsigned(0 downto 0):= "0";
out_hs_rcv: in unsigned{0 downto 0)
);
end aes256;

architecture behaviour of aes256 is

begin
process
variable bb, cc, dd, ee, ff, temp_vec1, temp_vec2: u_sign32;
--variable transition_state, temp_transition_state : tab_a4;
variable transition_state, temp_transition_state : u_sign128;
variable i: unsigned(6 downto 0) := "0000000";
variable j: unsigned(4 downto 0) := "00000";

variable fkey : tab_64;

- moods ram
variable tempi, temp2, tempS, keyloop: unsigned(6 downto 0) :="0000000"
variable indexl, index2, indexS : unslgned(1 downto 0) :="00";
begin

for keyloopi in 0 to 7 loop

whlle(in_hs_rdy = in_hs_rcv) loop
wait for 10 ns;
end loop;
fkey(keyloopl) := key;
case keyloopi Is
when 0 =>
temp_transition_state(1 to 32) := d_block;
when 1 =>
temp_transition_state(33 to 64) := d_block;
when 2 =>
temp_transition_state(65 to 96) := d_block;
when 3 =>
temp_transition_state(97 to 128) ;= d_block;
when others => NULL;
end case;

in_hs_rcv <= not in_hs_rcv;

wait for 10 ns;
end loop;
- For AES-256 (Nk=8, Nr=14)
- For 256-bit: Nb * (Nr +1 ) = 4 * 15 = 60 ("0111100")
- For 192-bit: = 4 * 1 3 = 52 ("0110100")
- F o r 128-bit: = 4 * 1 1 = 44 ("0101100")

i ;= "0001000"; - start off with the value of Nk, in this case = 8

i := "00000"; - round counter
T.B. Yee, 2007 Appendix D: VHDL code listings 357

while(i < "0111100") loop -- i < Nb * (Nr +1 )

tempi := i - "0000001";
wait for 10 ns;
i mod Nk = 0 —
bb:= fkey(toJnteger(temp1 (5 downto 0)));
r_oneto8(bb, dd); -- RotWord(w[i-1])
temp_vec1(1 to 32):= fbsub_quad(dd(1 to 32)); - SubWord(RotWord(w[i-1]))
rco(j, GO); - Rcon
tempi :=i-"0001000":
fkey(toJnteger(i(5 downto 0))) := fkey(to_integer(temp1(5 downto 0))) xor temp_vec1(1 to 32) xor
cc; — w[i-Nk] xor SubWord xor Rcon

keyloop := "0000001";
while keyloop /= "0000100" loop
-—moods unroll
if (keyloop + i < "0111100") then
tempi := keyloop + i - "0001000";
temp2 := keyloop + i - "0000001";
tempS := keyloop + i;
fkey(toJnteger(temp3(5 downto 0))) := fkey(toJnteger(temp1 (5 downto 0))) xor
fkey(toJnteger(temp2(5 downto 0))); - w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;

1 mod Nk = 4
if(i + "0000100" < "0111100") then
tempi := i + "0000011";
cc := fkey(to_integer(temp1(5 downto 0)));
temp_vec1(1 to 32):= fbsub_quad(cc(1 to 32)); --SubWord(RotWord(w[i-1]))

temp2 := i - "0000100";
temp3 := i + "0000100";
fkey(to_integer(temp3(5 downto 0))):= fkey(to_integer(temp2(5 downto 0))) xor
temp_vec1(1 to 32);
end if;

keyloop := "0000101";
while keyloop /= "0001000" loop
-—moods unroll
if(keyloop + i < "0111100") then
tempi := keyloop + i - "0001000";
temp2 := keyloop + 1 - "0000001";
temp3 := keyloop + i;
fkey(toJnteger(temp3(5 downto 0))) := fkey(to_integer(temp1 (5 downto 0))) xor
fkey(to_integer(temp2(5 downto 0))); -- w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;

I := i + "0001000"; - increment by Nk
j :=j + "00001";
end loop;
-======================== First Round ==========================-
transition_state(1 to 32) := fkey(O) xor temp_transition_state(1 to 32); - AddRoundKey
transition_state(33 to 64) := fkey(1) xor temp_transition_state(33 to 64);
transition_state(65 to 96) := fkey(2) xor temp_transition_state(65 to 96);
transition_state(97 to 128) := fkey(3) xor temp_transition_state(97 to 128);
*******************A**************************AA*AAA*A*

i := "0000100"; - start off with the 4th key, 3 used in the first round
T.B. Yee, 2007 Appendix D: V H D L code listings 35g

for EncLoop2 in 1 to 13 loop -- For AES-256 (Nk=8, Nr=14)

for EncLoopS in 0 to 3 loop

bb ;= fkey(toJnteger(i(5 downto 0)));

case EncLoopS is
when 0 =>
temp_vec1(1 to 32) := transition_state{1 to 8) & transition_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
ftable_quad(temp_vec1, cc); -- Retrieve values from Forward Tables
temp_transition_state(1 to 32) := bb(1 to 32) xor cc(1 to 32);
when 1 =>
temp_vec1(1 to 32) := transition_state(33 to 40) & transltion_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
ftable_quad(temp_vec1, cc); -- Retrieve values from Forward Tables
temp_transjtion_state(33 to 64) := bb(1 to 32) xor cc(1 to 32);
when 2 =>
temp_vec1(1 to 32) := transltion_state(65 to 72) & transition_state(105 to 112) &
transition_state(17 to 24) & transition_state(57 to 64);
ftable_quad(temp_veGl, cc); -- Retrieve values from Forward Tables
temp_transition_state(65 to 96) ;= bb(1 to 32) xor cc(1 to 32);
when 3 =>
temp_vec1(1 to 32) := transition_state(97 to 104) & transition_state(9 to 16) &
transition_state{49 to 56) & transition_state(89 to 96);
ftable_quad(temp_veGl, cc); -- Retrieve values from Forward Tables
temp_transition_state(97 to 128) := bb(1 to 32) xor cc(1 to 32);
when others => NULL;
end case;
i := i + "0000001";
end loop;

transition_state(1 to 128) := temp_transition_state(1 to 128);

end loop;

Last Round :
for EncLoopS in 0 to 3 loop
bb := fkey(toJnteger(i(5 downto 0)));

case EncLoopS is
when 0 =>
temp_vec1 (1 to 32) := transition_state(1 to 8) & transition_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
dd(1 to 32):= fbsub_quad( temp_vec1 ); -- w[i-1] = SubWord(w[i-1])
temp_transitlon_state(1 to 32) ;= bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 1 =>
temp_vec1 (1 to 32) ;= transition_state(33 to 40) & transition_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
dd(1 to 32);= fbsub_quad( temp_vec1 ); -- w[i-1] = SubWord(w[i-1])
temp_transition_state(33 to 64) := bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 2 =>
temp_vec1(1 to 32) ;= transitlon_state(65 to 72) & transition_state(10S to 112) &
transition_state(17 to 24) & transition_state(S7 to 64);
dd(1 to 32);= fbsub_quad( temp_vec1 ); - w[i-1] = SubWord(w[i-1])
temp_transition_state(6S to 96) := bb(1 to 32) xor dd(1 to 32); - AddRoundKey
when 3 = >
temp_vec1(1 to 32) := transition_state(97 to 104) & transition_state(9 to 16) &
transition_state(49 to 56) & transition_state(89 to 96);
dd(1 to 32):= fbsub_quad( temp_vec1 ); - w[i-1] = SubWord(w[i-1])
temp_transition_state(97 to 128) := bb(1 to 32) xor dd(1 to 32); — AddRoundKey
when others => NULL;
end case;
i := i + "0000001";
end loop;
T.B. Yee, 2007 Appendix D: VHDL code listings 359

for EncLoop6 in 0 to 3 loop

while(out_hs_rdy /= out_hs_rcv) loop
wait for 10 ns;
end loop;
case EncLoop6 is
when 0 =>
ciphertext <= temp_transition_state(1 to 32);
when 1 =>
ciphertext <= temp_transition_state(33 to 64);
when 2 =>
ciphertext <= temp_transition_state(65 to 96);
when 3 =>
ciphertext <= temp_transition_state(97 to 128);
when others => NULL;
end case;

out_hs_rdy <= not out_hs_rdy;

wait for 10 ns;
end loop;
end process;
end behaviour;

Figure D-16 VHDL of 256-Bit AES example

The post-MOODS synthesis simulation of the non-pipelined multi-FPGA 256-bit AES

example is given in Figure D-17. Zoom in views of the simulation showing inputs and
outputs updates are given in Figure D-18. The simulation input values (shown in
hexadecimals) are taken from the appendix (C.3 AES-256) of the AES specification [146];

Input plaintext (d block) values: 00112233, 44556677, 8899AABB, CCDDEEFF.

Key: 00010203,04050607, 08090A0B, OCODOEOF, 10111213, 14151617, 18191A1B,

ICIDIEIF.

The output ciphertext is: 8EA2B7CA, 516745BF, EAFC4990, 4B496089.

With a system clock period of 200 ns, the non-pipelined multi-FPGA 256-bit AES takes
5257 clock cycles (i.e. clock cycles = (1055500 ns - 4100 ns) / 200 ns) to process the 128-
bit data block using a 256-bit cipher key.
T.B. Yee, 2 0 0 7 A p p e n d i x D: V H D L c o d e listings 360

§ § o o

l i t ?

m
ill liii
i t i i mmVi
Figure D-17 Simulation of the non-pipelined multi-FPGA 256-bit AES core
T.B. Yee, 2007 A p p e n d i x D; V H D L c o d e listings 361

U M f i t till'
i Hi''
mm* m iglglSIS
l i i i l i III III!
I" '"S

Figure D-18 Simulation (zoom in views) of the non-pipelined multi-FPGA

256-bit AES core
T.B. Yee, 2 0 0 7 Appendix D: VHDL code listings (62

D.2 Behavioural pipelined VHDL examples

The three behavioural pipelined VHDL examples given in this section are used in
experiments (with explicit communication channels) described in Section 6.3. All the
VHDL packages which contain the definitions of constants, types, signals, functions, and
procedures are similar to the non-pipelined implementation and they are found in the
previous section. The explicit communication channel VHDL package used by all the
pipelined VHDL examples in this section is given in Figure D-19.

library ieee;
use ieee.std_logic_1164.all;
package channel_package is
subtype semaphore is s t d j o g i c _ v e c t o r ( 0 downto 0);
subtype intS is integer range 0 to 255;
subtype channel_sem is std_logic_vector(0 downto 0)
subtype channel_ack is std_logic_vector(0 downto 0);

-- initialise channel semaphore

procedure init(signal sem: out channel_sem); channel s e m a p h o r e
-- send data
procedure send{signal sem: out channel_sem; channel send s e m a p h o r e
signal ack: in channel_ack; channel send a c k n o w l e d g e
signal chan_data: out stdJogic_vector; channel send d a t a
d: in std_logic_vector); data to send
-- recv data
procedure recv(slgnal sem: out channel_sem; channel receive s e m a p h o r e
signal ack: in channel_ack; channel receive a c k n o w l e d g e
signal chan_data: in std_logic_vector; channel receive d a t a
d: out stdJogic_vector); data received

function ch_send(d: std_logic_vector; signal chan_sem_ j n : s e m a p h o r e ; signal chan_sem_out:

semaphore) return std_logic_vector;
- moods map ch_send u:* u:1 u:1 u:%1

function ch_recv(signal chan_data: std_logic_vector; signal c h a n _ s e m j n : semaphore; signal

chan_sem_out: semaphore) return std_loglc_vector;
-- moods map ch_recv u:* u:1 u:1 u:%1

function c h j n i t ( s i g n a l c h a n _ s e m j n : semaphore) return semaphore;

-- moods map c h j n i t u:1 u:1

-- channel component
component channel
generic (width: positive := 1); -- width of c h a n n e l data
port (send_sem: in channel_sem; - send s e m a p h o r e
T.B. Yee, 2007 Appendix D: VHDL code listings

procedure send(signal sem: out channel_sem; signal ack: in channel_ack; signal chan_data: out
std_logic_vector; d: in std_logic_vector) is
- moods inline
begin
chan_data <= ch_send{d,ack,sem);
end procedure send;

procedure recv(signal sem: out channel_sem; signal ack: in channel_ack; signal chan_data: in
std_logic_vector; d: out stdJogic_vector) is
~ moods inline
begin
d := ch_recv(chan_data, ack, sem);
end;

procedure init(signal sem: out channel_sem) is

-- moods inline
variable init_sig : channel_sem := "0";
begin
--sem <= ch_init("0");
sem <= init_slg;
end;

function ch_send(d: std_logic_vector; signal c h a n _ s e m j n : semaphore; signal chan_sem_out:

semaphore) return stdJogic_vector is
- moods map ch_send u:* u:1 u:1 u:%1
begin
return d;
end;

function ch_recv(signal chan_data: stdJogic_vector; signal chan_sem_in: semaptiore; signal

chan_sem_out: semaphore) return std_logic_vector is
- moods map ch_recv u:* u:1 u:1 u:%1
begin
return chan_data;
end;
function c h j n i t ( s i g n a l c h a n _ s e m j n : semaphore) return semaphore is
- moods map c h j n i t u:1 u:1
begin
return "0";
end;
end package body channel_package;

Figure D-19 VHDL package of the explicit communication channel

T.B. Yee, 2007 Appendix D: VHDL code listings 354

D.2.1 Pipelined quadratic equation solver

The pipelined quadratic equation solver is a two-stage pipelined version of the quadratic
equation solver given in Section 6.2.1. The behavioural VHDL of the pipelined quadratic
equation solver example is given in Figure D-20.

library leee;
use ieee.stdJogic_1164.all;
use ieee.numeric_std.all;
use work.c_types.all;
use work.algeqn_package.all;
use work.imath.all;
use work.channel_package.all;

entity plpe_quad Is
pod(
a1,a2,a3: in int;
x1,x2: out int;
no_real: out Int
);
end plpe_quad;
architecture behaviour of plpe_quad Is
signal c1_send_sem, c1_recv_sem: channel_sem := "0";
signal c1_send_ack, c1_recv_ack: channel_ack := "0";
signal c1_send_data, c1_recv_data: std_loglc_vector(95 downto 0) := (others=>'0');
begin
-- Explicit communication channel
c1: entity work.SIMPLE_CHANNEL generic map (96)
port map(c1_send_sem, c1_recv_sem, c1_send_data, c1_send_ack, c1_recv_ack, c1_recv_data);

Prs_1: process -- Process module p_M0D_1

variable tempi : std_loglc_vector(95 downto 0);
variable b1: Int := X"00000000";
variable b2: int := X"00000000";
variable b3: Int := X"00000000";
variable d1,d2: int;
begin
Inlt(c1_send_sem);
forever: loop
b1 = a1;
b2 = a2;
b3 = a3;

d1 := sqi(b2) - multl(multi(to_lnt(4),b1),b3);
d2 := multl(b1 ,to_int(2));
tempi := std_logic_vector(b2 & d2 & d1);
send(c1_send_sem, c1_send_ack, c1_send_data, tempi);
wait for 40 ns;
end loop;
end process Prs_1;

Prs_2: process - Process module p_M0D_2

variable temp2 : std_logic_vector(95 downto 0);
variable e1: int := X"00000000";
variable e2: Int := X"00000000";
variable rd: Int;
variable f1: int;
begin
init(c1_recv_sem);
T.B.Yee, 2007 Appendix D: VHDL code listings 355

forever: loop
recv(c1_recv_sem, c1_recv_ack, c1_recv_data, temp2);
e1 := int(temp2(31 downto 0));
e2 := int(temp2(63 downto 32));
f1 := int(temp2(95 downto 64));

if(e1 < 0) then

no_real <= to_int(0);
else
rd := sqrti(e1);
x1 <= sdivi((-f1 + rd),e2);
x2 <= sdivi((-f1 - rd),e2);
no_real <= to_int(2);
end if;
wait for 40 ns;
end loop;
end process Prs_2;
end behaviour;

Figure D-20 VHDL of pipelined quadratic equation solver

Figure D-21 shows the post-MOODS synthesis simulation of the two-stage pipelined
multi-FPGA quadratic equation solver. This two-device multi-FPGA implementation has a
single explicit communication channel {ExC 1) connecting the pipeline stages. Integer
inputs al, a2, and a3 of the quadratic equation solver are given values 1, -25 and 150
respectively. Outputs xl, x2 and number of real numbers (no real) are updated after 7660
ns. With a system clock period of 40 ns, the pipelined multi-FPGA quadratic equation
solver takes 189 clock cycles (i.e. clock cycles = (7660 ns -100 ns) / 40 ns) to complete the
application and output the result.
T.B. Yee, 2007 A p p e n d i x D : V H D L c o d e listings 366

n
^ I? 5 p

m m III

Figure D-21 Simulation of the pipelined multi-FPGA quadratic equation

solver
T.B. Yee, 2007 Appendix D: VHDL code listings 357

D.2.2 Pipelined inverse discrete cosine transform

This second pipelined VHDL example is the two-stage pipelined version of the inverse
discrete cosine transform (IDCT) core given in Section 6.2.3. The behavioural VHDL of
the pipelined inverse discrete cosine transform example is given in Figure D-22.

library IEEE;
use IEEE.stdJoglc_1164.all;
use IEEE.numeric_std.all;
use work.channel_package.all;
use work.idct_package.all;
entity p i p e j d c t is
port (
in_hs_rdy: in unslgned(0 downto 0); -- Handshake ready
in_hs_rcv: buffer unslgned(0 downto 0) := "0"; -- Handshake receive
dct_2d_in: in signed(11 downto 0);
idct_out: out signed(7 downto 0) := (others=>'0');-- 8 bit output.
out_hs_rdy: buffer unsigned(0 downto 0) := "0"; - Handshake ready
out_hs_rcv: in unslgned(0 downto 0); -- Handshake receive
sys_clock: in unslgned(0 downto 0);
- m o o d s clock
sys_reset: in unsigned(0 downto 0)
- m o o d s reset
):
end p i p e j d c t ;

ARCHITECTURE behaviour of pipejdct is

signal c1_send_sem, c1_recv_sem: channel_sem := "0";
signal c1_send_ack, c1_recv_ack: channel_ack := "0";
signal c1_send_data, c1_recv_data: stdjogic_vector(10 downto 0) := (others=>'0');
- memory section
type RAM_mem_type is array (0 to 63) of signed(10 downto 0);
begin

c1: entity work.SIMPLE_CHANNEL generic map (11) port map(c1_send_sem, c1_recv_sem,

c1_send_data, c1_send_ack, c1_recv_ack, c1_recv_data);

Prs_1: process - Process module p_M0D_1

- IDCT_2 signals
variable xaO_reg, xa1_reg, xa2_reg, xa3_reg,
xa4_reg, xa5_reg, xa6_reg, xa7_reg: signed(11 downto 0):= (others=>'0');
variable ID_input_cnt: unsigned(3 downto 0):= "0000";
variable z_out_int: signed(21 downto 0) := (others=>'0');
variable tempi : stdjogic_vector(10 downto 0);
variable cnt_64: unsigned(6 downto 0) := "0000000";
variable I D J n d e x J : unsigned(3 downto 0):= "0000";
begin
r e s e t j o o p : loop
ln_hs_rcv <= "0";
ID_input_cnt(3 downto 0) := "0000";
cnt_64 := "0000000";
I D J n d e x J := "0000";
wait until sys_clock'event and sys_clock = "1";
exit r e s e t j o o p when sys_reset = "1";
Inlt(c1_send_sem);
m a i n j o o p : loop
while(cnt_64(6) = '0') loop
T.B. Yee, 2007 Appendix D: VHDL code listings 353

while(IDJnput_cnt(3) = '0') loop

while(in_hs_rdy = in_hs_rcv) loop
wait until sys_clock'event and sys_clock = "1";
end loop;
case ID_input_cnt(2 down to 0) is
when "000" => xaO_reg = d c t _ 2 d j n ;
when "001" => xa1_reg = d c t _ 2 d j n ;
when "010" => xa2_reg = d c t _ 2 d j n ;
when " O i l " => xa3_reg = dct_2d_in;
when "100" => xa4_reg = d c t _ 2 d j n ;
when "101" => xa5_reg = d c t _ 2 d j n ;
when "110" => xa6_reg = d c t _ 2 d j n ;
when "111" => xa7_reg = d c t _ 2 d j n ;
when others => NULL;
end case;

ln_hs_rcv <= not in_hs_rcv;

IDJnput_cnt(3 downto 0) := IDJnput_cnt(3 downto 0) + "0001";
wait until sys_clock'event and sys_clock = "1";
end loop;

while (ID_indexJ /= "1000") loop

idct1_mult_add(ID_index_i(2 downto 0),xa0_reg,xa1_reg,xa2_reg,
xa3_reg,xa4_reg,xa5_reg,xa6_reg,xa7_reg,z_outJnt);

I D J n d e x J := I D J n d e x J + "0001";

if(z_outjnt(20) = '0' and z_out_int(7) = '1') then

tempi ;= std_logic_vector(z_out_int(18 downto 8) + to_signed(1,11));
else
tempi := stdJogic_vector(z_outJnt(18 downto 8));
end if;

send(c1_send_sem, c1_send_ack, c1_send_data, tempi);

- w a i t until sys_clock'event and sys_clock = "1";
end loop;
I D J n d e x J := "0000";
cnt_64 := cnt_64 + "0000001";
end loop;
IDJnput_cnt(3 downto 0) := "0000";
cnt_64 := "0000000";
wait until sys_clock'event and sys_clock = "1";
exit r e s e t j o o p when sys_reset = "1";
end loop;
end loop;
end process Prs_1;

Prs_2: process - Process module p_M0D_2

- IDCT_2 signals
variable xbO_reg, xb1_reg, xb2_reg, xb3_reg,
xb4_reg, xb5_reg, xb6_reg, xb7_reg: signed(10 downto 0):= (others=>'0';
variable temp2 : stdjogic_vector(10 downto 0);
variable rcv_z_out: signed(10 downto 0) := (others=>'0');
variable ID_wr_cntr: unsigned(6 downto 0):= (others=>'0');
variable ID_rd_cntr: unsigned(3 downto 0):= (others=>'0');
variable I D J n d e x J : unsigned(3 downto 0):= "0000";
variable idct2d_int: signed(20 downto 0):= (others=>'0');
variable ID_ram1_mem: RAM_memJype;
- m o o d s ram
begin
r e s e t j o o p : loop
ID_wr_cntr := "0000000";
ID_rd_cntr := "0000";
T.B. Yee, 2 0 0 7 Appendix D: VHDL code listings ;69

out_hs_rdy <= "0";

i d c t 2 d j n t := (others=>'0');
I D J n d e x J := "0000";
_ „y,.
wait until sys_clock'event and sys_clock
exit r e s e t j o o p when sys_reset = "1";
init(c1_recv_sem);
m a i n j o o p : loop

if(ID_wr_cntr(6) = '0') then

recv(c1_recv_sem, c1_recv_ack, c1_recv_data, temp2);
rcv_z_out := slgned(temp2);
ID_ram1_mem(toJnteger(ID_wr_cntr(5 downto 0))) := rcv_z_out;
ID_wr_cntr := ID__wr_cntr + "0000001";
else
whlle(ID_rd_cntr(3) = '0') loop

case ID_rd_cntr(2 downto 0) is

when "000" => xbO_reg := ID_ram1 mem(O);
xb1_reg := ID_ram1_menn(8);
xb2_reg := ID_ram1_mem(16);
xb3_reg := ID_ram1_mem(24);
xb4_reg := ID_ram1_mem(32);
xb5_reg := ID_ram1_mem(40);
xb6_reg := ID_ram1_mem(48); (
xb7_reg := ID_ram1_mem(56); j
when "001" => xbO_reg := ID_ram1_mem(1); |
xb1_reg = ID_ram1_mem(9);
xb2_reg = ID_ram1_mem(17); 1
xb3_reg = ID_ram1_mem(25)
xb4_reg = ID_rann1_mem(33]
xb5_reg = ID_ram1_mem(4i;
xb6_reg = ID_ram1_mem(49);
xb7_reg = ID_ram1_mem(57)
when "010" => xbO_reg := ID_ram1_mem(2); |
xb1_reg = ID_ram1_mem(10)
xb2_reg = ID_ram1_mem(18)
xb3_reg = ID_ram1_mem(26)
xb4_reg = ID_ram1_mem(34)
xb5_reg = ID_ram1_mem(42)
xb6_reg = ID_ram1_mem(50)
xb7_reg = ID_ram1_mem(58)
when "011" => xbO_reg := ID_ram1_mem(3);
xb1_reg = ID_ram1_mem(11)
xb2_reg = ID_ram1_mem(19)
xb3_reg = ID_ram1_mem(27)
xb4_reg = ID_ram1_mem(35)
xb5_reg = ID_ram1_mem(43)
xb6_reg = ID_ram1_mem(51)
xb7_reg = ID_ram1_mem(59)
when "100" => xbO_reg := ID_ram1_mem(4); j
xb1_reg = ID_ram1_mem(12)
xb2_reg = ID_ram1_mem(20)
xb3_reg = ID_ram1_mem(28)
xb4_reg = ID_ram1_mem(36)
xb5_reg = ID_ram1_mem(44)
xb6_reg = ID_ram1_mem(52)
xb7_reg = ID_ram1_mem(60)
when "101" => xbO_reg := ID_ram1_mem{5);
xb1_reg := ID_ram1_mem(13)
xb2_reg := ID_ram1_mem(21)
xb3_reg := ID_ram1_mem(29)
xb4_reg := lD_ram1_mem(37)
xb5_reg := ID_ram1_mem(45)
xb6_reg := ID_ram1_mem(53)
xb7_reg := ID_ram1_mem(61)
T.B. Yee, 2 0 0 7 Appendix D: VHDL code listings ;70

when "110" => xbO_reg := ID._ram1_mem(6);

xb1 .reg ID_ram1 _mem(14);
xb2. / e g = ID_rann1 .mem (22);
xb3, / e g = ID_ram1 .mem(30);
xb4 / e g = ID_ram1. .mem(38);
xb5 / e g = ID_ram1. _mem(46);
xb6' / e g = ID_ram1 mem(54);
xb7 / e g = ID r a m i mem(56);
when "111" =:> xbO_reg := ID _ram1_mem(7);
xb1 / e g = ID_ram1 mem(15);
xb2 / e g = ID_ram1 mem (23);
xb3 / e g = ID_ram1 mem(31);
xb4 / e g = ID_ram1. mem(39);
xb5 / e g = ID_ram1_ .mem(47);
xb6 / e g = ID_ram1 mem(55);
xb7. / e g = ID r a m i ,mem(63);
w h e n others => NULL;
end case;

ID_rd_cntr(3 downto 0) := ID_rd_cntr(3 downto 0) + "0001";

while ( I D J n d e x J /= "1000") loop
idct2_mult_add(ID_lndexJ(2 downto 0),xb0_reg,xb1_reg,xb2_reg,xb3_reg,
xb4_reg ,xb5_reg ,xb6_reg ,xb7_reg, idct2d_int);

while(out_hs_rdy /= out_hs_rcv) loop

wait until sys_c!ock'event and sys_clock = "1";
e n d loop;
idct_out <= signed(idct2d_int(15 downto 8));
out_hs_rdy <= not out_hs_rdy;
I D J n d e x J := I D J n d e x J + "0001";
end loop;
wait until sys_clock'event and sys_clock = "1";
end loop;
I D J n d e x J := "0000";
ID_wr_cntr(6 downto 0) := (others=>'0');
ID_rd_cntr(3 downto 0) := (others=>'0');
end if;

wait until sys_clock'event and sys_clock = "1";

exit r e s e t j o o p w h e n sys_reset = "1";
end loop;
end loop;
end process Prs_2;
*** ********************** ***** A * A A * **********************
end behaviour;

Figure D-22 VHDL of pipelined inverse discrete cosine transform example

The post-MOODS synthesis simulation of the 2-stage pipelined multi-FPGA IDCT is

given in Figure D-23. Zoom in views of the simulation showing inputs and outputs
updates are given in Figure D-24. The pipelined multi-FPGA IDCT has a single explicit
communication channel {ExC 1) connecting the pipeline stages. With a system clock
period of 40 ns, the pipelined multi-FPGA IDCT takes 1167 clock cycles (i.e. clock cycles
= (47160 ns - 480 ns) / 40 ns) to complete the application.
T.B. Yee, 2007 A p p e n d i x D: V H D L c o d e listings
371

III

# 3 3 1 * i

ai E l •>' a' MJ

Figure D-23 Simulation of the pipelined multi-FPGA IDCT example

T.B. Yee, 2007 A p p e n d i x D: V H D L c o d e listings 372

# 1 3
-s' -Q* -g' 5

MB. 1
I & i -Ci f &•=' 8 -=' i•=' if ' i
-C f
limMii III ' J .} J J -I J

m i l I r l l l H p ? I
L k i & 2
6 6 6 6 6 6 I 6 6 6

Figure D-24 Simulation (zoom in views) of the pipelined multi-FPGA

example
T.B. Yee, 2007 Appendix D: VHDL code listings 373

D.2.3 Pipelined 256-bit advanced encryption standard

The last pipelined VHDL example is the two-stage pipelined version of the 256-bit
advanced encryption standard (AES) core given in Section 6.2.5. The behavioural VHDL
of the pipelined 256-bit AES core is given in Figure D-25.

library ieee;
use ieee.std_logic_1164.all;
use leee.numerlc_std.all;
use work.channel_package.all;
use work.aes_procedures.all;
use work.encryptlon_tables.all;
entity pipe_aes256 is
port(
key, d_block: in u_sign32;
in_hs_rdy: in unsigned(0 downto 0);
in_hs_rcv: buffer unsigned(0 downto 0) := "0";
ciphertext: out u_sign32;
out_hs_rdy: buffer unsigned(0 downto 0):= "0";
out_hs_rcv: in unsigned(0 downto 0)
V
/'

end pipe_aes256;

architecture behaviour of pipe_aes256 is

signal c1_send_sem, c1_recv_senn: channel_sem := "0";
signal c1_send_ack, c1_recv_ack: channel_ack := "0";
signal c1_send_data, c1_recv_data: stdJogic_vector(31 downto 0) := (others=>'0');
signal c2_send_sem, c2_recv_sem: channel_sem := "0";
signal c2_send_ack, c2_recv_ack: channel_ack "0";
signal c2_send_data, c2_recv_data: stdJogic_vector(31 downto 0) := (others=>'0');
begin

c1: entity work.SIMPLE_CHANNEL generic map (32) port map

(c1_send_sem, c1_recv_sem, c1_send_data, c1_send_ack, c1_recv_ _ack, c1_recv_data);

c2: entity work.SIMPLE_CHANNEL generic map (32) port map

(c2_send_sem, c2_recv_sem, c2_send_data, c2_send_ack, c2_recv_ _ack, c2_recv_data);

Prs_1: process -- Process module p _ M 0 D _ 2

variable bb1, cc1, dd1, temp_vec1: u_sign32;
variable temp_t_state : u_sign128;
variable fkey : tab_64;
- moods ram
variable t e m p i , temp3: std_logic_vector(31 downto 0);
variable i: unsigned(6 downto 0) := "0000000"; - loop counters
variable j: unsigned(4 downto 0) := "00000"; - loop counter
variable temp_a1, temp_a2, temp_a3, keyloop: unsigned(6 downto 0) —"0000000"; |

begin
init(c1_send_sem);
init(c2_send_sem);
forever: loop
T.B. Yee, 2007 Appendix D: VHDL code listings 374

for loopcnti in 0 to 7 loop

while{in_hs_rdy = in_hs_rcv) loop
wait for 10 ns;
end loop;
fkey(loopcntl) := key;
case loopcnti is
when 0 => temp_t_state(1 to 32) := d_block;
when 1 => temp_t_state(33 to 64) := d_block;
when 2 => temp_t_state(65 to 96) := d_block;
when 3 => temp_t_state(97 to 128) := deblock;
when others => NULL;
end case;

ln_hs_rcv <= not in_hs_rcv;

wait for 10 ns;
end loop;
- For AES-256 (Nk=8, Nr=14)
-- For 256-bit N b * ( N r + 1 ) = 4 * 15 = 60 ("0111100")
- For 192-bit = 4 * 1 3 = 52 ("0110100")
- For 128-bit = 4 * 1 1 = 4 4 ("0101100")

i := "0001000"; - start off with the value of Nk, in this case = 8

j := "00000"; - round counter

while(i < "0111100") loop - i < Nb * (Nr +1 )

temp_a1 := i - "0000001";
wait for 10 ns;
i mod Nk = 0
b b l := fkey(to_integer(temp_a1(5 downto 0)));
r_oneto8(bb1, d d l ) ; - RotWord(w[i-1])
temp_vec1(1 to 32):= fbsub_quad1(dd1(1 to 32)); - SubWord(Rot\A/ord(w[i-1]))
rco(j, cc1); - Rcon
temp_a1 := i - "0001000";
fkey(to_lnteger(i(5 downto 0))) := fkey(to_integer(temp_a1(5 downto 0))) xor
temp_vec1 (1 to 32) xor cc1; - w[i-Nk] xor SubWord xor Rcon

keyloop ;= "0000001";
while keyloop /= "0000100" loop
-—moods unroll
if (keyloop + i < "0111100") then
temp_a1 := keyloop + i - "0001000";
temp_a2 := keyloop + i - "0000001";
temp_a3 := keyloop + i;
fkey(toJnteger(temp_a3(5 downto 0))) := fkey(toJnteger(temp_a1(5 downto 0))) xor
fkey(to_integer(temp_a2(5 downto 0))); ~ w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;

j mod Nk = 4
if(i + "0000100" < "0111100") then
temp_a1 ;= i + "0000011";
c c l := fkey(to_integer(temp_a1(5 downto 0)));
temp_vec1(1 to 32):= fbsub_quad1(cc1(1 to 32)); - SubWord(RotWord(w[i-1]))

temp_a2 := i - "0000100";
temp_a3 := i + "0000100";
fkey(toJnteger(temp_a3(5 downto 0))):= fkey(to_integer(temp_a2(5 downto 0))) xor
temp_vec1 (1 to 32); - fkey
end if;
T.B. Yee, 2007 Appendix D: VHDL code listings 375

keyloop := "0000101";
while keyloop /= "0001000" loop
-—moods unroll
if(keyloop + 1 < "0111100") then
temp_a1 := keyloop + 1 - "0001000";
temp_a2 := keyloop + i - "0000001";
temp_a3 := keyloop + 1;
fkey(toJnteger(temp_a3(5 downto 0))) := fkey(toJnteger(temp_a1(5 downto 0))) xor
fkey(to_integer(temp_a2(5 downto 0))); - w[i] = w[i-Nk] xor temp
end if;
keyloop := keyloop + "0000001";
end loop;

i := i + "0001000"; - increment by Nk
j : = j + "00001":
end loop;

for loopcnt2 in 0 to 3 loop

case loopcnt2 is
when 0 => tempi := stdJogic_vector(fkey(0) xor temp_t_state(1 to 32));
when 1 => tempi := stdJogic_vector(fkey(1) xor temp_t_state(33 to 64));
when 2 => tempi := stdJogic_vector(fkey(2) xor temp_t_state(65 to 96));
when 3 => tempi := stdJogic_vector(fkey(3) xor temp_t_state(97 to 128));
end case;
send(c1_send_sem, c1_send_ack, c1_send_data, tempi);
end loop;

i := "0000100"; - start off with the 4th key, Keys 0 to 3 used in the first round
for EncLoopI in 1 to 14 loop - For AES-256 (Nk=8, Nr=14)
for EncLoop2 in 0 to 3 loop
b b l := fkey(to_integer(i(5 downto 0))); -- fkey
temp3 := stdJogic_vector(bb1);
send(c2_send_sem, c2_send_ack, c2_send_data, temp3);
i := i + "0000001";
wait for 10 ns;
end loop;
end loop;

end loop forever;

end Process Prs_1;
..======================= ENCRYPTION ==============================-
Prs_2: process - Process module p_M0D_3
variable bb2, cc2, dd2, ee2, temp_vec2: u_sign32;
variable transition_state, temp_transition_state : u_sign128;
variable temp2, temp4: stdJogic_vector(31 downto 0);
begin
init(c1_recv_sem);
inlt(c2_recv_sem);
forever: loop

for loopcnt3 in 0 to 3 loop

recv(c1_recv_sem, c1_recv_ack, c1_recv_data, temp2);
case loopcnt3 is
when 0 => transition_state{1 to 32) := unsigned(temp2);
when 1 => transition_state(33 to 64) := unsigned(temp2);
when 2 => transition_state(65 to 96) := unsigned(temp2);
when 3 => transition_state(97 to 128) := unsigned(temp2);
when others => NULL;
end case;
end loop;
*************************************************** A A**********
T.B. Yee, 2007 Appendix D: VHDL code listings 316

for EncLoop3 in 1 to 14 loop - For AES-256 (Nk=8, Nr=14)

for EncLoop4 in 0 to 3 loop

recv(c2_recv_sem, c2_recv_ack, c2_recv_data, temp4);
bb2 := unslgned(temp4); -- fkey

case EncLoop4 is
when 0 =>
temp_vec2(1 to 32) := transition_state(1 to 8) & transltion_state(41 to 48) &
transition_state(81 to 88) & transition_state(121 to 128);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); - w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve v a l u e s from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(1 to 32) := bb2(1 to 32) xor ee2(1 to 32);
when 1 =>
temp_vec2(1 to 32) := transition_state(33 to 40) & transltion_state(73 to 80) &
transition_state(113 to 120) & transition_state(25 to 32);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); -- w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve v a l u e s from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(33 to 64) := bb2(1 to 32) xor ee2(1 to 32);
when 2 =>
temp_vec2(1 to 32) := transition_state(65 to 72) & transitlon_state(105 to 112) &
transition_state(17 to 24) & transition_state(57 to 64);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); - w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); -- Retrieve values from Forward Tables
ee2 := CG2;
end if;
temp_transition_state(65 to 96) := bb2(1 to 32) xor ee2(1 to 32);
when 3 =>
temp_vec2(1 to 32) := transition_state(97 to 104) & transltion_state(9 to 16) &
transition_state(49 to 56) & transition_state(89 to 96);
if(EncLoop3 = 14) then
dd2(1 to 32):= fbsub_quad2( temp_vec2 ); -- w[i-1] = SubWord(w[i-1])
ee2 := dd2;
else
ftable_quad(temp_vec2, cc2 ); - Retrieve values from Forward Tables
ee2 := cc2;
end if;
temp_transition_state(97 to 128) := bb2(1 to 32) xor ee2(1 to 32);
when others => NULL;
end case;
end loop; - for EncLoop4 in 0 to 3 loop
transition_state(1 to 128) := temp_transition_state(1 to 128);

end loop; - for EncLoop3 in 1 to 14 loop

T.B. Yee, 2007 Appendix D: VHDL code listings gyy

for loopcnt4 in 0 to 3 loop

while(out_hs_rdy /= out_hs_rcv) loop
wait for 10 ns;
end loop;
case loopcnt4 is
when 0 => ciphertext <= transition_state(1 to 32);
when 1 => ciphertext <= transition_state(33 to 64);
when 2 => ciphertext <= transition_state(65 to 96);
when 3 => ciphertext <= transition_state(97 to 128);
when others => NULL;
end case;
out_hs_rdy <= not out_hs_rdy;
wait for 10 ns;
end loop; -- for loopcnt4 in 0 to 3 loop
end loop forever;
end process Prs_2;
end behaviour;

Figure D-25 VHDL of pipelined 256-bit advanced encryption standard

example

The post-MOODS synthesis simulation of the pipelined multi-FPGA 256-bit AES

example is given in Figure D-26. Zoom in views of the simulation showing inputs and
outputs updates are given in Figure D-27. This 3-device multi-FPGA implementation has
two explicit communication channels (&cC 7 and ExrC 2) connecting the pipeline stages.
With a system clock period of 200 ns, the pipelined multi-FPGA 256-bit AES takes 1137
clock cycles (i.e. clock cycles = (231100 ns - 3700 ns) / 200 ns) to process the 128-bit data
block using a 256-bit cipher key.
T.B. Yee, 2007 A p p e n d i x D: V H D L c o d e listings 378

immm

•II

I
jjiiiiJiLfii
Figure D-26 Simulation of the pipelined multi-FPGA 256-bit AES core
T.B. Yee, 2007 A p p e n d i x D: V H D L c o d e listings 379

I J * '
I -S, 5 HI
I
ill W ' '
i n H i

llilip m i i iiiw m 1!! !!1

111
mm Z ' M : > Z' : • J :• j :> i- 2~> :<

Figure D-27 Simulation (zoom in views) of the pipelined multi-FPGA 256-bit

AES core
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis g u i d e 380

Appendix E

MOODS multi-FPGA synthesis guide

This appendix presents the partitioning options added to the MOODS synthesis system for
multi-FPGA synthesis. The appendix covers the complete set of commands for multi-
FPGA synthesis using the MOODS Command Line Interface (CLI) and the command line
switches for the original MOODS synthesis core are briefly repeated when needed for the
sake of completeness. Background information and a more detailed guide to the original
MOODS synthesis system can be found in references [32, 39, 42, 161].

E.1 The MOODS optimiser

The MOODS Synthesis Suite organises the user designs into a project-based workspace
environment with the inclusion and compilation of all the behavioural VHDL source files
within the main project. Other projects can be imported, as subprojects, into an existing
main project in the workspace to use the libraries associated with these imported projects.
All the project files are compiled and assembled into a library structure. Details on the
compilation of designs and project workspace can be found in [161].

Having the synthesis project compiled and set up into the corresponding project libraries,
the MOODS optimiser, which is the heart of the MOODS Synthesis suite, can be invoked
using the MOODS CLI in the form of a DOS-prompt command given below:

(MOODS root directory)\Bin\Moods design

-m "{project directory)\example.Imf"

- w example
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 381

-pre-opt

- m u l t 2 s h i f t

-prn_al

-prn_nl

-vhdl_out

-design_profile

-exchannels

{other arguments}

The above command assumes that a top-level design called has been compiled and
the main project name of the design is called example. File exampleXvsxi contains
information on the directory location of library files used in the project and this is passed
to the optimiser through the -m argument preceding the location of the (.Imf) file.
Argument -w specifies the directory where the output files generated from the synthesis
are to be written to. The -pre-opt argument allows pre-scheduling optimisation to be
performed on the design. At presents, the pre-scheduling optimisation only improves on
designs with array and vector dynamic indexing. Argument -mult2shift forces constant
divides, or multiplies by a positive power of two to be implemented as shift-left or -right
operations respectively to get a significant hardware reduction. Argument -prn al is
included to append a dump of control arcs to the design.cg output file. Argument -pm nl
is included to append a dump of data path nets to the design.d^g output file. Argument -
vhdl out specifies that multiple VHDL netlist output files are generated, one for each
target device. The first new argument, -design_profile is incorporated into the MOODS
optimiser to enable multi-FPGA synthesis. It instructs MOODS to retrieve partitioning and
design activity profile information (Section 4.5) in the design.pax file in the project
directory. A module call list design.mcl file is generated by MOODS during the prologue
stage when the initial data structures are built. Details of the module call list file can be
found in Appendix C.3. The second new argument, -exchannels enables the use of explicit
communication channels (Section 4.2.2.1).

Other arguments exist [161], but exceed the scope of this appendix.
T.B.Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 382

The basic steps in optimisation are:

1. Set up a "cost Amotion" specifying the required target specification (e.g. target area
and/or delay).

2. Run an optimisation algorithm.

3. Repeat the above if desired to achieve different results.

4. Set up and run the K-way partitioning process.

5. Repeat step 4 if desired to get different partitioning results.

6. Repeat steps 1 to 5 if desired to achieve different synthesis and partitioning results.

7. Run the communication subsystem optimisation algorithm.

8. Finish the design to produce final structural netlists suitable for targeting multiple
FPGA devices.

E.1.1 Setting up a cost function

During the prologue stage in MOODS, the associated technology libraries are loaded and
the input design is read in, followed by the initialisation of data structures. A number of
messages about loading libraries and files, and preliminary tasks are displayed in the
console window. When it finishes, a command prompt will appear, e.g.:

MOODS "C:\CAD\JPEG_demo\ipg_core_two\ipg_core_two'' — >

The command "CF" is entered to get to the cost function definition menu of MOOD. At
any point in the synthesis session, typing "?" at a command prompt gives a list of all
available commands, as illustrated below in Figure E-1.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide

MOODS "c:\CAD\ipeg_demo\jpg_core_two\ipgcore_two" --> CF

Enter cost function command []; ?

SETTING THE COST FUNCTION

Type a two character string:

first character: A - to add a criterion
(action) D - to delete a criterion
C - change target
S - show the cost function
F - to finish
second character: D - Total CP delay
(criterion) B - Delay between insts
A - Area
P - Power
N - Nets (no. of DP nets)
C - Clock period

Enter cost function command [?]:

Figure E-1 Cost function menu

The cost function allows the user to specify what the final optimised implementation
should be like (e.g. how large or fast it is). Figure E-2 illustrates the typical steps to enter
an area delay cost function, and specify a clock period for optimising the design. The
following specifies area optimisation as the highest (first) priority with a target area of 0,
and delay optimisation as the second priority with a target total delay of 0. Both target
objectives are set to zero so that the final optimised implementation is as small and as fast
as possible. Of course, non-zero target values can be given instead. A clock period of 20
ns is specified using the "AC" command and entering a value of 20 when asked to enter
the new clock value at the subsequent prompt. With all of the cost function parameters set
up, command "F" finishes the cost function definition and returns to the main MOODS
prompt.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 384

Enter cost function command [?]: AA

Enter priority level (1 is highest) [1]: 1

Initial total area is: 34505.6 Slices
Enter target area (Slices) [ 34505.6]: 0

Enter cost function command [aa]; AD

Enter priority level (1 is highest) [1]: 2

Initial total CP delay is: 3016.2 ns
Enter target total delay (ns) [ 3016.2]: 0

Enter cost function command [ad]; AC

Clock period has priority 1 and units in ns.

Enter new clock value (ns) [ 10.1]: 20

Enter cost function command [ac]: F

Figure E-2 Steps in setting a cost function in MOODS

E.1.2 Optimisation

After finishing the cost function set-up, the user can proceed to set up the optimisation
algorithm and perform optimisation on the design. There are currently two main
optimisation algorithms (described in Section 2.3.6) provided by the MOODS synthesis
core. The quasi-exhaustive heuristics is the simplest and MOODS proceeds to optimise the
design when the command "AOH" is entered at the MOODS prompt. Simulated annealing
is slower and more complex, and is more difficult to operate, however it can produce
better results, and also allow the design to be moved in many different directions round the
design space. Figure E-3 illustrates the steps in setting up the optimisation parameters
(annealing schedule), using the "AI" command. Once this data is entered, command "AO"
starts the annealing process, optimising the design.
T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 385

MOODS "c:\CAD\jpeg_demo\jpg_core_two\jpg_core_two" --> A I

Initializing Optimisation Data

Enter start temperature [ 0.0]: 50

Enter terminating temperature [ 0.0]: 0

Enter factor to decrease temp (<1) OR -n for No. of steps [ 100.0]: -100

Enter maximum iteration per temperature range [0] : 500

MOODS "c:\CAD\jpeg_demo\jpg_core_two\jpg_core_two" — >

Figure E-3 Steps in setting up the annealing schedule in MOODS

E.2 K-way partitioning

When all the optimisation is completed, typing command "FI" at the MOODS prompt
brings up the K-way partitioning prompt and typing "?" at the command prompt gives a
list of all available commands, as illustrated below in Figure E-4.

K-way partitioning -- > ?

K-way Partitioning Menu

DS - Display K-way partitioning setup

EX - Examine data structures
EM - Examine modules for partitions
ET Examine target device details
CT - Change number of target devices
CU - Change max device utilisation (100 percent) value, D max
CL - Change min device utilisation (20 percent) value, D_min
CA - Change offset target device areas
CM - Change data width
TS - Change to Strict balanced distribution over targeted devices
MD - Disable Multiple Subprogram Comm. Channels

K-way partitioning (Optimised)

KON - with no added options
KOL - with locked modules

K-way partitioning (with 2 partitions)

KFN - with no added options
KFP - with pre-allocated modules
KFL - with locked modules
KFB - with initial and locked modules
RM - Re-run MOODS optimisation

FI - Finish optimisation

K-way partitioning -- >

Figure E-4 K-way partitioning menu

T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis guide 386

COMMAND DESCRIPTION
DS Displays the K-way partitioning set-up.

EX This command is the same as the top-level MOODS command and it is used to
examine the data structures for the design.
EM The "EM" command leads to a set of further commands given in Figure E-5.

ET This command leads to two further commands that allows the user to display and edit
target device details (such as device area and I/O).
CT This command is used to change the number of target devices used to implement the
multi-FPGA system.
This command is used to change the maximum percentage of device utilisation. The
cu default value of 100 means the total logic (100%) capacity may be utilised if
required.
This command is used to specify the lowest percentage of the device area utilisation.
CL This value is used to determine the balanced criterion in the K-way partitioning
algorithm when a relaxed distribution of modules over the target devices is selected.
CA This command is used to specify the device area offset percentage.

CW CW is used to assign a fixed data bus width in the subprogram communication

channe](s) for inter-device transfers.
Command TS changes the balanced criterion in the K-way partitioning algorithm to
TS/TR enforce a Strict balanced distribution of modules over targeted devices. Command
TR allows a relaxed distribution of modules over targeted devices.
Command MD disables the generation of multiple subprogram communication
MD/ME channels, thereby connecting all communication cells to a single primary
communication channel. Command ME enables the generation of multiple
subprogram communication channels.
This command invokes the K-way partitioning algorithm to partition the design with
KON no pre-allocated and locked modules, and generate an optimised multi-FPGA system
with the least number of target devices required.
This command invokes the K-way partitioning algorithm to partition the design with
KOL module(s) locked to specified target device(s), and generate an optimised multi-
FPGA system with the least number of target devices required.
This command invokes the K-way partitioning algorithm to partition the design with
KFN no pre-allocated and locked modules, and generate an optimised multi-FPGA system
using a fixed number of target devices.
This command invokes the K-way partitioning algorithm to partition the design with
KFP pre-allocated modules, and generate an optimised multi-FPGA system using a fixed
number of target devices.
This command invokes the K-way partitioning algorithm to partition the design with
KFL module(s) locked to specified target device(s), and generate an optimised multi-
FPGA system using a fixed number of target devices.
This command invokes the K-way partitioning algorithm to partition the design with
KFB pre-allocated and locked modules, and generate an optimised multi-FPGA system
using a fixed number of target devices.
RM This command is used to re-run the MOODS optimisation process.
FI This command finishes and ends the K-way partitioning phase.

Table E-1 Complete set of commands in the K-way partitioning menu

T.B. Yee, 2007 Appendix E: MOODS multi-FPGA synthesis g u i d e 387

A description of the complete set of commands in the K-way partitioning menu is given in
Table E-1. Figure E-5 illustrates the further set of commands associated with the "EM"
command in the main K-way partitioning menu. Process modules (locked/unlocked to the
top-level architectural module) in the design are displayed using command "B".
Commands "A" and "U" are used to lock and unlock process modules in the design.
Commands "P" and "L" displays the pre-allocated and locked modules (if any) specified
in the partitioning information (.par) file respectively. Command "E" allows the user to
manually lock modules in the design to target devices, and a locked module can be
unlocked using the "R" command.

K-way partitioning --> EM

Examine --> ?

Modules for k partitions

B - Display process modules

A - Lock process modules
0 - Unlock process modules
P - Display pre-allocated modules
L - Display locked modules
E - Edit locked modules
R - Remove locked modules
F - Exit.

Examine -->

Figure E-5 Examine modules for partitioning menu

After setting up the partitioning parameters and running the K-way partitioning algorithm,
the final partition of the design, together with the I/O utilisation and estimated area
utilisation of target devices are displayed in the console window. The partitioning
parameters can be altered and the K-way partitioning algorithm can be repeated to get
different partitioning results, else command "FI" is entered at the K-way partitioning
prompt to begin the communication subsystem optimisation. Alternatively, the MOODS
optimisation process can be re-run using the "RM" command.

When the communication subsystem optimisation finishes, the system writes out VHDL
packages for the subprogram communication channel arbiter(s) (Section 5.4.3), final
netlists for all the target devices, and report files, leaving the system in the "EXAMINE"
mode. The "FI" command is typed once more to end the session.
T.B. Y e e , 2 0 0 7 A p p e n d i x E: M O O D S m u l t i - F P G A s y n t h e s i s g u i d e 3

Adapting the same naming convention described in Section E. 1, assuming the top-level
design has been created from a behavioural VHDL file, de s i g n . vhd. After the multi-
FPGA synthesis session in MOODS, VHDL netlist output files
design_synth_doml.vhd, design_synth_dom2.vhdf
d e s i g n synth domA:. vhd for a multi-FPGA design targeting A: devices are created in
the project directory.

Sanet - ST - Building Applications With AI Agents
100% (1)
Sanet - ST - Building Applications With AI Agents
72 pages
Top 100 Excel Tips by Nicolas Boucher
57% (7)
Top 100 Excel Tips by Nicolas Boucher
1 page
Towards Flexible Hardware - Software Encoding Using H.264
No ratings yet
Towards Flexible Hardware - Software Encoding Using H.264
111 pages
Chapter 5
No ratings yet
Chapter 5
39 pages
Strengths and Weaknesses of Approaches To Teaching Writing
80% (10)
Strengths and Weaknesses of Approaches To Teaching Writing
10 pages
PathPartner Case Study OV2715 DM8127 PDF
No ratings yet
PathPartner Case Study OV2715 DM8127 PDF
8 pages
AUBO I5 USER MANUAL V4.3.1 USA PDF
No ratings yet
AUBO I5 USER MANUAL V4.3.1 USA PDF
171 pages
PolarFire SoC FPGA Motion JPEG Video Streaming Over Ethernet Application Note AN4520
No ratings yet
PolarFire SoC FPGA Motion JPEG Video Streaming Over Ethernet Application Note AN4520
37 pages
11 Jpeg
No ratings yet
11 Jpeg
26 pages
Structure
No ratings yet
Structure
17 pages
A VHDL Design of A JPEG Still Image Compression Standard Decoder
No ratings yet
A VHDL Design of A JPEG Still Image Compression Standard Decoder
280 pages
Architecture of An Intelligent Beacon For Wireless Sensor Networ
No ratings yet
Architecture of An Intelligent Beacon For Wireless Sensor Networ
8 pages
Structure
No ratings yet
Structure
16 pages
Module 5 Psy002
No ratings yet
Module 5 Psy002
15 pages
T Rec T.871 201105 I!!pdf e
No ratings yet
T Rec T.871 201105 I!!pdf e
18 pages
Embedded Intro
No ratings yet
Embedded Intro
69 pages
Laboratory Manual For TSEA44: October 24, 2013
No ratings yet
Laboratory Manual For TSEA44: October 24, 2013
82 pages
JPEG
No ratings yet
JPEG
32 pages
Fundamentals of DIP
No ratings yet
Fundamentals of DIP
47 pages
Image Fundamentals (CH 2)
No ratings yet
Image Fundamentals (CH 2)
23 pages
Lab Report4
No ratings yet
Lab Report4
29 pages
College of Information Science and Engineering. Central South University. Changsha, Hunan, 410083, P.R China
100% (2)
College of Information Science and Engineering. Central South University. Changsha, Hunan, 410083, P.R China
37 pages
Michael - Barnard - Thesis Final Format Approved LW 11-23-15
No ratings yet
Michael - Barnard - Thesis Final Format Approved LW 11-23-15
47 pages
Sjoberg Fredrik
No ratings yet
Sjoberg Fredrik
75 pages
Farzana Akter - Energy Conversions
0% (2)
Farzana Akter - Energy Conversions
4 pages
VedicReport12 10 20239 34 25PM
No ratings yet
VedicReport12 10 20239 34 25PM
55 pages
Compressed Image File Formats - Jpeg, PNG, Gif, XBM, BMP (Acm, 1999)
No ratings yet
Compressed Image File Formats - Jpeg, PNG, Gif, XBM, BMP (Acm, 1999)
266 pages
RD - Incident Rail Commander
No ratings yet
RD - Incident Rail Commander
7 pages
VG 6640
No ratings yet
VG 6640
126 pages
Django Notification System Readthedocs Io en Latest
No ratings yet
Django Notification System Readthedocs Io en Latest
31 pages
Prof Christoph Stamm
No ratings yet
Prof Christoph Stamm
2 pages
MAAG GEAR Techinal and Commercial Considerations Related To The Cement Mill Production
No ratings yet
MAAG GEAR Techinal and Commercial Considerations Related To The Cement Mill Production
43 pages
Jfif 1.02
100% (1)
Jfif 1.02
9 pages
MMS - Unit3-Part-1
No ratings yet
MMS - Unit3-Part-1
45 pages
Image Fundamentals (CH 2)
No ratings yet
Image Fundamentals (CH 2)
22 pages
Modern Python Programming Using Chatgpt Shivakumar Gopalakrishnan
No ratings yet
Modern Python Programming Using Chatgpt Shivakumar Gopalakrishnan
31 pages
Compressed Image File Formats
No ratings yet
Compressed Image File Formats
266 pages
Sample Lesson Plans Forkazakhstangrade 10: Jenny Dooley Series Consultant: Bob Obee Translations by N. Mukhamedjanova
No ratings yet
Sample Lesson Plans Forkazakhstangrade 10: Jenny Dooley Series Consultant: Bob Obee Translations by N. Mukhamedjanova
331 pages
Image Compression For Wireless Sensor Networks.: Johannes Karlsson
No ratings yet
Image Compression For Wireless Sensor Networks.: Johannes Karlsson
56 pages
Revision 2.3: Acl Elite / Elitepro Service Manual
No ratings yet
Revision 2.3: Acl Elite / Elitepro Service Manual
42 pages
Embedded Systems Design: A Unified Hardware/Software Introduction
No ratings yet
Embedded Systems Design: A Unified Hardware/Software Introduction
47 pages
Six Month Statement - Icici Bank - Pranshu Sharma - 2308
No ratings yet
Six Month Statement - Icici Bank - Pranshu Sharma - 2308
12 pages
DCF
No ratings yet
DCF
47 pages
Aps RTC Bus Routes
No ratings yet
Aps RTC Bus Routes
8 pages
MMC Unit 4
No ratings yet
MMC Unit 4
30 pages
Product Overview: 1.1 Features
No ratings yet
Product Overview: 1.1 Features
47 pages
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
No ratings yet
Sindhu Rudianto - PDF - Wiratman Wangsadinata .PDF - Ellen M. Rathje - Makalah
10 pages
CSEE W4840 - Embedded Systems & Design Final Project Report (TAMF)
No ratings yet
CSEE W4840 - Embedded Systems & Design Final Project Report (TAMF)
38 pages
Database 1
No ratings yet
Database 1
59 pages
Types of Brakes: Adhesive Brake
No ratings yet
Types of Brakes: Adhesive Brake
10 pages
FPGA Based Implementation of Baseline JPEG Decoder
No ratings yet
FPGA Based Implementation of Baseline JPEG Decoder
7 pages
Jpeg PPT Notes
No ratings yet
Jpeg PPT Notes
24 pages
VHDL Implementation of Wavelet Packet Transforms Using SIMULINK Tools
No ratings yet
VHDL Implementation of Wavelet Packet Transforms Using SIMULINK Tools
10 pages
Design of Hydraulic Structures 711
No ratings yet
Design of Hydraulic Structures 711
2 pages
Final + Sol - Spring 2023
No ratings yet
Final + Sol - Spring 2023
11 pages
Elt2 Midterm 2022 How To Approach Genre Analysis
No ratings yet
Elt2 Midterm 2022 How To Approach Genre Analysis
3 pages
Sensors Cobas E411 Waste E411 Changing Report
No ratings yet
Sensors Cobas E411 Waste E411 Changing Report
2 pages
Project Reference
No ratings yet
Project Reference
4 pages
SoC Non Comfort
No ratings yet
SoC Non Comfort
3 pages
402997
No ratings yet
402997
2 pages
MSH210 Lec 09 BCC 1433-1434 1st-Term
No ratings yet
MSH210 Lec 09 BCC 1433-1434 1st-Term
13 pages
NEW Statement ISO15189 4.8.2021
No ratings yet
NEW Statement ISO15189 4.8.2021
2 pages
Selected List of First 2022-23
No ratings yet
Selected List of First 2022-23
2 pages
Multimedia Communications: Coding, Systems, and Networking: Prof. Tsuhan Chen
No ratings yet
Multimedia Communications: Coding, Systems, and Networking: Prof. Tsuhan Chen
17 pages
JPEG File Format
No ratings yet
JPEG File Format
6 pages
Fuel Cell System: Performance and Efficiency
No ratings yet
Fuel Cell System: Performance and Efficiency
2 pages
Split Type Air Conditioners: 2014 R-32 New Lineup
No ratings yet
Split Type Air Conditioners: 2014 R-32 New Lineup
4 pages
Li Jing Hong 2012
No ratings yet
Li Jing Hong 2012
4 pages
Date Sheet For The BS 4 Years Program Third Semester Examination Fall 40108 PDF
No ratings yet
Date Sheet For The BS 4 Years Program Third Semester Examination Fall 40108 PDF
2 pages
HD TV Encoding and Decoding
No ratings yet
HD TV Encoding and Decoding
9 pages
PIIS0007091224006536
No ratings yet
PIIS0007091224006536
11 pages
Fpga Image Accquigision
No ratings yet
Fpga Image Accquigision
6 pages
Training Agenda For Technical Training ACDx & LDX
No ratings yet
Training Agenda For Technical Training ACDx & LDX
6 pages
ACE Scanner - 2025 - 03 - 03
No ratings yet
ACE Scanner - 2025 - 03 - 03
3 pages
Principal Component Analysis - A Tutorial
No ratings yet
Principal Component Analysis - A Tutorial
37 pages
IP Fundamentals
No ratings yet
IP Fundamentals
41 pages
Schedule of Charges - Citi Rewards Credit Card: As On The Date of Levy of The Charge
No ratings yet
Schedule of Charges - Citi Rewards Credit Card: As On The Date of Levy of The Charge
2 pages
Abcdefg
No ratings yet
Abcdefg
6 pages
Iq Acl Top Series
No ratings yet
Iq Acl Top Series
9 pages
Gmail - Resignation
No ratings yet
Gmail - Resignation
2 pages
Customizing 16-Bit Floating Point Instructions On A NIOS II Processor For FPGA Image and Media Processing
No ratings yet
Customizing 16-Bit Floating Point Instructions On A NIOS II Processor For FPGA Image and Media Processing
6 pages
ETHERNET
No ratings yet
ETHERNET
52 pages
Maths Paper Class 4
No ratings yet
Maths Paper Class 4
6 pages
Pranshu Sharma - 2025 - Resume
No ratings yet
Pranshu Sharma - 2025 - Resume
3 pages
Project JPEG Decoder
No ratings yet
Project JPEG Decoder
16 pages
Code - 01 02 2024
No ratings yet
Code - 01 02 2024
76 pages
Image Compression Using Verilog
No ratings yet
Image Compression Using Verilog
5 pages
Image Compression
No ratings yet
Image Compression
42 pages
AXALT Addendum DOT 2017
No ratings yet
AXALT Addendum DOT 2017
29 pages
MMW Requirement Basic Statistics-.
No ratings yet
MMW Requirement Basic Statistics-.
16 pages
Receipt 03may2025 121646
No ratings yet
Receipt 03may2025 121646
1 page
Receipt 12apr2025 130007
No ratings yet
Receipt 12apr2025 130007
1 page
Receipt 06may2025 115037
No ratings yet
Receipt 06may2025 115037
1 page
Image Compression Standards: by A.Raju EC094201 M.Tech, ACS
No ratings yet
Image Compression Standards: by A.Raju EC094201 M.Tech, ACS
28 pages
8114 Um Hu
No ratings yet
8114 Um Hu
37 pages
Still Image Compression
No ratings yet
Still Image Compression
15 pages
Digital Camra Design
No ratings yet
Digital Camra Design
47 pages
Statistik English
No ratings yet
Statistik English
16 pages
Terminal Velocity of A Parachute
No ratings yet
Terminal Velocity of A Parachute
6 pages
Jpeg Image Compression Using Fpga
No ratings yet
Jpeg Image Compression Using Fpga
2 pages
JPEG Encoder IP Core
No ratings yet
JPEG Encoder IP Core
3 pages
For File Format For Digital Moving-Picture Exchange (DPX) : Approved February 18, 1994
No ratings yet
For File Format For Digital Moving-Picture Exchange (DPX) : Approved February 18, 1994
14 pages
The First Quarterly Assessment Results of Grade 2
No ratings yet
The First Quarterly Assessment Results of Grade 2
13 pages
WGH - Pranshu Sharma
No ratings yet
WGH - Pranshu Sharma
3 pages
JPEG Using Baseline Method2
No ratings yet
JPEG Using Baseline Method2
5 pages
OpenAI Function Calling For Financial Data Retrieval
No ratings yet
OpenAI Function Calling For Financial Data Retrieval
6 pages
JPEG DECODER USING VHDL AND IMPLEMENTING IT ON FPGA SPARTAN 3A KItProject Main Report1
No ratings yet
JPEG DECODER USING VHDL AND IMPLEMENTING IT ON FPGA SPARTAN 3A KItProject Main Report1
21 pages
Service Report / Installation Report
No ratings yet
Service Report / Installation Report
2 pages
ENG-Amed Atef Resume PDF
No ratings yet
ENG-Amed Atef Resume PDF
3 pages
FPGA Based System Design Suitable For Wireless Health Monitoring Employing Intelligent RF Module
No ratings yet
FPGA Based System Design Suitable For Wireless Health Monitoring Employing Intelligent RF Module
4 pages
Marketing Analytics
No ratings yet
Marketing Analytics
9 pages
Test and Inspection Data Checklist: Service Actions Not Requiring Verification
No ratings yet
Test and Inspection Data Checklist: Service Actions Not Requiring Verification
1 page
Receipt 04jun2025 075353
No ratings yet
Receipt 04jun2025 075353
1 page
DocScanner Mar 16, 2023 11.29 AM
No ratings yet
DocScanner Mar 16, 2023 11.29 AM
1 page
MSS Bill
No ratings yet
MSS Bill
1 page
6 - 0L Air Filter Removal
No ratings yet
6 - 0L Air Filter Removal
4 pages
Implementation of A Streaming Camera Using An FPGA and CMOS Image Sensor
No ratings yet
Implementation of A Streaming Camera Using An FPGA and CMOS Image Sensor
8 pages
Wa0006.
No ratings yet
Wa0006.
1 page
VHDL Test Bench For Digital Image Processing Systems
No ratings yet
VHDL Test Bench For Digital Image Processing Systems
10 pages

Synthesis of Multi-FPGA Systems With Asynchronous Communications

Uploaded by

Synthesis of Multi-FPGA Systems With Asynchronous Communications

Uploaded by

UNIVERSITY OF SOUTHAMPTON

Synthesis of Multi-FPGA Systems with

Tack Boon Yee

A thesis submitted for the degree of

School of Electronics and Computer Science,

Hardware demonstrator in detail

B.1 JFIF (JPEG File Interchange Format)

Table B-1 Marker identifiers in the JFIF file

Header: It occupies two bytes (SOI: start of image - OxFF, OxDB)

Trailer: It occupies two bytes. (EOI: end of image - OxFF, OxD9).

SOFO (Start of Frame 0) marker

The JFIF uses either 1 component (Y, grayscaled) or 3 components (YCbCr,

APPO (JFIF segment) marker

For thumbnails (RGB 24-bits), n= width*height*3

DHT (Define Huffman Table) marker

DRI (Define Restart Interval) marker

DOT (Define Quantisation Table) marker

SOS (Start of Scan) marker

• The image data (scans) is immediately following the SOS segment.

B.2 JFIF test images

Original test image (LENA.jpg)

Figure B-1 JFIF test image (LENA.jpg)

Original test image (MANDRILL.jpg)

Figure B-2 JFIF test image (MANDRILL.jpg)

Original test image (DRAGON.jpg)

MULTl-FPGA JPEG DEMO

Figure B-3 JFIF test image (DRAGON.jpg)

Original test image (SQUARES.jpg)

Figure B-4 JFIF test image (SQUARES.jpg)

Figure B-5 JFIF test image (SLOPE.jpg)

B.3 Simulations of test image decoding

Figure B-6 Simulation of test image (LENA.JPG) decoding in a non-pipelined

Figure B-7 Simulation (zoom view) of test image (LENA.JPG) decoding in a

Figure B-8 Simulation of test image (LENA.JPG) decoding in a pipelined

Figure B-9 Simulation (zoom view) of test image (LENA.JPG) decoding in a

Figure B-11 Simulation (zoom view) of test image (LENA.JPG) decoding in a

Figure B-12 Simulation of test image (LENA.JPG) decoding in a pipelined

B.4 Hardware demonstrator development board

Figure B-14 Multi-FPGA board connections

Conn 2ctor B1 Connector 82

Table B-3 Pin assignment of signals to connector B1 and B2 of development

Conn ector A1 Connector A2

Conn actor B1 Connector B2

Table B-6 Pin assignment of signals to connector B1 and B2 of development

5 P111 5 P22 jpg_core_two_ba2_Data_inout(2)

39 n/c 39 P77 GCLK1

Table B-8 Pin assignment of signals to connector A1 and A2 of development

Connector B1 Connector B2 Connector C I Connector C2

17 P94 17 P55 17 P94 17 P6

18 P93 18 P49 18 P93 18 P5

19 P89 19 P48 19 P89 19 P4

26 P81 26 n/c 26 P40 26 P201

27 P75 27 n/c 27 P36 27 P200

28 P74 28 n/c 28 P35 28 P199

29 P73 29 n/c 29 P34 29 P I 98

30 n/c 30 n/c 30 P33 30 PI 94

31 n/c 31 n/c 31 P31 31 P I 93

32 n/c 32 n/c 32 P30 32 P192

33 n/c 33 n/c 33 P29 33 P191

34 n/c 34 n/c 34 P27 34 P189

35 n/c 35 n/c 35 P24 35 P188

37 n/c 37 n/c 37 n/c 37 n/c

39 n/c 39 n/c 39 n/c 39 P77 GCLK1

40 n/c 40 n/c 40 n/c 40 n/c

B.5 Circuit description of the Bt121 triple 8-bit

Figure B-15 Functional block diagram of the BT121 videoDAC

1—1 1—1 ' ' L j c ! ;

Figure B-16 Pin diagram of the BT121 videoDAC

Pin name Description

CLOCK input be driven by a dedicated TTL buffer to avoid reflection-

Analog Power Plane +5V

Figure B-17 Typical connection diagram with Internal voltage reference

Location Description Vendor part number

Table B-11 Typical connection parts list

B.6 Digilent D2-SB system board reference manual

• Dual on-board 1.5A power regulators

For thumbnails (RGB 24-bits), n= widthheight3

_*** * A*A*****