Alford C.digital Design VHDL Laboratory Notes.1996
Alford C.digital Design VHDL Laboratory Notes.1996
Alford C.digital Design VHDL Laboratory Notes.1996
CERL/EE
OCTOBER 16, 1996
VERSION 1.01
COPYRIGHT 1996
DR. CECIL ALFORD
TSAI CHI HUANG
CERL / EE
TABLE OF CONTENTS
1. VHDL - HARDWARE DESIGN SOFTWARE APPROACH
1.1 INTRODUCTION
1.1.1 SEQUENTIAL PROGRAMMING
1.1.2 PARALLEL PROGRAMMING
1.1.3 OBJECT BASED PROGRAMMING
1.1.4 PROGRAMMABLE CIRCUITS
1.2 PROBLEMS
6
6
6
8
9
10
13
2.1 INTRODUCTION
2.2 PROBLEMS
13
18
19
19
19
22
22
23
24
26
4.1 INTRODUCTION
4.2 SRAM MEMORY
4.3 VHDL SRAM MEMORY DESIGN
4.4 SRAM VHDL LISTING
4.5 PROBLEMS
26
26
27
28
30
32
5.1 INTRODUCTION
5.2 VHDL FEATURES
5.3 ALU FUNCTIONS
5.4 ALU COMPONENT IMPLEMENTATION EXAMPLES
5.4.1 RIPPLE CARRY ADDER IMPLEMENTATION
5.4.2 PACKAGE IARITH_PARTS_PKG LISTING
32
32
32
33
33
34
2
Copyright 1996, CERL / EE, V1.00
36
37
37
37
40
42
43
6.1 INTRODUCTION
6.2 INSTRUCTION SET ARCHITECTURE (ISA)
6.3 A COMPUTER ARCHITECTURE IMPLEMENTATION
6.4 IMPLEMENTATION EXAMPLE
6.4.1 4X4 REGISTER FILE AND 8X4 MEMORY BLOCK
6.4.2 CONTROL MODULE
6.4.3 EXECUTION EXAMPLE
6.4.4 VHDL LISTING
6.4.4.1 regf4x4.vhd
6.4.4.2 mem4x2.vhd
6.4.4.3 ctrlm.vhd
6.4.4.4 BIA.vhd
6.5 PROBLEMS
7. APPENDICES
43
43
43
44
46
46
47
50
50
52
54
59
59
60
60
60
3
Copyright 1996, CERL / EE, V1.00
LIST OF FIGURES
Figure 1-1.
Figure 1-2.
Figure 1-3.
Figure 1-4.
Figure 1-5.
Figure 2-1.
Figure 2-2.
Figure 2-3.
Figure 2-4.
Figure 2-5.
Figure 3-1.
Figure 3-2.
Figure 3-3.
Figure 3-4.
Figure 3-5.
Figure 3-6.
Figure 3-7.
Figure 3-8.
Figure 3-9.
Figure 4-1.
Figure 4-2.
Figure 4-3.
Figure 5-1.
Figure 5-2.
Figure 5-3.
Figure 6-1.
Figure 6-2.
Figure 6-3.
Figure 6-4.
Figure 6-5.
Figure 7-1.
Figure 7-2.
4
Copyright 1996, CERL / EE, V1.00
5
Copyright 1996, CERL / EE, V1.00
its name; statements are executed in parallel. Moreover, the process of executing statements in parallel
mimics the real world phenomena and thus coincide better with the behavior of digital circuits. In
parallel processing, the application of parallel programmingg, programs executed in parallel have the
advantage of boosting program execution speed versus the equivalent sequential program. In the area of
VHDL programming, the goal is to mimic the physical digital system. Due to the fact that hardware
operates executed in parallel, to capture the functionality of the hardware, it is important for VHDL to
have the features to facilitate the modeling of such behavior.
Parallel program statements, as the name suggests, are the statements that can be executed in parallel.
Parallel statements are less intuitive to use due to their nondeterministic nature. Ideally, all parallel
statements should be executed at the same time. However, in the popular Von Neumann machine or
sequential computer world (eg., PCs and worstations), only sequential statements can be directly executed.
Therefore, parallel programming must be simulated on a sequential machine. When simulating parallel
statements, due to the fact that statements are executed in parallel and random order, it is crucial that the
execution order be irrelevant and not affect the result.
An example of a parallel system is shown in Figure 1-1. It describes a digital hardware circuit where four
inputs and one output are involved. Inputs A, B, C and D are fed into two AND gates, and the outputs of
these AND gates, E and F, are then fed into an OR gate whose output is shown below(see Figure 1-1).
A
B
E
G
C
D
F
Figure 1-1. AND-OR Gates
A piece of simple VHDL code shown in Listing 1-1 describes the circuit above. Notice the section after
dataflow_machine architecture in the code. Three statements, a, b, and c describe AND and OR logical
assignments. These statements can be rearranged in any order, and the simulated output G should still be
the same.
-- Sum of product example
entity sop is
port (
a, b, c, d: in bit;
e, f: inout bit;
g: out bit);
end sop;
architecture dataflow_machine of sop is
begin
-- Begin parallel statements
g <= e or f;
e <= a and b;
f <= c and d;
-- Statement a
-- Statement b
-- Statement c
end dataflow_machine;
7
Copyright 1996, CERL / EE, V1.00
The result G will be the same if the statements assigning E, F and G outputs were placed in any order.
Any data arriving at the inputs A, B, C, and D will pass through the AND and OR gates in continuous
time fashion. Whatever shown up at A, B, C or D inputs will be reflected at the intermediate outputs E
and F and at the final output G. Thus, output G should change instantly according to the essence of
parallel programming. In the world of hardware execution, there is always a delay when data pass
through gates. The VHDL code of hardware synthesis is mapped into a Cypress CPLD 375 device and
fitted with optimization disabled. The simulation of the mapped Cypress 375 PCLD shows that there is a
delay of about 5 ticks between each logical gate. Notice that ticks from VHDL simulation below do not
represent any real-time. Instead, a simulation tick is simply a unit of delay of simulation time. In the real
world, gates delay are on the order of nanoseconds (10-9 sec).
8
Copyright 1996, CERL / EE, V1.00
system development time. In summary, object based features are extremely useful for VHDL code
development, management, and reusability.
Ability to exactly perform their tasks, as defined by the requirement and specification.
Robustness
Extendibility
Reusability
Compatibility
9
Copyright 1996, CERL / EE, V1.00
PLA's
Programming
Points
1.2 Problems
1) Binary Coded Decimal (BCD) is a useful decimal number representation using binary numbers.
Many times, it is the only representation that digital hardware uses to communicate with humans
since humans are custom to the decimal numbering system. One example is to use BCD to control
the decimal LED display panel of a calculator. Each LED decimal digit has eight segments and each
segment contains an LED source as shown in Figure 1-4.
+5 V
Input
10
Copyright 1996, CERL / EE, V1.00
a
f
b
c
d
Figure 1-5. LED Segment Control Label
The input consists of 5 bits, and the output consists of 7 bits. Out of 5 input bits, 4 bits are used to
specify binary value, and one bit is used to blank all LED segments. For all output bits, one bit is
used for controlling each of the segment, and one bit is used to indicate overflow of the input binary
number, for example binary input greater than 9. The LED logic specification is shown in Table 1-1.
An LED segment control decoder is used to control a one-digit LED display panel. Construct such a
decoder using VHDL. This can be done using if statements nested inside of a process statement or
when statements alone. The form of such decoder could look like the VHDL code segment in Listing
1-3.
entity XXX is
port( .... );
end XXX;
architecture XXXX of XXX is
begin
-- option 1
option1: process ( input)
begin
if <condition> then
<output> <= <input>;
elsif <condition> then
<output> <= <input>;
.....
else
<output> <= <input>;
end if;
end process option1;
-- Or use option 2
output <= <input> when <condition> else
<input> when <condition> else
.....
<input> when <condition> else
<input>;
end XXXX;
11
Copyright 1996, CERL / EE, V1.00
INPUT
OUTPUT
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
x
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
x
0
0
0
0
1
1
1
1
0
0
0
0
1
1
1
1
x
0
0
1
1
0
0
1
1
0
0
1
1
0
0
1
1
x
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
0
1
0
0
1
0
0
0
0
0
0
1
0
1
0
0
1
0
0
0
0
0
1
1
0
0
0
0
1
1
0
0
1
1
0
0
1
0
0
0
0
0
0
0
0
0
1
0
1
1
1
0
1
0
0
1
0
0
1
0
1
0
0
0
0
0
1
1
0
1
0
1
1
1
0
1
0
1
0
0
0
0
0
0
1
0
1
1
1
0
0
0
1
0
0
1
0
0
1
0
0
1
1
1
0
0
0
0
0
1
0
0
0
0
1
0
0
0
12
Copyright 1996, CERL / EE, V1.00
x
y
z
OR2 Gate
x
z
y
AND2 Gate
13
RS Flip Flop
R
0
0
0
0
1
1
1
1
o1
n2
nq
n3
o2
S
0
0
1
1
0
0
1
1
(next)
q -> q
0
0
1
1
0
1
1
1
0
0
1
0
0
x
1
x
n2
n1
a1
o2
a2
n_q
n3
o1
n2
n1
a1
n3
a3
a2
o2
n3
n_q
14
running synchronously with the system clock. Since flip flops activate at an instance in time to acquire
data, relatively speaking, flip flops are more useful than latches because the output of flip flops is held
constant for a full clock cycle instead of half a clock cycle.
Cascading two D latches can produce a D flip flop. The clock signal that runs through these two latches
is inverted so data can pass through the latches in pipeline fashion. Due to the holding of input data at the
lower phase clock signal, input signal of the first latch is preserved for the instance when the clock signal
goes down. Simultaneously, the signal output from the first latch is acquired by the second latch, and the
signal is held constant until the second latchs clock goes down. A D flip flop drawing made by cascading
two D latches is shown below in Figure 2-5.
D Flip Flip
reset
d
reset
reset
nq
D Latch
nq
D Latch
c
Figure 2-5. D Flip-flop
Now the memory basics have been explained, and it is time to model some of the memory block using
VHDL. The difference between structure, dataflow, and behavior models will be shown in later examples.
For example, in Listing 2-1, each discrete logic component, AND2, OR2 and INV1, is defined at the
beginning using the dataflow model as state before. Notice that VHDL programming structure is declared
with entity statement first and followed by an architecture statement. Signals defined in the entity block
under the keyword port are visible inside of the architecture block.
-- INV1 Gate
-entity inv1 is
port(
x: in bit;
nx: out bit);
end inv1;
x, y: in bit;
z: out bit);
end or2;
15
entity lat_ds1 is
port(
c, d: in bit; -- input data
q, nq: inout bit);
end lat_ds1;
architecture struct of lat_ds1 is
signal i1_nx: bit;
signal i2_nx: bit;
signal i3_nx: bit;
signal a1_z: bit;
signal a2_z: bit;
signal o1_z: bit;
signal o2_z: bit;
component and2
port(x, y: in bit; z: out bit);
end component;
component or2
port(x, y: in bit; z: out bit);
end component;
component inv1
port(x: in bit; nx: out bit);
end component;
begin
-- structure connection
a1: and2 port map(c, i1_nx, a1_z);
a2: and2 port map(c, d, a2_z);
o1: or2 port map(a1_z, nq, o1_z);
o2: or2 port map(a2_z, q, o2_z);
i1: inv1 port map(d, i1_nx);
i2: inv1 port map(o1_z, q);
i3: inv1 port map(o2_z, nq);
end struct;
16
Gate components used here are also required to be declared. The declaration of a component is similar to
C, FORTRAN or Pascal programming where all procedures or subroutines are required to be declared
before use. However, it is not exactly the same as doing programming because each component declared
could be physical part rather than a conceptual abstract. For example, the ports declared by component do
have corresponding physical meaning, namely wires to carry signals. This concept should be understood
from the digital hardware engineer perspective.
Now, consider taking the same D latch circuit and describing or modeling it behaviorally in VHDL. This
code is shown in Listing 2-3. As we have noticed, the VHDL code here is shorter than its corresponding
structure model because the circuit behavior is directly described. It is a higher level modeling, and it can
be done easily if the designer knows exactly how his circuit behaves. Many times, the digital circuit
designer knows how the circuit should behave but is not sure if the actual circuit behaves as he predicts.
Therefore, it is useful to construct the circuit as a structure model and verifies its working order. Behavior
modeling saves time and it is based on the assumption that the designer knows his circuits behavior well.
If the assumption about the circuit is wrong, then behavior modeling can lead to a disastrous result.
Remember, behavior modeling should be used as much as possible when modeling digital circuits because
it will save much development time.
In the field of VLSI design, the translation of a behavior model to a structure model is called hardware
synthesis, and it relies much on the software intelligence. During the process of mapping a VHDL circuit
to the actual hardware, it is best to specify as much structure as possible if the speed and the size
requirement of the circuit are a concern. As an analogy, think of a programmer who wants his code to be
as fast and efficient as possible. The programmr does better write the code in assembly (or even better machine language) instead of relying on a the high level language compiler. For the exercises here,
problems mainly focus on translating VHDL code to hardware. In other words, VHDL will be used as a
programming language to map into hardware. Again, remember the concept of hardware synthesis. In
hardware synthesis, the digital designer needs to be conscious of the final form of the mapped circuit.
--- behavior modeling of D latch
--
entity lat_db1 is
port(
c, d: in bit;
-- input data
q, nq: inout bit
);
end lat_db1;
architecture behavior of lat_db1 is
begin
-- behavior model
behavior: process (c)
begin
if c='1' then
q
<= d;
nq <= not d;
end if;
end process behavior;
end behavior;
17
Another example of behavior modeling is again the D flip flop construction using two D latches which
was illustrated earlier in Figure 2-5. Listing 2-4, however, shows the construction of a D flip flop without
using any D latches. By using the behavior modeling technique, the VHDL code length was reduced
drastically. A circuit similar to a D flip flop in size would be tedious to describe totally described in a
structural model.
--- behavior modeling of D latch
-entity rdffb is
port(
reset: in bit;
-- reset signal
c, d: in bit;
-- input data, c=clock, d=data
q, nq: out bit ); -- output q and not q
end rdffb;
-- behavior model
architecture behavior of rdffb is
begin
behavior: process (c, reset)
begin
if reset='1' then
q
<= '0';
nq <= '1';
elsif (c='1' and c'event) then
q
<= d;
nq <= not d;
end if;
end process behavior;
end behavior;
2.2 Problems
1) Implement a D latch with reset using VHDL structure modeling. Demonstrate its functionality in the
Cypress NOVA simulator by resetting the circuit, latching in a high value, and resetting again at the
next clock cycle when the input is kept high. Save the simulation session to a stimulus file (*.SIM)
and trace file (*.PSD) using the NOVA FILE menu items.
2) A T flip flop can be defined as a flip flop that toggles the output at the edge of the clock cycle, for
example, rising or falling edge. Implement T flip flop behaviorally. Again, save the simulation
session in both stimulus file and trace file.
3) Implement an RS flip flop using an AND gate, and model it structurally. Demonstrate and show its
functionality.
4) In VHDL, implement the circuit described below using two D flip flops with /RESET input. The
circuit should first be implemented in a behavior model and, then, in a structure model. Show the
functionality using the VHDL simulator. (Hint: Utilize /RESET input).
INPUT:
Two input lines, A and B.
OUTPUT: One output C.
CONDITIONS:
a) The output goes to 1 on every positive transition of the A line.
b) The output goes to 0 on every positive transition of the B line.
c) The output must be capable of set to 1 again after it being to 0.
d) Input B takes precedence over Input A to affect the output.
18
Delay Element
Combinational Logic
Input
19
1/0
0/0
HIGH
LOW
1/1
20
1) Put the state machine diagram of edge detector in Figure 3-2 into table form, as shown in Table 3-1.
Realize that the table entry is constructed based on state transitions. Because there are four
transitions, four table entries (four columns) are shown.
2) Combine the last two rows of table entries, output state and output value, to form new Moore machine
states. According to the table, three unique states result: HIGH0, HIGH1, and LOW0.
Table 3-1.
Entry #
Input State
HIGH
LOW
LOW
HIGH
Input Value
Output State
HIGH
HIGH
LOW
LOW
Output Value
3) Create the new Moore machine state table from Table 3-1 using the new states. For example, the
Mealy state machine transition table entry column 1 contains four values: input state, input value,
output state, and output value. As shown before, output state and output value are used to generate
the corresponding Moore machine state, HIGH0. Input state and input value are to be used to
generate the corresponding Moore machine state transitions. Here, Mealy input state HIGH is
replaced by the Moore state HIGH0 and HIGH1 (HIGH plus all possible outputs). Now, two
corresponding Moore state transitions are derived; HIGH0 and input 1 goes to HIGH0, HIGH1 and
input 1 goes to HIGH0. The same process applies to the other entry columns (2, 3, and 4), see Table
3-2.
Table 3-2.
Moore States
Input 0
Input 1
Output Value
LOW0
LOW0
HIGH1
HIGH0
LOW0
HIGH0
HIGH1
LOW0
HIGH0
4) Make a Moore machine state diagram based on this state transition table, as shown in Figure 3-4.
1
HIGH0
0
1
0
0
LOW0
HIGH1
21
6) Express the Moore machine operation in the form of a timing diagram. The timing Diagram of the
derived Moore machine is shown in Figure 3-5. In the figure, the output signal coincides with the
state transition, as is expected from a Moore state machine.
input/output
State1
State0
State1
output
Moore Machine
Mealy Machine
22
23
3.4 Problems
1) Convert the Moore machine in Figure 3-7 into a Mealy machine. Implement the Mealy machine in
VHDL and simulate it with Cypress NOVA simulator.
0/1
Start
q1
0/0
q0
1/0
0/0
1/1
1/0
q2
0
1
q0/0
q1/1
q2/2
Start
24
Table 3-3.
frame
IDLE
/frame
frame
DECODE
/hit
hit
/frame
/frame
XFER
BUSY
State / Output
OE
GO
ACT
IDLE
DECODE
BUSY
XFER
XFER2
XFER2
25
26
Signal Type
Signal Description
NCE
IN
NWrite
IN
Address
IN
Data
IN/OUT
27
due to the hardware design application. Notice that we have discussed a VHDL method for declaring
constants in earlier lessons. Generic statement is different because it specifies where those constant are
valid. In other words, the generic statement defines the scope of its constants. The Generic statement is
to be used inside of an entity statement (see previous lessons), and the architecture body associated with
this entity will be able to see and use these generic constants. It is a very powerful concept because it
allows generalized hardware modeling. In the SRAM example, the size of the SRAM does not need to be
fixed at 8 words deep and 2 bits per word. Different size SRAMs can be created by changing only the
declaration inside of the generic statement.
The second concept is the use of the sub-program procedure statement. Although the component
statement was introduced first, the procedure statement is the preferred construct due to its simplicity.
The component structure is very much the same as the architecture structure. It uses an entity statement
to specify input and output signals and an architecture body to define the relationship(s) of these signals.
Declaration of a component is also needed in the main architecture body. As clear as the component
concept may seem it, there are many hassles in creating and using component for simulation. Unless the
component is useful or has already been created and put into a library, a procedure is the prefer choice.
Another reason why a procedure construct is preferred over a component construct is for ease of
understanding VHDL code. In general, VHDL code for structure modeling is less likely to be
comprehended than the corresponding behavior modeling which is naturally associated with procedure
statement. The fact that behavior modeling is easier to grasp was shown in the previous lesson where a D
flip-flop was represented in different VHDL models.
The third concept is the use of library routines. The use statement can be found between the entity
statement and architecture statement. Since the Cypress VHDL compiler is mainly geared towards
hardware synthesis, the behavior modeling aspect of Cypress VHDL compiler is weak because it doesnt
have good support for modeling a resistor. For example, in the SRAM modeling below, to multiplex bidirectional line, bidata, component tristate buffer, bufoe, from the Cypress library rtlpkg would have to be
used.
);
end sram0;
use work.rtlpkg.all;
-- bufoe(x, oe, y, yfb): Three state buffer with feedback.
use work.int_math.all; -- See Cypress VHDL synthesis reference page 4-86
-- i2bv(i,w): converts integer i to w width binary
28
29
end sram0arch;
Listing 4-1. SRAM VHDL Code
4.5 Problems
1) The Cypress Warp tool uses the time unit tick. Figure 4-2 shows an example of tick value 68. From
Figure 4-1, use the Warp simulator to determine the minimum times for signals Tnce2datain, Tnce2nwr and
Tnce2addr that produce a valid Tnce2dataout. Also, what is the value of Tnce2dataout?
2) The timing diagram in Figure 4-3 below shows a modified SRAM that contains an extra field for
storing a valid bit. This memory is to be named SRAMvc (SRAM with valid control) memory. To
put data into a memory location, the signal nvalid has to be set to zero prior to signal nces going
during the SRAMvc reading cycle, if the signal nvalid at the requested address location is high,
bidata should output zero. During the SRAMvc writing cycle, data will write into the memory along
with the nvalid signal. If the signal nvalid is high, the maximum data value (all bits set to one)
should write into the corresponding memory location.
Before using SRAMvc, a signal nreset should be applied to invalidate signal nvalid all memory
locations (for example, nvalid<=0). In result, all nvalid bits will be set to zero, and all bidata value
will be set to max. See Figure 4-3 for the timing relationship of nreset and nce.
Construct SRAMvc using VHDL and prove its functional by simulation. The simulator should
produce timing waveforms similar to those shown in Figure 4-3.
From the simulation, find all the timing parameters in Figure 4-3 based on the simulation of your
SRAMvc VHDL model.
30
31
Operand 1
Function Select
ALU
ALU Output
32
controller, on the other hand, would probably have an overall balanced arithmetic and logic functions,
with emphasis on integer manipulation.
There are many ways to construct an ALU. During the 1980s, when Large Scale Integration (LSI)
Integrated Chip (IC) and Medium Scale Integration IC were popular, bit-slice ALU design was the main
technology due to its modularity for LSI and MSI design. The bit-slice design concept was used because it
reduced computations to individual bits as to minimize each bits overlapping functionality. In the
current trend of integrated circuit technology (e.g. VLSI), ALU designers focus less on the ALUs bit
versatility and more towards the its overall speed performance. An ALU design example is shown in
Figure 5-2.
Operand 1
Operand 2
F0
Fn
F1
Function Select
ALU Output
33
they have been placed inside of the library, Arith_parts_pkg package. The listing of the Arith_parts_pkg
package is shown in Section 5.4.2.
In Arith_parts_pkg, components half_adder and full_adder were combined to produce a 4-bit ripple carry
adder ripp_adder4. One example of hardware reuse is to test the functionality of ripp_adder4 without reimplementing the same module. This is done by creating a VHDL module, addtest.vhd, which is listed in
Section 5.4.3. In file addtest.vhd, after the entity declaration, the line beginning with the VHDL key word
use tells the VHDL compiler that all modules (procedure, function, and component) inside of the package
iarith_parts_pkg are visible to file addtest.vhd and therefore are available for use.
Only the module ripp_adder4 is used structurally inside of the addtest module. Remaining modules such
as full_adder and half adder are not shown inside of the addtest module; nevertheless, they were
important in building the ripple carry adder ripp_adder4. The test wave form which illustrates the
adders functionality is shown in Figure 5-3. In Figure 5-3, ain is set to 1111 and bin is set to 0011.
The output aout becomes 0010 with the overflow flag set to 1. Notice the glitches on the at aout lines
caused by the asynchronous ain and bin input signals.
34
out bit);
component ripp_adder4
port(signal Ain, Bin: in bit_vector (3 downto 0);
signal Sout:
out bit_vector (3 downto 0);
signal OverFlow: out bit);
end component;
end iarith_parts_pkg;
----------------------------------------------------------- Half adder component
entity half_adder is
port(signal Ain, Bin: in bit;
signal Cout, Sout: out bit);
end half_adder;
architecture half_adder_arch of half_adder is
begin
process (Ain, Bin)
begin
if Ain='1' and Bin='1' then
Sout <= '0';
Cout <= '1';
else
Cout <= '0';
Sout <= Ain or Bin;
end if;
end process;
end half_adder_arch;
----------------------------------------------------------- Full adder component
entity full_adder is
port(signal Cin, Ain, Bin: in bit;
signal Cout, Sout: out bit);
end full_adder;
architecture full_adder_arch of full_adder is
begin
process (Ain, Bin, Cin)
begin
if Ain='1' and Bin='1' and Cin='1' then
Sout <= '1';
Cout <= '1';
elsif Cin='0' then
if Ain='1' and Bin='1' then
Sout <= '0';
Cout <= '1';
else
Sout <= Ain or Bin;
Cout <= '0';
end if;
else -- Cin='1' and either Ain='1' or Bin='1'
Sout <= not (Ain or Bin);
Cout <= Ain or Bin;
end if;
end process;
end full_adder_arch;
----------------------------------------------------------
35
fa1: full_adder
port map(Cin=>Cout(1), Ain=>Ain(1), Bin=>Bin(1),
Cout=>Cout(2), Sout=>Sout(1));
fa2: full_adder
port map(Cin=>Cout(2), Ain=>Ain(2), Bin=>Bin(2),
Cout=>Cout(3), Sout=>Sout(2));
fa3: full_adder
port map(Cin=>Cout(3), Ain=>Ain(3), Bin=>Bin(3),
Cout=>OverFLow, Sout=>Sout(3));
end ripp_adder4_arch;
end addtest;
use work.iarith_parts_pkg.all;
architecture addtest_arch of addtest is
begin
op4: ripp_adder4 port map(Ain, Bin, Aout, OVF);
end;
36
b b a a a a a a a a
i i i i i i o o o o
n n n n n n u u u u
_ _ _ _ _ _ t t t t
0 1 0 1 2 3 _ _ _ _
| | | | | | 0 1 2 3
| | | | | | | | | |
^^^^^^^^^^^^^^^^^^^^^^^^^^^
0: 0 1 0 0 1 0 L L L L
1: 0 1 0 0 1 0 L L L L
2: 0 1 0 0 1 0 L L L L
3: 0 1 0 0 1 0 H L L L
4: 0 1 0 0 1 0 H L L L
5: 0 1 0 0 1 0 H L L L
6: 0 1 0 0 1 0 H L L L
7: 0 1 0 0 1 0 H L L L
8: 0 1 0 0 1 0 H L L L
9: 0 1 0 0 1 0 H L L L
10: 0 1 0 0 1 0 H L L L
11: 0 1 0 0 1 0 H L L L
12: 0 1 0 0 1 0 H L L L
13: 0 1 0 0 1 0 H L L L
14: 0 1 0 0 1 0 H L L L
37
38
end cshift_arch;
------------------------------------------------------ Bitwise AND
---------------------------------------------------entity bAND is
port( Ain: in bit_vector (3 downto 0);
Bin: in bit_vector (3 downto 0);
Cout: out bit_vector (3 downto 0));
end bAND;
---------------------------------------------------architecture bAND_arch of bAND is
begin
process (Ain, Bin)
begin
Cout(3 downto 0) <= Ain(3 downto 0) and Bin(3 downto 0);
end process;
end bAND_arch;
------------------------------------------------------ Bitwise OR
----------------------------------------------------entity bOR is
port(Ain: in bit_vector (3 downto 0);
Bin: in bit_vector (3 downto 0);
Cout: out bit_vector (3 downto 0));
end bOR;
---------------------------------------------------architecture bOR_arch of bOR is
begin
process (Ain, Bin)
begin
Cout(3 downto 0) <= Ain(3 downto 0) or Bin(3 downto 0);
end process;
end bOR_arch;
39
use work.logic_parts_pkg.all;
use work.iarith_parts_pkg.all;
------------------------------------------------------ alu_comp
--- Input:
--- Ain, Bin - Input bits
-- sel_bit - Calculation type selection bit
-0001 - Bitwise AND operation
-0010 - Bitwise OR operation
-0100 - Left circular shift
-0101 - Right circular shift
-1000 - Integer add.
--- Output:
--- flags - Bit 0 for Ain all zero flag, all time
-Bit 1 for Bin all zero flag, all time
-Bit 2 for latched adder overflow flag
----------------------------------------------------entity alu_comp is
generic ( data_wd: integer := 3);
port( Ain, Bin: in bit_vector (data_wd downto 0);
sel_bit: in bit_vector (data_wd downto 0);
clk:
in bit;
Aout: out bit_vector (data_wd downto 0);
flags:
out bit_vector (2 downto 0));
end alu_comp;
----------------------------------------------------architecture alu_body of alu_comp is
------------------------------------------------------ calc_ctrl
--- Calculation type decoder
--- logic_ctrl - * Bit 0 for Bitwise AND tri-buffer
-* Bit 1 for Bitwise OR tri-buffer
-* Bit 2,3 for circular shift controller
-Bit 4 for circular shift tri-buffer
-* Bit 5 for integer add tri-buffer
----------------------------------------------------procedure calc_ctrl
(sel_bit:
in bit_vector (3 downto 0);
ctrl: out bit_vector (5 downto 0)) is
begin
case sel_bit is
when "0001" => -- Bitwise AND operation
ctrl <= "000001";
when "0010" => -- Bitwise OR operation
ctrl <= "000010";
when "0100" => -- Left circular shift operation
40
begin
-- Synchronize data
process(clk)
begin
if clk='1' and clk'event then
Ain_lat(3 downto 0) <= Ain(3 downto 0);
Bin_lat(3 downto 0) <= Bin(3 downto 0);
sel_bit_lat(3 downto 0) <= sel_bit(3 downto 0);
end if;
end process;
-- Operations assignment
op1: bAND port map(Ain_lat, Bin_lat, bAND_out);
op2: bOR port map(Ain_lat, Bin_lat, bOR_out);
op3: cshift port map(Ain_lat, Bin_lat (1 downto 0), ctrl(3), cshift_out);
op4: ripp_adder4 port map(Ain_lat, Bin_lat, ripp_adder4_out, overflow);
-- Output stage
process(ctrl)
begin
if ctrl(0)='1' then
Aout <= bAND_out;
elsif ctrl(1)='1' then
Aout <= bOR_out;
elsif ctrl(4)='1' then
Aout <= cshift_out;
elsif ctrl(5)='1' then
Aout <= ripp_adder4_out;
else
Aout <= Ain;
end if;
end process;
41
process(Ain)
begin
if
Ain="0000" then
flags(0) <= '1';
else
flags(0) <= '0';
end if;
end process;
process(Bin)
begin
if
Bin="0000" then
flags(1) <= '1';
else
flags(1) <= '0';
end if;
end process;
end alu_body;
5.6 Problems
1) Test component cshift by constructing a VHDL module cshtest.vhd. Test both shift directions, left
and right and make sure the test covers the circular wrap effect. The test vectors should be collected
in a trace file after running the cshtest.vhd simulation. In the trace file, write a comment at the side
of the test vectors that corresponding to each distinct operating period.
Specifically, how long is the input to output delay. Does the delay depend on the number of shift
and/or shift direction?
2) Implement a 4-bit arithmetic shift component ashift. Ashift takes a similar input to cshift except it
does an arithmetic shift instead of a circular shift. This component should also be included in the
package logic_parts_pkg.
3) Implement a 4-bit subtract unit that utilizes components already built, such as the adder. Notice that
one can subtract two numbers by complementing the subtractant then and add the numbers. Again,
this unit should be included in the package iarith_parts_pkg.
4) Implement the ALU shown in Figure 4.18 of Computer Organization and Design by Patterson and
Hennessy using the control logic described in Figure 4.19.
What is the worstcase operational propagation delay in this device that you just implemented?
42
Description
Length
Bit Location
OP
Opcode
4 bits
[15:12]
rs
Operand 1
2 bits
[11:10]
rt
Operand 2
2 bits
[9:8]
rd
Distention
2 bits
[7:6]
sh
Shift amount
2 bits
[5:4]
43
funct
Function
4 bits
[3:0]
Description
Format
OP code
Function
lw
load word
14
sw
store word
15
add
add
sub
subtract
or
logical or
sll
srl
andi
and immediate
12
ori
or immediate
13
44
45
included because the current CPLD fitter only supports up to 128 micro-cell devices. The BIA VHDL
implementation which is shown in a later section includes everything shown in Figure 6-1 with a basic
ALU block that mimic ALU functions.
46
implements functions dictating the BIA state flow. Entity ctrl_module uses the state value produced in
st_mach to generate control signals that go to every module of the BIA CPU.
47
Instruction
Comment
Opcode rs rt
rd sh function
[15:12]
[11:10] [9:8]
[7:6]
[5:4]
[3:0]
ori
; $3 <- $0 OR 6
1101
00 00 11 00 0110
ori
; $2 <- $0 OR 4
1101
00 00 10 00 0001
or
r1, r2, r3
; $1 <- $2 OR $3
0000
10 11 01 00 0010
sw
r1, #4(r0)
; Memory[$0+4] = $1
1111
00 00 01 00 0001
lw
r3, #4(r0)
; $3 = Memory[$0+4]
1110
00 00 11 00 0001
; no op
0001
00 00 00 00 0000
48
two cycles. The reason that instructions are latched is because control signals at every stage or clock cycle
used by various modules are generated from the instruction field. To execute each BIA instruction
correctly, the instruction needs to be frozen during that instruction cycle. For example, instructions such
as ori, or, and lw need to be constant for two clock cycles, and the instruction lw needs three clock cycles.
Again the timing diagram in Figure 6-5 shows ir was assigned value 0xd0c6 for two cycles (cycle 3 and
4). Notice that this operation is different from the pipeline concept to be learned later.
The signals regf4x4_regd1, regf4x4_regd2, and regf4x4_regd3 are a register file that contains registers 1,
2 and 3. Notice that register 0 always contains a zero value, and therefore it was not shown in the figure.
Memory contents are stored in the signals mem2x4_regd0 and mem2x4_regd1. A memory word is 4 bits
wide, However, there are only two words implemented in BIA because of Cypress the VHDL compiler
limitation (BIA with more memory words cannot be fitted and simulated properly).
Although a more complete ALU can be used, the ALU implemenetd here contains only a few simple
operations such as the or, ori, and, and andi instructions. Due to the limited amount of logic that one can
fit into a Cypress CPLD family part, only these representative operations are used and shown for purposes
of this example. Similar is of the case in the memory word depth. The BIA module originally contain a 4
bit wide and 8 word deep (4x8) module. But, the fitter cant get that BIA logic into a 37x CPLD.
Therefore, the memory block size was reduced to 4x2 to make the logic fit.
Another comment about simulating these six BIA instructions in Table 6-3 is that the Cypress VHDL
simulator, NOVA, takes about three minutes to do the simulation on a Petium-90 processor with 16
Megabytes of RAM in the Windows 95 platform. NOVA is not a graphically incremental simulator. The
user does not see timing waveform change incrementally. The final wave shows only at the end of the
simulation. In a typical simulation session, one would first start the simulation and let it run. While the
machine is simulating, output timing waveforms are frozen for approximately three minutes, and then at
last, the waveforms are updated all at once when the simulation completes.
49
6.4.4.1 regf4x4.vhd
Below is the VHDL listing for the 4x4 register file block.
package regf_type is
constant bw: integer := 3;
constant adw: integer := 1;
component regf4x4
port(
clk :
nreset: in bit;
nrw: in bit;
in
50
bit;
use work.regf_type.all;
entity regf4x4 is
port(
clk : in bit;
nreset: in bit;
nrw: in bit;
regin: in bit_vector(bw downto 0);
reg1, reg2:
out bit_vector(bw downto 0);
reg1a, reg2a, regina:
in
bit_vector(adw downto 0));
end regf4x4;
51
6.4.4.2 mem4x2.vhd
Below is the construction of both 2x4 and 8x4 memory blocks in VHDL. Although the 2x4 memory block
is the one used in BIA.vhd, the 8x4 memory block illustrates a more generic method of constructing
memory of any word depth.
package memory_type is
constant bw: integer := 3;
constant adw: integer := 1;
constant total_word: integer := 3;
type one_mem_data is array (bw downto 0)of bit;
type mem_data is array (total_word downto 0) of one_mem_data;
component mem2x4
port(
clk: in bit;
mdatain: in bit_vector(3 downto 0);
mdataout: out bit_vector(3 downto 0);
address: in bit;
nw:
in bit);
end component;
component mem8x4
port(
clk: in bit;
mdatain: in one_mem_data;
mdataout: out one_mem_data;
address: in bit_vector (adw downto 0);
nw:
in bit);
end component;
end memory_type;
use work.int_math.all;
use work.memory_type.all;
52
port(
clk: in bit;
mdatain: in one_mem_data;
mdataout: out one_mem_data;
address: in bit_vector (adw downto 0);
nw:
in bit);
end mem8x4;
architecture mem8x4_arch of mem8x4 is
signal mword: mem_data;
signal madd: bit_vector (adw downto 0);
begin
-- clocked memory!
process (clk)
variable j: integer;
begin
if clk'event and clk='1' then
-- write process
if nw='0' then
for j in 0 to total_word loop
if address=i2bv(j,2) then
mword(j) <= mdatain;
end if;
end loop;
else
for j in 0 to total_word loop
if address=i2bv(j,2) then
mdataout <= mword(j);
end if;
end loop;
end if;
end if;
end process;
end mem8x4_arch;
-----------------------------------------------------------------------------use work.memory_type.all;
entity mem2x4 is
port(
clk: in bit;
mdatain: in bit_vector(3 downto 0);
mdataout: out bit_vector(3 downto 0);
address: in bit;
nw:
in bit);
end mem2x4;
architecture mem2x4_arch of mem2x4 is
53
6.4.4.3 ctrlm.vhd
The VHDL program ctrl_pkg below the VHDL code section main contains BIA state control logic. It
dictates BIA state given independent input such as instructions. At each BIA state, ctrlm.vhd also
generates control signals to control other logic blocks such as ALU, memory, and the register block.
These processes comprise in the component st_mach. Notice that st_mach execute according to the flow
diagram in Figure 6-4.
One more important piece of code is the alu_module block. alu_module is very similar to the one
described in the previous lesson except it contains operations custom to BIA architecture. Also,
operations such as addition and circular shift are not included (commented out) due the CPLD capacity.
The last remaining components are simple multiplexers of various bit widths.
package ctrl_pkg is
component alu_module
port (
ain, bin: in bit_vector (3 downto 0);
sel_bit: in bit_vector (3 downto 0);
shamt: in bit_vector (1 downto 0);
cout: out bit_vector (3 downto 0);
flags: out bit_vector (2 downto 0));
end component;
component mux1_2x1
port (
ain, bin: in bit;
cout: out bit;
sel: in bit);
end component;
component mux4_2x1
port (
ain, bin: in bit_vector (3 downto 0);
cout: out bit_vector (3 downto 0);
sel: in bit);
end component;
component mux2_2x1
port (
ain, bin: in bit_vector (1 downto 0);
cout: out bit_vector (1 downto 0);
sel: in bit);
end component;
component st_mach
54
55
---------------------------------------------------------------entity mux4_2x1 is
port(
ain, bin: in bit_vector (3 downto 0);
cout: out bit_vector (3 downto 0);
sel: in bit);
end mux4_2x1;
architecture mux4_2x1_arch of mux4_2x1 is
begin
process (sel)
begin
if sel='0' then
cout <= ain;
else
cout <= bin;
end if;
end process;
end mux4_2x1_arch;
56
process(clk)
begin
if clk'event and clk='1' then
if nreset='0' then
state <= "000";
alufunc <= "0101";
regdst <= '1';
nregwr <= '1';
nmemwr <= '1';
signmux <= '1';
aluselb <= '1';
mem2reg <= '1';
regdst <= '1';
else
-- lw instruction
if state="000" and aluop="1110" then
state <= "001";
elsif state="001" and aluop="1110" then
state <= "010";
mem2reg <='0';
regdst <='0';
nregwr <= '0';
57
-- sw instruction
elsif state="000" and aluop="1111" then
state <= "011";
nmemwr <= '0';
-- ORI instruct
elsif state="000" and aluop="1101" then
state <= "110";
alufunc <= "0010";
nregwr <= '0';
signmux <= '0';
-- R-type instruction
elsif state="000" and aluop="0000" then
state <= "100";
alufunc <= ir3_0 (3 downto 0);
aluselb <= '0';
nregwr <= '0';
-- back to starting state
else
state <= "000";
alufunc <= "0101";
regdst <= '1';
nregwr <= '1';
nmemwr <= '1';
signmux <= '1';
aluselb <= '1';
mem2reg <= '1';
regdst <= '1';
end if;
end if;
end if;
end process;
end st_mach_arch;
----------------------------------------------------------------- test program
---------------------------------------------------------------entity cmtest is
port (
whatop: inout bit_vector (3 downto 0);
state: inout bit_vector (2 downto 0);
ir3_0: in bit_vector (3 downto 0);
aluop: in bit_vector (3 downto 0);
alufunc: inout bit_vector (3 downto 0);
aluselb: out bit;
mem2reg: out bit;
regdst: out bit;
nregwr: out bit;
signmux: out bit;
nmemwr: out bit;
clk: in bit;
dummy: out bit;
nreset: in bit );
end cmtest;
architecture cmtest_arch of cmtest is
begin
state1: st_mach port map (state, aluop, clk, nreset, ir3_0, aluselb, alufunc, mem2reg,
regdst,nregwr, nmemwr,
signmux);
end cmtest_arch;
58
6.4.4.4 BIA.vhd
The VHDL code below describes the top-most BIA structure. BIA.vhd connects all major component
blocks with extra glue logic to make these blocks compatible. For example, an instruction is fed through
the instruction register ir. ir input is used by a few components such as alu_module and st_mach listed in
previous section. However, ir is one clock cycle behind the actual input inst shown in Figure 6-5 due to
latching. When latching inst, ir is required to hold its value for either two or three clock cycles depending
on the instruction. Therefore, the state machine ir_process inside of bia.vhd is used for this purpose.
Moreover, ir_process also delays the reset signal because of the delayed or latched instruction input into
ir.
See problems section for BIA.vhd VHDL code listing.
6.5 Problems
1) The program BIA.vhd is not given in section 6.4.4.4. However, the entity and the architecture header
statements are given. Please use all the information from this lesson to complete the program
BIA.vhd. The BIA.vhd JEDEC file is given to demonstrate the simulation.
use work.regf_type.all;
use work.ctrl_pkg.all;
use work.memory_type.all;
entity bia is
port (
nrw, nmw: inout bit;
aluselb, signmux: inout bit;
nreset_delay: inout bit;
clk, nreset: in bit;
inst: in bit_vector(15 downto 0);
regb2out, rega1out, aluselbout: inout bit_vector(3 downto 0);
address: inout bit_vector (3 downto 0);
mem2regout: inout bit_vector (3 downto 0);
state: inout bit_vector (2 downto 0));
end bia;
architecture bia_arch of bia is
Simulate your compiled program in the NOVA simulator and produce a timing waveform similar to
the one in Figure 6-5.
59
7. APPENDICES
The sections below describe miscellaneous procedures for beginners to use the Cypress VHDL WARP2
version 4.0 development system. The WARP2 program supports both Window 3.1X and Window 95.
OK button if this were the first time WARP got start. After that, a project window will appear on the
screen. See Figure 7-1.
3) Click on the top Galaxy menu bar item Project, then select New command, and another window will
pop on the screen to prompt for the project name. Type prj_name for the project name. After that,
another interactive project development window named prj_name will pop on the screen.
4) Now click on the New button inside of the Edit box on right hand side of the prj_name project
window to start the editor.
5) Enter the VHDL code such as the one from lesson one, Listing 1-1.
6) After typing in the VHDL program, click on the top File button and select the Save As command to
save the file into fil_name.vhd. Quit the editor after the file has been saved. This should return the
program control back to the project manager window prj_name.
7) To include the VHDL program in the project manager window prj_name for compiling, click on the
Files command on the top menu bar and select the Add button to enable the VHDL program add
window. In the add window, first select the VHDL file by clicking on it (e.g. fil_name.vhd) in the left
box, and then click on the button to move the VHDL file to the right box. Select the OK button to
go back to the project window afterwards.
61
11) If there were VHDL syntax errors, one can easily go back to the VHDL program by selecting the
Error buttons that got the magnifying glass icon on the top of the compilation window menu bar.
12) After the compilation process is completed, quit the compilation window by clicking on the Close
button on the top of compilation window menu bar.
13) To start the NOVA simulator, click on the top Tools button and select the NOVA command to get to
the NOVA window. In the NOVA window, click on the Files button and select Open to specify the
simulation object file with the file extension *.jed. In this example, fil_name.jed is used.
14) In the NOVA simulator, one can examine the circuits functionality by studying the input and output
relationship. To simulate the circuit, one needs to set the input waveform(s) first, then, click on the
top menu bar Simulate item and select the execute command to simulate. Hopefully, in a short
period, NOVA will display the simulated waveform in red.
62