Project CMPEN 331 Computer Organization and Design - Penn State University - School of Electrical Engineering and Computer Science
Project CMPEN 331 Computer Organization and Design - Penn State University - School of Electrical Engineering and Computer Science
State
University
School
of
Electrical
Engineering
and
Computer
Science
Page
1
of
8
Project
CMPEN
331
–
Computer
Organization
and
Design
Late
submission
is
not
accepted
and
will
result
in
not
getting
any
credits
for
the
project
In the project, you just need to implement what was described in the honor option section in lab 5, with the addition of the
implementation and generation of the bit stream without errors.
The following part is not mandatory but any student will choose to do this part in addition to the previous
part, will take 5 points extra to the total grade of the course:
In this extra points project, the students are implementing a pipeline CPU using the Xilinx design package for FPGAs. You
can use any information available in previous labs if needed.
3. Pipelining
As described in lab 3
As described in lab 3
As described in lab 3
As described in lab 4
As described in lab 4
As described in lab 5
where the rsrtequ signal indicates where da or db are equal or not. Both da and db should be the state of
the art data. Referring to figures 3 and 4, we use the outputs of the multiplexers for internal forwarding as da and
db. This is the reason why we put the forwarding to the ID stage instead of to the EXE stage. Because the
delayed branch, the return address of the MIPS jal instruction is PC+8. Figure 5 illustrates the execution of the
jal instruction. The instruction located in delay slot (PC + 4) was already executed before transferring control to
a function (or a subroutine). The return address should be PC+8, which is written into $31 register in the WB
stage by the jal instruction. The return form subroutine can be done by the instruction of jr $31. The jr rs
instruction reads the content of register rs and writes it into the PC.
? ? ? ? ? ? ? ? ? ?
? ? ? ? Target address: IF ID EXE MEM
Target address: IF ID EXE IF ID EXE
a: beq ID a: beq ID
a+4: IF ID EXE MEM WB a+4: IF ID EXE MEM WB
a+8: IF ID EXE MEM t: IF ID EXE MEM
a+12: IF ID EXE t+4: IF ID EXE
pcsrc rsrtequ
wpcir
op Control
unit fwdb
0 func fwda
1 npc
2
3 imm << bpc
pc4 4 epc8
4 dpc4
addr jpc
<<
we
rs
a do rna qa 0
da
pc rt 1
Inst rnb 2
3
mem Regfile equ
wn rsrtequ
qb 0
db
d 1
2
3
clk
IF ID EXE
rs
rt
clk
IF ID EXE MEM WB
Penn
State
University
School
of
Electrical
Engineering
and
Computer
Science
Page
4
of
8
PC: jal ID WB
PC + 4 (delay slot): IF ID EXE MEM WB
PC + 8 (return address):
For your reference, figure 6 illustrate the detailed circuit of the pipelined CPU, plus instruction
memory and data memory. The PC can be considered as the first pipeline located in front of the IF
stage, and a register of the register file can be considered as the sixth (last) pipeline register at the end of
the WB stage.
In the IF stage, an instruction is fetched from instruction memory, and the PC is incremented by 4 if the
instruction in the ID stage is neither a branch nor a jump instruction, and there is no pipeline stall. There
are four sources for the next PC:
pc4: PC+4
bpc: branch target address of a beq or bne instruction
da: target address in register of a jr instruction
jpc: jump target address of a j or jal instruction
The selection of the next PC (npc) is done by a 32-bit 4-to-1 multiplexer whose selection signal is
pcsrc (PC source), generated by the control unit in the ID stage.
In the ID stage, two register operands are read from the register file based on rs and rt; the
immediate (imm) is extended and the instruction is decoded based on op (and func) by the control
unit.
The selection signal of the multiplexer for ALU’s input e.g. A is named fwda (forward A) and the other
for ALU’s input B, is named fwdb (forward B). if there is no data hazard, the multiplexer selects the
data read from the register file. The inverse of the stall signal is used as the write enable for the PC and
the IF/ID pipeline register (wpcir). The stall signal becomes true when an instruction in the ID
stage uses the result of an lw instruction which is in the EXE stage. Thus, the stall signal can be
generated by the following Verilog HDL code.
stall = ewreg & em2reg & (ern!=0) & (i_rs & (ern== rs) | i_rt & (ern
== rt));
where i_rs and i_rt indicate that an instruction uses the contents of the rs register and the rt register
respectively.
There is an important thing we must not to forget. The pipeline stall is implemented by prohibiting the
updates of the PC and the IF/ID pipeline register. But the instruction that is already in the IF/ID register
will be decoded and fed to the next pipeline stage. This will result in an instruction being executed
twice. To prvent an instruction from being executed twice, we must cancel the first instruction.
Canceling an instruction is easy: prevent it from updating the states of the cpu and memory.
All the control signals that will be used in the following stages are saved into the ID/EXE registers.
In the EXE stage, in addition to the operation performed by the ALU, the PC+8 operation is carried out
by an adder for generating the return address for the jal instruction. The shift amount (sa) for a shift
Penn
State
University
School
of
Electrical
Engineering
and
Computer
Science
Page
5
of
8
instruction can be extracted from the immediate field (eimm). If the instruction in the EXE stage is a
jal, PC+8 is selected and the destination register number (ern) is set to 31 (done by f component).
Otherwise, the ALU output is selected and let ern=ern0 (rd or rt in the EXE stage).
In the MEM stage, if the instruction is an sw, the data mb will be written into the data memory
addressed by malu. If the instruction is an lw, the memory data addressed by malu is read out. Other
instructions do nothing in this stage.
In the WB stage, an instruction is graduated by writing the result, either the ALU result or memory
data, into a register file. The destination register number is wrn (register number in the WB stage). And
the write enable signal is wwreg (register write enable in WB stage)
below is the test data that should be stored in the data memory. Four 32-bit words in the memory will be read by
lw instructions. The test program will store a word in the location next to the four words.
Control unit
m2reg em2reg mm2reg wm2reg
0 wmem ewmem mwmem
1
2 npc jal ejal
3 aluc ealuc
aluimm ealuimm
shift eshift
op regrt
func rsrtequ
rs sext
fwdb
rt fwda
addr jpc 4
<< epc8
4 pc4 dpc4
bpc epc4
imm
<<
Regfile sa
rs 1 we
rna qa 0
da ea a 1 ealu
rt 1
0 aluc malu
a rnb 2 a
pc 3
we ALU 0
Inst d Data
1
mem wn qb 0
db eb b mem
inst
1
2 0
ins 3
do di do 1 wdi
imm dimm eimm
e
0
rd
0 drn ern0 mrn wrn
rt f
1 ern
mmo
clk
IF ID EXE MEM WB
Figure 7 illustrates an example of waveforms when the pipelined CPU executes the jal instruction (PC =
0x00000008). The instruction in the delay slot (PC = 0x0000000c) is executed also. The taget address of the jal
instruction is 0x0000006c, the entry of a subroutine (sum). The result at the EXE stage of the jal instruction is
0x00000010, which is the return address (from the subroutine).
IF
ID
EXE
MEM
WB
11. Write a Verilog code that implement the instructions shown in item number 8 with the corresponding
initialization of data memory using the design shown in Figure 6. You need to show your outputs in a similar way
as figure 7 with the same signals when the pipelined CPU execute the lw $9, 0($4) instruction (PC =
Penn
State
University
School
of
Electrical
Engineering
and
Computer
Science
Page
8
of
8
0x00000070) and its follow up, the add $8, $8, $9 instruction in the fourth (last round) of the for loop.
14. You have to upload the whole project design file zipped with the word or LaTex with PDF file.