0% found this document useful (0 votes)
781 views42 pages

ECE429 Final Project

This document describes a case study for a 32-bit pipelined CPU design with a new ALU architecture. It presents the objectives, circuit description, and building blocks of the CPU design. The CPU uses a memory file and ALU as primary building blocks. It is synchronized by a clock signal and instructions are executed over two clock cycles - memory access in one cycle and ALU operation in the next. The goal is to understand the design and synchronize signals with critical path delays.

Uploaded by

Dheeraj Gaddi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
781 views42 pages

ECE429 Final Project

This document describes a case study for a 32-bit pipelined CPU design with a new ALU architecture. It presents the objectives, circuit description, and building blocks of the CPU design. The CPU uses a memory file and ALU as primary building blocks. It is synchronized by a clock signal and instructions are executed over two clock cycles - memory access in one cycle and ALU operation in the next. The goal is to understand the design and synchronize signals with critical path delays.

Uploaded by

Dheeraj Gaddi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 42

Case Study for 32-bit Pipelined CPU

design with New ALU Architecture


by

Dheeraj Sadashivappa Gaddi


A20329707
12/08/2014

The project report is prepared for ECE 429 Final Project


Fall 2014, Illinois Institute of Technology, Chicago
Master of Science in Computer Engineering

DEPARTMENT OF ELECTRICAL AND ELECTRONICS

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

TABLE OF CONTENTS

Contents
CHAPTER 1: INTRODUCTION ...................................................................................................................... 5
CHAPTER 2: CIRCUIT DESCRIPTION ......................................................................................................... 6
CHAPTER 3: MEMORY FILE ......................................................................................................................... 1
CHAPTER 4: ARITHMETIC LOGIC UNIT (ALU) ........................................................................................ 1
CHAPTER 5: SYNCHRONIZATION .............................................................................................................. 2
CHAPTER 6: CASE STUDY 1 ......................................................................................................................... 2
6.1

RTL Simulation ................................................................................................................................... 2

CHAPTER 7: CASE STUDY 2 ....................................................................................................................... 20


7.1

Structure of 4-bit comparator ............................................................................................................ 20

7.2

The structure view of the 32-bit comparator ..................................................................................... 21

7.3

The new ALU design ........................................................................................................................ 22

7.4

Code Implemented ............................................................................................................................ 23

CHAPTER 8: RESULT ................................................................................................................................... 32

1|Page

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

LIST OF FIGURES
Figure 1: Overview of the primary blocks and signals ...................................................................................... 6
Figure 2: Memory file configuration ................................................................................................................. 1
Figure 3: Block diagram of the ALU circuit ...................................................................................................... 2
Figure 4: Instruction word contents ................................................................................................................... 2
Figure 5: RTL Simulation result ........................................................................................................................ 3
Figure 6: Simvision Waveform for CLA ........................................................................................................... 4
Figure 7: Simvision Waveform for CRA ........................................................................................................... 5
Figure 8: Simvision Waveform for CSA ........................................................................................................... 5
Figure 9: Simvision Waveform for CSeA ......................................................................................................... 6
Figure 10: Cell.rep file generated ...................................................................................................................... 7
Figure 11: timing.rep file generated................................................................................................................... 8
Figure 12: post-synthesis simulation ................................................................................................................. 9
Figure 13: timing.rep.5.final ............................................................................................................................ 10
Figure 14: post-P&R simulation ...................................................................................................................... 11
Figure 15: post-P&R simulation Simvision Output ......................................................................................... 11
Figure 16: Log file for CLA for New test Bench............................................................................................. 13
Figure 17: Log file for CRA for New test Bench ............................................................................................ 13
Figure 18: Log file for CSA for New test Bench ............................................................................................. 14
Figure 19: Log file for CSeA for New test Bench ........................................................................................... 15

2|Page

Illinois Institute of Technology


Dheeraj Sadashivappa Gaddi
ECE429 Final Project
Figure 20: Simvision Waveform for CLA for New test Bench ....................................................................... 15
Figure 21 : Simvision Waveform for CRA for New test Bench ...................................................................... 16
Figure 22: Simvision Waveform for CSA for New test Bench ....................................................................... 17
Figure 23: Simvision Waveform for CSeA for New test Bench ..................................................................... 17
Figure 24: Post P&R Simulation ..................................................................................................................... 18
Figure 25: Structure of 4-bit comparator ......................................................................................................... 20
Figure 26: Structure of 32-bit comparator ....................................................................................................... 22
Figure 27: Block diagram of the new ALU circuit .......................................................................................... 23
Figure 28: New Test bench values ................................................................................................................... 26
Figure 29: Simulation result for New Testbench ............................................................................................. 27
Figure 30: Simvision Output for New Testbench ............................................................................................ 28
Figure 31: Power Values after simulation ....................................................................................................... 29
Figure 32: All Ports Matched Report ............................................................................................................... 30
Figure 33: P & R simvision output .................................................................................................................. 31
Figure 34: Final Routed Chip Design .............................................................................................................. 31

3|Page

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

LIST OF TABLES
Table 1: Comparator condition ........................................................................................................................ 20
Table 2: Final results ........................................................................................................................................ 32

4|Page

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 1: INTRODUCTION
This Report will describes the Case Study for 32-bit Pipelined CPU design with New ALU
Architecture. The objective of this project is to understand a 32-bit Pipelined Central Processing Unit
(CPU). As the name of the design declares, the word length of the data used in the circuits is 32 bits.
Furthermore, since this circuit is pipelined, more than one instruction can be executed simultaneously. The
operation of the circuit is synchronized by an externally set clock signal. Also the instruction signals for
addressing the memory file, selecting the Arithmetic Logic Unit (ALU) operands and specifying the
operation of the ALU are also external. The correct synchronization of those signals with the critical data
path delay of the circuit that will determine the minimum operating period is one of the objectives of the
project.

5|Page

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 2: CIRCUIT DESCRIPTION


An overview of the primary building blocks and signals of the CPU is shown in Fig.1. As shown in Fig. 1,
the primary building blocks are the memory file and the ALU. The external clock signal synchronizes the
capture and release of data within the memory file block. The circuit is pipelined and each instruction is
explained in two clock cycles. In the first clock cycle, the two decoders are used to decode the external
address selection signals used for specifying the contents of the memory file that should be read at the
memory ports in each clock cycle. Additionally, multiplexer blocks are used to select the operands for the
ALU. In the next clock cycle, the ALU executes the specified operation. The ALU results can be read from
the outside of the CPU through a tri-state buffer, based upon the value of the externally specified OEN
(output enable) signal. Finally the ALU results can be written back in the memory file in the word specified
by the Address B

Figure 1: Overview of the primary blocks and signals

6|Page

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 3: MEMORY FILE


The memory file of this design stores 32 32-bit words. There are two read ports in the
memory file and one write port. The words to be read in each clock cycle are specified by
the external 5-bit words address A and Address B. The internal configuration of the memory
file is illustrated below in Fig. 2.
As illustrated in Fig. 2, the primary storing element within the memory file is a D-register.
The output of each D-register is connected to the two output ports of the memory file
through tri-state buffers. The tri-state buffers are enabled by the decoded address (signals A
and B) and the contents of the D-registers appear at the output ports A and B respectively.
Furthermore, the value of address B specifies the word address in the memory file where the
results of the ALU computation are stored in the second cycle of instruction execution. The
writing of the ALU results within the memory file is synchronized by the clock signal.

Figure 2: Memory file configuration

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 4: ARITHMETIC LOGIC UNIT (ALU)


The ALU of the circuit has two operands A and B and an implement the following eight
functions:

A * B : multiplication
A + B : addition
A - B : subtraction
B - A : subtraction
A or B : logic OR function
A and B : logic AND function
A xor B : logic XOR function
A xnor B : logic XNOR function

As illustrated in Fig. 1 the operand of the CPU can be selected among the following:
Operand A: Operand A can be selected between the read port A of the memory file
and the externally defined data in. The selection is done by the external signal ASEL.
Operand B: Operand B can be selected between the read port B of the memory file
and the logic zero value. The selection is done by the external signal BSEL.
Internally the ALU has three primary operation blocks: the multiplier, the adder and the logic
function block. These blocks are illustrated in Fig 3. The multiplier can be implemented as
32-by-32 array-based multiplier. The multiplier executes the multiplication function. Notice,
however, that the result of the multiplication operation is 64 bits. Therefore, in order to store
the multiplication result back to memory file we need two clock cycles. Therefore the
multiplication instruction is executed in 3 clock cycles. This is made possible by pipelining
the multiplier unit in order to produce the 16 least significant bits (LSB) of the result in once
clock cycle and the following 16 most significant bits (MSB) in the next cycle. Notice
however, that the instruction immediately following a multiplication operation should select
the MSB of the multiplier at the output of the ALU. It should also specify the storage address
of the MSB bits in the memory file.

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 3: Block diagram of the ALU circuit


The adder circuit within the ALU is a 32-bit adder/subtractor circuit. It executes the addition
and subtraction operations. The selection of the operation is done by the two externally
defined operation select signals OPSEL. The same signals are used for specifying the
operation executed within the logic function block of the ALU. The final output of the ALU
is specified by the output select OUTSEL signals that control the final 4-to-1 multiplexer
within the ALU.
Finally, the ALU creates a control output signal that can be used externally of the CPU:
Adder overflow the signal is 1 if there is an adder overflow.

A20329707

Page 2

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final

Project
CHAPTER 5: SYNCHRONIZATION
To better understand the circuit synchronization sequence described below, please refer to
Fig1.The operation of the CPU is synchronized by the external clock signal.

An instruction to be executed by the CPU is determined by the external control signals. An


example of an instruction word is illustrated in Fig. 4. After the clock switches high the
instruction is applied (fetched) to the signals that control the CPU operation. Since each
instruction is executed in two steps, some of these control signals need to be stored at the
internal registers of the CPU. On the first step of the instruction, these signals will specify the
contents of the memory file that will be read from the read ports A and B. Also they will
specify the operands of the ALU. In the second cycle of the operation the control signals will
determine the operation to be executed internally the ALU, and the value of the ALU output.

Figure 4: Instruction word contents


The results of the ALU will be available of the circuit if the OEN signal is set, and it will also
be written back in the memory file. The address in memory where the ALU result is written is
specified by the address B value. The data will be written that word at the next positive edge
of the clock signal.
The control signals are set after a positive edge of the clock signal and should not change
before the next positive edge. The period of the clock signal is determined by the longest path
of data within the circuit.

A20329707

Page 2

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 6: CASE STUDY 1


32-bit CPU design with Different Adders - Carry Ripple Adder, Carry Lookahead Adder,
Carry Skip Adder, and Carry Select Adder
We have the source verilog code and test bench provided by Professor for the cpu design with
Carry Ripple Adder (cpu_CRA.v), Carry Lookahead Adder (cpu_CLA.v), Carry Skip Adder
(cpu_CSA.v), Carry Select Adder (cpu_CSeA.v), and testbench verilog (tb_cpu.v) and we
have done the logical synthesis and physical synthesis by using IIT-ECE429 ASIC flow but
referring previous Lab programs.
6.1

RTL Simulation
1. We have ensured that there is no bug in the design before synthesis. We have
verified the correctness by running testing testbench. The whole cpu design in
Verilog is provided, cpu_xxx.v, and a testbench for verifying the CPU,
tb_cpu.v, where we tested functionality for store, read, addition,and
subtraction.

Below attached are screenshots for RTL Simulation

A20329707

Page 2

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 5: RTL Simulation result

A20329707

Page 3

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 6: Simvision Waveform for CLA

A20329707

Page 4

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 7: Simvision Waveform for CRA

Figure 8: Simvision Waveform for CSA

A20329707

Page 5

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 9: Simvision Waveform for CSeA

A20329707

Page 6

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 10: Cell.rep file generated

A20329707

Page 7

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 11: timing.rep file generated

A20329707

Page 8

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 12: post-synthesis simulation

A20329707

Page 9

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 13: timing.rep.5.final


As given in timing.rep.5.final file, Max clock frequency is determined by the worst case path
minimum time required for completion of cycle is 28.976nS. So maximum frequency of
operation will be
Fmax= 35.075 Mhz

A20329707

Page 10

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 14: post-P&R simulation

Figure 15: post-P&R simulation Simvision Output

A20329707

Page 11

Illinois Institute of Technology


Dheeraj Sadashivappa Gaddi
Now we have changed tb_cpu value from

ECE429 Final Project

To

A20329707

Page 12

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 16: Log file for CLA for New test Bench

Figure 17: Log file for CRA for New test Bench

A20329707

Page 13

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 18: Log file for CSA for New test Bench

A20329707

Page 14

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 19: Log file for CSeA for New test Bench

Figure 20: Simvision Waveform for CLA for New test Bench

A20329707

Page 15

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 21 : Simvision Waveform for CRA for New test Bench

A20329707

Page 16

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 22: Simvision Waveform for CSA for New test Bench

Figure 23: Simvision Waveform for CSeA for New test Bench

A20329707

Page 17

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 24: Post P&R Simulation

CLA(nS) CRA(nS) CSA(nS) CSeA(nS)


5555_5555 + 5

2.10

2.11

2.19

1.2

AAAA_AAAA + 5555_5555

2.85

2.90

2.36

1.65

Path Delay for Each

0000_00C8 + 0000_012C

2.45

2.76

2.13

1.70

Operation (PostSynthesis

5 + 0000_000A

2.69

2.59

2.18

1.69

FFFF_FFFF - 0000_0001

2.68

2.4

2.24

2.07

FFFF_FFFF + 0000_0001

2.55

2.32

2.28

1.85

5555_5555 - 5

2.13

2.35

1.89

1.64

AAAA_AAAB + 5555_5555

2.42

2.39

1.8

1.65

Gate-Level Delay)

A20329707

Page 18

Illinois Institute of Technology


Dheeraj Sadashivappa Gaddi
ECE429 Final Project
A. As observed from above that more number of bits to be operated more delay will be
there in path.
B. CSeA take minimum path delay as it has less data to handle
C. CLA always takes less time to perform operation

A20329707

Page 19

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 7: CASE STUDY 2


Comparator Design in the ALU for the 32-bit
The function of a 32-bit comparator in Verilog is shown in Table 1. Suppose we have
two 32-bit inputs (we assume them to be unsigned in this project) A and B. Since the
result of comparing them can be A > B , A = B and A < B. So two bits are needed to
represent the comparison result (two outputs f1 and f0). Note that when f1 = 1, it
means two integers are equal. Otherwise, f0 is used to determine the relation of A
and B.
In this project, We have designed the 32-bit comparator in a structural way. First of
all, we will explain the structure by using 4-bit comparator. Then we will give the
structure view of the 32-bit comparator, and finished the Verilog coding according to
the structure.
f1

f0

A>B

B>A

A=B

Table 1: Comparator condition


7.1

Structure of 4-bit comparator

Figure 25: Structure of 4-bit comparator

A20329707

Page 20

Illinois Institute of Technology


Dheeraj Sadashivappa Gaddi
ECE429 Final Project
The structure of 4-bit comparator is shown in Fig. 10. It is designed in a tree
structure. At the bottom level (Level 2), there are 4 one bit comparators. Each of
them is used to compare the corresponding bit in A and B. The meaning of the output
f1 and f0 are the same as the meaning in Fig. 10 (f1f0=10 means a=b, f1f0=00 means
a<b, f1f0=01 means a>b).
Notice that the final comparison result depends on the comparison result of the most
significant bit which has determined the relation of the two integers. Take the 4 bit
comparator shown in Fig for example. If the results from MSB A[3] and B[3] has
shown that A[3] > B[3] or A[3]< B[3] (in other words, f1 = 0), then it means A > B
or A < B. On the other hand, if A[3] = B[3] (f1 = 1), then we have to refer to the
comparison result of next significant bit A[2] and B[2]. If A[2] and B[2] are equal,
we have to compare A[1] and B[1], and so on. If all the 4 bits are equal (f1 from all
the four one bit comparators are all 1s), then A = B. In fact, rather than comparing
bits from MSB to LSB sequentially, we can do the comparison in parallel in order to
save time, as we can see from Fig. 10. Remember that the left part of f1 and f0
results always have higher priority than the right part of the f1 and f0. To be more
specific, for the component of mux_4to2 in Fig. 10, if hi_f1 = 0 which means the
relation of A and B has already been determined, then its output f1 and f0 should be
consistent with hi_f1 and hi_f0. Otherwise, f1 and f0 should be consistent with lo_f1
and lo_f0.
From Fig, we can see that the number of mux_4to2 is 3 which is equal to 4 1 and
the level of the tree is 3 which is equal to log2(4) + 1. More generally, if two N bit
(N is the power of 2) unsigned integers are compared, then the tree comparator will
be (log2(N) + 1) levels, and it will consists of N -1 mux_4to2 and N one_bit_comp.
7.2

The structure view of the 32-bit comparator

The structure of the 32-bit comparator is shown in Fig. 11. You are supposed to
finish the Verilog coding of this structure and include it in the ALU design. There
should be three modules in your Verilog code: one_bit_comp, mux_4to2, and
tree_comp. The definition part of each module is included in file cpu_comp.v, and
they are listed in Fig. You should finish the code in order to complete your new cpu
design.

A20329707

Page 21

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 26: Structure of 32-bit comparator

7.3

The new ALU design

After adding the 32-bit comparator to the ALU design, the new ALU will look like
Fig . Note that we should extend the two bit output {f1, f0} to 32-bit result.
Moreover, the original 4 to 1 MUX in the ALU should be changed to a 5 to 1 MUX,
and its select signal OUTSEL should be 3 bits now. The ALU design has already
been modified so that you can focus on the comparator design.

A20329707

Page 22

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 27: Block diagram of the new ALU circuit

7.4

Code Implemented

//Level 5: 32 one_bit_comp go here


one_bit_comp Level5_1(A[0], B[0], f1_L5[0], f0_L5[0]);
one_bit_comp Level5_2(A[1], B[1], f1_L5[1], f0_L5[1]);
one_bit_comp Level5_3(A[2], B[2], f1_L5[2], f0_L5[2]);
one_bit_comp Level5_4(A[3], B[3], f1_L5[3], f0_L5[3]);
one_bit_comp Level5_5(A[4], B[4], f1_L5[4], f0_L5[4]);
one_bit_comp Level5_6(A[5], B[5], f1_L5[5], f0_L5[5]);
one_bit_comp Level5_7(A[6], B[6], f1_L5[6], f0_L5[6]);
one_bit_comp Level5_8(A[7], B[7], f1_L5[7], f0_L5[7]);
one_bit_comp Level5_9(A[8], B[8], f1_L5[8], f0_L5[8]);
one_bit_comp Level5_10(A[9], B[9], f1_L5[9], f0_L5[9]);
one_bit_comp Level5_32(A[10], B[10], f1_L5[10], f0_L5[10]);
one_bit_comp Level5_11(A[11], B[11], f1_L5[11], f0_L5[11]);
one_bit_comp Level5_12(A[12], B[12], f1_L5[12], f0_L5[12]);
one_bit_comp Level5_13(A[13], B[13], f1_L5[13], f0_L5[13]);
one_bit_comp Level5_14(A[14], B[14], f1_L5[14], f0_L5[14]);
one_bit_comp Level5_15(A[15], B[15], f1_L5[15], f0_L5[15]);
one_bit_comp Level5_16(A[16], B[16], f1_L5[16], f0_L5[16]);

A20329707

Page 23

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

one_bit_comp Level5_17(A[17], B[17], f1_L5[17], f0_L5[17]);


one_bit_comp Level5_18(A[18], B[18], f1_L5[18], f0_L5[18]);
one_bit_comp Level5_19(A[19], B[19], f1_L5[19], f0_L5[19]);
one_bit_comp Level5_20(A[20], B[20], f1_L5[20], f0_L5[20]);
one_bit_comp Level5_21(A[21], B[21], f1_L5[21], f0_L5[21]);
one_bit_comp Level5_22(A[22], B[22], f1_L5[22], f0_L5[22]);
one_bit_comp Level5_23(A[23], B[23], f1_L5[23], f0_L5[23]);
one_bit_comp Level5_24(A[24], B[24], f1_L5[24], f0_L5[24]);
one_bit_comp Level5_25(A[25], B[25], f1_L5[25], f0_L5[25]);
one_bit_comp Level5_26(A[26], B[26], f1_L5[26], f0_L5[26]);
one_bit_comp Level5_27(A[27], B[27], f1_L5[27], f0_L5[27]);
one_bit_comp Level5_28(A[28], B[28], f1_L5[28], f0_L5[28]);
one_bit_comp Level5_29(A[29], B[29], f1_L5[29], f0_L5[29]);
one_bit_comp Level5_30(A[30], B[30], f1_L5[30], f0_L5[30]);
one_bit_comp Level5_31(A[31], B[31], f1_L5[31], f0_L5[31]);

//Level 4: 16 mux_4to2 go here


mux_4to2 Level4_1(f1_L5[1], f0_L5[1],f1_L5[0], f0_L5[0], f1_L4[0], f0_L4[0]);
mux_4to2 Level4_2(f1_L5[3], f0_L5[3],f1_L5[2], f0_L5[2], f1_L4[1], f0_L4[1]);
mux_4to2 Level4_3(f1_L5[5], f0_L5[5],f1_L5[4], f0_L5[4], f1_L4[2], f0_L4[2]);
mux_4to2 Level4_4(f1_L5[7], f0_L5[7],f1_L5[6], f0_L5[6], f1_L4[3], f0_L4[3]);
mux_4to2 Level4_5(f1_L5[9], f0_L5[9],f1_L5[8], f0_L5[8], f1_L4[4], f0_L4[4]);
mux_4to2 Level4_6(f1_L5[11], f0_L5[11],f1_L5[10], f0_L5[10], f1_L4[5], f0_L4[5]);
mux_4to2 Level4_7(f1_L5[13], f0_L5[13],f1_L5[12], f0_L5[12], f1_L4[6], f0_L4[6]);
mux_4to2 Level4_8(f1_L5[15], f0_L5[15],f1_L5[14], f0_L5[14], f1_L4[7], f0_L4[7]);
mux_4to2 Level4_9(f1_L5[17], f0_L5[17],f1_L5[16], f0_L5[16], f1_L4[8], f0_L4[8]);
mux_4to2 Level4_10(f1_L5[19], f0_L5[19],f1_L5[18], f0_L5[18], f1_L4[9], f0_L4[9]);
mux_4to2 Level4_11(f1_L5[21], f0_L5[21],f1_L5[20], f0_L5[20], f1_L4[10], f0_L4[10]);
mux_4to2 Level4_12(f1_L5[23], f0_L5[23],f1_L5[22], f0_L5[22], f1_L4[11], f0_L4[11]);
mux_4to2 Level4_13(f1_L5[25], f0_L5[25],f1_L5[24], f0_L5[24], f1_L4[12], f0_L4[12]);
mux_4to2 Level4_14(f1_L5[27], f0_L5[27],f1_L5[26], f0_L5[26], f1_L4[13], f0_L4[13]);

A20329707

Page 24

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

mux_4to2 Level4_15(f1_L5[29], f0_L5[29],f1_L5[28], f0_L5[28], f1_L4[14], f0_L4[14]);


mux_4to2 Level4_16(f1_L5[31], f0_L5[31],f1_L5[30], f0_L5[30], f1_L4[15], f0_L4[15]);

//Level 3: 8 mux_4to2 go here


mux_4to2 Level3_1(f1_L4[1], f0_L4[1], f1_L4[0], f0_L4[0], f1_L3[0], f0_L3[0]);
mux_4to2 Level3_2(f1_L4[3], f0_L4[3], f1_L4[2], f0_L4[2], f1_L3[1], f0_L3[1]);
mux_4to2 Level3_3(f1_L4[5], f0_L4[5], f1_L4[4], f0_L4[4], f1_L3[2], f0_L3[2]);
mux_4to2 Level3_4(f1_L4[7], f0_L4[7], f1_L4[6], f0_L4[6], f1_L3[3], f0_L3[3]);
mux_4to2 Level3_5(f1_L4[9], f0_L4[9], f1_L4[8], f0_L4[8], f1_L3[4], f0_L3[4]);
mux_4to2 Level3_6(f1_L4[11], f0_L4[11], f1_L4[10], f0_L4[10], f1_L3[5], f0_L3[5]);
mux_4to2 Level3_7(f1_L4[13], f0_L4[13], f1_L4[12], f0_L4[12], f1_L3[6], f0_L3[6]);
mux_4to2 Level3_8(f1_L4[15], f0_L4[15], f1_L4[14], f0_L4[14], f1_L3[7], f0_L3[7]);

//Level 2: 4 mux_4to2 go here


mux_4to2 Level2_1(f1_L3[1], f0_L3[1], f1_L3[0], f0_L3[0], f1_L2[0], f0_L2[0]);
mux_4to2 Level2_2(f1_L3[3], f0_L3[3], f1_L3[2], f0_L3[2], f1_L2[1], f0_L2[1]);
mux_4to2 Level2_3(f1_L3[5], f0_L3[5], f1_L3[4], f0_L3[4], f1_L2[2], f0_L2[2]);
mux_4to2 Level2_4(f1_L3[7], f0_L3[7], f1_L3[6], f0_L3[6], f1_L2[3], f0_L2[3]);

//Level 1: 2 mux_4to2 go here


mux_4to2 Level1_1(f1_L2[1], f0_L2[1], f1_L2[0], f0_L2[0], f1_L1[0], f0_L1[0]);
mux_4to2 Level1_2(f1_L2[3], f0_L2[3], f1_L2[2], f0_L2[2], f1_L1[1], f0_L1[1]);

//Level 0: 1 mux_4to2 goes here


mux_4to2 Level0_1(f1_L1[1], f0_L1[1], f1_L1[0], f0_L1[0], f1, f0);

//mux to select the f1 f0 outputs


module mux_4to2(hi_f1, hi_f0, lo_f1, lo_f0, f1, f0);
input hi_f1, hi_f0, lo_f1, lo_f0;
output f1, f0;

A20329707

Page 25

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

//use hi_f1 to select the correct outputs


assign

f1 =

hi_f1 & lo_f1;

assign

f0 = (hi_f1 & lo_f0 ) | ((~hi_f1) & hi_f0);

module one_bit_comp ( a,b,f1,f0);


input a ;
input b ;
output f1,f0;
assign f1 = a ~^ b;
assign f0 = a;

By reducing f1 and f0 from Karnad map we obtain equation


f1= (hi_fi)*(lo_f1)
f0= ((hi_fi)*(lo_f1))+( (~hi_f1)*( hi_f0))

Created New Test bench tb_cpu for below given values and function.

Figure 28: New Test bench values

A20329707

Page 26

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

After running newly created testbench in RTL Simulator. New test bench runs
successfully and error free output.

Figure 29: Simulation result for New Testbench

A20329707

Page 27

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 30: Simvision Output for New Testbench

A20329707

Page 28

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 31: Power Values after simulation

A20329707

Page 29

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

Figure 32: All Ports Matched Report

A20329707

Page 30

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi


Figure 33: P & R simvision output

ECE429 Final Project

Figure 34: Final Routed Chip Design

A20329707

Page 31

Illinois Institute of Technology

Dheeraj Sadashivappa Gaddi

ECE429 Final Project

CHAPTER 8: RESULT
After simulating for condition given in test bench , we obtained result in simvision as
following.
Location

Operation

f1

f0

Expected

[8][10]

0000_0001

5555_5555

CMP

True

[4][2]

0000_0001

5555_5555

CMP

True

[3][7]

0000_000A 0000_012C

CMP

True

Table 2: Final results


Obtained results matches expected result as in Table 1 and hence code working
properly.

A20329707

Page 32

You might also like