0% found this document useful (0 votes)
4 views50 pages

Module 4_1The Processor

The document provides an overview of the central processing unit (CPU) and its instruction execution model, detailing how the program counter (PC) manages instruction flow and the roles of various components such as the arithmetic/logic unit (ALU) and register file. It explains the implementation of the RISC-V architecture, including the data path for different instruction types, control signals, and the significance of multiplexors in managing data flow. Additionally, it covers logic design conventions and the structure of instruction formats used in RISC-V, emphasizing the operations of load, store, and arithmetic-logical instructions.

Uploaded by

mrtbsekati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views50 pages

Module 4_1The Processor

The document provides an overview of the central processing unit (CPU) and its instruction execution model, detailing how the program counter (PC) manages instruction flow and the roles of various components such as the arithmetic/logic unit (ALU) and register file. It explains the implementation of the RISC-V architecture, including the data path for different instruction types, control signals, and the significance of multiplexors in managing data flow. Additionally, it covers logic design conventions and the structure of instruction formats used in RISC-V, emphasizing the operations of load, store, and arithmetic-logical instructions.

Uploaded by

mrtbsekati
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 50

Module 4

• Introduction

The central processing unit (CPU), or simply processor, is the engine


that interprets (or executes) instructions stored in main memory.
At its core is a word-size storage device (or register) called the
program counter (PC). At any point in time, the PC points at (contains
the address of) some machine-language instruction in main memory.
From the time that power is applied to the system until the time that
the power is shut off, a processor repeatedly executes the instruction
pointed at by the program counter and updates the program counter
to point to the next instruction.
A processor appears to operate according to a very simple instruction
execution model, defined by its instruction set architecture. In this
model, instructions execute in strict sequence.
Module 4: The Processor
• Introduction
Executing a single instruction involves performing a series of
steps. The processor reads the instruction from memory pointed
at by the program counter (PC), interprets the bits in the
instruction, performs some simple operation dictated by the
instruction, and then updates the PC to point to the next
instruction, which may or may not be contiguous in memory to
the instruction that was just executed.
There are only a few of these simple operations, and they revolve
around main memory, the register file, and the arithmetic/logic
unit (ALU). The register file is a small storage device that
consists of a collection of word-size registers, each with its own
unique name. The ALU computes new data and address values.
.
The Processor
Here are some examples of the simple operations that the CPU might carry out at
the request of an instruction: .
Load: Copy a byte or a word from main memory into a register, overwriting the
previous contents of the register. Store: Copy a byte or a word from a register to a
location in main memory, overwriting the previous contents of that location.
Operate: Copy the contents of two registers to the ALU, perform an arithmetic
operation on the two words, and store the result in a register, overwriting the
previous contents of that register. .
Jump: Extract a word from the instruction itself and copy that word into the
program counter (PC), overwriting the previous value of the PC.
We say that a processor appears to be a simple implementation of its instruction
set architecture, but in fact modern processors use far more complex mechanisms
to speed up program execution. Thus, we can distinguish the processor’s
instruction set architecture, describing the effect of each machine-code instruction,
from its microarchitecture, describing how the processor is actually implemented
Module 4:Implementation of RISC V
Subset
• Figure 4.1.1: An abstract view of the implementation of the RISC-V subset
showing the major functional units and the major connections between them
(COD Figure 4.1).
• All instructions start by using the program counter to supply the instruction address to
the instruction memory. After the instruction is fetched, the register operands used by
an instruction are specified by fields of that instruction.
• Once the register operands have been fetched, they can be operated on to compute a
memory address (for a load or store), to compute an arithmetic result (for an integer
arithmetic-logical instruction), or an equality check (for a branch).
• If the instruction is an arithmetic-logical instruction, the result from the ALU must be written
to a register. If the operation is a load or store, the ALU result is used as an address to
either load a value from memory into the registers or store a value from the registers.
• The result from the ALU or memory is written back into the register file. Branches require
the use of the ALU output to determine the next instruction address, which comes either
from the adder (where the PC and branch offset are summed) or from an adder that
increments the current PC by four.
• The thick lines interconnecting the functional units represent buses, which consist of
multiple signals. The arrows are used to guide the reader in knowing how information flows.
Since signal lines may cross, we explicitly show when crossing lines are connected by the
presence of a dot where the lines cross.
Module 4: The Processor
Fig 4.1.2 Basic implementation of
RISC V subset with multiplexors
Basic implementation of RISC V
subset with multiplexors
The top multiplexor ("Mux") controls what value replaces the PC (PC + 4
or the branch destination address); the multiplexor is controlled by the gate
that "ANDs" together the Zero output of the ALU and a control signal that
indicates that the instruction is a branch.
The middle multiplexor, whose output returns to the register file, is used
to steer the output of the ALU (in the case of an arithmetic-logical
instruction) or the output of the data memory (in the case of a load) for
writing into the register file.
Finally, the bottom-most multiplexor is used to determine whether the
second ALU input is from the registers (for an arithmetic-logical instruction
or a branch) or from the offset field of the instruction (for a load or store).
The added control lines are straightforward and determine the operation
performed at the ALU, whether the data memory should read or write, and
whether the registers should perform a write operation. The control lines
are shown in color to make them easier to see.
4.2 Logic Design Conventions
• Combinational element
Combinational element: An operational element, such as an
AND gate or an ALU.
• State element
State element: A memory element, such as a register or a
memory.
• Clocking methodology
Clocking methodology : The approach used to determine
when data are valid and stable relative to the clock.
• Edge-triggered clocking
Edge-triggered clocking: A clocking scheme in which all
4.2 Logic Design Conventions
4.2 Logic Design Conventions
• Control signal
Control signal: A signal used for multiplexor selection or
for directing the operation of a functional unit; contrasts
with a data signal, which contains information that is
operated on by a functional unit.
• Asserted
Asserted: The signal is logically high or true.
• Deasserted
Deasserted: The signal is logically low or false.
4.2 Logic Design Conventions
4.3 Building a Data Path
• Datapath element
Datapath element: A unit used to operate on or hold data
within a processor. In the RISC-V implementation, the
datapath elements include the instruction and data
memories, the register file, the ALU, and adders.
• Program counter / PC
Program counter (PC): The register containing the
address of the next instruction in the program being
executed.
• Figure 4.3.1: Two state elements are needed to store
and access instructions, and an adder is needed to
compute the next instruction address
4.3 Building a Data Path
• The two state elements are the instruction memory and the
program counter.
• The instruction memory need only provide read access because
the datapath does not write instructions. Since the instruction
memory only reads, we treat it as combinational logic: the output at
any time reflects the contents of the location specified by the
address input, and no read control signal is needed. (We will need to
write the instruction memory when we load the program; this is not
hard to add, and we ignore it for simplicity.)
• The program counter is a 32-bit register that is written at the end
of every clock cycle and thus does not need a write control signal.
• The adder is an ALU wired to always add its two 32-bit inputs and
place the sum on its output.
Fig 4.3 .1 Building a Data Path
4.3 Building a Data Path
4.3 Building a Data Path
• Register file
Register file: A state element that consists of a set of registers that can
be read and written by supplying a register number to be accessed.

PARTICIPATION ACTIVITY
4.3.3: The two elements needed to implement R-format ALU
operations are the register file and the ALU (COD Figure 4.7).

Remember the Instruction formats (see link on Moodle):


4.3 Building a Data Path
Sign-extend
• Sign-extend: To increase the size of a data item by replicating the high-order sign bit of the
original data item in the high-order bits of the larger, destination data item.
• Figure 4.3.2: The two units needed to implement loads and stores, in addition to the
register file and ALU of (COD Figure 4.7), are the data memory unit and the immediate
generation unit (COD Figure 4.8).
• The memory unit is a state element with inputs for the address and the write data, and a
single output for the read result.
• There are separate read and write controls, although only one of these may be asserted on
any given clock.
• The memory unit needs a read signal, since, unlike the register file, reading the value of an
invalid address can cause problems, as we will see in COD Chapter 5 (Large and Fast:
Exploiting Memory Hierarchy).
• The immediate generation unit (ImmGen) has a 32-bit instruction as input that selects a
12-bit field for load, store, and branch if equal that is signed-extended into a 32-bit result
appearing on the output (see COD Chapter 2 (Instructions: Language of the Computer)).
• We assume the data memory is edge-triggered for writes. Standard memory chips actually
have a write enable signal that is used for writes. Although the write enable is not edge-
triggered, our edge-triggered design could easily be adapted to work with real memory
chips. See COD Section A.8 (Memory elements: Flip-flops, latches, and registers) of
Appendix A for further discussion of how real memory chips work.
Fig 4.3.2
4.3 Building a Data Path

Branch target address
Branch target address: The address specified in a branch, which
becomes the new program counter (PC) if the branch is taken. In
the RISC-V architecture, the branch target is given by the sum of
the offset field of the instruction and the address of the branch.
• Branch taken
Branch taken: A branch where the branch condition is satisfied and
the program counter (PC) becomes the branch target. All
unconditional branches are taken branches.
• Branch not taken / untaken branch
Branch not taken or (untaken branch): A branch where the branch
condition is false and the program counter (PC) becomes the
address of the instruction that sequentially follows the branch.
Figure 4.3.3: The portion of a datapath for a branch uses the ALU to evaluate the
branch condition and a separate adder to compute the branch target as the sum
of the PC and immediate (the branch displacement)(COD Figure 4.9).

Control logic is used to decide whether the incremented PC or branch target should replace
the PC, based on the Zero output of the ALU.
Example 4.3.1: Building a datapath.

• The operations of arithmetic-logical (or R-type) instructions and the memory


instructions datapath are quite similar. The key differences are the following:
• The arithmetic-logical instructions use the ALU, with the inputs coming from the two
registers. The memory instructions can also use the ALU to do the address calculation,
although the second input is the sign-extended 12-bit offset field from the instruction.
• The value stored into a destination register comes from the ALU (for an R-type
instruction) or the memory (for a load).
• Show how to build a datapath for the operational portion of the memory-reference and
arithmetic-logical instructions that uses a single register file and a single ALU to
handle both types of instructions, adding any necessary multiplexors.
• Answer
• To create a datapath with only a single register file and a single ALU, we must support
two different sources for the second ALU input, as well as two different sources for the
data stored into the register file. Thus, one multiplexor is placed at the ALU input and
another at the data input to the register file. The figure below shows the operational
portion of the combined datapath.
Fig. 4.3.4 Data Path for memory
instructions and R type instructions
Data Path for memory instructions and R type
instructions
• Figure 4.3.4: The datapath for the memory instructions
and the R-type instructions (COD Figure 4.10).
• This example shows how a single datapath can be
assembled from the pieces in COD Figures 4.7 (The two
elements needed to implement R-format ALU operations
…) and 4.8 (The two units needed to implement loads
and stores …) by adding multiplexors. Two multiplexors
are needed, as described in the example.
Data Path for memory instructions
and R type instructions
2x speed

2x speed

4.3.5: The simple datapath for the core RISC-V architecture


2x speed

combines the elements required by different instruction classes


(COD Figure 411).
Remarks fig 4.3.5
1.The two units needed to implement R-format ALU operations: the
register file
and ALU.
2.The four units needed to implement loads and stores: register file,
ALU,
data memory unit, and immediate generation unit.
Wires/muxes are also needed.
3.The five units needed for a branch. The ALU evaluates
the branch condition, and the adder computes branch target
address.
4.The remaining datapath portion for fetching instructions and
incrementing the program counter.
Nearly the full RISC-V datapath is now shown.
4.4A Simple implementation
scheme of RISCV subset
• Uses the data path of section 4.3 and adding a simple
control function
• The simple implementation covers
• load word (lw),
• store word (sw)
• Branch if equal (beq)
• Arithmetic-logical instructions add, sub, and, or
4.4A Simple implementation
scheme of RISCV subset

• The ALU Control Figure 4.4.1


4.4 A Simple implementation scheme of
RISCV subset: Explanation
Recall: RISC V Instruction Formats
RISC V Instruction
format :Explanation
• Figure 4.4.3: The four instruction classes (arithmetic, load, store, and conditional
branch) use four different instruction formats (COD Figure 4.14).
• (a) Instruction format for R-type arithmetic instructions (opcode = 51ten), which have
three register operands: rs1,rs2, and rd. Fields rs1 and rs2 are sources, and rd is the
destination. The ALU function is in the funct3 and funct7 fields and is decoded by the
ALU control design in the previous section. The R-type instructions that we implement
are add, sub, and, and or.
• (b) Instruction format for I-type load instructions (opcode = 3ten). The register rs1 is
the base register that is added to the 12-bit immediate field to form the memory
address. Field rd is the destination register for the loaded value.
• (c) Instruction format for S-type store instructions (opcode = 35ten). The register rs1 is
the base register that is added to the 12-bit immediate field to form the memory
address. (The immediate field is split into a 7-bit piece and a 5-bit piece.) Field rs2 is
the source register whose value should be stored into memory.
• (d) Instruction format for SB-type conditional branch instructions (opcode = 99ten). The
registers rs1 and rs2 compared. The 12-bit immediate address field is sign-extended,
shifted left 1 bit, and added to the PC to compute the branch target address. Figures
4.17 and 4.18 give the rationale for the unusual bit ordering for SB-type.
Figure 4.4.5: The actual RISC-V formats
introduces R-, I-, S-, and U-types, which are straightforward.

• Actual RISCV formats


The data path in operation for an R-
type instruction format
Figure 4.4.12: The datapath in operation for a load
instruction

• The control lines, datapath units, and connections that


are active are highlighted.
• A store instruction would operate very similarly.
• The main difference would be that the memory control
would indicate a write rather than a read, the second
register value read would be used for the data to store,
and the operation of writing the data memory value to
the register file would not occur.
• See figure below
Figure 4.4.13: The datapath in operation for a branch-if-
equal instruction

• The control lines, datapath units, and connections that


are active are highlighted. After using the register file
and ALU to perform the compare, the Zero output is
used to select the next program counter from between
the two candidates.
• See fig below
4.6 Multicycle datapath
• Figure 4.5.1: The high-level view of the multicycle
datapath (COD Figure e4.5.1).
• This picture shows the key elements of the datapath: a
shared memory unit, a single ALU shared among
instructions, and the connections among these shared
units. The use of shared functional units requires the
addition or widening of multiplexors as well as new
temporary registers that hold data between clock cycles
of the same instruction. The additional registers are the
Instruction register (IR), the Memory data register
(MDR), A, B, and ALUOut.
Figure 4.5.1: The high-level view of
the multicycle datapath
Figure 4.5.2: Multicycle datapath for
RISC–V handles the basic
instructions
• Although this datapath supports normal incrementing of
the PC, a few more connections and a multiplexor will
be needed for branches and jumps; we will add these
shortly.
• The additions versus the single-clock datapath include
several registers (IR, MDR, A, B, ALUOut), a multiplexor
for the memory address, a multiplexor for the top ALU
input, and expanding the multiplexor on the bottom ALU
input into a four-way selector. These small additions
allow us to remove two adders and a memory unit.
Figure 4.5.2: Multicycle datapath for
RISC–V
Figure 4.5.3: The multicycle datapath from
COD Figure e4.5.2 with the control lines
The multicycle datapath from COD
Figure e4.5.2 with the control lines
• The signals ALUOp and ALUSrcB are 2-bit control signals,
while all the other control lines are 1-bit signals. Neither
register A nor B requires a write signal, since their contents
are only read on the cycle immediately after it is written. The
memory data register has been added to hold the data from a
load when the data returns from memory. Data from a load
returning from memory cannot be written directly into the
register file since the clock cycle cannot accommodate the
time required for both the memory access and the register
file write. The MemRead signal has been moved to the top of
the memory unit to simplify the figures. The full set of
datapaths and control lines for branches will be added shortly.
Figure 4.5.4: The complete datapath for the
multicycle implementation together with the
necessary control lines
Figure 4.5.4: The complete datapath for the
multicycle implementation together with the
necessary control lines
• The control lines of COD Figure e4.5.3 are attached to
the control unit, and the control and datapath elements
needed to effect changes to the PC are included.
• The major additions from Figure e4.5.3 include the
multiplexor used to select the source of a new PC value;
gates used to combine the PC write signals; and the
control signals PCSource, PCWrite, and PCWriteCond.
The PCWriteCond signal is used to decide whether a
conditional branch should be taken.

You might also like