0% found this document useful (0 votes)

64 views95 pages

Unit 2-Basic Processing Unit

The CPU fetches instructions from memory one at a time and executes them by performing more basic operations. It keeps track of the next instruction address using the program counter. Executing an instruction involves operations like register transfers to move data, ALU operations, loading data from memory, and storing data to memory. The CPU contains components like the ALU, registers, program counter, and instruction register to perform these operations according to the instructions.

Uploaded by

Akhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views95 pages

Unit 2-Basic Processing Unit

Uploaded by

Akhil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 95

Basic Processing Unit

UNIT II
Overview
 Instruction Set Processor (ISP) or Central Processing
Unit (CPU) – the unit which executes instructions and
coordinates the activities of the other units
 A typical computing task consists of a series of steps
specified by a sequence of machine instructions that
constitute a program.
 An instruction is executed by carrying out a sequence
of more fundamental operations.
Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time and perform
the operation specified.
 Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
 Instruction Register (IR) – holds the instruction fetched
from memory
Executing an Instruction
 Fetch the contents of the memory location pointed to by
the PC. The contents of this location are loaded into the
IR (fetch phase).
IR ← [[PC]]
 Assuming that an instruction is 4bytes long, increment
the contents of the PC by 4 to point to the next
instruction. (fetch phase)
PC ← [PC] + 4
 Carry out the actions specified by the instruction in the IR
(execution phase).
Processor Organization
Internal organization of the
processor
 ALU – Used to perform arithmetic and logical operation.
 Registers for temporary storage
 Various digital circuits for executing different micro
operations.(gates, MUX, decoders, counters).
 Internal path for movement of data between ALU and
registers.
 Driver circuits for transmitting signals to external units.
 Receiver circuits for incoming signals from external
units.
 PC:
 Keeps track of execution of a program
 Contains the memory address of the next instruction to be fetched and
executed.

 MAR:
 Holds the address of the location to be accessed.
 I/p of MAR is connected to Internal bus and an O/p to external bus.

 MDR:
 Contains data to be written into or read out of the addressed location.
 It has 2 inputs and 2 Outputs.
 Data can be loaded into MDR either from memory bus or from internal
processor bus.

 The data and address lines are connected to the internal bus via MDR
and MAR
Registers:
 The processor registers R0 to Rn-1 vary considerably from one
processor to another.
 Registers are provided for general purpose, used by programmer.
 Special purpose registers-index & stack registers.
 Registers Y,Z &TEMP are temporary registers used by processor
during the execution of some instruction.
Multiplexer:
 Select either the output of the register Y or a constant value 4 to be
provided as input A of the ALU.
 Constant 4 is used by the processor to increment the contents of PC.
Data Path:
 The registers, ALU and interconnecting bus are collectively
referred to as the data path.
Executing an Instruction
 Execution of an instruction involves one or more of the
following operations:

 Register Transfers: Transfer a word of data from one processor

 ALU operations: Perform an arithmetic or a logic operation

and store the result in a processor register.

 Load operation: Fetch the contents of a given memory location

and load them into a processor register.
 Store operation: Store a word of data from a processor register
into a given memory location.
Register Transfers Riin
Internal processor
bus

Riout

Yin

Constant 4

Select MUX

A B
ALU

Zin

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
 The input and output gates for register Ri are
controlled by signals Riin and Riout .

 Riin is set to1 – data available on internal bus are

loaded into Ri.

 Riout is set to1 – the contents of register are placed

on the bus.
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.

1. Enable output of register R1 by setting R1out=1.

This places the contents of R1 on the processor bus.

2. Enable input of register R4 by setting R4in=1. This

loads the data from the processor bus into register
R4.
Performing an Arithmetic or
Logic Operation
 The ALU is a combinational circuit that has no internal
storage.
 ALU gets the two operands from MUX and bus. The
result is temporarily stored in register Z.
 What is the sequence of operations to add the contents of
register R1 to those of R2 and store the result in R3? i.e.
Add R1, R2, R3
1. R1out, Yin

2. R2out, SelectY, Add, Zin

3. Zout, R3in
Processor Organization

Add R1, R2, R3

1.R1out, Yin
2.R2out, SelectY, Add, Zin
3.Zout, R3in
 Step 1: Output of the register R1 and input of the
register Y are enabled, causing the contents of R1 to
be transferred to Y.

 Step 2: The multiplexer’s select signal is set to select

Y causing the multiplexer to gate the contents of
register Y to input A of the ALU.

 Step 3: The contents of Z are transferred to the

destination register R3.
Load Operation - Fetching a Word
from Memory
 Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus

MDR

MDR inE MDRin

Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
Fetching a Word from Memory
 The response time of each memory access varies (cache miss)
 To accommodate this, the processor waits until it receives an
indication that the requested operation has been completed
(Memory-Function-Completed, MFC).
 Move (R1), R2
 MAR ← [R1] R1out, MARin, Read
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory MDRinE,WMFC
 Load MDR from the memory bus
 R2 ← [MDR]
MDRout, R2in

Memory-bus Internal processor

data lines MDRoutE MDRout bus

MDR

MDR inE MDRin

Step 1 2 3

Timing Clock

MARin MAR ← [R1]

Assume MAR
is always available Address
on the address lines
of the memory bus. Start a Read operation on the memory bus
Read

MDRinE

Data

Wait for the MFC response from the memory

MFC

MDR out Load MDR from the memory bus

R2 ← [MDR]

Figure 7.5. Timing of a memory Read operation.

Storing a word in memory
 Move R2,(R1)

1. R1out, MARin
2. R2out, MDRin, Write
3. MDRoutE, WMFC
Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction
 Fetch the first operand (the contents of the
memory location pointed to by R3)
 Perform the addition
 Store the result into R1
Execution of a Complete
Instruction Internal processor
bus

Add (R3), R1 Control signals

Instruction
Step Action Address
decoder and
lines
MAR control logic

1 PC out , MAR in , Read, Select4, Add, Zin Memory

bus

2 Zout , PC in , Yin , WMF C MDR

Data
lines IR
3 MDR out , IR in
4 R3out , MAR in , Read Y
Constant 4 R0
5 R1out , Yin , WMF C
6 MDR out , SelectY, Add, Zin Select MUX

7 Zout , R1 in , End Add

A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP
Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
Z

Figure 7.1. Single-bus organization of the datapath inside a processor.

Execution of Branch Instructions
 A branch instruction replaces the contents of PC
with the branch target address, which is usually
obtained by adding an offset X given in the
branch instruction.
 The offset X is usually the difference between the
branch target address and the address immediately
following the branch instruction.
 Unconditional branch – Jump instructions
Execution of unconditional
Branch Instruction

Step Action

1 PCout , MAR in , Read, Select4, Add, Z in

2 Zout , PC in , Yin , WMFC
3 MDR out , IR in
4 Offset-field-of-IRout , Select Y, Add, Zin
5 Z out , PCin , End

Figure 7.7(a). Control sequence for an unconditional branch instruction.

Execution of conditional Branch
Instruction
Step Action

1 PCout , MAR in , Read, Select4, Add, Z in

2 Zout , PC in , Yin , WMFC
3 MDR out , IR in
4 Offset-field-of-IRout , Select Y, Add, Z in , If N=0 then End
5 Z out , PCin , End

Figure 7.7(b). Control sequence for an unconditional branch instruction.

If N=1 then branch is taken; execute an instruction from branch target.

N=0 branch is not taken; execute a sequentially next instruction.
Multiple-Bus Organization
Bus A Bus B Bus C

Incrementer

Constant 4

MUX
A

ALU R

Instruction
decoder

MDR

MAR

Memory bus Address

data lines lines

Figure 7.8. Three-b us organization of the datapath.

 General purpose registers are combined into a single block
called register file.
 Two output ports allow the contents of two different registers
to be accessed simultaneously and have their contents placed
on buses A and B.
 Third port allows the data on bus C to be loaded into a third
register during the same clock cycle.

 Bus A & B are used to transfer the source operands to A & B

inputs of the ALU.
 The result of ALU operation is transferred to the destination
over the bus C.
 ALU may simply pass one of its 2 input operands
unmodified to bus C.
 The ALU control signals for such an operation R=A or R=B.
 Incrementer unit is used to increment the PC by 4.
 Using the incrementer eliminates the need to add the
constant value 4 to the PC using the main ALU.
 The source for the constant 4 at the ALU input multiplexer
can be used to increment other address such as loadmultiple
& storemultiple
Multiple-Bus Organization
 Add R4, R5, R6
Bus A Bus B Bus C

Control sequence for the instruction Add R4,R5,R6, Incrementer

on three-bus organization PC

Step Action
Constant 4

MUX
1 PCout , R=B, MAR in , Read, IncPC A

ALU R

2 WMFC B

3 MDR outB , R=B, IR in

Instruction
decoder

4 R4outA , R5outB , SelectA, Add, R6 in , End IR

MDR

MAR

Memory bus Address

data lines lines

Figure 7.8. Three-b us organization of the datapath.

ADD (R3), R2, R1
i.e. R1← (R3) + [R2] Bus A Bus B Bus C

Incrementer

 PCout, R=B, MARin, Read, IncPC Register

file

WMFC
Constant 4


MUX
MDRoutB, R=B, IRin
A
 ALU R

 R3outB, R=B, MARin, Read

Instruction

WMFC
decoder

IR

 MDRoutB, R2outA,Add, R1in, End MDR

MAR

Memory bus Address

data lines lines

Figure 7.8. Three-b us organization of the datapath.

Quiz
 What is the control
sequence for execution
of the instruction
Add R1, R2
including the
instruction fetch
phase? (Assume single
bus architecture)
On single bus (R2← [R1] + [R2])
1. PCout, MARin, Read, Select4, Add, Zin
2. Zout, PCin, Yin, WMFC Internal processor
bus

Control signals

MDRout, Irin
PC

3. Address
lines
Instruction
decoder and
MAR control logic

R1out, Yin
Memory

4.
bus

MDR
Data
lines IR

5. R2out, Add, Zin Constant 4

Y
R0

Select MUX

6. Zout, R2in, END ALU

control
lines
Add
Sub
A

ALU
B
R n - 1 

Carry-in
XOR TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.

Hardwired Control
Overview
 To execute instructions, the processor must have
some means of generating the control signals
needed in the proper sequence.
 Computer designers use a wide variety of
technology to solve this problem
 This approach fall in two categories: hardwired
control and microprogrammed control
 Hardwired system can operate at high speed; but with little
flexibility.
 The control unit uses a fixed logic circuit to interpret
instructions and generate sequence of control signals from
them.
 Each steps in this sequence is completed in one clock cycle.
 A counter may be used to keep the track of the control
steps (refer fig 7.10)
 Each count of this counter corresponds to one control step
The required control signals are determined by the
following information.
1.contents of the control step counter

2.contents of the instruction register

3.contents of the condition code flags

4.External input signals such as MFC and interrupt

request.
Hardwired Control Unit Organization
such as MFC,
interrupt requests
 The decoder/encoder block diagram is a
combinational circuit that generate the required
control outputs, depending on the state of all its
inputs.
 In the figure 7.11 the decoding and encoding
functions are separated.

 The step decoder provides a separate signal line

for each step, in the control sequence.
 The output of the instruction decoder consists of a
separate line for each machine instruction.
 For any instruction loaded in the IR, one of the
output lines INS1 through INSm is set to 1, and all
other lines are set to 0.
 The encoder generate appropriate control signals
yin, pcout , Add, End and so on by combining the
input signals.
Detailed Block Diagram
Example: Generation of Zin by the
encoder in a signal for the single-bus
processor
 Zin = T1 + T6 • ADD + T4 • BR + …
Branch Add

T4 T6

Zin is asserted during time slot T1 for all instructions, during T6 for an Add instruction,
during T4 for an unconditional branch instruction, and son
Control signals for Add (R3), R1

Control signals for Conditional branch

Step Action

1 PCout , MAR in , Read, Select4, Add, Z in

2 Z out , PC in , Yin , WMF C
3 MDR out , IR in
4 Offset-field-of-IRout , Select Y, Add, Z in , If N=0 then End
5 Z out , PCin , End
Generating End control signal
 End = T7 • ADD + T5 • BR + ( T5 • N + T4 • Nˡ ) • BRN +…

The END signals starts a new instruction fetch cycle by

resetting the control step counter to its starting value.
 RUN, set to 1, causes the counter to be
incremented by 1 at the end of every clock cycle,
when it set to 0 the counter stop counting. This is
needed whenever the WMFC signal is issued.
A Complete Processor
Instruction Integer Floating-point
unit unit unit

Instruction Data
cache cache

Bus interface
Processor

System bus

Main Input/
memory Output

Figure 7.14. Block diagram of a complete processor.

Microprogrammed
Control
Overview
 Control signals are generated by a program similar to machine language
programs.
 Control Word (CW); microroutine; microinstruction

MDRout

WMFC
MAR in

Select
PCout

R1out

R3out
Micro -

Read
PCin

R1 in
Z out
Add

End
IRin
Yin
instruction

Zin
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1

Figure 7.15 An example of microinstructions for Figure 7.6.

Overview

Step Action

1 PC out , MAR in , Read, Select4, Add, Zin

2 Zout , PC in , Yin , WMF C
3 MDR out , IR in
4 R3out , MAR in , Read
5 R1out , Yin , WMF C
6 MDR out , SelectY, Add, Zin
7 Zout , R1 in , End

Figure 7.6. Con trol sequence for execution of the instruction Add (R3),R1.
 Control Word (CW) is a word whose individual
bits represent various control signals.
 Every instruction will need a sequence of CWs for
its execution.
 At every step, some control signals are asserted
(=1) and all others are 0
 Sequence of CWs for an instruction forms the
microroutine for that instruction.
 Each CW in this microroutine is referred to as a
microinstruction.
 Every instruction will have its own microroutine
which is made up of microinstructions.
 Microroutines for all instructions in the instruction set
of a computer are stored in a special memory called
Control Store.
 Control signals are generated by sequentially reading
the CWs of the corresponding microroutine from the
control store.
Basic organization of a microprogrammed
control unit
 Microprogram counter (mPC) is used to read
CWs from control store sequentially.
 When a new instruction is loaded into IR, starting
address generator generates the starting address of
the microroutine.
 This address is loaded into the mPC. mPC is
automatically incremented by the clock, so
successive microinstructions are read from the
control store.
Execution of conditional Branch
Instruction
Step Action

1 PCout , MAR in , Read, Select4, Add, Z in

2 Zout , PC in , Yin , WMF C
3 MDR out , IR in
4 Offset-field-of-IRout , Add, Select Y, Z in , If N=0 then End
5 Z out , PCin , End

Figure 7.7(b). Control sequence for an conditional branch instruction.

If N=1 then branch is taken; execute an instruction from branch target.

N=0 branch is not taken; execute a sequentially next instruction.
Overview
 The previous organization cannot handle the situation when the control unit is
required to check the status of the condition codes or external inputs to choose
between alternative courses of action.
 Use conditional branch microinstruction.
Address Microinstruction

0 PCout , MAR in , Read, Select4, Add, Z in

1 Zout , PC in , Yin , WMF C
2 MDR out , IR in
3 Branch to starting addressof appropriate microroutine
. ... .. ... ... .. ... .. ... ... .. ... ... .. ... .. ... ... .. ... .. ... ... .. ... ..
25 If N=0, then branch to microinstruction 0
26 Offset-field-of-IR out , SelectY, Add, Z in
27 Zout , PC in , End

Figure 7.17. Microroutine for the instruction Branch<0.

 Control words at location 25, 26, and 27 implent
the microroutine for branch<0
 Control word at location 25 tests the N bit of the
condition code register
 If this bit is 0 branch takes place to location 0 to
fetch new instruction
 Otherwise control word at location 26 is executed,
to fetch the instruction from branch target address.
Overview External
inputs

Starting and
branch address Condition
IR codes
generator

Clock  PC

Control
store CW

Figure 7.18. Organization of the control unit to allow

conditional branching in the microprogram.
control
Thank you
Pipelining
Overview
• Pipelining is widely used in modern
processors.

• Pipelining improves system performance in

terms of throughput.

• Pipelined organization requires sophisticated

compilation techniques.
Basic Concepts
Making the Execution of Programs
Faster
• To increase the speed of execution either
– Use faster circuit technology to build the processor
and the main memory
or
– Arrange the hardware so that more than one operation
can be performed at the same time.

• In the latter way, the number of operations

performed per second is increased even though
the elapsed time needed to perform any one
operation is not changed.
Traditional Pipeline Concept
• Laundry Example

• Ann, Brian, Cathy, Dave A B C D

each have one load of clothes
to wash, dry, and fold

• Washer takes 30 minutes

• Dryer takes 40 minutes

• “Folder” takes 20 minutes

Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
• Sequential laundry takes 6 hours
A for 4 loads
• If they learned pipelining, how
long would laundry take?
B

D
Traditional Pipeline Concept

6 PM 7 8 9 10 11 Midnight

Time
T
a 30 40 40 40 40 20
s
k A
• Pipelined laundry takes 3.5
hours for 4 loads
O B
r
d C
e
r D
Traditional Pipeline Concept
6 PM 7 8 9
• Pipelining doesn’t help latency
Time of single task, it helps
throughput of entire workload
T
30 40 40 40 40 20 • Pipeline rate limited by slowest
a
pipeline stage
s
A • Multiple tasks operating
k
simultaneously using different
resources
O B
• Potential speedup = Number
r pipe stages
d C • Unbalanced lengths of pipe
e stages reduces speedup
r • Stall for Dependences
D
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction

I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3

Instruction Execution
fetch unit (c) Pipelined execution
unit

Figure 8.1. Basic idea of instruction pipelining.

(b) Hardware organization
Use the Idea of Pipelining in a Computer
Time
Clock cycle 1 2 3 4 5 6 7

Instruction

I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

(a) Instruction execution divided into four steps

Interstage buffers

D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3

(b) Hardware organization

Textbook page: 457

Figure 8.2. A 4-stage pipeline.

Role of Cache Memory

• Each pipeline stage is expected to complete in one clock

cycle.
• The clock period should be long enough to let the slowest
pipeline stage to complete.
• Faster stages can only wait for the slowest one to
complete.
• Since main memory is very slow compared to the
execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless.
• Fortunately, we have cache.
Pipeline Performance

• The potential increase in performance

resulting from pipelining is proportional to the
number of pipeline stages.
• However, this increase would be achieved only
if all pipeline stages require the same time to
complete, and there is no interruption
throughout program execution.
• Unfortunately, this is not true.
Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance

• The previous pipeline is said to have been stalled for two clock cycles.

• Any condition that causes a pipeline to stall is called a hazard.

• Data hazard – any condition in which either the source or the

destination operands of an instruction are not available at the time
expected in the pipeline. So some operation has to be delayed, and the
pipeline stalls.

• Instruction (control) hazard – a delay in the availability of an

instruction causes the pipeline to stall.

• Structural hazard – the situation when two instructions require the use
of a given hardware resource at the same time.
Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

(a) Instruction execution steps in successive clock cycles

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F1 F2 F2 F2 F2 F3 Idle periods –
D: Decode D1 idle idle idle D2 D3 stalls (bubbles)
E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

Pipeline Performance
Structural hazard
Load X(R1), R2
Time
Clock cycle 1 2 3 4 5 6 7

Instruction

I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Effect of a Load instruction on pipeline timing.

Pipeline Performance

• Again, pipelining does not result in individual

instructions being executed faster; rather, it is the
throughput that increases.

• Throughput is measured by the rate at which instruction

execution is completed.

• Pipeline stall causes degradation in pipeline performance.

• We need to identify all hazards that may cause the

pipeline to stall and to find ways to minimize their
impact.
Quiz
• Four instructions, the I2 takes two clock cycles
for execution. Draw the figure for 4-stage
pipeline, and figure out the total cycles needed
for the four instructions to complete.
Solution
1 2 3 4 5 6 7 8
F D E W
F D E W
F D stall E W
F D stall E W
Data Hazards
• We must ensure that the results obtained when instructions are executed
in a pipelined processor are identical to those obtained when the same
instructions are executed sequentially.
• Hazard occurs
A←3+A
B←4×A
• No hazard
A←5×C
B ← 20 + C
• When two operations depend on each other, they must be executed
sequentially in the correct order.
• Another example:
Mul R2, R3, R4
Add R5, R4, R6
Data Hazards
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 (Mul) F1 D1 E1 W1

I2 (Add) F2 D2 D2A E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

Figure 8.6. Pipeline stalled by data dependency between D2 and W 1.

Figure 8.6. Pipeline stalled by data dependency between D 2 and W1.
Operand Forwarding
• Instead of from the register file, the second
instruction can get data directly from the
output of ALU after the previous instruction is
completed.
• A special arrangement needs to be made to
“forward” the output of ALU to the input of
ALU.
Source 1
Source 2

SRC1 SRC2

ALU

RSLT

Destination

(a) Datapath

SRC1,SRC2 RSLT

E: Execute W: Write
(ALU) (Register file)

Forwarding path

(b) P osition of the source and result registers in the processor pipeline

Figure 8.7. Operand forw arding in a pipelined processor.

Handling Data Hazards in Software
• Let the compiler detect and handle the hazard:
I1: Mul R2, R3, R4
NOP
NOP
I2: Add R5, R4, R6
• The compiler can reorder the instructions to
perform some useful work during the NOP
slots.
Moore’s Law

• Gordon Moore, cofounder of Intel noticed a trend in IC

manufacture
• Every 2 years the number of components on an area of
silicon doubled
• He published this work in 1965 – known as Moore’s Law
• His predictions were for 10 years into the future
• His work predicted personal computers and fast
telecommunication networks

84
Graph of Moore’s Law

11/01/2008 EADS 85
Single-core computer
Limits
• Both physical and practical reasons pose significant
constraints to simply building ever faster serial computers:
– Transmission speeds - the speed of a serial computer is directly
dependent upon how fast data can move through hardware.
Absolute limits are the speed of light (30 cm/nanosecond) and
the transmission limit of copper wire (9 cm/nanosecond).
Increasing speeds necessitate increasing proximity of processing
elements.
– Limits to miniaturization - processor technology is allowing an
increasing number of transistors to be placed on a chip.
However, even with molecular or atomic-level components, a
limit will be reached on how small components can be.
– Economic limitations - it is increasingly expensive to make a
single processor faster. Using a larger number of moderately fast
commodity processors to achieve the same (or better)
performance is less expensive.
Limits
• Relation btwn frequency of operation and power
consumption is given by :
Power= ½ x Capacitive load x Voltage2 x Frequency

• Power consumption is a huge problem when

frequency is increased
• More is the power consumed, more will be the heat
generated
• Power consumption poses significant constraint for
simply building ever faster serial computers
Why multi-core ?
• Difficult to make single-core clock frequencies even
higher

• Deeply pipelined circuits:

– heat problems
– speed of light problems
– difficult design and verification
– large design teams necessary
– server farms need expensive
air-conditioning
• Many new applications are multithreaded

• General trend in computer architecture (shift towards

more parallelism)
Multi-core architectures
• Replicate multiple processor cores of moderate
speed on a single die.

Core 1 Core 2 Core 3 Core 4

Multi-core CPU chip

Unicore vs. Multicore

(a) Uni-core processor; (b) multicore processor.

Conventional processor Multicore processors

•Single core •At least two cores
•Dedicated caches •Shared caches
•One thread at a time •Many threads simultaneously
Multi-core CPU chip
• The cores fit on a single processor socket
What is a thread?
• A process is a program in execution.
• A process is also known as heavy weight process
• A process is made to contain many lightweight processes or threads.
• Each thread performs a subtask of the problem to be solved.
• Each thread contains private local data and all threads share a common
address space.
• Hence, the overhead of thread creation is lesser compared to the creation of
heavyweight processes.
• Also, since threads share resources of the process to which they belong, the
resources are utilized optimally.
• Above all, threads communicate with each other through shared memory
locations
• Hence the communication is faster and the associated overhead is lesser
compared to that of processes.
Thread level parallelism – multiple threads run on
separate cores in parallel
Within each core, threads can be time-sliced

Fuel System Wiring Schematic For 2007-2008 Chevrolet Silverado
100% (9)
Fuel System Wiring Schematic For 2007-2008 Chevrolet Silverado
11 pages
Diagrama de Cabina OEM DDEC V
93% (43)
Diagrama de Cabina OEM DDEC V
1 page
Infineon AURIX - TC3xx - Part2 UserManual v01 - 00 EN PDF
100% (3)
Infineon AURIX - TC3xx - Part2 UserManual v01 - 00 EN PDF
3,554 pages
20t Franna Manual
100% (1)
20t Franna Manual
28 pages
Airbus A320 Family
No ratings yet
Airbus A320 Family
103 pages
Chapter 7 Basic Processing Unit
No ratings yet
Chapter 7 Basic Processing Unit
58 pages
Chapter 7 Basic Processing Unit
No ratings yet
Chapter 7 Basic Processing Unit
58 pages
CO Unit 4_Processing_Pipelining (1)
No ratings yet
CO Unit 4_Processing_Pipelining (1)
53 pages
Unit3-Control-Unit-1
No ratings yet
Unit3-Control-Unit-1
47 pages
Unit 6 Basic Processing Unit
No ratings yet
Unit 6 Basic Processing Unit
57 pages
Module 2-Basic-Processing-Unit (CPU)
No ratings yet
Module 2-Basic-Processing-Unit (CPU)
55 pages
Coa Unit 3
100% (1)
Coa Unit 3
58 pages
Unit 3 Basic Processing Unit
No ratings yet
Unit 3 Basic Processing Unit
39 pages
Unit 3 Basic Processing Unit
No ratings yet
Unit 3 Basic Processing Unit
44 pages
Coa Lecture Unit 2
No ratings yet
Coa Lecture Unit 2
82 pages
Module-2: Memory Systems Basic Processing Unit
No ratings yet
Module-2: Memory Systems Basic Processing Unit
183 pages
Unit - Vi: Some Fundamental Concepts
No ratings yet
Unit - Vi: Some Fundamental Concepts
46 pages
Module_5_DD_CO_Ver4
No ratings yet
Module_5_DD_CO_Ver4
44 pages
COA_Module5_ppt
No ratings yet
COA_Module5_ppt
42 pages
Unit III - Basic Processing Unit
No ratings yet
Unit III - Basic Processing Unit
123 pages
Module 5 - Basic Processing Unit
No ratings yet
Module 5 - Basic Processing Unit
33 pages
Execution of Instruction
No ratings yet
Execution of Instruction
60 pages
Basic Processing Unit
No ratings yet
Basic Processing Unit
45 pages
COA-UNIT-III-FINAL (1)
No ratings yet
COA-UNIT-III-FINAL (1)
141 pages
Chapter 7 - Basic Processing Unit
0% (1)
Chapter 7 - Basic Processing Unit
47 pages
Computer Organization-Basic Processing Unit
100% (2)
Computer Organization-Basic Processing Unit
48 pages
Basic Processing Unit
No ratings yet
Basic Processing Unit
23 pages
Chapter 7. Basic Processing Unit
No ratings yet
Chapter 7. Basic Processing Unit
47 pages
COA Mod 1 Part 2
No ratings yet
COA Mod 1 Part 2
33 pages
Chapter3 - Basic Processing Unit
No ratings yet
Chapter3 - Basic Processing Unit
47 pages
Module 4 Basicprocessingunit
No ratings yet
Module 4 Basicprocessingunit
105 pages
COA UNIT - III Processor and Control Unit
No ratings yet
COA UNIT - III Processor and Control Unit
127 pages
Hamacher Ch7 Microarchitecture
No ratings yet
Hamacher Ch7 Microarchitecture
47 pages
Chapter3 - Basic Processing Unit
No ratings yet
Chapter3 - Basic Processing Unit
47 pages
Unit 3 - Coa
No ratings yet
Unit 3 - Coa
54 pages
Basic Processing Unit
No ratings yet
Basic Processing Unit
35 pages
Fundamental Concepts of BPU
No ratings yet
Fundamental Concepts of BPU
16 pages
UNIT III: Basic Processing Unit 08 Hrs
No ratings yet
UNIT III: Basic Processing Unit 08 Hrs
16 pages
Chapter3 - Basic Processing Unit
No ratings yet
Chapter3 - Basic Processing Unit
38 pages
MODULE-5 - Basic-Processing-Unit
No ratings yet
MODULE-5 - Basic-Processing-Unit
76 pages
15-Micro Programmed Control Unit-13!02!2023
No ratings yet
15-Micro Programmed Control Unit-13!02!2023
45 pages
Unit Ii
No ratings yet
Unit Ii
84 pages
Ddco Co 03
No ratings yet
Ddco Co 03
21 pages
Module 5
No ratings yet
Module 5
40 pages
Unit 3 Basic Processing Unit
No ratings yet
Unit 3 Basic Processing Unit
86 pages
Basic Processing Unit
No ratings yet
Basic Processing Unit
35 pages
Computer Organization and Architecture - Basic Processing Unit (Module 5)
No ratings yet
Computer Organization and Architecture - Basic Processing Unit (Module 5)
76 pages
All PDF Reader Document (1)
No ratings yet
All PDF Reader Document (1)
15 pages
UNIT-3(Processor Organization)
No ratings yet
UNIT-3(Processor Organization)
44 pages
Basic Processing Unit: Chapter Objectives
No ratings yet
Basic Processing Unit: Chapter Objectives
31 pages
Module_4
No ratings yet
Module_4
103 pages
CO Module 3
No ratings yet
CO Module 3
12 pages
lecture-7
No ratings yet
lecture-7
25 pages
UNIT 4 - Fundamental Concepts and Processor Organization
No ratings yet
UNIT 4 - Fundamental Concepts and Processor Organization
55 pages
Control Sequencing (2)
No ratings yet
Control Sequencing (2)
26 pages
COA Module4
No ratings yet
COA Module4
50 pages
Chapter 7 Notes Computer Organization
No ratings yet
Chapter 7 Notes Computer Organization
20 pages
Basic Processing Unit: Chapter Objectives
No ratings yet
Basic Processing Unit: Chapter Objectives
20 pages
M5 NOTES.docx
No ratings yet
M5 NOTES.docx
14 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
From Everand
Conceptual Programming: Conceptual Programming: Learn Programming the old way!
Avishek Sharma
No ratings yet
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
MARIO FRANCO
No ratings yet
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
From Everand
Practical Reverse Engineering: x86, x64, ARM, Windows Kernel, Reversing Tools, and Obfuscation
Bruce Dang
No ratings yet
Foundation Course for Advanced Computer Studies
From Everand
Foundation Course for Advanced Computer Studies
Franck Ismael Djédjé
No ratings yet
Cat 336 Next Gen Excavator
No ratings yet
Cat 336 Next Gen Excavator
16 pages
Transmision Camion Renault TI 150
No ratings yet
Transmision Camion Renault TI 150
38 pages
He-Va Disc-Roller 30-35-40 Spare-Parts
No ratings yet
He-Va Disc-Roller 30-35-40 Spare-Parts
52 pages
Samsung UN55ES7500FXZA Fast Track
100% (1)
Samsung UN55ES7500FXZA Fast Track
5 pages
Rain Alarm Investigatory Project
100% (1)
Rain Alarm Investigatory Project
16 pages
Motor Balancing
No ratings yet
Motor Balancing
2 pages
Vendor List Electrical
0% (1)
Vendor List Electrical
5 pages
Tesys Circuit-Breakers: Characteristics
No ratings yet
Tesys Circuit-Breakers: Characteristics
11 pages
Self Excited Induction Generator
No ratings yet
Self Excited Induction Generator
6 pages
Unitised Package Sub-Station (USS) : Megawin
No ratings yet
Unitised Package Sub-Station (USS) : Megawin
4 pages
3 3 2
100% (1)
3 3 2
5 pages
CX (Fd50ay-10)
100% (1)
CX (Fd50ay-10)
175 pages
PIC - New - Part 1 PIC Microcontroller Systems PDF
100% (1)
PIC - New - Part 1 PIC Microcontroller Systems PDF
5 pages
General Information: Relays and Timers
No ratings yet
General Information: Relays and Timers
220 pages
8279 - Programmable Keyboard
No ratings yet
8279 - Programmable Keyboard
5 pages
Summary Table of Electl Material Device and Eqpt Survey
No ratings yet
Summary Table of Electl Material Device and Eqpt Survey
4 pages
rt3 S
No ratings yet
rt3 S
47 pages
Key-Operated Safety Switches: FR 692-D / FX 692-D / FD 693-F
No ratings yet
Key-Operated Safety Switches: FR 692-D / FX 692-D / FD 693-F
5 pages
7SG11 Argus 7 Check Synhro
No ratings yet
7SG11 Argus 7 Check Synhro
8 pages
bp1808
No ratings yet
bp1808
9 pages
Liebherr LHM 800 High Rise Mobile Harbour Crane Datasheet English
No ratings yet
Liebherr LHM 800 High Rise Mobile Harbour Crane Datasheet English
3 pages
PT32 09 PDF
No ratings yet
PT32 09 PDF
3 pages
Thermal Wheel - Handbook For Design Installation and Operation
No ratings yet
Thermal Wheel - Handbook For Design Installation and Operation
36 pages
MK Pengukuran Besaran Listrik Minggu 13
No ratings yet
MK Pengukuran Besaran Listrik Minggu 13
39 pages
Monera John Philip M Bsee4a Lecture 1 Review Questions
No ratings yet
Monera John Philip M Bsee4a Lecture 1 Review Questions
1 page