0% found this document useful (0 votes)

15 views77 pages

EE457Unit9a OoO

Uploaded by

Shaurya Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views77 pages

EE457Unit9a OoO

Uploaded by

Shaurya Chandra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 77

1

EE 457 Unit 9a

Exploiting ILP
Out-of-Order Execution
2

Credits
• Some of the material in this presentation is taken from:
– Computer Architecture: A Quantitative Approach
• John Hennessy & David Patterson
• Some of the material in this presentation is derived from
course notes and slides from
– Prof. Michel Dubois (USC)
– Prof. Murali Annavaram (USC)
– Prof. David Patterson (UC Berkeley)
3

Exploiting Parallelism
• With increasing transistor budgets of modern processors (i.e.,
can do more things at the same time) the question becomes
how do we find enough useful tasks to increase performance,
or, put another way, what is the most effective way of
exploiting parallelism!
• Many types of parallelism available
– Instruction Level Parallelism (ILP): Overlapping instructions within a
single process/thread of execution
– Thread Level Parallelism (TLP): Overlap execution of multiple
processes/threads
– Data Level Parallelism (DLP): Overlap an operation (instruction) that is
to be applied independently to multiple data values (usually, an array)
for (int i=0; i < MAX; i++) { A[i] = A[i] + 5; }

• We'll focus on ILP in this unit

Outline
• Instruction Level Parallelism
– In-order (IO) pipeline
• From academic 5-stage pipeline
• To 8-stage MIPS R4000 pipeline
• Superscalar, superpipelined
– Out-of-Order (OoO) Execution
• This unit: OoO Execution (Compute the result) AND
OoO Completion (write result to memory or a register).
(Problem: Exceptions
• Next Unit: OoO Execution BUT In-order completion
5

Instruction Level Parallelism (ILP)

• Although a program defines a sequential ordering of instructions, in reality
many instructions can be executed in parallel (i.e. out of (program) order).
• ILP refers to the process of finding instructions from a single program/thread
of execution that can be executed in parallel
• Data flow (data dependencies) limits out-of-order execution
• Independent instructions (no data dependencies) can be executed at the
same time)
• Control hazards also provide some ordering constraints
lw $s3,0($s4)
and $t3,$t2,$t3
LW ADD SUB AND
Program add $t0,$t0,$s4
Order or $t5,$t3,$t2 Dependency
(In-order) sub $t1,$t1,$t2 Graph BEQ OR
beq $t0,$t8,L1
We may perform
xor $s0,$t1,$s2
execution out-of-order XOR
Cycle 1: lw $s3,0($s4) / add $t0,$t0,$s4 / sub $t1,$t1,$t2 / and $t3,$t2,$t3
Cycle 2: / beq $t0,$t8,L1 / / or $t5,$t3,$t2
Cycle 3: / / xor $s0,$t1,$s2 /
6

Basic Blocks
• Basic Block (def.) = Sequence of instructions that will
always be executed together
– No conditional branches out lw $s3,0($s4)
and $t3,$t2,$t3
– No branch targets coming in L1: add
or
$t0,$t0,$s4
$t5,$t3,$t2
This is a
basic block
sub $t1,$t1,$t2 (starts w/
– Also called “straight-line” code beq $t0,$t8,L1 target, ends
xor $s0,$t1,$s2 with branch)
– Average size: 5-7 instrucs.
• Instructions in a basic block can be overlapped if
there are no data dependencies
• Control dependences really limit our window of
possible instructions to overlap
– W/o extra hardware, we can only overlap execution of
instructions within a basic block
7

Other In-Order techniques

SUPERSCALAR & SUPERPIPELINING

Overview
• Superscalar = More than 1 instruction completing per clock cycle (IPC > 1)
– 2-way superscalar = Proc. that can issue 2 instructions per clock cycle
– Success is sensitive to ability to find independent instructions to issue in the same cycle
• Superpipelining = Many small stages to boost clock freq.
– Success depends of finding instructions to schedule in the shadow of data and control hazards

Instruction Instruc. Instruc. Data

Superscalar

1 Execute Write back

Fetch Decode Memory

Instruction Instruc. Instruc. Data

2 Execute Write back
Fetch Decode Memory

Superscalar: Executing more than 1 instruction per clock cycle (CPI < 1 or IPC > 1)
Superpipelining

Instruction
1 IF1 IF2 ID EX DM1 DM2 DM3 WB

Instruction
2 IF1 IF2 ID EX DM1 DM2 DM3 WB

Superpipelining: Divide logic into many short stages (Higher Clock Frequency)
9

2-way Superscalar
• Ex: One ALU & Data transfer (LW/SW) instruction can be issued at the same time
• Relies on compiler to find and reorder appropriate instructions (using nops if no
appropriate instruction can be found
Instruction Pipeline Stages
ALU or branch IF ID EX MEM WB
LW/SW IF ID EX MEM WB
ALU or branch IF ID EX MEM WB
LW/SW IF ID EX MEM WB
ALU or branch IF ID EX MEM WB
LW/SW IF ID EX MEM WB
Integer Slot

PC
ALU
Reg.
File
I-Cache
(4 Read,
Addr.
LD/ST Slot

2 Write) D-Cache
Calc.
2 instructions
10

Sample Scheduling
• Compiler can reorder instructions to find integer and memory
instructions to fuse together that can be run down the
pipeline at the same time
void f1(int *A, int n) { time
do {
*A += 5; Int./Branch Slot LD/ST Slot
A++; addi $7, $7, -1 lw $9,0($6)
n--;
} while (n != 0); addi $6, $6, 4
}
addi $9, $9, 5
# $6 = A bne $0,$7,L1 st $9,-4($6)
# $7 = n = # of iterations
L1: ld $9, 0(%6)
add $9, $9, 5
w/ modifications and code movement
st %r9,0(%rdi) IPC = 6 instrucs. / 4 cycle = 1.5
add $6, $6, 4
add $7, $7, -1
jne $0,%esi,L1
11

Scheduling Strategies
• Static Scheduling
– Compiler re-orders instructions in such a way that no
dependencies will be violated and allows for OoOE
• Dynamic Scheduling
– HW implementing the Tomasulo algorithm or other similar
approach will re-order instructions to allow for OoOE
• More Advanced Concepts
– Branch prediction and speculative execution (execution beyond
a branch flushing if incorrect) will be covered later
12

Static Scheduling
• Strengths
– Hardware simplicity [Better clock rate]
• Power/energy advantage
• Compiler has a global view of the program anyway, so it should be able to
do a “good” job
– Very predictable: static performance predictions are reliable
• Weaknesses
– Requires re-compilation to take advantage of new/modified
architecture
– Cannot foresee dynamic (data-dependent) events
• Cache miss, conditional branches (can only recedule instructions in a basic
block)
– Cannot precompute memory addresses
– No good solution for precise exceptions with out-of-order completion
13

OUT-OF-ORDER EXECUTION
14

Out-of-Order Motivation
• We will focus on dynamically scheduled, OoO processors
• Hide the impact of dynamic events such as a cache miss
– Let independent instructions behind a stalled instruction execute
• Separate functional units (ALU, MUL, DMEM, etc.)
• "Queues" where instructions wait
Queues +

ADD
SUB
until they are ready at which point Functional ALU
Units
they can execute "out-of-order"

MUL
MUL

LW $4,0($5)
// cache miss IM Reg Reg
ADD $6,$7,$4 DIV
SUB $1,$2,$3
MUL $9,$7,$2
LW DMEM
(Cache)
15

Dispatch, Execution, and Completion

• "Execution" here means producing the results not necessarily
writing them to a register or memory
• Completion means committing/writing the results to register
file or memory
• While we say out-of-order execution we really mean/want:
– In-order (Program order) Issue/Dispatch (IoD) Execution
– Out-of-Order Execution (OoOE)
– In-order Completion (IoC) [hard]
• So we'll start with the easier Issue/Dispatch Completion
Out-of-Order Completion (OoOC)

LW $4,0($5)
// cache miss
ADD $6,$7,$4
SUB $1,$2,$3 In-order In-order
MUL $9,$7,$2
Out-of-Order
16

Branch Handling
• We will present the concept of OoOC (out-of-order
completion) which is a bit easier and then come back to the
desired approach of In-Order Completion (IOC)
• OoOC Issues
– Branches…we should not commit an instruction that came after (in
program order) a branch
Execution
– Solution: Stall dispatching instructions
after a branch until we resolve the
outcome
Issue/Dispatch Completion

LW $4,0($5) // cache miss

BEQ $4,$0,L1
ADD $6,$7,$8
// What if we execute this Stall branches
ADD out of order
here
In-order In-order
Out-of-Order
17

Data Hazard Stalling

• In our 5-stage pipeline (in-order execution) RAW dependency
was solved by
– Forwarding (preferably) or
– Stalling (LW followed by dependent instruction)
• Dependent instructions stalled in the ID stage if necessary
• Do we want to stall in the decode stage in our OoO processor?
– No! Doing so would necessarily stall everyone behind us
ADD $1,$3,$4
(Stall here) bubble LW $4

IM Reg ALU DM Reg

Stalling here would plug up the

pipeline
18

EX Stage Stalling
• In our 5-stage pipeline, could we have stalled in the EX stage
• No! If ADD depended on an instruction in WB then it has no place to store
that forwarded data while it stalls
0
1
FLUSH
PCWrite

Ex Mem WB
IRWrite HDU
0

Mem WB
0 1
Stall
IF.Flush 0
Why? What if ADD was also

WB
0 1
dependent on the instruction in

MemToReg
Control Branch
4
+

WB… ADD has no place to

+
buffer that forwarded value Read Sh.

MemRead &
MemWrite
5 Reg. 1 # Left
2

Pipeline Stage Register

rt
Instruction Register

Read 0
Thus we stall in ID so we can Read
1
Reg. 2 #

Pipeline Stage Register

5 data 1
use the Register File to grab 2 0
I-Cache

dependent values. Further . Write Zero

ALUSelA
Reg. #

ALU
stalling in ID incurs only 1 cycle Read Res.
penalty as would stalling in EX. Write data 2 0 0

D-Cache
Data 1
1 1
2
Register File

Data Mem. or ALU result

Sign ALUSelB
ALUSrc
Reset
Extend
16 32 Forwarding
Unit 0
rs

Prior ALU
rt 1

Result
rd Regwrite &
WriteReg# Regwrite,
WriteReg#
19

Where to Stall?
• But to implement OoO execution, we cannot stall in the decode stage
since that would prevent any further issuing of instructions
• Thus, now we will issue to queues for each of the multiple functional units
and have the instruction stall in the queue until it is ready
Queues +
Functional ALU
Units

MUL

IM Reg Reg

DIV

Stalling here would plug up the DMEM

pipeline
(Cache)
20

Forwarding in OoO Execution

• In 5-stage pipeline later instructions carried their source register IDs into the
EX stage to be compared with destination register ID’s of their earlier
instructions
• But in OoO execution, we may have many (earlier) instructions in front of us
and would require more complex hardware to determine who is producing 0
1

the data we need

PCWrite
HDU
(especially when multiple producers exist and we want the
IRWrite
Ex Mem WB FLUSH

Mem WB
latest version) IF.Flush
Stall
0 1

WB
• Instead, the dispatch unit will explicitly tell the dependent instruction who to
0 1

MemToReg
Control Branch
4
+

get data from using part of Tomasulo's algorithm

+
Read Sh.

MemRead &
MemWrite
5 Reg. 1 # Left
2

Pipeline Stage Register

rt
Instruction Register

Read 0
Read
Reg. 2 # 1
Pipeline Stage Register

5 data 1
2 0
I-Cache

. Write Zero
PC

ALUSelA
Reg. #
ALU

Read Res.
Write data 2 0 0

D-Cache
Data 1
1 1
2
Register File

Data Mem. or ALU result

Sign ALUSelB
ALUSrc
Reset
Extend
16 32 Forwarding
Unit 0
rs
Prior ALU

rt 1
Result

rd Regwrite &
WriteReg# Regwrite,
WriteReg#
21

Tomasulo’s Plan
• OoO Execution
• Multiple functional units
– Integer ALU, Data memory, Multiplier, Divider
• Queues between ID and EX stages (in place of ID/EX
register)
– Allows later instructions to keep issuing even if earlier ones
are stalled
• Method for dealing with RAW data hazards by
specifying who dependent instructions should get
data from
– But with OoO execution, new hazards arise!
22

WAR and WAW

NEW DATA HAZARDS

RAW, WAR, and WAW

• RAW = Read After Write
– lw $8, 40($2)
– add $9, $8, $7
• WAR = Write After Read
– add $9, $8, $6  say $6 is not available yet, can LW execute?
– lw $8, 40($2)
• WAW = Write After Write
– add $9, $8, $6  say $6 is not available yet, can LW execute?
– lw $9, 40($2)
Why would anyone produce one result in $9 without utilizing
that result? Why would he overwrite it with another result?
How is this possible?
24

WAW can easily occur

• How is WAW possible? for(i=MAX; i != 0; i--)
A[i] = A[i] * 3;
• Example 1
– Say a company gives standard bonus to L1: lw $2, 40($1)
mult $4, $2, $3
most of the employees and a higher bonus sw $4, 40($1)
to managers addi $1, $1,-4
bne $1, $0,L1
– The software may set a default value to the
standard bonus and then overwrite for the Original Code
special case L1: lw $2, 40($1)
mult $4, $2, $3
• Example 2 sw $4, 40($1)
addi $1, $1,-4
– Consider multiple iterations of a loop body bne $1, $0,L1

L1: lw $2, 40($1)

int x = standard_bonus;
mult $4, $2, $3
if (manager) sw $4, 40($1)
addi $1, $1,-4
x = special_bonus;
bne $1, $0,L1
set_bonus(x);
25

RAW, WAR, and WAW

• Some terminology to remember
• RAW = Read After Write
RAW
– lw $8, 40($2) A true dependency
– add $9, $8, $7
• WAR = Write After Read
– add $9, $8, $6 WAR
Name Depdencies

An anti-dependency
– lw $8, 40($2)
• WAW = Write After Write
– add $9, $8, $6 WAW
An anti-dependency
– lw $9, 40($2)
Note: No information is communicated in WAR/WAW hazards.
If no info is communicated can we somehow solve these hazards?
26

RAW, WAR, and WAW

• In-order execution:
– We need to deal with RAW only
• Out-of-order execution
– Now we need to deal with WAR and WAW hazards besides RAW
– Any of these hazards seem to prevent re-ordering instructions and
executing them out-of-order
27

Register Renaming
WAR = Write After Read
• WAR and WAW hazards can add $9, $8, $6
always be solved by simply lw $8$48, 40($2)
choosing a DIFFERENT register WAW = Write After Write

since no data is being add $9, $8, $6

communicated but we were lw $9$49, 40($2)

simply "reusing" a register

This is an example of a name-dependency

lw $8, 40($2)
• If we had 64 registers instead First iteration add $8, $8, $8
sw $8, 40($2)
of 32 registers, then perhaps
the compiler might have used Second
iteration lw $48, 60($3)
$48 instead of $8 and we could (using add $48, $48, $48
alternate sw $48, 60($3)
have executed the second part register, $48)
of the code before the first part
28

Register Renaming
• Renaming requires more registers
• We have limited architectural registers
– Registers the instruction set is aware of
• We could have more physical registers
– Actual registers part of the register file
Assume Delayed
lw $8, 40($2) It is clear the compiler is using $8 as a
add $8, $8, $8 temporary register
sw $8, 40($2)
If there is a delay in obtaining $2 the first
lw $8, 60($3) part of the code cannot proceed
add $8, $8, $8
sw $8, 60($3) Unfortunately, the second part of the code
cannot proceed because of the name
dependency for $8
29

Increasing Number of Registers

• Can a later implementation provide 64
registers (instead of 32) while maintaining
binary compatibility with previously compiled
code?
• Answer: Yes / No
NO
• Why?
Machine code has 5-bit fields for register ID’s

R-Type opcode=6 rs=5 rt=5 rd=5 shamt=5 func=6

Register Renaming
• Rather than creating new architectural registers, let
us internally provide multiple "versions" of the same
architectural register
– $8v1 = $8 version 1
– $8v2 = $8 version 2

lw $8v1, 40($2)
add $8v2, $8v1, $8v1 $8v1
sw $8v2, 40($2)
$8v2
$8
lw $8v3, 60($3)
$8v3
add $8v4, $8v3, $8v3 "Arch. Reg" $8v4
sw $8v4, 60($3)
Phys Reg
31

Tomasulo's Approach to Renaming

• Cannot change the number of architectural registers

• Instead we will perform

Register Renaming through Tagging Registers
– This solves name dependency problems (WAR and WAW)
while attending to true dependency (RAW) through waiting
in queues
– Please be sure you understand this!
32

OoO Execution & Tomasulo's Algorithm

Uses "tags" to track
which instruction is
I-Cache Fetch multiple instructions per
the latest producer clock cycle in PROGRAM ORDER
(version) of a register. (i.e. normal order generated by
(Helps solve RAW, the compiler)
WAR, WAW
Reg. File

dependencies) Instruc.
Queue
Decode & dispatch multiple
instructions per cycle tracking
Register dependencies on earlier
Status instructions
Table Dispatch
Instructions wait in queues
until their respective
functional unit (the
hardware that will compute
their value) is free AND

Mult. Queue
they have their data
L/S Queue
Int. Queue

Div Queue
available (from the
instructions they depend
upon). These act as
additional "physical
registers"

Issue
Unit
Integer /
D-Cache Div Mul
Branch Results and TAGs of
multiple instructions can
be written back per cycle.
Results are broadcast to
any instruction waiting for
Block Diagram Adapted that result.
from Prof. Michel Dubois Common Data Bus
(Simplified for EE457)
33

Tomasulo’s Algorithm
• Dispatch/Issue unit decodes and dispatches instructions
• Assign a binary code (aka TAG) to each instruction producing a register
value using the TAG FIFO
• Adds a Register Status Table (RST) that holds the TAG of the instruction that
is producing the LATEST version of each architectural register or NULL if the
LATEST version is in the register file
• The destination operand is represented by the TAG but not the actual
register name
• For source operands, an instruction carries either the values (if TAG is null in
RST) or TAGs of the operands (but not the actual register name)
• When an instruction executes and produces a result it broadcasts the result
and its destination TAG
– Any instruction waiting can compare its SRC tags with the destination tag and
grab the value if they match
– If entry in RST matches the TAG then this instruction is the latest producer of
the register and the value will be written to the register file
34

Tagging process
RST
(Identify latest
version of a reg.) RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

T1: SQRT $2 Val / $10 Val

INT INT MUL/DIV/SQRT Load/
ALU Store
35

Tagging process: CC1

RST
(Identify latest
version of a reg.) RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
Instruction that will write to a destination register,
take a TAG and enter that TAG into the RST to
track the latest version/producer

RST = Register
T1: SQRT $2 Val / $10 Val
Status Table
INT INT MUL/DIV/SQRT Load/
ALU Store RF = Register File
36

Tagging process: CC2

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T2 $8
sw $8, 60($3) … …

$31 $31

Issue Logic

RST = Register
T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
Status Table
INT INT MUL/DIV/SQRT Load/
RF = Register File
ALU Store
37

Tagging process: CC3

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T2 T3 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
Notice the RST only stores the TAG of the
LATEST producer/version. Solves WAR/WAW
hazards by not accepting a writeback unless it is
from the latest/producer

RST = Register
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
Status Table
INT INT MUL/DIV/SQRT Load/
RF = Register File
ALU Store
38

Tagging process: CC4

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T3 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store
39

Tagging process: CC5

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T3 T4 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

T4: LW $3 val / 60
SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store
40

Tagging process: CC6

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T4 T5 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

T4: LW $3 val / 60
T5: ADD T4 / T4 SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store
41

Tagging process: CC7

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T5 $8
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

T4: LW $3 val / 60
T5: ADD T4 / T4 SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store

T4: Read 0x1111

Tagging process: CC8

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 T5 => null $8 0x2222
sw $8, 60($3) … …

$31 $31

T5: Sum 0x2222 Issue Logic

When latest producer writes to register, we reset
RST entry to NULL (indicates that the RF has the
latest value and issuing instructions can just take
that value from the RF)

T5: ADD 0x1111 / 0x1111 SW T3 / T1 / 40 RST = Register

T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
Status Table

INT INT MUL/DIV/SQRT Load/ RF = Register File

ALU Store

T5: Sum 0x2222

Tagging process: CC9

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 null $8 0x2222
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

SW 0x2222, $3 val / 60
SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store

T5: Sum 0x2222

Tagging process: CC10

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 T1 => null $2
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8 0x2222
sw $8, 60($3) … …

$31 $31

T1: SQRT 0xacd0 Issue Logic

RST = Register
Status Table
RF = Register File

SW T3 / T1 / 40
T3: ADD T2 / T2 T1: SQRT $2 Val / $10 Val T2: LW T1 / 40
INT INT MUL/DIV/SQRT Load/
ALU Store

T1: SQRT 0xacd0

Tagging process: CC11

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 $2 0xacd0
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8 0x2222
sw $8, 60($3) … …

$31 $31

Issue Logic
Since RST entry for $8 is NULL, RF will not update
when LW attempts to writeback.

RST = Register
Status Table
SW T3 / 0xacd0 / 40
T3: ADD T2 / T2 T2: LW 0xacd0 / 40 RF = Register File
INT INT MUL/DIV/SQRT Load/
ALU Store

T2: Read 0x5678

Tagging process: CC12

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 $2 0xacd0
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8 0x2222
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

SW T3 / 0xacd0 / 40
T3: ADD 0x5678 / 0x5678
INT INT MUL/DIV/SQRT Load/
ALU Store

T3: Sum 0xACF0

Tagging process: CC13

RST RF
sqrt $2, $10
$1 $1
lw $8, 40($2) $2 $2 0xacd0
add $8, $8, $8 $3 $3
$4 $4
sw $8, 40($2) $5 $5
$6 $6
lw $8, 60($3) $7 $7
add $8, $8, $8 $8 $8 0x2222
sw $8, 60($3) … …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

SW 0xacf0 / 0xacd0 / 40

INT INT MUL/DIV/SQRT Load/

ALU Store
48

Register Renaming
RST RF
sqrt $2, $10
$1 $1
add $2, $2, $2 $2 T1, T2, T3, T4 $2
add $2, $2, $2 $3 $3
$4 $4
add $2, $2, $2 $5 $5
$6 $6
add $2, $2, $2 $7 $7
$8 $8

… …

$31 $31

Issue Logic
RST = Register
Status Table
RF = Register File

T4: ADD T3 / T3
T3: ADD T2 / T2
T2: ADD T1 / T1 T1: SQRT $2 Val / $10 Val
INT INT MUL/DIV/SQRT Load/
ALU Store
49

Unique TAGs
• Like SSN, we need a unique TAG
• SSN’s are reused.
• Similarly TAGS can be reused
• TAGs are similar to number TOKEN

Helps to create a In State Bank of India, the cashier issues

virtual queue. brass token to customers trying to draw
money as an ID (and not at all to put them
in any virtual queue / ordering). Token
We do not need numbers are in random order.
that here
The cashier verifies the signature in the
record rooms, returns with money, calls the
token number and issues the money.
Tokens are reclaimed & reused.
50

Tags (= Tokens)
• How many tokens should the bank casheir
have to start with?
• What happens if the tokens run out?
• Does the cashier need to have any order in
holding tokens and issuing tokens?
• Do they have to collect the tokens back?
51

TAG FIFO
FIFO’s are taught in EE 560

• To issue and collect tokens (TAGS) use a circular FIFO (First-

In/First-Out) unit
– While the FIFO order is not important here, a FIFO is the easiest to
implement in hardware compared to a random order in a pile
• Filled (with say) 64 tokens (in any order) initially on reset
• Tokens return in any order
• Put tokens back in the FIFO and reissue
TAG FIFO TAG FIFO TAG FIFO

wp 0 rp wp 1
1 wp

2 2 rp 2 rp

… … …
63 63 63
FULL 2 Tokens issued 1 Tokens returned
52

Organization for OoO Execution

I-Cache TAG FIFO Block Diagram
Adapted from Prof.
Michel Dubois

Instruc. (Simplified for EE 457)

Reg. File

Queue

Mult. Queue
L/S Queue
Int. Queue

Div Queue

Issue
Unit
Integer /
D-Cache Div Mul
Branch

CDB
53

Front-End & Back-End

• IFQ (Instruction Fetch Queue)
– A FIFO structure
• Dispatch (Issue) Unit
– Includes RST, RF, Tag FIFO
• Load/Store and other Issue Queues
• Issue Units
• Functional units
• CDB (Common Data Bus)
– Like a public address system that everyone can see/hear
when data is produced
54

More Tomasulo Algorithm

• Front End
– Instructions are fetched
– They are stored in a FIFO (IFQ)
– When instruction reached the head of the IFQ it is
• Decoded
• Dispatched to an issue queue/functional unit
• Even if some of the inputs are not ready (takes TAGs)
• Back End
– Instructions in issue queues wait for their input operands
– Once register operands are ready instructions can be scheduled for execution provided
they will not conflict for the CDB or their functional unit
– Instructions execute in their functional unit and their result is put on the CDB
– All instructions in queues and the register file “watch” the CDB and grab the value they
are waiting for when it is produced
• Bottleneck in Tomasulo's algorithm?
– The CDB!!!
– Do all instructions use the CDB? No, not SW, J (jump), BEQ
55

Data hazards and memory

MEMORY DISAMBIGUATION
56

Load/Store Queue (LSQ)

• For our course, the LSQ performs
– Address calculation
– Memory disambiguation
• RAW, WAR, WAW hazards due to memory reads and
writes

// Is there a dependency here?

SW $2,0($5)
LW $8,0($5)
// What about here?
SW $2, 1000($4)
LW $3, 0($6)
57

Memory Disambiguation
• Data hazards (RAW, WAR, WAW) can occur in memory just as
with registers, and hazards in memory are much harder to deal
with since many combinations could produce the same address
RAW This later lw can proceed only if there is
no store ahead of it with the same address
sw $2, 2000($0)
lw $8, 2000($0)

WAW This later sw can proceed only if there is

no store ahead of it with the same address
sw $2, 2000($0)
sw $8, 2000($0)

WAR This later sw can proceed only if there is

no load ahead of it with the same address
lw $2, 2000($0)
sw $8, 2000($0)
58

Address Calculation for LW/SW

• EE 557 approach for address calculation
– Loads & store in 2 sub-instructions
• 1 instruction computes address and is dispatched to
integer ALU
• 1 instruction access data cache and is issued to LSQ
• Address is communicated from integer ALU to LSQ via
CDB forwarding using a tag
• EE 560/457 approach
– Use a dedicated adder in the LSQ to compute
address (so just 1 dispatched instruction)
59

Memory Disambiguation
• When can LSQ can issue a LW or SW to cache?
– Loads can issue to a cache when their address is ready
– Stores can issue to cache when both address & data is ready
– Memory hazards (RAW, WAR, WAW) are resolved in the LSQ
• Load can issue to cache if no store with same address is before it
• Store can issue to cache if no store or load with same address before it
• Otherwise, access waits in LSQ
– If an address is unknown it is assumed to be the same
• Worst case to enforce correctness
– The process of figuring out and comparing memory address is called
“disambiguation”
60

Issue Queue priority, Branches, etc.

LAST CONSIDERATIONS FOR

OUT-OF-ORDER
EXECUTION/COMPLETION
61

Issue Unit
• How do we determine when to issue an instruction to the
functional unit?
– Is the instruction ready
– Is the functional unit free to start the operation?
– CDB availability constraint
• Will there be room on the CDB when operation finished?
– Priority/conflict resolution
• If many instructions are available, which should be chosen? (Is round-
robin priority adequate)?

How do we prioritize
instructions that are ready?
62

Issue Queue Priority

• Priority (based on the order of arrival among
ready instructions)
– Is it necessary or just desirable?
– Local priority within queues?
– Global priority across the queues?

How do we prioritize
instructions that are ready?
63

LSQ Ordering/Priority
• Maintaining instructions in the order of arrival
– Issue order/program order in a queue
• Is this necessary and/or desirable?
– In the case of LSQ?
• Necessary! To enforce memory disambiguation
– In the case of Integer, MUL, DIV queues?
• Desirable, so that an earlier instruction gets executed
whenever possible, thereby reducing queue pressure
from too many instructions waiting on it
64

Conditional Branches
• Dispatcher stalls when it reaches a branch (and waits until it is resolved)
• Branches are dispatched to integer queue where they wait for their
operands (if necessary)
• When branch executes it puts its outcome & target on CDB
– If untaken, dispatch unit resumes
– If taken, then dispatch clears flushes the IFQ and resumes at target
• Since we stop dispatching instructions after a branch, does it mean that
this branch is the last instruction to be executed in the back-end?
• Is it possible that the back-end holds simultaneously
– A. Some instructions dispatched before
the branch .. AND ..
– B. Some instructions issued after
the branch
ADD $4,$5,$5
BEQ $6,$7,L1
...
L1: SUB $1,$2,$3
MUL $9,$7,$2
65

Structural Hazards + Exceptions

• Structural Stalls
– Dispatch must stall if IFQ empty OR all
entries in the desired functional unit’s
issue queue are occupied AND an
instruction of that type is attempting to
dispatch
– Fetch unit must stall if the IFQ is full
– Functional units stall when no ready
instructions in the queue or CDB
scheduling conflicts
• Precise exceptions not supported
– Some instructions after the offending
instruction may have updated registers
or memory! BAD!
– We'll handle this in the next unit
66

BACKUP
67

Tagging Registers: CC1

Orange means dispatched and
SQRT is a long-latency RST RF
DOG computation
$1 $1
sqrt $2, $10 $2 DOG $2
$3 $3
$4 $4
$5 $5
lw $8, 40($2) $6 $6
$7 $7
add $8, $8, $8 $8 $8

… …
sw $8, 40($2)
$31 $31

lw $8, 60($3)
add $8, $8, $8
sw $8, 60($3)

Destination Dependent source

RST = Register Status Table

RF = Register File
68

Tagging Registers: CC2

Orange means dispatched and
SQRT is a long-latency RST RF
DOG computation
$1 $1
sqrt $2, $10 $2 DOG $2
$3 $3
$4 $4
LION $5 $5
lw $8, 40($2)DOG $6 $6
$7 $7
add $8, $8, $8 $8 LION $8

… …
sw $8, 40($2)
$31 $31

lw $8, 60($3)
add $8, $8, $8
sw $8, 60($3)

Destination Dependent source

RST = Register Status Table

RF = Register File
69

Tagging Registers: CC3

Orange means dispatched and
SQRT is a long-latency RST RF
DOG computation
$1 $1
sqrt $2, $10 $2 DOG $2
$3 $3
$4 $4
LION $5 $5
lw $8, 40($2) DOG $6 $6
TIGER LION LION $7 $7
add $8, $8, $8 $8 TIGER $8

… …
sw $8, 40($2)
$31 $31

lw $8, 60($3)
add $8, $8, $8
sw $8, 60($3)

Destination Dependent source

RST = Register Status Table

RF = Register File
70

Tagging Registers: CC4

Orange means dispatched and
SQRT is a long-latency RST RF
DOG computation
$1 $1
sqrt $2, $10 $2 DOG $2
$3 $3
$4 $4
LION $5 $5
lw $8, 40($2) DOG $6 $6
TIGER LION LION $7 $7
add $8, $8, $8 $8 TIGER $8
TIGER … …
sw $8, 40($2)
$31 $31

lw $8, 60($3)
add $8, $8, $8
sw $8, 60($3)

Destination Dependent source

RST = Register Status Table

RF = Register File
71

Tagging Registers Review

RST RF
DOG
$1 $1
sqrt $2, $10 $2 DOG $2
$3 $3
$4 $4
LION $5 $5
lw $8, 40($2) DOG $6 $6
TIGER LION LION $7 $7
add $8, $8, $8 $8 TIGER $8
TIGER … …
sw $8, 40($2)
$31 $31

lw $8, 60($3)
• Dispatch unit decodes and dispatches instructions
add $8, $8, $8 • For destination operand, an instruction carreis a
sw $8, 60($3) TAG (but not the actual register name)
• For source operands, an instruction carries either
the values (if no TAG in RST) or TAGs of the
operands (but not the actual register name)
• When
72

Organization for OoO Execution

I-Cache TAG FIFO Block Diagram
Adapted from Prof.
Michel Dubois

Instruc. (Simplified for EE 457)

Reg. File

Queue

Mult. Queue
L/S Queue
Int. Queue

Div Queue

Issue
Unit
Integer /
D-Cache Div Mul
Branch

CDB
73

Multiple Functional Units

• We now provide multiple functional units
• After decode, issue to a queue, stalling if the unit is busy or
waiting for data dependency to resolve

Queues +
Functional ALU
Units

MUL

IM Reg Reg

DIV

DMEM
(Cache)
74

Multiple Functional Units

• We now provide multiple functional units
• After decode, issue to a queue, stalling if the unit is busy or
waiting for data dependency to resolve

Queues +
Functional ALU
Units

MUL

IM Reg DM Reg

DIV

DM
(Cache)
75

Queues +
Functional ALU
Units

MUL

IM Reg DM Reg

DIV

Stalling here would plug up the

pipeline Addr
Calc.
76

Functional Unit Latencies

Int. ALU, Addr. Calc.

EX
FP Add Look Ahead: Tomasulo
Algorithm will help absorb
An added complication of A1 A2 A3 A4 latency of different functional
units and cache miss latency by
out-of-order execution & Int. & FP MUL allowing other ready instruction
completion: WAW & WAR proceed out-of-order
hazards M1 M2 M3 M4 M5 M6 M7
Int. & FP DIV

Functional Unit Latency Initiation Interval

(Required stalls cycles (Distance between 2 independent instructions
between dependent [RAW] instrucs.) requiring the same FU)

Integer ALU 0 1
FP Add 3 1
FP Mul. 6 1
FP Div. 24 25
77

OoO Execution w/ ROB

• ROB allows for OoO execution but in-order completion

I-Cache D-Cache

ROB
Instruc.
Reg. File

(Reorder
Queue Buffer)

Br. Pred.
Buffer Dispatch Exceptions?
No problem

Mult. Queue
L/S Queue
Int. Queue

Div Queue

Addr.
Buffer
Issue
Unit
Exec. Unit
Integer /
D-Cache Div Mul
Branch
L/S Buffer
CDB

DPR Lakshadweep Submarine by TCIL PDF
No ratings yet
DPR Lakshadweep Submarine by TCIL PDF
133 pages
Chapter 2 ILP
No ratings yet
Chapter 2 ILP
89 pages
Instruction Level Pipelining
100% (1)
Instruction Level Pipelining
113 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
5th Sem_unit 2-Ec355tbf
No ratings yet
5th Sem_unit 2-Ec355tbf
104 pages
13) Ilp1 PDF
No ratings yet
13) Ilp1 PDF
85 pages
Topic2c Ss Dynamicscheduling
No ratings yet
Topic2c Ss Dynamicscheduling
94 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
Cosc530 Ch3all6up
No ratings yet
Cosc530 Ch3all6up
8 pages
7TH - Unit 2-21ec74h6 - Ca
No ratings yet
7TH - Unit 2-21ec74h6 - Ca
95 pages
5 Advanced-1
No ratings yet
5 Advanced-1
60 pages
Instruction Scheduling
No ratings yet
Instruction Scheduling
17 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
EC483 Fall2024 W7
No ratings yet
EC483 Fall2024 W7
40 pages
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
No ratings yet
Instruction-Level Parallelism and Its Exploitation: Prof. Dr. Nizamettin AYDIN
170 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture13 Pipelining Afterlecture
138 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
L1.3b OOOpipelines
No ratings yet
L1.3b OOOpipelines
72 pages
CH18 COA11e
No ratings yet
CH18 COA11e
37 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
CS 6290 Instruction Level Parallelism
No ratings yet
CS 6290 Instruction Level Parallelism
45 pages
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Instruction Level Parallelism and Superscalar Processors
50 pages
CH16-WS ILP and Superscalar-V2
No ratings yet
CH16-WS ILP and Superscalar-V2
42 pages
Design of 32bit MIPS Processor
No ratings yet
Design of 32bit MIPS Processor
23 pages
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
110 pages
Instruction-Level Parallelism (ILP), Since The
100% (1)
Instruction-Level Parallelism (ILP), Since The
57 pages
3a.ILP Dipendenze e Superscalare
No ratings yet
3a.ILP Dipendenze e Superscalare
24 pages
WINSEM2022-23 CSE4001 ETH VL2022230503160 Reference Material I 22-12-2022 2.1 ILP
No ratings yet
WINSEM2022-23 CSE4001 ETH VL2022230503160 Reference Material I 22-12-2022 2.1 ILP
34 pages
Hafta 14
No ratings yet
Hafta 14
23 pages
06 Ooo Basics
No ratings yet
06 Ooo Basics
74 pages
CH16 ParallelismSuperScalar 22 Slides
No ratings yet
CH16 ParallelismSuperScalar 22 Slides
22 pages
Chapter 5 PPTV 41 STDV 1
No ratings yet
Chapter 5 PPTV 41 STDV 1
47 pages
Lecture 5
No ratings yet
Lecture 5
80 pages
L27,28 Superscaler
No ratings yet
L27,28 Superscaler
28 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
No ratings yet
Decode and Issue More and One Instruction at A Time Executing More Than One Instruction at A Time More Than One Execution Unit
28 pages
CS 6461: Computer Architecture Instruction Level Parallelism
No ratings yet
CS 6461: Computer Architecture Instruction Level Parallelism
41 pages
Computer Organization and Architecture What Does Superscalar Mean?
No ratings yet
Computer Organization and Architecture What Does Superscalar Mean?
14 pages
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
89 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
Superscalar
No ratings yet
Superscalar
38 pages
P14-15 Superscalar
No ratings yet
P14-15 Superscalar
28 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Chapter 13 - Instruction Level Parallelism
No ratings yet
Chapter 13 - Instruction Level Parallelism
16 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
10 Week
No ratings yet
10 Week
35 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
No ratings yet
William Stallings Computer Organization and Architecture: Instruction Level Parallelism and Superscalar Processors
28 pages
Pipelining Become Universal Technique in 1985
No ratings yet
Pipelining Become Universal Technique in 1985
16 pages
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
No ratings yet
CSE 820 Graduate Computer Architecture Week 5 - Instruction Level Parallelism
38 pages
ELECH473 Th04
No ratings yet
ELECH473 Th04
59 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Computer Architecture Revision For Final Exam
No ratings yet
Computer Architecture Revision For Final Exam
60 pages
CompArch 17e ILP-1
No ratings yet
CompArch 17e ILP-1
15 pages
Lecture 5
No ratings yet
Lecture 5
76 pages
Texas CH 320m Colapret Final Review
No ratings yet
Texas CH 320m Colapret Final Review
87 pages
Homework 2: H2P1: Current Divider
0% (1)
Homework 2: H2P1: Current Divider
1 page
Sensors: Thermalwrist: Smartphone Thermal Camera Correction Using A Wristband Sensor
No ratings yet
Sensors: Thermalwrist: Smartphone Thermal Camera Correction Using A Wristband Sensor
18 pages
12 Math SP Hy 08 2019-20 PDF
No ratings yet
12 Math SP Hy 08 2019-20 PDF
4 pages
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
No ratings yet
Automatically Build ML Models On Amazon SageMaker Autopilot - Tapan Hoskeri
26 pages
Automate Machine Learning - Aparna Elangovan
No ratings yet
Automate Machine Learning - Aparna Elangovan
26 pages
SHAURDUINO Boards
No ratings yet
SHAURDUINO Boards
56 pages
Physics Investigatory Project: Transistor ASA Switch
No ratings yet
Physics Investigatory Project: Transistor ASA Switch
2 pages
.-111111 - Tti, N: Untvi '7,'lo) LLT' Ll''l.it
No ratings yet
.-111111 - Tti, N: Untvi '7,'lo) LLT' Ll''l.it
4 pages
RFC 790
No ratings yet
RFC 790
15 pages
What Is The Role of IIS ?
No ratings yet
What Is The Role of IIS ?
37 pages
PAYE-REG-03-G02 - Register An Employee For Income Tax Via EFiling - External Guide
No ratings yet
PAYE-REG-03-G02 - Register An Employee For Income Tax Via EFiling - External Guide
11 pages
Schematic Diagram
100% (1)
Schematic Diagram
12 pages
Table 1: Number of CLSU Students
No ratings yet
Table 1: Number of CLSU Students
5 pages
Result 4th Sem 2021
No ratings yet
Result 4th Sem 2021
13 pages
Microsoft All Time Q
No ratings yet
Microsoft All Time Q
28 pages
Networks, Telecommunications and The Internet: Slide 5.1
No ratings yet
Networks, Telecommunications and The Internet: Slide 5.1
41 pages
Introduction To TCAD - Presentation
No ratings yet
Introduction To TCAD - Presentation
15 pages
Helsinki Manifesto 201106
No ratings yet
Helsinki Manifesto 201106
10 pages
Greedy Solution To The Fractional Knapsack Prob
No ratings yet
Greedy Solution To The Fractional Knapsack Prob
3 pages
Introduction To Computer Organization
No ratings yet
Introduction To Computer Organization
66 pages
8086 Microprocessor Trainer Kit PDF 2 PDF
No ratings yet
8086 Microprocessor Trainer Kit PDF 2 PDF
116 pages
Sylvester Maurus, Aristotelis Opera Omnia, I (Logica, Rhetorica, Poetica), Roma, 1668
100% (1)
Sylvester Maurus, Aristotelis Opera Omnia, I (Logica, Rhetorica, Poetica), Roma, 1668
999 pages
Resiliency Orchestration 7.1 SP7 Admin Guide
No ratings yet
Resiliency Orchestration 7.1 SP7 Admin Guide
1,073 pages
CH 5 Vahid
100% (1)
CH 5 Vahid
88 pages
Value Stream Mapping Fundamentals
88% (8)
Value Stream Mapping Fundamentals
28 pages
MVTS Manual v2
No ratings yet
MVTS Manual v2
12 pages
How Operating System Works
No ratings yet
How Operating System Works
21 pages
Designing Effective PowerPoint Presentations
No ratings yet
Designing Effective PowerPoint Presentations
52 pages
ActivInspire Handbook
No ratings yet
ActivInspire Handbook
19 pages
Project Deliverable 5 / Infrastructure and Security / Complete Solution / Project File Included
No ratings yet
Project Deliverable 5 / Infrastructure and Security / Complete Solution / Project File Included
2 pages
Assignment
No ratings yet
Assignment
3 pages
DSE2157 Installation Instructions
No ratings yet
DSE2157 Installation Instructions
2 pages
Installing Oracle, PHP and Apache On WINDows
No ratings yet
Installing Oracle, PHP and Apache On WINDows
5 pages
Introduction To Oralce
No ratings yet
Introduction To Oralce
9 pages
Mumbai Data
No ratings yet
Mumbai Data
82 pages
Kismet
100% (1)
Kismet
28 pages