Solution Set To Homework #4 (Updated 5/5/03)
Solution Set To Homework #4 (Updated 5/5/03)
move, comp, br, beq, blt, bgt, bne, bge, ble, add, sub, lshift, and
rshift
where COMP compares its two operands and stores the results of the
comparison in a hidden register (condition code) for future use by a
conditional branch, BR is an unconditional branch, and BEQ, BLT, BGT, BNE,
BGE, and BLE are conditional branches with one operand such that the branch
is taken if the results of the most recent comp operation are equal, less
than, greater than, not equal, greater than or equal, and less that or
equal, respectively.
Assume that the first element of A is stored in A_start, that size contains
the number of elements in A, and that x, the value for the search, is stored
in value.*
NFound: move R4, 0x0 ; Set the fail flag and halt.
halt 0
LessT: br Loop ; Here for clarity. The algorithm
; “naturally” handles this case with
; the progressive halving of R1
2. Consider Figures 6.2 and 6.3 on page 439 of PH, and the load word (lw)
instruction of MIPS. Suppose that a clever designer manages to combine the
last two components of the instruction cycle (Data access and Register
write) into one component (call it DARW) that takes 2 ns to complete; i.e.,
a lw instruction will now have 4 components, Instruction fetch, Register
read, ALU operation, and DARW, and will take a total of 7 ns.
(a) Redraw both parts of Figure 6.3 assuming this new design.
IF – ID – ALU – DARW
IF – ID - ALU – DARW = 12 ns, total
IF – ID - ALU - DARW
There were 3 ways to answer this. The first was to compare the old (non-DARW)
machine to the new one, both in their pipelined versions. Then, the answer
is:
Speed-up = (10/5)/(8/4) = 1x
Note that the DARW improvement actually doesn’t help in the limit as the
number of instructions approaches infinity. The reason for this is that there
are no hazards that would force the stopping of execution. Thus, after the
pipe is full, an instruction is retired every clock cycle without fail. The
length of the clock cycle is the only real factor here, and the clock cycle is
the same in both machines (2 ns).
Note that the original posted values were wrong, because they assumed the
validity of the above equations when the pipeline stages are of different
length. This doesn’t work, because a proper pipeline acts like a ceiling
function, forcing the clock cycle to the longest length instruction. To make
it work, the lengths of the stages that are shorter than the cycle time have
to be “padded”.
The second gives the theoretical maximum speed-up in a 4 stage pipeline (ideal
conditions), which is simply 4x. This assumes that the ideal situation is
achievable, which means that the work is evenly distributed among the stages
and the cycle time goes down to 1.75 ns.
The third simply compares the DARW pipeline with the DARW standard system.
This yields:
*NOTE: There has been some confusion with the instructions and the addressing
modes we were looking for. Here are the standard addresses for this
assignment, as they were posted on the list, with examples. The original
solution to problem #1 used non-standard semantics, which implied extra
indirection.