0% found this document useful (0 votes)
46 views3 pages

Solution Set To Homework #4 (Updated 5/5/03)

The document provides an assembly language program that performs a binary search on a sorted integer array to find a given search value. The program uses registers to store important values like the array start address, array size, and search value. It recursively halves the array range and compares the midpoint value to the search value, updating registers to narrow the search range until the value is found or the range is exhausted, in which case it leaves a fail flag.

Uploaded by

vaish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views3 pages

Solution Set To Homework #4 (Updated 5/5/03)

The document provides an assembly language program that performs a binary search on a sorted integer array to find a given search value. The program uses registers to store important values like the array start address, array size, and search value. It recursively halves the array range and compares the midpoint value to the search value, updating registers to narrow the search range until the value is found or the range is exhausted, in which case it leaves a fail flag.

Uploaded by

vaish
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 3

Solution Set to Homework #4 (updated 5/5/03)

CSE410 HOMEWORK #4 Due: Thursday, May 1

1. Assume a 2-address machine, multiple general purpose registers, and all


types of operand addressing (direct, immediate, indirect, and base
displacement). Each memory and register cell is large enough to hold an
integer or memory address. The architecture supports the instructions:

move, comp, br, beq, blt, bgt, bne, bge, ble, add, sub, lshift, and
rshift

where COMP compares its two operands and stores the results of the
comparison in a hidden register (condition code) for future use by a
conditional branch, BR is an unconditional branch, and BEQ, BLT, BGT, BNE,
BGE, and BLE are conditional branches with one operand such that the branch
is taken if the results of the most recent comp operation are equal, less
than, greater than, not equal, greater than or equal, and less that or
equal, respectively.

You are given an array A of integers, sorted in ascending sequence. A is


stored in sequential locations in main memory. Write an assembly language
program that performs a binary search over A, looking for a given value x.
If x is not in A, your program should leave the integer 0 in a register.
If x is in A, then the program should leave a 1 in the register and the
address, say adr, such that A[adr]=x in a memory location called match (in
assembly language).

Assume that the first element of A is stored in A_start, that size contains
the number of elements in A, and that x, the value for the search, is stored
in value.*

move R0, #A_Start ; Take important values from memory


move R1, size ; and keep them in registers for the
sub R1, #1 ; purposes of the program. Will need
move R2, value ; size-1 rather than size.

Loop: comp R1, 0x0 ; If the value to shift is 0 or less,


br NFound ; the value wasn’t found. Break to avoid
; infinite loop.

move R3, R0 ; Get A_Start + size-1 / 2 in a register.


rshift R1, #1 ;
add R3, R1 ;

comp (R3), R2 ; Compare the value we’re looking at.


beq Found ; If they’re the same, we found it.
blt LessT ; Otherwise, choose the half to look in.
bgt GreatT ; Assumes stability of hidden register.

Found: move match, R3 ; Put the address in match. Assumes that


; move takes an address or register as dest.
move R4, 0x1 ; Put 1 in R4, the return register.
halt 0 ; We’re done.

NFound: move R4, 0x0 ; Set the fail flag and halt.
halt 0
LessT: br Loop ; Here for clarity. The algorithm
; “naturally” handles this case with
; the progressive halving of R1

GreatT: move R0, R3 ; Set the base to the midpoint.


br Loop ; Do it again.

2. Consider Figures 6.2 and 6.3 on page 439 of PH, and the load word (lw)
instruction of MIPS. Suppose that a clever designer manages to combine the
last two components of the instruction cycle (Data access and Register
write) into one component (call it DARW) that takes 2 ns to complete; i.e.,
a lw instruction will now have 4 components, Instruction fetch, Register
read, ALU operation, and DARW, and will take a total of 7 ns.

(a) Redraw both parts of Figure 6.3 assuming this new design.

IF – ID – ALU – DARW | IF – ID – ALU – DARW | IF – ID – ALU – DARW = 21 ns

IF – ID – ALU – DARW
IF – ID - ALU – DARW = 12 ns, total
IF – ID - ALU - DARW

(b) What is the approximate speed-up under ideal conditions in a pipelined


execution? Show your work.

There were 3 ways to answer this. The first was to compare the old (non-DARW)
machine to the new one, both in their pipelined versions. Then, the answer
is:

Speed-up = time between instr(pipe, non-DARW)/time between instr(pipe, DARW)

time between instr(pipe, non-DARW) = time between instr(std, non-DARW)/number


of stages = (8+2)/5
time between instr(pipe, DARW) = time between instr(std, DARW)/number of
stages = (7+1)/4

Speed-up = (10/5)/(8/4) = 1x
Note that the DARW improvement actually doesn’t help in the limit as the
number of instructions approaches infinity. The reason for this is that there
are no hazards that would force the stopping of execution. Thus, after the
pipe is full, an instruction is retired every clock cycle without fail. The
length of the clock cycle is the only real factor here, and the clock cycle is
the same in both machines (2 ns).

Note that the original posted values were wrong, because they assumed the
validity of the above equations when the pipeline stages are of different
length. This doesn’t work, because a proper pipeline acts like a ceiling
function, forcing the clock cycle to the longest length instruction. To make
it work, the lengths of the stages that are shorter than the cycle time have
to be “padded”.

The second gives the theoretical maximum speed-up in a 4 stage pipeline (ideal
conditions), which is simply 4x. This assumes that the ideal situation is
achievable, which means that the work is evenly distributed among the stages
and the cycle time goes down to 1.75 ns.
The third simply compares the DARW pipeline with the DARW standard system.
This yields:

Speed-up = time between instr(std)/time between instr(pipe) = 7/2 = 3.5x.

*NOTE: There has been some confusion with the instructions and the addressing
modes we were looking for. Here are the standard addresses for this
assignment, as they were posted on the list, with examples. The original
solution to problem #1 used non-standard semantics, which implied extra
indirection.

Presume the following setup:

0: dc 1 ; The value 1 is at address 0.

A_start: dc 0 ; A[0] = 0. A[0] is contained in A_start.


dc 1 ; A[1] = 12.

A_start is placed in address 100 by the assembler.

A_start – directly addressed.


This gets the value of the constant defined there, A[0].

(A_start) – indirectly addressed.


This gets the value resulting from dereferencing A_start. *A[0] = 1; (in C)

#A_start – immediately addressed.


This gets the pointer the assembler has assigned to A_start. &A[0] = 100;

1(A_start) – index addressed.


This gets the value of the next element in the array starting at A_start.
A[1] = 12.

You might also like