Ece 7373 HW#4
Ece 7373 HW#4
1. In this exercise we will look at how variations on Tomasulos algorithm perform when running a common vector loop. The loop is the so-called DAXPY loop (double-precision aX plus Y) and is the central operation in Gaussian elimination. The following code implements the operation Y= aX+Y for a vector of length of 100. Initially, R1=0 and F0 contains a. Foo L.D MUL.D L.D ADD.D S.D DADDUI DADDUI DSGTUI BEQZ F2, 0(R1) F4, F2, F0 F6, 0(R2) F6, F4, F6 F6, 0(R2) R1, R1, #8 R2, R2, #8 R3, R1, #800 R3, foo ; load X(i) ; multiply a!X(i) ; load Y(i) ; add a ! X(i) + Y(i) ; store Y(i) ; increment X index ; increment Y index ; test if done ; loop if not done
i) Assume a single-issue Tomasulo MIPS pipeline with the specifications shown in Table 1. FU Type Integer FP Adder FP Multiplier Cycles in EX 1 5 7 Number of FUs 1 1 1 Table 1 Show the number of stall cycles for each instruction and what clock cycle each instruction begins execution. Show your answer in the form of a table like the one shown in Fig. 3.19 of your textbook (2.20 in 4th Ed), that shows the clock cycle an instruction issues at, executes, does the access to memory (for loads or stores) and finally "The CDB performs the Write" for one iteration of the loop and the first instruction of the second iteration. Assume also the following: There is a separate adder for effective address calculation and integer operations (ADD, SUB). Number of Reservation Stations 5 3 2
Loads and Stores take one clock cycle for effective address calculation (during EX) and one clock cycle for memory access as in Fig. 3.19. The issue (IS) and write result stages each take one clock cycle. There is no forwarding between function units; results are communicated by the CDB. If an instruction needs the value computed by a functional unit, it can have it after the value is written by the CDB (as in Fig. 3.19 DADDIU executes at clock cycle 5, after the LD writes CDB at clock cycle 4). An instruction that follows a branch can issue before the outcome of the branch is known, but it can execute only after the branch outcome is known Function units are not pipelined.