Vector Code Example
Vector Code Example
Grade:
Example:
Consider this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = a*x[i] + y[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D
L.D.
L.D.
L.D.
L.D.
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
R2,R2,8
R3,R3,8
R10,R10,-1
R10,Loop
;
;
;
;
;
;
;
;
;
;
load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
increment array pointer for y[]
increment array pointer for z[]
decrement loop counter
branch R10 != zero
process
array x
array y
array z
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
LOOP:
LV
MULVS.D
LV
ADDVV.D
SV
DADDUI
V1,R1
V2,V1,F0
V3,R2
V4,V2,V3
R3,V4
R1,R1,16*8
;
;
;
;
;
;
load vector X
vector-scalar multiply
load vector Y
add
store the result
increment array pointer for x[]
process
array x
array y
array z
Page 1 of 6
DADDUI
DADDUI
DADDUI
BNEZ
c)
How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
L.D
L.D.
L.D.
L.D.
L.D.
Loop:
L.D
MUL.D
L.D
ADD.D
S.D
DADDUI
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
BNEZ
Number of times
executed
1
1
1
1
1
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
F1,[R1]
F2,F1,F0
F3,[R2]
F4,F2,F3
[R3],F4
R1,R1,8
;
;
;
;
;
;
load vector X
scalar-scalar multiply
load vector Y
add
store the result
increment array pointer
128
128
128
128
128
128
R2,R2,8
128
R3,R3,8
128
process
array x
array y
array z
128
127*2 + 1*1
Instruction
L.D
L.D.
L.D.
L.D.
L.D.
F0,a
R10, 128
R1, 1000
R2, 2000
R3, 3000
;
;
;
;
;
load scalar a
128 elements to
load address of
load address of
load address of
LOOP:
L.D
MULVS.D
LV
ADDVV.D
SV
DADDUI
V1,R1
; load vector X
V2,V1,F0
; vector-scalar multiply
V3,R2
; load vector Y
V4,V2,V3
; add
R3,V4
; store the result
R1,R1,16*8 ; increment array pointer
process
array x
array y
array z
Number of times
executed
1
1
1
1
1
8
8
8
8
8
8
Page 2 of 6
for x[]
DADDUI
for y[]
DADDUI
for z[]
DADDUI
counter
BNEZ
R10,R10,-16
R10,Loop
; decrement loop
7*2 + 1*1
1412/92 = 15.34
f)
I leave it for you to figure this out. Remember now per loop iteration, you calculate 32 elements, as opposed to 16.
Page 3 of 6
Exercise:
Repeat the previous example for this piece of C code:
for (i=0; i< 128; i++)
{
z[i] = (x[i] + y[i]) * w[i];
}
a) Develop the MIPS scalar assembly code for this C code.
L.D.
L.D.
L.D.
L.D.
L.D.
R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000
;
;
;
;
;
128 elements
load address
load address
load address
load address
to
of
of
of
of
process
array x
array y
array z
array w
Loop:
L.D
L.D
L.D
ADD.D
MUL.D
S.D
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
F1,[R1]
F2,[R2]
F4,[R4]
F3,F1,F2
F5, F3, F4
[R3],F5
R1,R1,8
R2,R2,8
R3,R3,8
R4,R4,8
R10,R10,-1
R10,Loop
;
;
;
;
;
;
;
;
;
;
;
;
load vector X
load vector Y
load vector W
add X & Y
Z = (X+Y) * W
store the result
increment array pointer
increment array pointer
increment array pointer
increment array pointer
decrement loop counter
branch R10 != zero
for
for
for
for
x[]
y[]
z[]
w[]
R10, 128
R1, 1000
R2, 2000
R3, 3000
R4, 4000
;
;
;
;
;
128 elements
load address
load address
load address
load address
to
of
of
of
of
process
array x
array y
array z
array w
LOOP:
LV
LV
LV
ADDVV.D
MULVV.D
SV
DADDUI
DADDUI
DADDUI
DADDUI
DADDUI
BNEZ
V1,R1
; load vector X
V2,R2
; load vector Y
V4,R4
; load vector W
V3,V1,V2
; vector-vector add
V5,V3,V4
; add
R3,V5
; store the result
R1,R1,16*8 ; increment array pointer for x[]
R2,R2,16*8 ; increment array pointer for y[]
R3,R3,16*8 ; increment array pointer for z[]
R4,R3,16*8 ; increment array pointer for w[]
R10,R10,-16 ; decrement loop counter
R10,Loop
; branch R10 != zero
Page 4 of 6
c)
How many cycles it takes to execute the scalar code, assuming no memory latencies?
Instruction
Number of times
executed
Instruction
Number of times
executed
e)
f)
Page 6 of 6