2MIPS Assembly
2MIPS Assembly
Small benchmarks
nice for architects and designers
easy to standardize
can be abused!
Benchmark suites
Perfect Club: set of application codes
Livermore Loops: 24 loop kernels
Linpack: linear algebra package
SPEC: mix of code from industry organization
SPEC (System Performance
Evaluation Corporation)
Sponsored by industry but independent and self-managed –
trusted by code developers and machine vendors
Clear guides for testing, see www.spec.org
Regular updates (benchmarks are dropped and new ones added
periodically according to relevance)
Specialized benchmarks for particular classes of applications
Can still be abused…, by selective optimization!
SPEC History
First Round: SPEC CPU89
10 programs yielding a single number
Second Round: SPEC CPU92
SPEC CINT92 (6 integer programs) and SPEC CFP92 (14 floating
point programs)
compiler flags can be set differently for different programs
Third Round: SPEC CPU95
new set of programs: SPEC CINT95 (8 integer programs) and SPEC
CFP95 (10 floating point)
single flag setting for all programs
Fourth Round: SPEC CPU2000
new set of programs: SPEC CINT2000 (12 integer programs) and
SPEC CFP2000 (14 floating point)
single flag setting for all programs
programs in C, C++, Fortran 77, and Fortran 90
CINT2000 (Integer component
of SPEC CPU2000)
Program Language What It Is
164.gzip C Compression
175.vpr C FPGA Circuit Placement and Routing
176.gcc C C Programming Language Compiler
181.mcf C Combinatorial Optimization
186.crafty C Game Playing: Chess
197.parser C Word Processing
252.eon C++ Computer Visualization
253.perlbmk C PERL Programming Language
254.gap C Group Theory, Interpreter
255.vortex C Object-oriented Database
256.bzip2 C Compression
300.twolf C Place and Route Simulator
CFP2000 (Floating point
component of SPEC CPU2000)
Program Language What It Is
168.wupwise Fortran 77 Physics / Quantum Chromodynamics
171.swim Fortran 77 Shallow Water Modeling
172.mgrid Fortran 77 Multi-grid Solver: 3D Potential Field
173.applu Fortran 77 Parabolic / Elliptic Differential Equations
177.mesa C 3-D Graphics Library
178.galgel Fortran 90 Computational Fluid Dynamics
179.art C Image Recognition / Neural Networks
183.equake C Seismic Wave Propagation Simulation
187.facerec Fortran 90 Image Processing: Face Recognition
188.ammp C Computational Chemistry
189.lucas Fortran 90 Number Theory / Primality Testing
191.fma3d Fortran 90 Finite-element Crash Simulation
200.sixtrack Fortran 77 High Energy Physics Accelerator Design
301.apsi Fortran 77 Meteorology: Pollutant Distribution
SPEC CPU2000 reporting
Refer SPEC website www.spec.org for documentation
Single number result – geometric mean of normalized ratios for
each code in the suite
Report precise description of machine
Report compiler flag setting
CINT2006 for Opteron X4 2356
Name Description IC×109 CPI Tc (ns) Exec time Ref time SPECratio
perl Interpreted string processing 2,118 0.75 0.40 637 9,777 15.3
bzip2 Block-sorting compression 2,389 0.85 0.40 817 9,650 11.8
gcc GNU C Compiler 1,050 1.72 0.40 724 8,050 11.1
mcf Combinatorial optimization 336 10.00 0.40 1,345 9,120 6.8
go Go game (AI) 1,658 1.09 0.40 721 10,490 14.6
hmmer Search gene sequence 2,783 0.80 0.40 890 9,330 10.5
sjeng Chess game (AI) 2,176 0.96 0.40 837 12,100 14.5
libquantum Quantum computer simulation 1,623 1.61 0.40 1,047 20,720 19.8
h264avc Video compression 3,102 0.80 0.40 993 22,130 22.3
omnetpp Discrete event simulation 587 2.94 0.40 690 6,250 9.1
astar Games/path finding 1,082 1.79 0.40 773 7,020 9.1
xalancbmk XML parsing 1,058 2.70 0.40 1,143 6,900 6.0
Geometric mean 11.7
8
SPEC ‘95
Does doubling the clock rate double the performance?
10
Can a machine with a slower clock rate10 have better performance?
9 9
8 8
7 7
6 6
SPECfp
SPECint
5 5
4 4
3 3
2 2
1 1
0 0
50 100 150 200 250 50 100 150 200 250
Clock rate (MHz)
Clock rate (MHz) Pentium Pentium
Pentium Pro Pentium Pro
9
Specialized SPEC Benchmarks
I/O
Network
Graphics
Java
Web server
Transaction processing (databases)
Instructions: Language of the Machine
Instructions: Overview
Language of the machine
More primitive than higher level languages, e.g., no
sophisticated control flow such as while or for loops
Very restrictive
e.g., MIPS arithmetic instructions
We’ll be working with the MIPS instruction set architecture
inspired most architectures developed since the 80's
used by NEC, Nintendo, Silicon Graphics, Sony
the name is not related to millions of instructions per second !
it stands for microcomputer without interlocked pipeline stages !
Design goals: maximize performance and minimize cost and
reduce design time
MIPS Arithmetic
Example:
Control Input
Memory
Datapath Output
Processor I/O
Memory Organization
Viewed as a large single-dimension array with access by address
A memory address is an index into the memory array
Byte addressing means that the index points to a byte of
memory, and that the unit of memory accessed by a load/store
is a byte
0 8 bits of data
1 8 bits of data
2 8 bits of data
3 8 bits of data
4 8 bits of data
5 8 bits of data
6 8 bits of data
...
Memory Organization
Bytes are load/store units, but most data items use larger words
For MIPS, a word is 32 bits or 4 bytes.
0 32 bits of data
4 32 bits of data Registers correspondingly hold 32 bits of data
8 32 bits of data
12 32 bits of data
...
Instruction Meaning
31 26 25 21 20 16 15 11 10 6 5 0
rd
rt
add $4, $3, $2
rs
31 26 25 21 20 16 15 11 10 6 5 0
0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
opcode rs rt rd shamt funct
0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0
Encoding = 0x00622020
23
Machine Language
Consider the load-word and store-word instructions,
what would the regularity principle have us do?
we would have only 5 or 6 bits to determine the offset from a base
register - too little…
rt
Immediate
lw $5, 3000($2)
rs
31 26 25 21 20 16 15 0
1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
opcode rs rt Immediate Value
1 0 0 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
Encoding = 0x8C450BB8
25
MIPS Encoding: I-Type
31 26 25 21 20 16 15 0
rt
Immediate
sw $5, 3000($2)
rs
31 26 25 21 20 16 15 0
1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
opcode rs rt Immediate Value
1 0 1 0 1 1 0 0 0 1 0 0 0 1 0 1 0 0 0 0 1 0 1 1 1 0 1 1 1 0 0 0
Encoding = 0xAC450BB8
The immediate value is signed 26
Control: Conditional Branch
Decision making instructions
alter the control flow,
i.e., change the next instruction to be executed
rs
Offset
beq $0, $9, 40 Encoded by
40/4 = 10
rt
31 26 25 21 20 16 15 0
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
opcode rs rt Immediate Value
0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 0
Encoding = 0x1009000A
30
Control: Unconditional Branch (Jump)
J op 26 bit address
MIPS Instructions:
addi $29, $29, 4
slti $8, $18, 10
andi $29, $29, 6
ori $29, $29, 4
op rs rt 16 bit number
How about larger constants?
First we need to load a 32 bit constant into a register
Must use two instructions for this: first new load upper immediate
instruction for upper 16 bits
lui $t0, 1010101010101010 filled with zeros
1010101010101010 0000000000000000
0000000000000000 1010101010101010
ori
1010101010101010 1010101010101010
Formats:
R op rs rt rd shamt funct
I op rs rt 16 bit address
J op 26 bit address
Logical Operations
Shift Logical Left (SLL $S1,$S2,10)
Shift Logical Right (SRL $S1,$S2,10)
AND (AND $S1,$S2,$S3)
OR (OR $S1,$S2,$S3)
NOR (NOR $S1,$S2,$S3)
ANDI (ANDI $S1,$S2,100)
ORI (ORI $S1,$S2,100)
Control Flow
We have: beq, bne. What about branch-if-less-than?
New instruction:
if $s1 < $s2 then
$t0 = 1
slt $t0, $s1, $s2 else
$t0 = 0
2. Register addressing
op rs rt rd ... funct Registers
Register
3. Base addressing
op rs rt Address Memor y
4. PC-relative addressing
op rs rt Address Memor y
PC + Word
5. Pseudodirect addressing
op Address Memor y
PC Word
Overview of MIPS
Simple instructions – all 32 bits wide
Very structured – no unnecessary baggage
Only three instruction formats
R op rs rt rd shamt funct
I op rs rt 16 bit address
J op 26 bit address
Summarize MIPS:
MIPS operands
Name Example Comments
$s0-$s7, $t0-$t9, $zero, Fast locations for data. In MIPS, data must be in registers to perform
32 registers $a0-$a3, $v0-$v1, $gp, arithmetic. MIPS register $zero always equals 0. Register $at is
$fp, $sp, $ra, $at reserved for the assembler to handle large constants.
Memory[0], Accessed only by data transfer instructions. MIPS uses byte addresses, so
30
2 memory Memory[4], ..., sequential words differ by 4. Memory holds data structures, such as arrays,
words Memory[4294967292] and spilled registers, such as those saved on procedure calls.
MIPS assembly language
Category Instruction Example Meaning Comments
add add $s1, $s2, $s3 $s1 = $s2 + $s3 Three operands; data in registers
Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers
add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants
load word lw $s1, 100($s2) $s1 = Memory[$s2 + 100] Word from memory to register
store word sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory
Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100] Byte from memory to register
store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memory
load upper immediate lui $s1, 100 $s1 = 100 * 2
16 Loads constant in upper 16 bits
branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to Equal test; PC-relative branch
PC + 4 + 100
branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to Not equal test; PC-relative
PC + 4 + 100
Conditional
branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; Compare less than; for beq, bne
else $s1 = 0
set less than slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; Compare less than constant
immediate else $s1 = 0
Design Principles:
simplicity favors regularity
smaller is faster
good design demands compromise
make the common case fast