This document provides a tutorial on using WinMIPS64, a simulator for a pipelined MIPS64 processor. It describes the various windows in the simulator interface and how they display information about the processor's pipeline, code memory, data memory, registers, statistics and more. It then walks through loading and running a sample MIPS64 assembly program that calculates the sum of two numbers, examining what occurs in the pipeline and other components on a cycle-by-cycle basis. Disabling forwarding reveals its impact on performance.
This document provides a tutorial on using WinMIPS64, a simulator for a pipelined MIPS64 processor. It describes the various windows in the simulator interface and how they display information about the processor's pipeline, code memory, data memory, registers, statistics and more. It then walks through loading and running a sample MIPS64 assembly program that calculates the sum of two numbers, examining what occurs in the pipeline and other components on a cycle-by-cycle basis. Disabling forwarding reveals its impact on performance.
This document provides a tutorial on using WinMIPS64, a simulator for a pipelined MIPS64 processor. It describes the various windows in the simulator interface and how they display information about the processor's pipeline, code memory, data memory, registers, statistics and more. It then walks through loading and running a sample MIPS64 assembly program that calculates the sum of two numbers, examining what occurs in the pipeline and other components on a cycle-by-cycle basis. Disabling forwarding reveals its impact on performance.
This document provides a tutorial on using WinMIPS64, a simulator for a pipelined MIPS64 processor. It describes the various windows in the simulator interface and how they display information about the processor's pipeline, code memory, data memory, registers, statistics and more. It then walks through loading and running a sample MIPS64 assembly program that calculates the sum of two numbers, examining what occurs in the pipeline and other components on a cycle-by-cycle basis. Disabling forwarding reveals its impact on performance.
This exercise introduces WinMIPS64, a Windows based simulator of a pipelined implementation of the MIPS64 64-bit processor.
1. Starting and configuring WinMIPS64
Start WinMIPS64 from the task bar.
A window (denoted the main window) appears with seven child windows and a status line at the bottom. The seven windows are Pipeline, Code, Data, Registers, Statistics, Cycles and Terminal.
Page 2 Pipeline window
This window shows a schematic representation of the five pipeline stages of the MIPS64 processor and the units for floating point operations (addition / subtraction, multiplication and division). It shows which instruction is in each stage of the pipeline. Code window This window shows a three column representation of the code memory, showing from left to right 1) a byte address, 2) a hex number giving the 32-bit machine code representation of the instruction, and 3) the assembly language statement. Double-left- clicking on an instruction sets or clears break-points. Data window This window shows the contents of data memory, byte addressable, but displayed in 64-bit chunks, as appropriate for a 64-bit processor. To edit an integer value double-left-click. To display and edit as a floating-point number, double-right-click. Register window This window shows the values stored in the registers. If the register is displayed in grey, then it is in the process of being written to by an instruction. If displayed using a colour, the colour indicates the stage in the pipeline from which this value is available for forwarding. This window allows you to interactively change the contents of those 64- bit integer and floating-point registers that are not in the process of being written to, or being forwarded. To do this, double-left-click on the register you want to change and a pop-up window will ask you for new content. Press OK to confirm the change. Clock Cycle diagram This window gives a representation of the timing behaviour of the pipeline. It records the history of instructions as they enter and emerge from the pipeline. An instruction that causes a stall is highlighted in blue: instructions held up as a result of a stall are grayed. Statistics This window provides statistics on the number of simulation cycles, instructions, the average Cycles Per Instruction (CPI), the types of stalls, and numbers of conditional branches and Load/Store- instructions. Terminal
This window mimics a dumb terminal I/O device with some limited graphics capability. Status Line The status line at the bottom normally displays "Ready", but will during program simulation provide useful information on the current status of the simulation.
To make sure the simulation is reset, click on the File menu and click Reset MIPS64.
WinMIPS64 can be configured in many ways. You can change the structure and time requirements of the floating-point pipeline, and the code/data memory size. To view or change standard settings click Configure/Architecture (read this as: click Configure to open the menu, then clicking on Architecture) and you will see the following settings:
Page 3
You can change the settings by clicking in the appropriate field and editing the given numbers. Any changes to the Floating-point latencies will be reflected in the Pipeline window. The Code Address Bus refers to the actual number of wires in the address bus. So a value of 10 means that 2 10 =1024 bytes of code memory will be displayed in the Code window. When you are finished, click OK to return to the main window.
Three more options in the Configuration menu can be selected: Multi-Step, Enable Forwarding, Enable Branch Target Buffer and Enable Delay Slot. Of these Enable Forwarding should be enabled, that is, a small hook should be shown beside it. If this is not the case, click on the option.
You can change the size and/or position of child windows or bring up only one window using the maximise option for that window.
2. Loading a test program.
Use a standard text editor to create this file sum.s, which is a MIPS64 program that calculates the sum of two integers A and B from memory, and stores the result into the memory on location C.
. dat a A: . wor d 10 B: . wor d 8 C: . wor d 0
. t ext mai n: l d r 4, A( r 0) l d r 5, B( r 0) dadd r 3, r 4, r 5 sd r 3, C( r 0) hal t
A small command line utility asm. exe is provided to test a program for syntactical correctness. To check this program type
H: >asmsum. s
In order to be able to start the simulation, the program must be loaded into the main memory. To accomplish this, select File/Open. A list of assembler programs in current directory appears in a window, including sum.s. Page 4
To load this file into WinMIPS64, do the following:
Click on sum.s Click the Open button
The program is now loaded into the memory and the simulation is ready to begin.
You can view the content of code memory using the Code window, and observe the program data in the Data Window.
3. Simulation
3.1 Cycle-by-cycle Simulation
At any stage you can press F10 to restart the simulation from the beginning.
At the start you will note that the first line in the Code window with the address 0000 is coloured yellow. The IF stage in the Pipeline window is also coloured in yellow and contains the assembler mnemonic of the first instruction in the program. Now inspect the Code window and observe the first instruction ld r4,A(r0). Look in the Data window to find the program variable A.
Clock 1:
Pressing Execute/Single Cycle (or simply pressing F7) advances the simulation for one time step or one clock tick; in the Code Window, the colour of the first instruction is changed to blue and the second instruction is coloured in yellow. These colours indicate the pipeline stage the instruction is in (yellow for IF, blue for ID, red for EX, green for MEM, and purple for WB).
If you look in the IF stage in the Pipeline window, you can see that the second instruction ld r5, B(r0) is in the IF stage and the first instruction ld r4,A(r0) has advanced to the second stage, ID.
Clock 2:
Pressing F7 again will re-arrange the colours in the Code window, introducing red for the third pipeline stage EX. Instruction dadd r3,r4,r5 enters the pipeline. Note that the colour of an instruction indicates the stage in the pipeline that it will complete on the next clock tick.
Clock 3:
Pressing F7 again will re-arrange the colours in the Code window, introducing green for the fourth pipeline stage MEM. Instruction sd r3,C(r0) enters the pipeline. Observe the Clock Cycle Diagram which shows a history of which instruction was in each stage before each clock tick.
Clock 4:
Press F7 again. Each stage in the pipeline is now active with an instruction. The value that will end up in r4 has been read from memory, but has not yet been written back to r4. However it is available for forwarding from the MEM stage. Hence observe that r4 is displayed as green (the colour for MEM) in the Registers window. Can you explain the value of r4? Note that the last instruction halt has already entered the pipeline. Page 5
Clock 5:
Press F7 again. Something interesting happens. The value destined for r5 becomes available for forwarding. However the value for r5 was not available in time for the dadd r3,r4,r5 instruction to execute in EX. So it remains in EX, stalled. The status line reads "RAW stall in EX (R5)", indicating where the stall occurred, and which register's unavailability was responsible for it.
The picture in the Clock Cycle Diagram and the Pipeline window clearly shows that the dadd instruction is stalled in EX, and that the instructions behind it in the pipeline are also unable to progress. In the Clock Cycle Diagram, the dadd instruction is highlighted in blue, and the instructions behind are shown in gray.
Clock 6:
Press F7. The dadd r3,r4,r5 instruction executes and its output, destined for r3, becomes available for forwarding. This value is 12 hex, which is the sum of 10+8 =18 in decimal. This is our answer.
Clock 7:
Press F7. The halt instruction entering IF has had the effect of "freezing" the pipeline, so no new instructions are accepted into it.
Clock 8:
Press F7. Examine Data memory, and observe that the variable C now has the value 12 hex. The sd r3,C(r0) instruction wrote it to memory in the MEM stage of the pipeline, using the forwarded value for r3.
Clock 9:
Press F7.
Clock 10:
Press F7. The program is finished
Look at the Statistics window and note that there has been one RAW stall. 10 clock cycles were needed to execute 5 instructions, so CPI=2. This is artificially high due to the one-off start-up cost in clock cycles needed to initially fill the pipeline.
The statistics window is extremely useful for comparing the effects of changes in the configuration. Let us examine the effect of forwarding in the example. Until now, we have used this feature; what would the execution time have been without forwarding?
To accomplish this, click on Configure. To disable forwarding, click on Enable Forwarding (the hook must vanish). Page 6 Repeat the cycle-by-cycle program execution, re-examine the Statistics window and compare the results. Note that there are more stalls as instructions are held up in ID waiting for a register, and hence waiting for an earlier instruction to complete WB. The advantages of forwarding should be obvious.
3.2 Other execution modes
Click on File/Reset MIPS64. If you click on File/Full Reset, you will delete the data memory, so you will have to repeat the procedure for program loading. Clicking on File/Reload or F10 is a handy way to restart a simulation.
You can run simulation for a specified number of cycles. Use Execute/Multi cycle... for this. The number of cycles stepped through can be changed via Configure/Multi-step.
You can run the whole program by a single key-press - press F4. Alternatively click on Execute/Run to.
Also, you can set breakpoints. Press F10. To set a break-point, double-left-click on the instruction, for example on dadd r3,r4,r5. Now press F4. The program will halt when this instruction enters IF. To clear the break-point, double-left-click on the same instruction again.
3.3 Terminal Output
The simulator supports a simple I/O device, which works like a simple dumb terminal screen, with some graphical capability. The output of a program can appear on this screen. To output the result of the previous program, modify it like this
. dat a A: . wor d 10 B: . wor d 8 C: . wor d 0 CR: . wor d32 0x10000 DR: . wor d32 0x10008
. t ext mai n: l d r 4, A( r 0) l d r 5, B( r 0) dadd r 3, r 4, r 5 sd r 3, C( r 0)
l wu r 1, CR( r 0) ; Cont r ol Regi st er l wu r 2, DR( r 0) ; Dat a Regi st er daddi r 10, r 0, 1 sd r 3, ( r 2) ; r 3 out put . . sd r 10, ( r 1) ; . . t o scr een
hal t
After this program is executed you can see the result of the addition printed in decimal on the Terminal window. For a more complete example of the I/O capabilities, see the testio.s and hail.s example programs.
Page 7 The Instruction set
The following assembler directives are supported
. dat a - st ar t of dat a segment . t ext - st ar t of code segment . code - st ar t of code segment ( same as . t ext ) . or g <n> - st ar t addr ess . space <n> - l eave n empt y byt es . asci i z <s> - ent er s zer o t er mi nat ed asci i st r i ng . asci i <s> - ent er asci i st r i ng . al i gn <n> - al i gn t o n- byt e boundar y . wor d <n1>, <n2>. . - ent er s wor d( s) of dat a ( 64- bi t s) . byt e <n1>, <n2>. . - ent er byt es . wor d32 <n1>, <n2>. . - ent er s 32 bi t number ( s) . wor d16 <n1>, <n2>. . - ent er s 16 bi t number ( s) . doubl e <n1>, <n2>. . - ent er s f l oat i ng- poi nt number ( s)
where <n>denotes a number like 24, <s>denotes a string like "fred", and <n1>,<n2>.. denotes numbers separated by commas. The integer registers can be referred to as r0-r31, or R0-R31, or $0- $31 or using standard MIPS pseudo-names, like $zero for r0, $t0 for r8 etc. Note that the size of an immediate is limited to 16-bits. The maximum size of an immediate register shift is 5 bits (so a shift by greater than 31 bits is illegal).
Floating point registers can be referred to as f0-f31, or F0-F31
The following instructions are supported. Note reg is an integer register, freg is a floating-point (FP) register, and imm is an immediate value.
l b r eg, i mm( r eg) - l oad byt e l bu r eg, i mm( r eg) - l oad byt e unsi gned sb r eg, i mm( r eg) - st or e byt e l h r eg, i mm( r eg) - l oad 16- bi t hal f - wor d l hu r eg, i mm( r eg) - l oad 16- bi t hal f wor d unsi gned sh r eg, i mm( r eg) - st or e 16- bi t hal f - wor d l w r eg, i mm( r eg) - l oad 32- bi t wor d l wu r eg, i mm( r eg) - l oad 32- bi t wor d unsi gned sw r eg, i mm( r eg) - st or e 32- bi t wor d l d r eg, i mm( r eg) - l oad 64- bi t doubl e- wor d sd r eg, i mm( r eg) - st or e 64- bi t doubl e- wor d l . d f r eg, i mm( r eg) - l oad 64- bi t f l oat i ng- poi nt s. d f r eg, i mm( r eg) - st or e 64- bi t f l oat i ng- poi nt hal t - st ops t he pr ogr am daddi r eg, r eg, i mm - add i mmedi at e daddui r eg, r eg, i mm - add i mmedi at e unsi gned andi r eg, r eg, i mm - l ogi cal and i mmedi at e or i r eg, r eg, i mm - l ogi cal or i mmedi at e xor i r eg, r eg, i mm - excl usi ve or i mmedi at e l ui r eg, i mm - l oad upper hal f of r egi st er i mmedi at e sl t i r eg, r eg, i mm - set i f l ess t han i mmedi at e sl t i u r eg, r eg, i mm - set i f l ess t han i mmedi at e unsi gned beq r eg, r eg, i mm - br anch i f pai r of r egi st er s ar e equal bne r eg, r eg, i mm - br anch i f pai r of r egi st er s ar e not equal beqz r eg, i mm - br anch i f r egi st er i s equal t o zer o bnez r eg, i mm - br anch i f r egi st er i s not equal t o zer o j i mm - j ump t o addr ess j r r eg - j ump t o addr ess i n r egi st er Page 8 j al i mm - j ump and l i nk t o addr ess ( cal l subr out i ne) j al r r eg - j ump and l i nk t o addr ess i n r egi st er dsl l r eg, r eg, i mm - shi f t l ef t l ogi cal dsr l r eg, r eg, i mm - shi f t r i ght l ogi cal dsr a r eg, r eg, i mm - shi f t r i ght ar i t hmet i c dsl l v r eg, r eg, r eg - shi f t l ef t l ogi cal by var i abl e amount dsr l v r eg, r eg, r eg - shi f t r i ght l ogi cal by var i abl e amount dsr av r eg, r eg, r eg - shi f t r i ght ar i t hmet i c by var i abl e amount movz r eg, r eg, r eg - move i f r egi st er equal s zer o movn r eg, r eg, r eg - move i f r egi st er not equal t o zer o nop - no oper at i on and r eg, r eg, r eg - l ogi cal and or r eg, r eg, r eg - l ogi cal or xor r eg, r eg, r eg - l ogi cal xor sl t r eg, r eg, r eg - set i f l ess t han sl t u r eg, r eg, r eg - set i f l ess t han unsi gned dadd r eg, r eg, r eg - add i nt eger s daddu r eg, r eg, r eg - add i nt eger s unsi gned dsub r eg, r eg, r eg - subt r act i nt eger s dsubu r eg, r eg, r eg - subt r act i nt eger s unsi gned dmul r eg, r eg, r eg - si gned i nt eger mul t i pl i cat i on dmul u r eg, r eg, r eg - unsi gned i nt eger mul t i pl i cat i on ddi v r eg, r eg, r eg - si gned i nt eger di vi si on ddi vu r eg, r eg, r eg - unsi gned i nt eger di vi si on add. d f r eg, f r eg, f r eg - add f l oat i ng- poi nt sub. d f r eg, f r eg, f r eg - subt r act f l oat i ng- poi nt mul . d f r eg, f r eg, f r eg - mul t i pl y f l oat i ng- poi nt di v. d f r eg, f r eg, f r eg - di vi de f l oat i ng- poi nt mov. d f r eg, f r eg - move f l oat i ng- poi nt cvt . d. l f r eg, f r eg - conver t 64- bi t i nt eger t o a doubl e FP f or mat cvt . l . d f r eg, f r eg - conver t doubl e FP t o a 64- bi t i nt eger f or mat c. l t . d f r eg, f r eg - set FP f l ag i f l ess t han c. l e. d f r eg, f r eg - set FP f l ag i f l ess t han or equal t o c. eq. d f r eg, f r eg - set FP f l ag i f equal t o bc1f i mm - br anch t o addr ess i f FP f l ag i s FALSE bc1t i mm - br anch t o addr ess i f FP f l ag i s TRUE mt c1 r eg, f r eg - move dat a f r omi nt eger r egi st er t o FP r egi st er mf c1 r eg, f r eg - move dat a f r omFP r egi st er t o i nt eger r egi st er
Set CONTROL = 1, Set DATA t o Unsi gned I nt eger t o be out put Set CONTROL = 2, Set DATA t o Si gned I nt eger t o be out put Set CONTROL = 3, Set DATA t o Fl oat i ng Poi nt t o be out put Set CONTROL = 4, Set DATA t o addr ess of st r i ng t o be out put Set CONTROL = 5, Set DATA+5 t o x coor di nat e, DATA+4 t o y coor di nat e, and DATA t o RGB col our t o be out put Set CONTROL = 6, Cl ear s t he t er mi nal scr een Set CONTROL = 7, Cl ear s t he gr aphi cs scr een Set CONTROL = 8, r ead t he DATA ( ei t her an i nt eger or a f l oat i ng- poi nt ) f r omt he keyboar d Set CONTROL = 9, r ead one byt e f r omDATA, no char act er echo. Page 9
Notes on the Pipeline Simulation
The pipeline simulation attempts to mimic as far as possible that described in Appendix A of Computer Architecture: A Quantitative Approach.
However in a few places alternative strategies were suggested, and we had to choose one or the other.
Stalls are handled where they arise in the pipeline, not necessarily in ID.
We decided to allow floating-point instructions to issue out of ID into their own pipelines, if available. There they either proceed or stall, waiting for their operands to become available. This has the advantage of allowing out-of-order completion to be demonstrated, but it can cause WAR hazards to arise. However the student can thus learn the advantages of register renaming.
Consider this simple program fragment:-
. t ext add. d f 7, f 7, f 3 add. d f 7, f 7, f 4 mul . d f 4, f 5, f 6 ; WAR on f 4
If the mul.d is allowed to issue, it could "overtake" the second add.d and write to f4 first. Therefore in this case the mul.d must be stalled in ID.
Structural hazards arise at the MEM stage bottleneck, as instructions attempt to exit more than one of the execute stage pipelines at the same time. Our simple rule is longest latency first. See page A-52
Installation
On your own computer, just install anywhere convenient, and create a short-cut to point at it. Note that winmips64 will write two initialisation files into this directory, one winmips64.ini which stores architectural details, one winmips64.las which remembers the last .s file accessed.
On a network drive, install winmips64.exe into a suitable system directory. Then use a standard text editor to create a file called winmips64.pth, and put this file in the same directory.
The read-only file winmips64.pth should contain a single line path to a read-write directory private to any logged-in user. This directory will then be used to store their .ini and .las files.
For example winmips64.pth might contain
H:
or
c: \ t emp
But remember only a single line - don't press return at the end of it!