0% found this document useful (0 votes)

317 views172 pages

Playstation Emulation Guide: Lionel Flandrin July 24, 2019

This document provides a guide to emulating the Playstation system. It discusses the central processing unit (CPU) architecture, including the program counter, general purpose registers, memory map, and BIOS. It then covers implementing specific CPU instructions like LUI, ORI, and LW. The document provides details on how to emulate the CPU, memory, and other components to recreate the Playstation system.

Uploaded by

livros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

317 views172 pages

Playstation Emulation Guide: Lionel Flandrin July 24, 2019

Uploaded by

livros

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 172

Playstation Emulation Guide

Lionel Flandrin
July 24, 2019

1
Contents
1 Introduction 6
1.1 Isn’t emulation complicated? . . . . . . . . . . . . . . . . . . . . 6
1.2 Feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 The CPU: Instructions and the memory 6

2.1 What is a CPU, anyway? . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4 The Program Counter register . . . . . . . . . . . . . . . . . . . . 7
2.4.1 Reset value of the PC . . . . . . . . . . . . . . . . . . . . 9
2.5 The Playstation memory map . . . . . . . . . . . . . . . . . . . . 9
2.5.1 Implementing the memory map . . . . . . . . . . . . . . . 10
2.6 The BIOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Loading the BIOS . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.8 The interconnect . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.9 Gluing the interconnect to the CPU . . . . . . . . . . . . . . . . 14
2.10 Instruction decoding . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.11 General purpose registers . . . . . . . . . . . . . . . . . . . . . . 17
2.11.1 The $zero register . . . . . . . . . . . . . . . . . . . . . . 18
2.11.2 The $ra register . . . . . . . . . . . . . . . . . . . . . . . 18
2.12 Special purpose registers . . . . . . . . . . . . . . . . . . . . . . . 18
2.13 Implementing the general purpose registers . . . . . . . . . . . . 19
2.14 LUI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.15 ORI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.16 Writing to memory . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.16.1 Unaligned memory access . . . . . . . . . . . . . . . . . . 22
2.16.2 Expansion mapping . . . . . . . . . . . . . . . . . . . . . 23
2.17 Sign extension . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.18 SW instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.19 SLL instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.20 ADDIU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.21 RAM configuration register . . . . . . . . . . . . . . . . . . . . . 28
2.22 J instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.23 Branch delay slots . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.24 OR instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.25 Type safety in the register interface . . . . . . . . . . . . . . . . . 31
2.26 CACHE CONTROL register . . . . . . . . . . . . . . . . . . . . 32
2.27 The coprocessors . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.28 MTC0 instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.29 BNE instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.30 ADDI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.31 Memory loads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.32 Load delay slots . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.33 LW instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.34 The RAM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.35 The coprocessor 0 registers . . . . . . . . . . . . . . . . . . . . . 41
2.36 SLTU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
2.37 ADDU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 42

2
2.38 Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.39 SH instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.40 SPU registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.41 JAL instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.42 ANDI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.43 SB instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.44 Expansion 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.45 JR instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.46 LB instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.47 BEQ instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.48 Expansion 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.49 RAM byte access . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.50 MFC0 instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.51 AND instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.52 ADD instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.53 Interrupt Control registers . . . . . . . . . . . . . . . . . . . . . . 53
2.54 BGTZ instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.55 BLEZ instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.56 LBU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.57 JALR instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.58 BLTZ, BLTZAL, BGEZ and BGEZAL instructions . . . . . . . . 55
2.59 SLTI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.60 SUBU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.61 SRA instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
2.62 DIV instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
2.63 MFLO instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.64 SRL instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.65 SLTIU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.66 DIVU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.67 MFHI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.68 SLT instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.69 Interrupt Control read . . . . . . . . . . . . . . . . . . . . . . . . 62
2.70 Timer registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.71 Exceptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.72 SYSCALL instruction . . . . . . . . . . . . . . . . . . . . . . . . 66
2.73 MTLO instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.74 MTHI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.75 RFE intsruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
2.76 Exceptions and branch delay slots . . . . . . . . . . . . . . . . . 69
2.77 ADD and ADDI overflows . . . . . . . . . . . . . . . . . . . . . . 71
2.78 Store and load alignment exceptions . . . . . . . . . . . . . . . . 72
2.79 PC alignment exception . . . . . . . . . . . . . . . . . . . . . . . 73
2.80 RAM 16bit store . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.81 DMA registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
2.82 LHU instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
2.83 SLLV instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
2.84 LH instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.85 NOR instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
2.86 SRAV instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
2.87 SRLV instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

3
2.88 MULTU instruction . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.89 GPU registers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
2.89.1 GP0: Draw Mode Setting command . . . . . . . . . . . . 80
2.90 Interrupt Control 16bit access . . . . . . . . . . . . . . . . . . . . 81
2.91 Timer registers 32bit access . . . . . . . . . . . . . . . . . . . . . 81
2.92 GPUSTAT “DMA ready” field . . . . . . . . . . . . . . . . . . . 82
2.93 XOR instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.94 BREAK instructions . . . . . . . . . . . . . . . . . . . . . . . . . 83
2.95 MULT instruction . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.96 SUB instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
2.97 XORI instruction . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.98 Cop1, cop2 and cop3 opcodes . . . . . . . . . . . . . . . . . . . . 85
2.99 Non-aligned reads . . . . . . . . . . . . . . . . . . . . . . . . . . 86
2.99.1 LWL instruction . . . . . . . . . . . . . . . . . . . . . . . 87
2.99.2 LWR instruction . . . . . . . . . . . . . . . . . . . . . . . 88
2.100Non-aligned writes . . . . . . . . . . . . . . . . . . . . . . . . . . 89
2.100.1 SWL instruction . . . . . . . . . . . . . . . . . . . . . . . 89
2.100.2 SWR instruction . . . . . . . . . . . . . . . . . . . . . . . 89
2.101Coprocessor loads and stores . . . . . . . . . . . . . . . . . . . . 90
2.101.1 LWCn instructions . . . . . . . . . . . . . . . . . . . . . . 90
2.101.2 SWCn instructions . . . . . . . . . . . . . . . . . . . . . . 91
2.102Illegal instructions . . . . . . . . . . . . . . . . . . . . . . . . . . 91

3 The DMA: Ordering tables and the GPU 93

3.1 DMA Control register . . . . . . . . . . . . . . . . . . . . . . . . 94
3.2 DMA Interrupt register . . . . . . . . . . . . . . . . . . . . . . . 96
3.3 DMA Channel Control register . . . . . . . . . . . . . . . . . . . 97
3.4 DMA Base Address register . . . . . . . . . . . . . . . . . . . . . 102
3.5 DMA Block Control register . . . . . . . . . . . . . . . . . . . . . 103
3.6 Depth Ordering Tables . . . . . . . . . . . . . . . . . . . . . . . . 104
3.7 DMA Clear Ordering Table channel . . . . . . . . . . . . . . . . 105
3.8 DMA Block copy . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
3.9 DMA Linked Lists . . . . . . . . . . . . . . . . . . . . . . . . . . 110
3.10 RAM to device GPU block copy . . . . . . . . . . . . . . . . . . 111

4 The GPU: Internal state and first commands 112

4.1 GPUSTAT register . . . . . . . . . . . . . . . . . . . . . . . . . . 113
4.2 GP0 Dram Mode Setting command . . . . . . . . . . . . . . . . . 117
4.3 GP0 NOP command . . . . . . . . . . . . . . . . . . . . . . . . . 118
4.4 GP1 Soft Reset command . . . . . . . . . . . . . . . . . . . . . . 119
4.5 The GPU renderer and the video output . . . . . . . . . . . . . . 121
4.6 GPUREAD register placeholder . . . . . . . . . . . . . . . . . . . 122
4.7 GP1 Display Mode command . . . . . . . . . . . . . . . . . . . . 122
4.8 GP1 DMA direction command . . . . . . . . . . . . . . . . . . . 123
4.9 DMA GP0 commands . . . . . . . . . . . . . . . . . . . . . . . . 123
4.10 GP0 Set Drawing Area commands . . . . . . . . . . . . . . . . . 123
4.11 GP0 Set Drawing Offset command . . . . . . . . . . . . . . . . . 124
4.12 GP0 Texture Window command . . . . . . . . . . . . . . . . . . 125
4.13 GP0 Mask Bit Setting command . . . . . . . . . . . . . . . . . . 125
4.14 GP1 Display VRAM Start command . . . . . . . . . . . . . . . . 125

4
4.15 GP1 Display Range commands . . . . . . . . . . . . . . . . . . . 126
4.16 GP0 Monochrome Quadrilateral command . . . . . . . . . . . . . 126
4.17 Interleaved video deadlock workaround . . . . . . . . . . . . . . . 129
4.18 GP0 Clear Cache command . . . . . . . . . . . . . . . . . . . . . 129
4.19 GP0 Load Image command . . . . . . . . . . . . . . . . . . . . . 130
4.20 DMA image transfer . . . . . . . . . . . . . . . . . . . . . . . . . 132
4.21 GP1 Display Enable command . . . . . . . . . . . . . . . . . . . 133
4.22 GP0 Image Store command . . . . . . . . . . . . . . . . . . . . . 133
4.23 GP0 Shaded Quadrilateral command . . . . . . . . . . . . . . . . 133
4.24 GP0 Shaded Triangle command . . . . . . . . . . . . . . . . . . . 134
4.25 GP0 Textured Quadrilateral With Color Blending command . . . 135
4.26 GP1 Acknowledge Interrupt command . . . . . . . . . . . . . . . 135
4.27 GP1 Reset Command Buffer command . . . . . . . . . . . . . . . 136

5 The GPU: Basic OpenGL renderer for the boot logo 136
5.1 Window and OpenGL context creation . . . . . . . . . . . . . . . 136
5.2 Drawing the primitives . . . . . . . . . . . . . . . . . . . . . . . . 139
5.3 The vertex shader . . . . . . . . . . . . . . . . . . . . . . . . . . 142
5.4 The fragment shader . . . . . . . . . . . . . . . . . . . . . . . . . 143
5.5 Compiling and linking the shaders . . . . . . . . . . . . . . . . . 144
5.6 Vertex array objects . . . . . . . . . . . . . . . . . . . . . . . . . 147
5.7 OpenGL rendering and synchronization . . . . . . . . . . . . . . 148
5.8 OpenGL debugging . . . . . . . . . . . . . . . . . . . . . . . . . . 150
5.9 Drawing quadrilaterals . . . . . . . . . . . . . . . . . . . . . . . . 152
5.10 Draw Offset emulation . . . . . . . . . . . . . . . . . . . . . . . . 155
5.11 Handling SDL2 events and exiting cleanly . . . . . . . . . . . . . 157

6 The Interconnect: Generic loads and stores 158

6.1 Porting the CPU code . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Porting the interconnect code . . . . . . . . . . . . . . . . . . . . 160
6.3 Porting the RAM and BIOS . . . . . . . . . . . . . . . . . . . . . 162
6.4 Porting the GPU code . . . . . . . . . . . . . . . . . . . . . . . . 163
6.5 Porting the DMA code . . . . . . . . . . . . . . . . . . . . . . . . 164

7 The Debugger: Breakpoints and Watchpoints 164

7.1 Debugger memory access . . . . . . . . . . . . . . . . . . . . . . . 165
7.2 Breakpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
7.3 Watchpoints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
7.4 Code disassembly and beyond . . . . . . . . . . . . . . . . . . . . 170

8 The CPU: Instruction cache 170

8.1 Instruction cache lookup behavior . . . . . . . . . . . . . . . . . 170
8.2 Instruction cache fetch behavior . . . . . . . . . . . . . . . . . . . 171

5
1 Introduction
This is my attempt at documenting my implementation of a PlayStation emulator
from scratch. I’ll write the document as I go and I’ll try to explain as much as
possible along the way. You can find the complete source of the emulator itself
in my GitHub repository.
Since my favourite passtime is to reinvent the wheel and recode things that
already exist I decided that this time I might as well document it. This way
maybe this time something useful will come out of it and it’ll give me a motivation
to finish it.
I will be using the Rust programming language but this is not meant as a
Rust tutorial and knowledge of the language shouldn’t be necessary to follow
this guide, although it won’t hurt.

1.1 Isn’t emulation complicated?

Emulation requires some low-level knowledge about how computers work and
some basics in electronics might help for certain things. Since this doc is meant as
an introduction to emulation I’ll assume that the reader doesn’t bring anything
with them beyond some decent programming skills. So don’t worry if you’re not
familiar with registers, cache, memory mapped IO, virtual memory, interrupts
and other low level fun: I’ll try to explain everything when needed. Emulators
are a good introduction to low level programming without having to bother with
that pesky hardware in person!
Since this is supposed to be a general guide about writing PlayStation
emulators I won’t put the entire source code of the emulator here, only snippets
relevant to the matter beind discussed.
Finally, keep in mind that getting a PlayStation emulator even capable to
run some games decently will require quite a lot of work. Don’t expect to play
Final Fantasy VII on your brand new emulator in two days. If you want to start
with something simpler to see if you have a taste for it you can search for Chip-8,
Game Boy or NES emulation tutorials (by increasing complexity).

1.2 Feedback
If some part of this document is unclear, poorly written or incomplete please
submit an issue so that I can fix or complete it. Corrections for grammar, syntax
and typos are very welcome. Thank you!
Ready? Let’s begin!

2 The CPU: Instructions and the memory

2.1 What is a CPU, anyway?
That might seem like a silly question to some but I’m sure there are plenty
of competent programmers out there who are used to program in high level
managed environements haven’t seen a register in their entire life. Let me make
the introductions.
For our first version of the PlayStation CPU I’m going to make some simpli-
fying assumptions. I’m going to ignore the caches for instance and assume that

6
it directly accesses the system bus. Basically we’re going to implement a Von
Neumann architecture.As we make progress we’ll have to revisit this design to
add the missing bits when they are needed.
The objective of this section is to implement all the instructions and try to
reach the part of the BIOS where it starts to draw on the screen. As we’ll see
there’s a bunch of boring initialization code to run before we get there.
There are 67 opcodes in the Playstation MIPS CPU. Some take one line to
implement, others will give us more trouble. In order to make the process more
interactive and less tedious we’ll implement them as they’re encountered while
we’re running the original BIOS code. This way we’ll immediately be able to see
our emulator in action.
But first things first, before we start implementing instructions we need to
explain how a CPU works.

2.2 Architecture
A simple Von Neumann architecture looks like this: the CPU only sees a flat
address space: an array of bytes. The PlayStation uses 32bit addresses so the
CPU sees 1 << 32 addresses. In other words it can address 4GB of memory.
That’s why the PlayStation is said to be a 32bit console (that and the fact that
it uses 32bit registers in the CPU as we’ll see in a minute).
This address space contains all the external ressources the CPU can access:
the RAM of course but also the various peripherals (GPU, controllers, CD drive,
BIOS...). That’s called memory mapped IO. Note that in this context ”memory”
doesn’t mean RAM. Rather it means that you access peripherals as if they were
memory (instead of using dedicated instructions for instance). From the point
of view of the CPU, everything is just a big array of bytes and it doesn’t really
know what’s out there.
Of course we’ll have to figure out how the devices and RAM are mapped in
this address space to make sure the transactions end up at the right location
when the CPU starts reading and writing to the bus. But first we need to
understand how the code is executed.

2.3 The code

In this architecture the instructions live in the global address space along with
everything else. Typically in RAM but again, the CPU doesn’t care. If you
want to run code from the controller input port I’m sure the console will let you.
Probably not very useful but it’s all the same as far as the CPU is concerned.
So somewhere in this 4GB address space there’s the next instruction for the
CPU to run. How does it know the address of this instruction? By using a
register of course!

2.4 The Program Counter register

Registers are very small and very fast special purpose memories built inside
the CPU. Most CPU instructions manipulate those registers by adding them,
multiplying them, masking them, storing their content to memory or fetching it
back. . .

7
The Program Counter (henceforth refered to as PC) is one of the most
elementary registers, it exists in one form or an other on basically all computer
architectures (although it goes by various names, on x86 for instance it’s called
the Instruction Pointer, IP). Its job is simply to hold the address of the next
instruction to be run.
As we’ve seen, the PlayStation uses 32bit addresses, so the PC register is
32bit wide (as are all other CPU registers for that matter).
A typical CPU execution cycle goes roughly like this:

1. Fetch the instruction located at address PC,

2. Increment the PC to point to the next instruction,

3. Execute the instruction,

4. Repeat

We need to know how big an instruction is in order to know how many

bytes to fetch and how much we need to increment the PC to point at the
next instruction. Some architectures have variable length instructions (x86
and derivatives are a common example) which means we’d have to decode the
instruction to know how many bytes it takes. Fortunately for us, the PlayStation
uses a fixed length instruction set (The MIPS instruction set) and all instructions
are 32bit long.
With all that in mind we can finally start writing some code!
Here’s what the CPU state looks like at that point:
// / CPU s t a t e
pub s t r u c t Cpu {
// / The program c o u n t e r r e g i s t e r
pc : u32 ,
}

And here’s the implementation of our CPU cycle described above:

impl Cpu {

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
l e t pc = s e l f . pc ;

// Fetch i n s t r u c t i o n a t PC
l e t i n s t r u c t i o n = s e l f . l o a d 3 2 ( pc ) ;

// I n c r e m e n t PC t o p o i n t t o t h e n e x t i n s t r u c t i o n .
s e l f . pc = pc . wrapping add ( 4 ) ;

s e l f . decode and execute ( i n s tr u ct i o n ) ;

}
}

In Rust wrapping add means that we want the PC to wrap back to 0 in case
of an overflow (i.e. 0xfffffffc + 4 => 0x00000000). We’ll see that most CPU
operations wrap on overflow (although some instructions catch those overflows
and generate an exception, we’ll see that later).
If you’re coding in C you don’t need to worry about that if you use uint32 t
since the C standard mandates that unsigned overflow wraps around in this
fashion. Rust however says that overflows are undefined and will generate an

8
error in debug builds if an unchecked overflow is detected, that’s why I need to
write pc.wrapping add(4) instead of pc + 4.
We now finally have some code but it doesn’t build yet.
We’re still missing 3 pieces of the puzzle before we can run this piece of code:

• What’s the initial value of PC when starting up?

• How do we implement the fetch32 function?
• How do we implement the decode and execute function?

2.4.1 Reset value of the PC

In integrated circuits reset is a state where the chip generally does nothing and
its internal state is set to some known default “factory” value. What exactly
the reset does varies from chip to chip (it’s just a convention) but it’s assumed
that a chip will restart in a clean and deterministic state after a reset cycle.
Generally the reset is a dedicated pin on the chip that’s connected to a
button or some other control logic. Sometimes you can also request a ”soft”
reset through software using a specific command or sequence of instructions.
Reseting a chip does necessitate cutting off the power (nor is power cycling an
integrated circuit a good way to reset a chip: if the reset signal is not asserted it
might not load the default values correctly).
When you power up the console or hit the reset button the hardware forces
the CPU (and other peripherals) into a reset state to initialize the logic.
Knowing this it’s pretty obvious that the reset value of the PC is very
important since it’s going to tell the CPU where it should start running the code.
It basically defines the location of the ”main” function of the console’s kernel.
The docs say that the reset value of PC is 0xbfc00000. In the playstation
memory map that’s the beginning of the BIOS (we’ll look at the memory map
in greater details in the next section).
Now that we know where our story starts we can write our CPU initializer:
impl Cpu {

pub f n new ( ) −> Cpu {

Cpu {
// PC r e s e t v a l u e a t t h e b e g i n n i n g o f t h e BIOS
pc : 0 x b f c 0 0 0 0 0 ,
}
}

// . . .
}

2.5 The Playstation memory map

Our CPU treats all addresses the same way but at some point we’ll have to
dispatch the load/store requests to the correct peripheral. If we read the BIOS
and we get GPU data instead we’re going to run into troubles very quickly. . .
So how do we know what is mapped at some arbitrary address? By using
the memory map of course!
Here’s an overview of the PlayStation memory map, courtesy of the Nocash
specs:

9
KUSEG KSEG0 KSEG1 Length Description
0x00000000 0x80000000 0xa0000000 2048K Main RAM
0x1f000000 0x9f000000 0xbf000000 8192K Expansion Region 1
0x1f800000 0x9f800000 0xbf800000 1K Scratchpad
0x1f801000 0x9f801000 0xbf801000 8K Hardware registers
0x1fc00000 0x9fc00000 0xbfc00000 512K BIOS ROM

Table 1: Playstation memory map

Let’s take the time to parse through this.

We can see that most peripherals in table 1 are mapped at several addresses.
For instance if we look at the PC reset value 0xbfc00000 corresponds to the be-
ginning of the BIOS range in region KSEG1. However we can also reach the same
location through addresses 0x1fc00000(KUSEG) and 0x9fc00000(KSEG0).
What’s the point of having those mirrored regions? What’s the difference
between KUSEG and KSEG1 for instance? Those are memory regions which
are used to specify certain attributes of the memory access. On the Playstation
hardware it’s mostly used to specify whether the access is cached or not.
For now we’re going to ignore regions and treat all mappings the same, we’ll
study them more closely later on.

KSEG2 Length Description

0xfffe0000 512B I/O Ports

Table 2: KSEG2 memory map

Table 2 shows the last region: KSEG2. It’s a bit different from the others.
It doesn’t mirror the other regions, instead it gives access to a unique set of
registers. As far as I know the only important register there is the cache control
but there might be others I haven’t encountered yet.

2.5.1 Implementing the memory map

In order to implement the PlayStation memory map in our emulator we will need
an interconnect to dispatch the load/store operations to the correct peripheral.
I don’t know if the PlayStation really has a hardware interconnect. The CPU
could just ”broadcast” the read/write operations on the system bus and the
peripherals would check the address and only answer if it’s for them. However this
design would be inefficient in software: we’d need to iterate over the peripherals
for each transaction until we find the correct receiver.
Instead we’re just going to implement a ”switchboard” that will match the
address to the correct peripheral and forward it there.
Since the first thing the emulator will run is the BIOS we’ll use it as our first
peripheral.

2.6 The BIOS

On the PlayStation the BIOS displays the first screens (with the logos and that
memorable sweeping tune) and starts the game from the CD drive. If no CD is
present it displays a menu that can be used to manage the memory cards and

10
play CDs. As a player that’s probably the only time you’d know there was a
BIOS running.
But that’s just the tip of the iceberg! The BIOS remains loaded at all time
and provides a Basic Input/Output System to the running game. That means
that the game can call into the BIOS to do things like allocating memory, reading
the memory card, common libc functions (qsort, memset...) and many other
things.
We won’t be implementing the BIOS ourselves. It’s possible (and it’s been
done) but that’s a lot of work and probably something you’d want to do once
you have a working emulator. It might also hurt compatibility since many games
are known to patch the BIOS at runtime. The Nocash specs have more info.
We could dump the BIOS of a console but that requires access to the actual
hardware and the know-how to access the BIOS memory. Fortunately some nice
people have done it for us and these days it’s easy to find BIOS files on the web.
There are many BIOS versions: they change depending on the region, the
hardware revision and patches. Any good dump should work (after all, they all
do more or less the same thing) but if you’re following this guide it’s probably
better that we use the same file.

Algorithm Hash
MD5 924e392ed05558ffdb115408c263dccf
SHA-1 10155d8d6e6e832d6ea66db9bc098321fb5e8ebf

Table 3: SCPH1001.BIN BIOS checksums

I’ve decided to go for the version named SCPH1001.BIN. The file should be
exactly 512KB big. Check table 3 to make sure you got the right one.

2.7 Loading the BIOS

Once we got our BIOS the rest is pretty straightforward. We just read the file
into a 512KB buffer:
// / BIOS image
pub s t r u c t B i o s {
// / BIOS memory
data : Vec<u8>
}

impl B i o s {

// / Load a BIOS image from t h e f i l e l o c a t e d a t ‘ path ‘

pub f n new ( path : &Path ) −> R e s u l t <Bios> {

let f i l e = t r y ! ( F i l e : : open ( path ) ) ;

l e t mut data = Vec : : new ( ) ;

// Load t h e BIOS
t r y ! ( f i l e . t a k e ( BIOS SIZE ) . r e a d t o e n d (&mut data ) ) ;

i f data . l e n ( ) == BIOS SIZE a s u s i z e {

Ok( B i o s { data : data } )
} else {
Err ( E r r o r : : new ( ErrorKind : : I n v a l i d I n p u t ,
” I n v a l i d BIOS s i z e ” ) )

11
}
}
}

// / BIOS images a r e a l w a y s 512KB i n l e n g t h

c o n s t BIOS SIZE : u64 = 512 ∗ 1 0 2 4 ;

We also need to be able to read data from the BIOS. The CPU wants to
read 32bit of data to load the instructions so let’s start by implementing load32:
impl B i o s {
// . . .

// / Fetch t h e 32 b i t l i t t l e e n d i a n word a t ‘ o f f s e t ‘
pub f n l o a d 3 2 (& s e l f , o f f s e t : u32 ) −> u32 {
l e t o f f s e t = o f f s e t as u s i z e ;

let b0 = self . data [ offset + 0] as u32 ;

let b1 = self . data [ offset + 1] as u32 ;
let b2 = self . data [ offset + 2] as u32 ;
let b3 = self . data [ offset + 3] as u32 ;

b0 | ( b1 << 8 ) | ( b2 << 1 6 ) | ( b3 << 2 4 )

}
}

A few things to note: offset, as its name implies, is not the absolute address
used by the CPU, it’s just the offset in the BIOS memory range. Remember
that the BIOS is mapped in multiple regions so we’ll handle that in the generic
interconnect code. Each peripheral will just handle offsets in its address range.
In the comment I mention that we read the word in little endian. That’s
important. If you’ve never had to worry about endianess issues before let me
give you the gist.
The basic unit of memory is a byte (8 bits in our case). You cannot address
anything smaller than that. However sometimes you need to store data over
multiple bytes. For instance we’ve seen that our instructions are 4byte long. We
have multiple way to store 4byte words in our ”array of bytes”.
Let’s take an example: you have the 32bit word 0x12345678. You have
multiple way to store that value in 4 consecutive bytes. We can store [0x12,
0x34, 0x56, 0x78] or [0x78, 0x56, 0x34, 0x12] for instance. The former is called
big-endian because we store the most significant byte first. The latter is little-
endian because we store the least significant byte first. There are other endian
types with weirder patterns but they’re not often used is modern computers.
Check wikipedia if you want more details.
The PlayStation is little-endian so we’re in the 2nd case: when reading or
writing multi-byte values the least significiant byte goes first. If we do it the
other way around we’ll end up with garbage.
Now we can implement our interconnect to let the CPU communicate with
the BIOS.

2.8 The interconnect

We now have an embryo of a CPU and our first device ready to talk to each
other. We just need to figure out how to link them together.
At that point we could have the CPU talk directly to the BIOS, after all it’s
our only device. Obviously that won’t work for very long however, we need to be

12
able to dispatch the CPU’s loads and stores to the correct peripheral depending
on the address range.
I’m not quite sure how this is handled on the actual hardware. For simple
buses it’s very possible that the CPU just ”broadcasts” the address to all the
peripherals and each of them just checks if it’s within their address range and
simply ignores the transaction if they see it’s not for them. It’s fast in hardware
because all peripherals work in parallel so there’s no delay induced: they can all
receive and decode the address at the same moment.
Unfortunately we can’t really do that in software: the closest equivalent
would be to spawn a thread for each peripheral. The problem is that memory
transactions are very common (several millions per second potentially) and having
to send data and resynchronize across threads would kill our performances.
Multihreading emulators in general is a very tough issue: for threading to be
really efficient you need to reduce data exchange and resynchronization as much
as possible to let each thread live its life. When we emulate however we want to
mimick the original hardware behaviour and speed as much as possible which
requires very frequent resynchronization and we have plenty of shared state.
The two endeavors are somewhat at odds. That’s not to say multithreading
is impossible in emulators, just that it’s hard. We can’t just spawn threads
willy-nilly.
Anyway, back to our interconnect: since threads are out it means we’ll have
to sequentially match the address against each mapping until we get a match.
Then we can let the selected peripheral handle the transaction.
Let’s do just that:
// / G l o b a l i n t e r c o n n e c t
pub s t r u c t I n t e r c o n n e c t {
// / B a s i c I n p u t / Output memory
b i o s : Bios ,
}

impl I n t e r c o n n e c t {
pub f n new ( b i o s : B i o s ) −> I n t e r c o n n e c t {
Interconnect {
bios : bios ,
}
}
}

I’ve decided to store the BIOS directly in the interconnect struct. We’ll
append the other peripherals there as we implement them. We are going to store
the interconnect inside the struct Cpu which will give us a device tree with the
CPU at the top. It makes the data paths pretty simple: everything goes from
the CPU to the peripherals. It’s easier to reason about than a full “everybody
sees everybody” architecture in my opinion but it might prove limiting as we
progress. We’ll see if we need to revise that later.
Now we can finally implement the load32 function that the CPU will be
using. I don’t like having hardcoded constants all over the place so I’m going to
tie the address ranges to nice symbolic names:
mod map {
s t r u c t Range ( u32 , u32 ) ;

impl Range {
// / Return ‘ Some ( o f f s e t ) ‘ i f addr i s c o n t a i n e d i n ‘ s e l f ‘

13
pub f n c o n t a i n s ( s e l f , addr : u32 ) −> Option<u32> {
l e t Range ( s t a r t , l e n g t h ) = s e l f ;

i f addr >= s t a r t && addr < s t a r t + l e n g t h {

Some ( addr − s t a r t )
} else {
None
}
}
}

pub c o n s t BIOS : Range = Range ( 0 x b f c 0 0 0 0 0 , 512 ∗ 1 0 2 4 ) ;

}

If you’re not familiar with rust what this does is create a new type Range
which is a tuple of two values: the start address and length of the mapping.
I also declare a contains methods which takes an address and returns
Some(offset) if the address is within the range, None otherwise. You can think
of it as a form of multiple return values with some nice type-safety on top.
Finally I declare our first range for the BIOS.
Now for the load32 function:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {

if l e t Some ( o f f s e t ) = map : : BIOS . c o n t a i n s ( addr ) {

return s e l f . bios . load32 ( o f f s e t ) ;
}

p a n i c ! ( ” unhandled f e t c h 3 2 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

The if let syntax is an other rust nicety: if the contains function returns
Some(offset) we enter the body of the if with offset bound to a temporary
variable. If contains returns None on the other hand the if is refuted and we
don’t enter the body and go straight to the panic! command which will make
our emulator crash.

2.9 Gluing the interconnect to the CPU

The only thing left before we can finally build our code is gluing the Interconnect
with the Cpu.
We add an inter member to the struct Cpu and take an Interconnect
object in the constructor:
// / CPU s t a t e
pub s t r u c t Cpu {
// / The program c o u n t e r r e g i s t e r
pc : u32 ,
// / Memory i n t e r f a c e
inter : Interconnect ,
}

impl Cpu {

14
pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {
Cpu {
// PC r e s e t v a l u e a t t h e b e g i n n i n g o f t h e BIOS
pc : 0 x b f c 0 0 0 0 0 ,
inter : inter ,
}
}

// . . .
}

We can also implement the load32 function for the CPU which will just call
the interconnect.
impl Cpu {
// . . .

// / Load 32 b i t v a l u e from t h e i n t e r c o n n e c t
f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
s e l f . i n t e r . l o a d 3 2 ( addr )
}
}

We’re still lacking the decode and execute function, let’s use a placeholder
function that just panics for now:
impl Cpu {
// . . .

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : u32 ) {
p a n i c ! ( ” Unhandled i n s t r u c t i o n { : 0 8 x} ” ,
instruction ) ;
}
}

Finally we can instantiate everything in our main function:

f n main ( ) {
l e t b i o s = B i o s : : new(&Path : : new ( ” roms /SCPH1001 . BIN” ) ) . unwrap ( ) ;

l e t i n t e r = I n t e r c o n n e c t : : new ( b i o s ) ;

l e t mut cpu = Cpu : : new ( i n t e r ) ;

loop {
cpu . r u n n e x t i n s t r u c t i o n ( ) ;
}
}

I’ve hardcoded the BIOS path for now. It would be better to read it from
the command line, a config file or even some fancy dialog window but it’ll do
nicely for now.
We should now be able to build the code. When I run it, assuming that the
BIOS file was found at the correct location I get:

thread ‘<main>’ panicked at ’Unhandled instruction 3c080013’

As expected the decode and execute function died on us but we managed

to fetch an instruction. If you’ve been using the same BIOS file as me you should
have exactly the same value of 0x3c080013. If you got an other value something
is wrong with your code. In particular if you end up with 0x1300083c it means
you’re erroneously reading in big-endian.

15
2.10 Instruction decoding
We’ve now fetched our first instruction from the BIOS: 0x3c080013. What do
we do with this?
In order to be able to run this instruction we need to decode it to figure out
what it means. Instruction encoding is of course CPU dependent so we need
to interpret this value in the context of the Playstation R3000 processor. Once
again the Nocash specs have our back and list the format of the instruction.
MIPS is a common architecture used outside of the playstation and you can find
plenty of resources online describing its instruction set.
Let’s decode this one by hand to see how this works. If we look at the
“Opcode/Parameter Encoding” table in Nocash’s docs we see that we need to
look at the bits [31:26] of the operation to see what type it is. In our case they
are 001111. That means the operation is a LUI or “Load Upper Immediate”.
Immediate means that the value loaded is directly in the instruction, not indirectly
somewhere else in memory. Upper means that it’s loading this immediate value
into the high 16 bits of the target register. The 16 low bits are cleared (set to 0).
But what are the register and the value used by the instruction? Well we
need to finish decoding it to figure it out: for a LUI bits [20:16] give us the
target register: in our case it’s 01000 which means it’s register 8. Finally bits
[15:0] contain the immediate value: 0000 0000 0001 0011 or 19 in decimal.
Bits [25:21] are not used and their value doesn’t matter.
In other words this instruction puts 0x13 in the 16 high bits of the register 8.
In MIPS assembly1 it would be equivalent to:
l u i $8 , 0 x13

Enough babbling, let’s implement decoding. First I’ll wrap the raw instruction
in a nice interface that will let us extract the fields without doing the bitshifts
and masking everywhere. If you look at the encoding for other MIPS instructions
you’ll see that it’s fairly regular, for instance immediate values are always stored
in the LSBs:
s t r u c t I n s t r u c t i o n ( u32 ) ;

impl I n s t r u c t i o n {
// / Return b i t s [ 3 1 : 2 6 ] o f t h e i n s t r u c t i o n
f n f u n c t i o n ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

op >> 26
}

// / Return r e g i s t e r i n d e x i n b i t s [ 2 0 : 1 6 ]
f n t ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

( op >> 1 6 ) & 0 x 1 f
}

// / Return immediate v a l u e i n b i t s [ 1 6 : 0 ]
f n imm( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

op & 0 x f f f f
1 I’m using the GNU assembler syntax in this guide unless otherwise noted.

16
}
}

The names for the accessor functions match those I’ve seen used in the various
references to name the various fields.
We can now leverage that fancy interface in decode and execute:
impl Cpu {
// . . .

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . f u n c t i o n ( ) {
0 b001111 => s e l f . o p l u i ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n { : x} ” ,
instruction .0) ,
}
}

// / Load Upper Immediate

f n o p l u i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;

p a n i c ! ( ” what now? ” ) ;
}
}

We’re very close to finally run our first instruction in full but we’re still
missing something: we see that the register field in this instruction is 5bits, that
means it can index 32 registers. But for now we only have one register in our
CPU: the PC. We need to introduce the rest of them.

2.11 General purpose registers

Register Name Conventional use

$0 $zero Always zero
$1 $at Assembler temporary
$2, $3 $v0, $v1 Function return values
$4. . . $7 $a0. . . $a3 Function arguments
$8. . . $15 $t0. . . $t7 Temporary registers
$16. . . $23 $s0. . . $s7 Saved registers
$24, $25 $t8, $t9 Temporary registers
$26, $27 $k0, $k1 Kernel reserved registers
$28 $gp Global pointer
$29 $sp Stack pointer
$30 $fp Frame pointer
$31 $ra Function return address

Table 4: R3000 CPU general purpose registers

Table 4 lists the registers in the Playstation MIPS R3000 CPU (ignoring the
coprocessors for now). They’re all 32bit wide.
You can see that we have 32 registers ($0 to $31) which are the general
purpose registers. They’re all given a mnemonic used when writing assembly.

17
For instance, by convention, $29 is the stack pointer($sp) and $30 holds the
frame pointer ($fp).
It’s important to understand that those are just a convention between de-
velopers, in the hardware there’s no difference between $29 and $30. The point
of those calling conventions is to make it possible to make code generated from
different compilers or written in assembly by different coders remain interopera-
ble. If you write MIPS assembly and want to call third party functions (like the
BIOS functions for instance) you’ll have to adhere to this convention.
Only two general purpose registers are given a special meaning by the
hardware itself: $zero and $ra.

2.11.1 The $zero register

$zero ($0) is always equal to 0. If an instruction attempts to load a value in this
register it doesn’t do anything, the register will still be 0 afterwards.
Having a constant 0 register is useful to reduce the size of the instruction set.
For instance if you want to move the value of the register $v0 in $a0 you can
write this:
move $a0 , $v0

However this “move” instruction is not actually part of the MIPS instruction
set, it’s just a convenient shorthand understood by the assembler which will
generate the equivalent instruction:
addu $a0 , $v0 , $ z e r o

We can see that it effectively does the same thing by setting $a0 to the result
of $v0 + 0 but we avoid having to implement a dedicated “move” instruction in
the CPU.

2.11.2 The $ra register

$ra ($31) is the other general purpose register given a special meaning by the
hardware since instructions like “jump and link” or “branch and link” put the
return address in this register exclusively. Therefore the following instruction
jumps in function foo and puts the return address in $ra:
j a l foo

As we’ll soon see we don’t really have to bother with the various roles assigned
to those general purpose registers when writing our emulator (with the exception
of $zero and $ra) but it’s still useful to know the convention when trying to
understand what some emulated code is doing.

2.12 Special purpose registers

Name Description
PC Program counter
HI high 32bits of multiplication result; remainder of division
LO low 32bits of multiplication result; quotient of division

Table 5: R3000 CPU special purpose registers

18
Table 5 lists the three special purpose CPU registers. We’re already familiar
with the PC used to keep track of the code execution. The two others are HI and
LO which contain the results of multiplication and division instructions. Those
cannot be used as general purpose registers, instead there are special instructions
used to manipulate them. We’ll discover them as we implement them.

2.13 Implementing the general purpose registers

I’m just going to represent the 32 general purpose registers as an array of 32 u32
and use the index in the instructions to address them. I’ll even have an entry
for $zero even though it’s always 0 to avoid special cases. Of course we’ll have
to be careful to always keep its value to 0.
// / CPU s t a t e
pub s t r u c t Cpu {
// / The program c o u n t e r r e g i s t e r
pc : u32 ,
// / G e n e r a l Purpose R e g i s t e r s .
// / The f i r s t e n t r y must a l w a y s c o n t a i n 0 .
r e g s : [ u32 ; 3 2 ] ,
// / Memory i n t e r f a c e
inter : Interconnect ,
}

The registers are not initialized on reset, so they contain garbage value when
we start up. For the sake of our emulator being deterministic I won’t actually
put random values in the registers however, instead I’m going to use an arbitrary
garbage value 0xdeadbeef. We could as well initialize them to 0 but I prefer
to use a more distinguishable value which can be helpful while debugging. We
must remember to put 0 in $zero however.
impl Cpu {
pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {
// Not s u r e what t h e r e s e t v a l u e s a r e . . .
l e t mut r e g s = [ 0 x d e a d b e e f ; 3 2 ] ;

// . . . but R0 i s h a r d w i r e d t o 0
regs [ 0 ] = 0;

Cpu {
// PC r e s e t v a l u e a t t h e b e g i n n i n g o f t h e BIOS
pc : 0 x b f c 0 0 0 0 0 ,
regs : regs ,
inter : inter ,
}
}

f n r e g (& s e l f , i n d e x : u32 ) −> u32 {

s e l f . r e g s [ index as u s i z e ]
}

f n s e t r e g (&mut s e l f , i n d e x : u32 , v a l : u32 ) {

s e l f . r e g s [ index as u s i z e ] = val ;

// Make s u r e R0 i s a l w a y s 0
s e l f . regs [ 0 ] = 0;
}

// . . .
}

19
I’ve also added a getter and a setter. They’re very straightforward but I take
care to always write 0 in $zero in case it gets overwritten. I don’t ever bother
checking if the function wrote in this register or an other one, writing a 32bit
value is cheap and probably cheaper than adding an if. It’s also important to
note that the BIOS does try to write to $zero, it is believed that this is useful to
discard an I/O result without having to waste a register.

2.14 LUI instruction

Now we can finally implement our first instruction in full! Here’s what op lui
looks like now:
impl Cpu {

// . . .

// / Load Upper Immediate

f n o p l u i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;

// Low 16 b i t s a r e s e t t o 0
l e t v = i << 1 6 ;

s e l f . set reg (t , v) ;
}
}

Note that the low 16bits are set to 0. It’s important as we’ll see with the
next instruction.
The first instruction in the BIOS uses LUI to put 0x13 in the high 16bits of
$8.

2.15 ORI instruction

We can directly implement the 2nd instruction: 0x3508243f.
It decodes to:
o r i $8 , $8 , 0 x 2 4 3 f

In other words, it puts the result of the bitwise or of $8 and 0x243f back into
$8. The previous LUI initialized the high 16bits of $8 and set the rest to 0 so
this one will initialize the low 16bits.
That’s the simplest way to load a constant in a register with the MIPS
instruction set and that’s why it’s important for LUI to set the low 16bits to 0,
otherwise the ORI wouldn’t do the right thing.
The implementation is straightforward:
impl Cpu {
// . . .

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . f u n c t i o n ( ) {
0 b001111 => s e l f . o p l u i ( i n s t r u c t i o n ) ,
0 b001101 => s e l f . o p o r i ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n { : x} ” ,
instruction .0) ,
}

20
}

// / B i t w i s e Or Immediate
f n o p o r i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t v = s e l f . reg ( s ) | i ;

s e l f . set reg (t , v) ;
}
}

After those two instructions the value of $8 should be 0x0013243f. The next
instruction as an other LUI which puts 0x1f800000 in $1.

2.16 Writing to memory

The next instruction, 0xac281010, is going to give us a little more trouble. It
decodes to the “store word” instruction:
sw $8 , 0 x1010 ( $1 )

If you’re not familiar with GNU assembly syntax the 0x1010($1) syntax
means “address in $1 plus offset 0x1010”. In this case the full instruction is “store
the 32bits in register $8 at the location $1 + 0x1010”. Given the current values
of the $1 and $8 registers it would store 0x0013243f at the address 0x1f801010.
We can implement the storing to memory by mirroring our load32 code:
impl Cpu {
// . . .

// / S t o r e 32 b i t v a l u e i n t o t h e memory
f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
s e l f . i n t e r . s t o r e 3 2 ( addr , v a l ) ;
}

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . f u n c t i o n ( ) {
0 b001111 => s e l f . o p l u i ( i n s t r u c t i o n ) ,
0 b001101 => s e l f . o p o r i ( i n s t r u c t i o n ) ,
0 b101011 => s e l f . op sw ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n { : x} ” ,
instruction .0) ,
}
}

// . . .

// / S t o r e Word
f n op sw(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

let v = s e l f . reg ( t ) ;

s e l f . s t o r e 3 2 ( addr , v ) ;
}
}

21
This code for op sw is actually subtly broken, I’ll explain why in a moment.
For these values of addr and i it’ll do the right thing though. You can see that
we call into the interconnect’s store32 method that we have yet to implement.
Since the only peripheral we support so far is the BIOS ROM and we can’t write
to it there’s not much we can do at that point, let’s just log the access and panic:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
p a n i c ! ( ” unhandled s t o r e 3 2 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

2.16.1 Unaligned memory access

While we’re at it I just realized that so far we allow 32bit fetch and store from
and to any address. However the architecture won’t allow unaligned memory
accesses (i.e. 32bit accesses must have an address which is a multiple of 32bits).
Many architectures don’t support unaligned accesses (it generates a “bus error”)
and those who do usually implement it at a cost (unaligned accesses are slower).
I’d rather add some code in the functions to catch unaligned access, it could
help us catch unexpected behaviours when debugging:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {

i f addr % 4 != 0 {
p a n i c ! ( ” U n a l i g n e d l o a d 3 2 a d d r e s s : { : 0 8 x} ” , addr ) ;
}

// . . .
}

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {

i f addr % 4 != 0 {
p a n i c ! ( ” U n a l i g n e d s t o r e 3 2 a d d r e s s : { : 0 8 x} ” , addr ) ;
}

// . . .
}
}

Once we implement exceptions we’ll be able to handle those conditions

properly.
The code should now compile but unsurprisingly it won’t manage to execute
the SW instruction in full:
’<main>’ panicked at ’unhandled store32 into address 1f801010’
The address is not part of the BIOS and therefore we don’t support it yet.
We can figure out where we’re trying to write by going back to the memory map
in table 1. We can see that we end up in the ”Hardware registers” range.

22
Looking at the specs we see that registers in this range are for “memory
control”. They’re mainly used to set things like access latencies to the various
peripherals. We’re going to hope we don’t need to emulate those very low level
settings so we’ll ignore the writes to those registers for now.

2.16.2 Expansion mapping

There are two memory control registers we need to be careful about however:
registers 0x1f801000 and 0x1f801004 contain the base address of the expansion
1 and 2 register maps. We could emulate dynamic mappings but apparently on
the Playstation they’re always at 0x1f000000 and 0x1f802000 respectively so
we’re just going to hardcode those addresses and raise an error if the BIOS or a
game ever attempts to remap them to something else (which hopefully shouldn’t
ever happen).
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : MEM CONTROL. c o n t a i n s ( addr ) {

match o f f s e t {
0 => // Expansion 1 b a s e a d d r e s s
i f v a l != 0 x 1 f 0 0 0 0 0 0 {
p a n i c ! ( ”Bad e x p a n s i o n 1 b a s e a d d r e s s : 0x
{ : 0 8 x} ” , v a l ) ;
},
4 => // Expansion 2 b a s e a d d r e s s
i f v a l != 0 x 1 f 8 0 2 0 0 0 {
p a n i c ! ( ”Bad e x p a n s i o n 2 b a s e a d d r e s s : 0x
{ : 0 8 x} ” , v a l ) ;
},
=>
p r i n t l n ! ( ” Unhandled w r i t e t o MEM CONTROL
register ”) ,
}
return ;
}

p a n i c ! ( ” unhandled s t o r e 3 2 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

And of course we need to declare the MEM CONTROL constant:

// / Memory l a t e n c y and e x p a n s i o n mapping
pub c o n s t MEM CONTROL: Range = Range ( 0 x 1 f 8 0 1 0 0 0 , 3 6 ) ;

It’s a bit hackish but at least the store will now go through.
Before we move on to the next instruction we need to address the “subtle
brokenness” in our SW implementation I was talking about earlier.

2.17 Sign extension

The reason our current “Store word” extension is broken is because we’re not
handling the immediate value correctly. It should be interpreted like a signed
16bit value in a two’s complement representation.

23
In other words, if the immediate value of the SW was 0xffff it would give
an offset of -1, not +65535.

16bit value 32bit “unsigned” extended value decimal unsigned value

0x0000 0x00000000 0
0x0001 0x00000001 1
0x01ad 0x000001ad 429
0xffff 0x0000ffff 65535
0x83c5 0x000083c5 33733
16bit value 32bit sign-extended value decimal signed value
0x0000 0x00000000 0
0x0001 0x00000001 1
0x01ad 0x000001ad 429
0xffff 0xffffffff -1
0x83c5 0xffff83c5 -31803

Table 6: 16 to 32bit conversion: influence of sign extension

In order to support this we don’t need to add any branching, we just need to
sign extend the immediate value. It means that we increase the width of the
16bit value to 32bit but instead of padding with zeroes we pad with the original
MSB (which is sometimes called the sign bit). This way the signed value remains
the same. See table 6 for some examples.
You can see that for values where the sign bit is not set if we simply pad
the 16 high bits with 0s we get the same result in both signed and unsigned
extension. However for values with the MSB set to 1 we have a big difference.
So when we extend values it’s important to know if we’re dealing with signed
or unsigned quantities. We’ll have the same problem with rightwise bitshifts: if
we’re shifting signed quantities we have to pad with the sign bit.
It might sounds complicated but it’s very straightforward to implement with
most programming languages, for instance in C, C++ and Rust simply casting
from a 16bit signed integer to a 32bit integer makes the compiler sign-extend
the value. If it didn’t casting a 16bit variable containing -1 into a 32bit variable
would have the final value be 65535 which is obviously not what we want.
We can’t guess which instructions use signed or unsigned immediate values,
it’s described in the MIPS instruction set. For instance our ORI instruction
correctly uses an unsigned immediate value.
The nice thing with two’s complement representation is that while we need to
think about the signedness of the value when bitshifting and widening it doesn’t
matter for most arithmetic operations.
For instance the 16 bit addition 0x01ad + 0x84e0 gives the same result
whether the operands are signed or not: 0x01ad is 429, 0x84e0 is either 34016 if
it’s unsigned or -31520 if it’s a two’s complement signed value. 429 + 34016 is
34445 or 0x868d in hexadecimal. 429 - 31520 is -31091 or 0x868d in 16bit two’s
complement hexadecimal.
You can see that doing the calculation with signed or unsigned quantities
doesn’t matter: we end up with the same binary pattern.
Therefore we just need to care about the sign when widening the immediate
from 16 to 32 bits and then we can proceed with our usual ”unsigned” addition
and we’ll get the correct result whether the offset is negative or positive:

24
impl I n s t r u c t i o n {
// . . .

// / Return immediate v a l u e i n b i t s [ 1 6 : 0 ] a s a s i g n −e x t e n d e d 32
bit
// / v a l u e
f n imm se ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

l e t v = ( op & 0 x f f f f ) a s i 1 6 ;

v a s u32
}
}

Note the order of the casts from u32 to i16 back to u32. They might
look useless but that’s what’s forcing the compiler to generate instructions to
sign-extend v.

2.18 SW instruction
We can now use this function to fix op sw, we just have to replace instruction.imm()
with the new sign-extending instruction.imm se():
impl Cpu {
// . . .

// / S t o r e Word
f n op sw(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

let v = s e l f . reg ( t ) ;

s e l f . s t o r e 3 2 ( addr , v ) ;
}
}

This version of SW should work correctly even if the offset i is negative.

2.19 SLL instruction

The next instruction is simply 0x00000000. Looks strange but it’s perfectly
valid. As always we start by reading the bits [31:26] which obviously gives us
0b000000. This value however can introduce a number of instructions, to figure
out which one we need to read bits [5:0] which are again full zeroes. By looking
at the instruction set reference we see that these value correspond to a “shift
left logical” (SLL). If we decode the entire instruction we end up with:
s l l $zero , $zero , 0

Obviously this instruction does absolutely nothing since it stores the result
in $zero. This instruction is just the preferred way to encode a NOP2 . There are
many instruction in the MIPS architecture that behave like NOPs, for instance
using the opcodes we’ve already encountered we can craft several other NOPs:
2 MIPS assemblers actually feature a nop pseudo-instruction that generates this

sll $zero, $zero, 0 instruction.

25
l u i $zero , 0
o r i $zero , $zero , 0
o r i $ z e r o , $4 , 1234

And there are many others since almost anything targeting $zero is a NOP3 . I
think the SLL version is preferred simply because it has this noticeable encoding
of being all 0s.
In this case I can only assume that the NOP is used as a delay, probably
waiting for the previous SW instructions to take effect but I’m not entirely sure
why it’s needed.
In our emulator we won’t special-case this particular instruction, we can just
implement the generic SLL instruction in full. Since NOPs are pretty common
it might make some sense to special-case them but we’ll need to benchmark it
to make sure the cost of the test won’t be greater than computing a useless shift
and storing it in $zero.
Let’s start by implementing the accessors (the shift immediate is only 5bits
since it wouldn’t make sense to shift by more than 31 places and the rest of the
low bits is taken by the “subfunction” part of the instruction):
impl I n s t r u c t i o n {
// . . .

// / Return r e g i s t e r i n d e x i n b i t s [ 1 5 : 1 1 ]
f n d ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

( op >> 1 1 ) & 0 x 1 f
}

// / Return b i t s [ 5 : 0 ] o f t h e i n s t r u c t i o n
f n s u b f u n c t i o n ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

op & 0 x 3 f
}

// / S h i f t Immediate v a l u e s a r e s t o r e d i n b i t s [ 1 0 : 6 ]
f n s h i f t ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

( op >> 6 ) & 0 x 1 f
}
}

Now that we have our fancy getters ready to parse the instruction we can
implement the opcode itself:
impl Cpu {
// . . .

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . f u n c t i o n ( ) {
0 b000000 => match i n s t r u c t i o n . s u b f u n c t i o n ( ) {
0 b000000 => s e l f . o p s l l ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n { : 0 8 x} ” ,
instruction .0) ,
},
3 One exception would be memory loads which can have side effects even if the value is

discarded in $zero.

26
0 b001111 => s e l f . o p l u i ( i n s t r u c t i o n ) ,
0 b001101 => s e l f . o p o r i ( i n s t r u c t i o n ) ,
0 b101011 => s e l f . op sw ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n { : 0 8 x} ” ,
instruction .0) ,
}
}

// / S h i f t L e f t L o g i c a l
f n o p s l l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let i = instruction . shift () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t v = s e l f . r e g ( t ) << i ;

s e l f . s e t r e g (d , v) ;
}
}

Obviously in this case it won’t do anything since it’s a NOP but it should
work correctly when we encounter a “real” SLL instruction.

2.20 ADDIU instruction

After that we encounter the instruction “0x24080b88” which is the “Add Imme-
diate Unsigned” opcode. The name is completely misleading: it seems to say
that the immediate value is treated as unsigned (i.e. not zero-extended instead
of sign-extended) but it’s not the case. The only difference between ADDIU and
ADDI (“Add Immediate”) is that the latter generates an exception on overflow
while the former simply truncates the result. How they got to “unsigned” from
that I have no idea...
Knowing that it’s easy to implement it in our emulator4 :
impl Cpu {
// . . .

// / Add Immediate Unsigned

f n o p a d d i u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t v = s e l f . r e g ( s ) . wrapping add ( i ) ;

s e l f . set reg (t , v) ;
}
}

If you decode the instruction in full you should end up with:

addiu $8 , $ z e r o , 0 xb88

You can see an other use of the $zero register: this time with the ADDIU
opcode it sets $8 to the immediate value 0xb88. It saves having a dedicated
“Load immediate” opcode.
4 I’ll skip the code in decode and execute from now on, I’m sure you can figure it out by

yourself. . .

27
2.21 RAM configuration register
This value of 0x00000b88 is then stored at address 0x1f801060.
This register is called RAM SIZE in the NoCash specs. The exact purpose
of this register remains partially unknown but it seems to be configuring the
memory controller. I assume that this controller is capable of handling various
amounts of RAM for instance and this register lets the BIOS load the particular
configuration needed by the Playstation hardware.
At any rate, since we’re trying to emulate the Playstation and not some
generic MIPS computer we probably don’t have to handle this register in any
specific way so it’s hopefully safe to ignore it. I just add a new mapping entry,
ignore the store at this address and move along:
// / R e g i s t e r t h a t has s o m e t h i n g t o do with RAM c o n f i g u r a t i o n ,
// / c o n f i g u r e d by t h e BIOS
pub c o n s t RAM SIZE : Range = Range ( 0 x 1 f 8 0 1 0 6 0 , 4 ) ;

After this instruction we get a few NOPs. I suppose that the ram size
configuration takes a few cycle to take effect and the BIOS delays a bit before
continuing.

2.22 J instruction
The next instruction is 0x0bf00054 which is a jump instruction (J). This function
is used to change the value of the PC and have the CPU execution pipeline jump
to some other location in memory.
Jump behaves like a goto: it sets the PC to the immediate value contained
in the instruction. Since the instruction is 32bit wide and the instruction set
uses 6bits to encode the opcode it can only specify 26bits of the ‘PC‘ at once.
To make the most of those 26bits the target address is shifted two places to
the right. It’s not a problem because instructions must be aligned to a 32bit
boundary so the two LSBs of the PC always have to be zero. It means that
the instruction really encodes 28bits of the target address. The remaining 4
high bits are the PC’s MSB and remain untouched. In the case of our current
instruction this makes the target address 0xbfc00150.
You can see that this instruction cannot jump anywhere in RAM, only to an
address within the current 256MB of addressable memory. If the CPU needs to
jump further away5 it’ll have to use an other instruction like JR which takes a
full 32bit register containing the destination address. But we’ll see that soon
enough.
First we need to add an accessor for the 26bit immediate field:
impl I n s t r u c t i o n {
// . . .

// / Jump t a r g e t s t o r e d i n b i t s [ 2 5 : 0 ]
f n imm jump ( s e l f ) −> u32 {
l e t I n s t r u c t i o n ( op ) = s e l f ;

op & 0 x 3 f f f f f f
}
}

Now we can implement the instruction itself:

5 For instance to an other region as we’ll see later.

28
impl Cpu {
// . . .

// / Jump
f n o p j (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm jump ( ) ;

s e l f . pc = ( s e l f . pc & 0 x f 0 0 0 0 0 0 0 ) | ( i << 2 ) ;
}
}

Looks simple enough but unfortunately it’s broken. Why you ask?

2.23 Branch delay slots

The reason our implementation of “jump” doesn’t work properly is because one
of the simplifying assumptions we made when we started implementing the CPU
does not hold in this case.
Remember when I said that the CPU fetches and execute an instruction at
each cycle, increments the PC and repeats? Well it’s a bit more complicated
than that.
The MIPS architecture is pipelined It means that in order to increase the
throughput of the processor it splits its execution logic across several stages.
While one stage is busy decoding an instruction the instruction fetch stage
could already be loading the next one. It works like an assembly line.
When the code executes linearly (i.e. without jumps or branches) there’s no
problem: while the CPU decodes the instruction at PC the instruction fetch
stage can start loading the value at PC + 4.
But if the instruction being decoded is a jump or a branch things get messy.
The instruction fetch stage cannot know that the previous instruction is supposed
to change the execution path. When the instruction reaches the execution stage
the value of PC gets updated, but it’s too late a spurious instruction has been
fetched into the pipeline already.
So there you are, with an unwanted instruction in your pipeline. What do
you do?
Some architectures opt for flushing the pipeline in those cases. You restart
from the correct address. Of course that’s a costly operation: your CPU has
to wait for the fresh instructions to make it all the way through the pipeline
before getting executed. Many modern architectures do that and that’s why they
generally include complex branch predictors which do their best to guess if a
branch is about to be taken. If they make a bad prediction the pipeline has to be
flushed. That’s one of the main reasons branches are considered expensive (and
why I always overwrite regs[0] in set reg instead of checking if the register
was 0).
MIPS however doesn’t do that. It doesn’t bother wasting time flushing the
pipeline, it just ignore the issues and run the code anyway. What this means is
that the first instruction right after a branch always gets executed before the
branch is taken, unconditionaly. This instruction is said to be in the branch
delay slot
Consider the following assembly6 :
6 I’m assuming that the assembler is not asked to reorder the instructions. To get this

behaviour you have to use “.set noreorder” with the GNU assembler.

29
j foo
l u i $a0 , 0 x f 0 0

The LUI instruction gets executed before the code jumps to foo. When the
function is entered $a0 will be equal to 0x0f000000.
Fortunately it’s pretty easy to emulate this behaviour: we just have to do the
same thing the processor does and load the next instruction before we execute
the current one:
// / CPU s t a t e
pub s t r u c t Cpu {
// / The program c o u n t e r r e g i s t e r
pc : u32 ,
// / Next i n s t r u c t i o n t o be e x e c u t e d , used t o s i m u l a t e t h e
branch
// / d e l a y s l o t
next instruction : Instruction ,

// . . .
}

impl Cpu {

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

Cpu {
// PC r e s e t v a l u e a t t h e b e g i n n i n g o f t h e BIOS
pc : 0 x b f c 0 0 0 0 0 ,
// . . .
n e x t i n s t r u c t i o n : I n s t r u c t i o n ( 0 x0 ) , // NOP
}
}

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
l e t pc = s e l f . pc ;

// Use p r e v i o u s l y l o a d e d i n s t r u c t i o n
let instruction = self . next instruction ;

// Fetch i n s t r u c t i o n a t PC
s e l f . n e x t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . l o a d 3 2 ( pc ) ) ;

// I n c r e m e n t PC t o p o i n t t o t h e n e x t i n s t r u c t i o n . A l l
// i n s t r u c t i o n s a r e 32 b i t l o n g .
s e l f . pc = pc + 4 ;

s e l f . decode and execute ( i n s tr u ct i o n ) ;

}

// . . .
}

And now our jumps should behave correctly.

2.24 OR instruction
After the jump there’s a sequence of LUI/ORI/SW used to store a bunch of
values in the SYS CONTROL registers that we chose to ignore. We then stumbble
upon a new instruction: 0x00000825 which encodes a bitwise or operation:

30
o r $1 , $ z e r o , $ z e r o

Unlike ORI which used an immediate value as a 2nd operand this one takes
two register and stores the result in a third one. We can see that in this case
the two source registers are $zero so it just clears $1. The implementation is
fairly straightforward:
impl Cpu {
// . . .

// / B i t w i s e Or
f n o p o r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = s e l f . reg ( s ) | s e l f . reg ( t ) ;

s e l f . s e t r e g (d , v) ;
}
}

The next few instructions use OR to set all the general purpose registers to
0.

2.25 Type safety in the register interface

I’ve decided to make a modification to our Instruction interface: right now the
helper methods in the Instruction return register indexes as u32. The same
type as the values contained in the registers. Therefore the compiler won’t warn
us if we mess up and use a register index instead of a register value:
impl Cpu {
// . . .

// / B i t w i s e Or
f n o p o r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

let v = s | s e l f . r e g ( t ) ; // Oops . . .

s e l f . s e t r e g (d , v) ;
}
}

This code is broken: instead of OR-ing the value of the register number s it
ORs the index s itself. It’s meaningless and obviously wrong and yet it builds
without any error.
Fortunately with a small modification in our code we can have the compiler
reject such code by wrapping register indexes in a new type incompatible with
u32:
s t r u c t R e g i s t e r I n d e x ( u32 ) ;

Note that this is not like a typedef in C or C++: typedef just creates an
alias which remains compatible (i.e. interchangeable) with the original type.
The equivalent in C would be to wrap the u32 in a struct or something like
that.

31
Then we just have to update our helpers as well as the Cpu::reg and
Cpu::set reg methods to use a RegisterIndex instead of a plain u32.
With this modification the compiler will reject the broken op or implemen-
tation above:

Binary operation | cannot be applied to type cpu::RegisterIndex:

let v = s | self.reg(t);

Hurray for type safety!

2.26 CACHE CONTROL register

The BIOS then wants to write 0x00000804 to 0xfffe0130. This address is used
for cache control. Since we won’t implement the caches yet we can just add a
log message and ignore this register for the moment:
// / Cache c o n t r o l r e g i s t e r
pub c o n s t CACHE CONTROL: Range = Range ( 0 x f f f e 0 1 3 0 , 4 ) ;

2.27 The coprocessors

The next unhandled instruction, 0x408c6000, involves one of the R3000 CPU
coprocessors.
Coprocessors are pieces of hardware which live alongside the CPU and are
accessed through dedicated instructions (instead of memory mapped I/O like
external peripherals). They are used to complement and extend the capabilities
of the processor. They each have their own set of registers.
The MIPS R3000 CPU can support up to 4 coprocessors:

• The coprocessor 0 (cop0) is mandated by the MIPS architecture: it’s used

for exception handling. Exceptions are things like hardware interrupts and
traps (divisions by zero, integer overflows, system calls etc...). We’ll study
them in greater details when we’ll implement them.
• The coprocessor 1 (cop1) is optional: when available it’s used for floating
point arithmetic. You might expect that a videogame console would benefit
greatly from having hardware accelerated floating point and yet cop1 is
not implemented on the playstation! Instead we have the coprocessor 2.
• The coprocessor 2 (cop2) is, as far as I know, custom made for the Playsta-
tion. At least I can’t find any reference to it outside of the Playstation
hardware. It’s called the ”Geometry Transformation Engine”, or GTE
for short. It implements many instructions dealing with 3D transforms
like perspective transformations, vector and matrix multiplications, color
manipulation etc... It’s basically the first half of the rendering pipeline, the
second half being the GPU (but that one is a memory mapped peripheral,
not a coprocessor).
• The coprocessor 3 (cop3) is not implemented on the Playstation.

Hopefully we shouldn’t have to mess with the GTE until we start encountering
3D code.

32
2.28 MTC0 instruction
Back to the 0x408c6000 instruction: the opcode (bits [31:26]) is equal to
0b010000 which means that it’s an instruction for the coprocessor 0. The
generic format is 0b0100nn where nn is the coprocessor number.
impl Cpu {
// . . .

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . f u n c t i o n ( ) {
// . . .
0 b010000 => s e l f . o p c o p 0 ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” Unhandled i n s t r u c t i o n {} ” ,
instruction ) ,
}
}

// / C o p r o c e s s o r 0 opcode
f n o p c o p 0 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . c o p o p c o d e ( ) {
0 b00100 => s e l f . op mtc0 ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” unhandled cop0 i n s t r u c t i o n {} ” ,
instruction )
}
}
}

Instruction::cop opcode returns the same bit range as Instruction::s,

however it returns it as a plain u32 instead of a RegisterIndex (since it’s not
a register in this case). You see that the current coprocessor opcode 0b00100
means MTC0 or “move to coprocessor 0”. This instruction takes two parameters:
the source register index (one of the CPU’s general registers) and the target
register (one of the coprocessor’s register). Those parameters are respectively in
bits [20:16] and [15:11] of the instruction.
In our current instruciton both of those parameters are equal to 12 so if we
decode the instruction in full it gives7 :
mtc0 $12 , $ c o p 0 1 2

The coprocessor register $cop0 12 is very useful: it’s called the “status
register” or SR for short. Among other things it’s used to query and mask the
exceptions and controlling the cache behaviour.
At this point the $12 register contains 0x00010000 so this MTC0 instruction
sets bit 16 of SR which is the “isolate cache” bit. It makes all the following read
and write target directly the cache instead of going through it towards the main
memory. We’re probably in the middle of the cache initialization sequence.
At any rate since we still haven’t implemented anything cache-related we’ll
just store the value of the SR in our Cpu struct and move along:
// / CPU s t a t e
pub s t r u c t Cpu {
// . . .

// / Cop0 r e g i s t e r 1 2 : S t a t u s R e g i s t e r
s r : u32 ,
7 This
is actually pseudo-assembly for the sake of clarity. The correct GNU assembler syntax
would be mtc0 $12, $12 but it’s a bit too ambiguous for my taste.

33
}

impl Cpu {
// . . .

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

Cpu {
// . . .
sr : 0 ,
}
}

f n op mtc0(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t cpu r = i n s t r u c t i o n . t () ;
l e t cop r = instruction . d () . 0 ;

l e t v = s e l f . reg ( cpu r ) ;

match c o p r {
12 => s e l f . s r = v ,
n => p a n i c ! ( ” Unhandled cop0 r e g i s t e r : { : 0 8 x} ” , n ) ,
}
}
}

Setting the SR to 0 on reset might not be accurate but I doubt it matters

much.
Since the cache is supposed to be isolated all “stores” should end up in the
cache and never in the main memory. Even if we don’t implement the cache we
don’t want the BIOS to start writing at random locations in main memory when
it thinks it writes to the cache so we can start by ignoring all writes when this
isolation bit is set:
impl Cpu {
// . . .

// / S t o r e Word
f n op sw(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

if s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” i g n o r i n g s t o r e while cache i s i s o l a t e d ” ) ;
return ;
}

// . . .
}
}

2.29 BNE instruction

We now encounter the instruction 0x154bfff7. It encodes a BNE or “branch
if not equal” instruction. The difference between jumps and branches is that
branches are conditional and they use relative offsets.
The immediate value is sign extended (in order to allow for negative offsets)
and multiplied by 4 (as always, the PC must be aligned to 32bits at all times).
Therefore this instruction decodes to:

34
bne $10 , $11 , −36

In other words the instruction will compare the values in $10 and $11 and
if they’re unequal it’ll subtract 36 from the PC. If the values are equal it’ll do
absolutely nothing.
Like jumps, branches have a delay slot8 . Fortunately our implementation in
section 2.23 already takes care of that without any more work.
I’ve decided to factor the “branching” code itself in a separate function
because we’ll have to use the same logic in the other branch instructions:
impl Cpu {
// . . .

// / Branch t o immediate v a l u e ‘ o f f s e t ‘ .
f n branch (&mut s e l f , o f f s e t : u32 ) {
// O f f s e t i m m e d i a t e s a r e a l w a y s s h i f t e d two p l a c e s t o t h e
// r i g h t s i n c e ‘PC‘ a d d r e s s e s have t o be a l i g n e d on 32 b i t s
at
// a l l t i m e s .
l e t o f f s e t = o f f s e t << 2 ;

l e t mut pc = s e l f . pc ;

pc = pc . wrapping add ( o f f s e t ) ;

// We need t o compensate f o r t h e hardcoded

// ‘ pc . wrapping add ( 4 ) ‘ i n ‘ r u n n e x t i n s t r u c t i o n ‘
pc = pc . w r a p p i n g s u b ( 4 ) ;

s e l f . pc = pc ;
}

// / Branch i f Not Equal

f n op bne (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;
let t = instruction . t () ;

if s e l f . r e g ( s ) != s e l f . r e g ( t ) {
s e l f . branch ( i ) ;
}
}
}

Notice the wrapping sub(4) to compensate for our pc.wrapping add(4) in

run next instruction. Without it we’d branch one instruction too far.

2.30 ADDI instruction

Before we even reach the target of the branch we stumble upon unhandled
instruction 0x214a0080. This one is an ADDI which behaves exactly like the
ADDIU instruction we’ve already implemented except that it generates an
exception if the addition overflows.
The instruction decodes to:
a d d i $10 , $10 , 128
8 It is called a branch delay slot after all. . .

35
Since this operation checks for signed overflow I’ll cast the operands to i32
before using the checked add provided by rust’s standard library9 . For now
I just panic if we encounter an overflow, we’ll change that when we actually
implement exceptions:
impl Cpu {
// . . .

// / Add Immediate Unsigned and c h e c k f o r o v e r f l o w

f n o p a d d i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) a s i 3 2 ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t s = s e l f . reg ( s ) as i32 ;

l e t v = match s . c h e c k e d a d d ( i ) {
Some ( v ) => v a s u32 ,
None => p a n i c ! ( ”ADDI o v e r f l o w ” ) ,
};

s e l f . set reg (t , v) ;
}
}

The cast to i32 is important because something like 0x4 + 0xffffffff is

an overflow in 32bit unsigned arithmetics. If the operands are signed however
it’s simply 4 + -1 and that’s obviously perfectly fine. The actual result of
the operation would be the same (0x00000003) but since ADDI generates an
exception on overflow the difference in behaviour is critical.

2.31 Memory loads

The next unhandled instruction, 0x8d090000 is LW or “load word”. It decodes
to:
lw $9 , 0 ( $8 )

We can reuse the load32 method to fetch the data from memory:
impl Cpu {
// . . .

// / Load Word
f n o p l w (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

if s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” Ignoring load while cache i s i s o l a t e d ” ) ;
return ;
}

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

9 If you’re implementing this in C or C++ and need to check for signed overflow yourself you’ll

find plenty of examples online. Welcome to the 1970s. Be careful with your implementation
though because signed integer overflow is undefined behaviour in C.

36
l e t v = s e l f . l o a d 3 2 ( addr ) ;

s e l f . set reg (t , v) ;
}
}

There’s a subtle problem with this implementation however.

2.32 Load delay slots

Sounds familiar? It’s our friend the pipeline messing with us once again. What
happens is that the load instructions attempts to read from the memory, but
that takes time. At least, it takes more than a single cycle.
On the R3000 CPU it creates “load delay slots”: when you load a value from
memory the CPU will execute the next instruction before the value is fetched
into the target register10 .
Consider this sequence of instructions:
lw $1 , 0 ( $ z e r o ) /∗ Load $1 with t h e v a l u e a t a d d r e s s 0 ∗/
move $2 , $1 /∗ Move $1 i n $2 ∗/
move $3 , $1 /∗ Move $1 i n $3 ∗/

The first MOVE instruction11 is in the load delay slot of the previous LW.
That means that at that point the register $1 does not yet contain the value
loaded into it. So after these two instructions $2 contains the value of $1 before
the load. The 2nd MOVE however takes place after the load delay slot so $3
will contain the final, post-load value of $1.
But it gets worse. Consider the value of $1 after these two instructions:
lw $1 , 0 ( $ z e r o ) /∗ Load $1 with t h e v a l u e a t a d d r e s s 0 ∗/
addiu $1 , $ z e r o , 42 /∗ Put 42 i n $1 ∗/

We first use LW to load something in $1 and then, while the load takes place,
we change the value of $1 with an ADDIU instruction. Who wins?
You might think that since the LW finishes after the load delay slot its fetched
value will override the one set by the ADDIU. It turns out that it’s not the case
however: after those two instructions $1 will contain 42, no matter what the LW
fetched.
It’s a bit of a bad news for us emulator writers. It means we can’t execute
the load before the delay slot because the instruction must see the previous
value of the loaded register (otherwise the first example code above won’t work)
and we can’t just execute it afterwards because it would make the load take the
priority over the delay slot (thus breaking our 2nd example).
One way to see it is that the loaded value ends up in the target register after
the next instruction has fetched the input register values but before the next
instruction updates the target register values. In our first example $1 is an input
register to both MOVEs while in the 2nd it’s the output (destination) register
of the ADDIU.
We could implement it exactly that way by splitting each instruction in two:
10 This
behaviour is part of the MIPS I architecture. Later revisions (starting with MIPS II)
don’t have load delay slots, only branch delay slots.
11 As I mentioned earlier MOVE is actually a pseudo-instruction that the assembler will

expand into an addu $<target >, $<source >, $zero.

37
• The first part would take the pre-load register values, compute the result
(adding $zero and 10 in the 2nd example example above),
• Then it would execute any pending load,
• Finally it would store the result of the computation in the target register
($1 in the ADDIU). That way the ADDIU will write last.
I don’t really like this solution however because we’ll have to handle load
delays explicitly in all instructions which seems inelegant and error-prone.
Instead I’m going to use two sets of general purpose registers: one will be
the input set and the other the output set. Each instruction will read its input
values from the former set and will write to the latter. Once the instruction is
finished we copy the output set into the input set for the next instruction.
This way we can update the output register set with the load value before
we execute the instruction and it will still see the old value from the input set.
And if the instruction writes to the same register it will overwrite the value in
the output set.
Hopefully it should be clearer in code. First let’s add a 2nd set of registers
and a (register, value) tuple containing the pending load:
// / CPU s t a t e
pub s t r u c t Cpu {
// . . .

// / 2nd s e t o f r e g i s t e r s used t o e m u l a t e t h e l o a d d e l a y s l o t
// / a c c u r a t e l y . They c o n t a i n t h e ou tp ut o f t h e c u r r e n t
// / i n s t r u c t i o n .
o u t r e g s : [ u32 ; 3 2 ] ,
// / Load i n i t i a t e d by t h e c u r r e n t i n s t r u c t i o n
l o a d : ( R e g i s t e r I n d e x , u32 ) ,
}

impl Cpu {

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

Cpu {
// . . .
out regs : regs ,
load : ( RegisterIndex (0) , 0) ,
}
}

// . . .
}

If no load is pending we can just target $zero since it doesn’t do anything.

Now we can update the set reg method to target the output register set:
impl Cpu {
// . . .

f n s e t r e g (&mut s e l f , i n d e x : R e g i s t e r I n d e x , v a l : u32 ) {
s e l f . o u t r e g s [ index . 0 as u s i z e ] = val ;

// Make s u r e R0 i s a l w a y s 0
s e l f . out regs [ 0 ] = 0;
}

38
}

Since all our instructions so far use this helper method to update the register
values we won’t have to modify their code at all.
The next step is to update run next instruction to handle pending loads
and copying the output registers between every instructions:
impl Cpu {
// . . .

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
// . . .

// Execute t h e p e n d i n g l o a d ( i f any , o t h e r w i s e i t w i l l l o a d
// $ z e r o which i s a NOP) . ‘ s e t r e g ‘ works o n l y on
// ‘ o u t r e g s ‘ s o t h i s o p e r a t i o n won ’ t be v i s i b l e by
// t h e n e x t i n s t r u c t i o n .
l e t ( reg , v a l ) = s e l f . l o a d ;
s e l f . s e t r e g ( reg , v a l ) ;

// We r e s e t t h e l o a d t o t a r g e t r e g i s t e r 0 f o r t h e n e x t
// i n s t r u c t i o n
s e l f . load = ( RegisterIndex (0) , 0) ;

s e l f . decode and execute ( i n s tr u ct i o n ) ;

// Copy t h e ou tp ut r e g i s t e r s a s i n p u t f o r t h e
// n e x t i n s t r u c t i o n
s e l f . regs = s e l f . out regs ;
}
}

You can see that we’re copying 128 bytes worth of registers for each instruction
which might not be great performance-wise but at this point I don’t really care
about that.

2.33 LW instruction
We can now write the correct, load-delay friendly implementation of SW:
impl Cpu {
// . . .

// / Load Word
f n o p l w (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

if s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” Ignoring load while cache i s i s o l a t e d ” ) ;
return ;
}

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

l e t v = s e l f . l o a d 3 2 ( addr ) ;

// Put t h e l o a d i n t h e d e l a y s l o t

39
s e l f . load = ( t , v) ;
}
}

2.34 The RAM

Unfortunately we can’t test our brand new load delay slot just yet because the
current instruction attemps to load from an unhandled address: 0xa0000000.
The memory map1 tells us that this is the first address in RAM.
Adding RAM support is straightforward: it’s very similar to our BIOS
implementation except it boots up uninitialized and it’s not read-only:
// / RAM
pub s t r u c t Ram {
// / RAM b u f f e r
data : Vec<u8>
}

impl Ram {

// / I n s t a n t i a t e main RAM with g a r b a g e v a l u e s

pub f n new ( ) −> Ram {

// D e f a u l t RAM c o n t e n t s a r e g a r b a g e
l e t data = v e c ! [ 0 xca , 2 ∗ 1024 ∗ 1 0 2 4 ] ;

Ram { data : data }

}

// / Fetch t h e 32 b i t l i t t l e e n d i a n word a t ‘ o f f s e t ‘
pub f n l o a d 3 2 (& s e l f , o f f s e t : u32 ) −> u32 {
l e t o f f s e t = o f f s e t as u s i z e ;

let b0 = self . data [ offset + 0] as u32 ;

let b1 = self . data [ offset + 1] as u32 ;
let b2 = self . data [ offset + 2] as u32 ;
let b3 = self . data [ offset + 3] as u32 ;

b0 | ( b1 << 8 ) | ( b2 << 1 6 ) | ( b3 << 2 4 )

}

// / S t o r e t h e 32 b i t l i t t l e e n d i a n word ‘ v a l ‘ i n t o ‘ o f f s e t ‘
pub f n s t o r e 3 2 (&mut s e l f , o f f s e t : u32 , v a l : u32 ) {
l e t o f f s e t = o f f s e t as u s i z e ;

let b0 = v a l a s u8 ;
let b1 = ( v a l >> 8 ) a s u8 ;
let b2 = ( v a l >> 1 6 ) a s u8 ;
let b3 = ( v a l >> 2 4 ) a s u8 ;

self . data [ offset + 0] = b0 ;

self . data [ offset + 1] = b1 ;
self . data [ offset + 2] = b2 ;
self . data [ offset + 3] = b3 ;
}
}

I arbitrarily chose 0xca as the poison value on startup. It’s pretty strange
that the BIOS attempts to fetch data from the RAM before writing anything to
it (and effectively reading garbage) but if you look at the following instructions

40
it repeatedly reads the same address (the first word in RAM) and does nothing
with it. I’m not sure what this code does but it probably initializes something.
Let’s hope it’s not too important. . .
We can then plug our brand new RAM in the interconnect as usual:
pub c o n s t RAM: Range = Range ( 0 xa0000000 , 2 ∗ 1024 ∗ 1 0 2 4 ) ;

2.35 The coprocessor 0 registers

After that the BIOS wants to initialize the remaining cop0 registers by loading
$zero into them with the MTC0 instruction.
Let’s take the time to review those registers:

• $cop0 3 is BPC, used to generate a breakpoint exception when the PC

takes the given value.

• $cop0 5 is BDA, the data breakpoint. It’s like BPC except it breaks when
a certain address is accessed on a data load/store instead of a PC value.
• $cop0 6: I couldn’t find a lot of informations on this register or what it
does, the consensus seems to be that it’s basically useless.

• $cop0 7 is DCIC, used to enable and disable the various hardware break-
points.
• $cop0 9 is BDAM, it’s a bitmask applied when testing for BDA above.
That way we could trigger on a range of address instead of a single one.
• $cop0 11 is BPCM, like BDAM but for masking the BPC breakpoint.

• $cop0 12 we’ve already encountered: it’s SR, the status register.

• $cop0 13 is CAUSE, which contains mostly read-only data describing the
cause of an exception. Apparently only bits [9:8] are writable to force an
exception.

You can see that most of those registers (except SR and CAUSE) deal with
hardware breakpoints. That’s generally used for debugging so we shouldn’t need
to emulate those for most games. It’s probably safe to ignore for now. You can
see that the BIOS loads $zero into all of them which disables them.
For now we’re just going to ignore write to these registers when the value is
0. If at some point some game writes something else we’ll catch it and see what
we need to implement:
impl Cpu {
// . . .

// / Move To C o p r o c e s s o r 0
f n op mtc0(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t cpu r = i n s t r u c t i o n . t () ;
l e t cop r = instruction . d () . 0 ;

l e t v = s e l f . reg ( cpu r ) ;

match c o p r {
3 | 5 | 6 | 7 | 9 | 11 => // B r e a k p o i n t s r e g i s t e r s

41
i f v != 0 {
p a n i c ! ( ” Unhandled w r i t e t o c o p 0 r {} ” , c o p r )
},
12 => s e l f . s r = v ,
13 => // Cause r e g i s t e r
i f v != 0 {
p a n i c ! ( ” Unhandled w r i t e t o CAUSE r e g i s t e r . ” )
},
=> p a n i c ! ( ” Unhandled cop0 r e g i s t e r {} ” , c o p r ) ,
}
}
}

2.36 SLTU instruction

After that we encounter the instruction 0x0043082b which encodes the “set on
less than unsigned”(STLU) opcode:
s l t u $1 , $2 , $3

This instruction compares the value of two registers ($2 and $3 in this case)
and sets the value of a third one ($1) to either 0 or 1 depending on the result of
the “less than” comparison:
impl Cpu {
// . . .

// / S e t on L e s s Than Unsigned
f n o p s l t u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = s e l f . reg ( s ) < s e l f . reg ( t ) ;

s e l f . s e t r e g ( d , v a s u32 ) ;
}
}

2.37 ADDU instruction

We then stumble upon the instruction 0x03a0f021 which encodes an “Add
unsigned” (ADDU) opcode:
addu $30 , $29 , $ z e r o

You can see that with $zero as the third operand it simply moves $29 in $30,
so in this case it’s really a MOVE instruction.
The instruction is implemented like ADDIU except that we add two registers
instead of a register and an immediate value:
impl Cpu {
// . . .

// / Add Unsigned
f n op addu(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;
let d = instruction . d() ;

42
l e t v = s e l f . r e g ( s ) . wrapping add ( s e l f . r e g ( t ) ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.38 Regions
Our next problem is an unhandled access at address 0x00000060. If we look at
the memory map1 we see that it’s the RAM. But we’ve already added the RAM
in our interconnect in section 2.34!
The problem is that currently we mapped the RAM at 0xa0000000, in the
KSEG1 region. But this time the BIOS attempts to access it through an other
region: KUSEG. We could add multiple mappings for each peripheral in each
region but that would be a waste of code and performance.
Let’s a closer look at how those regions are specified by the MIPS architecture:

• KSEG0 starts at 0x80000000 and ends at 0x9fffffff. This region is

accessed through the caches but it’s not mapped through the MMU. In
order to get the physical address we just have to strip the MSB.
• KSEG1 starts at 0xa0000000 and ends at 0xbfffffff. This region is not
cached or mapped through the MMU. In order to get the physical address
we just have to strip the three MSBs.

• KSEG2 starts at 0xc0000000 and ends at 0xffffffff. This region is only

accessed in kernel mode and is also cached and goes through the MMU.
• KUSEG starts at 0x00000000 and ends at 0x7fffffff. It’s meant for
user code and is both cached and goes through the MMU.

All that sounds rather complicated. Fortunately for us since we’re targeting
the Playstation and not some generic MIPS architecture we’ll be able to make
some simplifications:

• The Playstation hardware does not have a MMU and therefore no virtual
memory. We won’t have to deal with memory translation.

• The Playstation CPU has 1KB of data cache and an other kilobyte of
instruction cache. However the data cache is not used, instead its memory
is mapped as the ”scratpad” at a fixed location. In other word we don’t
need to implement the data cache.

• As far as I can tell the Playstation software doesn’t seem to use the
kernel/user privilege separation and runs everything in kernel mode.

In other words the only time we’ll need to worry about which region is in use
is when we’ll implement the cache instruction and only for KSEG1 since that’s
the only non-cached region.. For everything else it doesn’t matter through which
region the peripherals are accessed.
In order to solve our issue of having multiple mappings at different addresses
for the same peripherals in different regions we want to compute the unique

43
physical address corresponding to a memory access and map that through our
interconnect code.
By the descriptions above you see that we should mask a different number
of bits depending on the region. Since KSEG2 doesn’t share anything with the
other regions we won’t touch the address here (otherwise we would allow access
to the RAM through KSEG2 for instance and that wouldn’t be accurate). In
order to avoid branches we can use a nice mask lookup table:
// / Mask a r r a y used t o s t r i p t h e r e g i o n b i t s o f t h e a d d r e s s . The
// / mask i s s e l e c t e d u s i n g t h e 3 MSBs o f t h e a d d r e s s s o each e n t r y
// / e f f e c t i v e l y matches 512kB o f t h e a d d r e s s s p a c e . KSEG2 i s not
// / t o u c h e d s i n c e i t doesn ’ t s h a r e a n y t h i n g with t h e o t h e r
// / r e g i o n s .
c o n s t REGION MASK: [ u32 ; 8 ] = [
// KUSEG: 2048MB
0xffffffff , 0xffffffff , 0xffffffff , 0xffffffff ,
// KSEG0 : 512MB
0x7fffffff ,
// KSEG1 : 512MB
0x1fffffff ,
// KSEG2 : 1024MB
0xffffffff , 0xffffffff ,
];

// / Mask a CPU a d d r e s s t o remove t h e r e g i o n b i t s .

pub f n m a s k r e g i o n ( addr : u32 ) −> u32 {
// Index a d d r e s s s p a c e i n 512MB chunks
l e t i n d e x = ( addr >> 2 9 ) a s u s i z e ;

addr & REGION MASK[ i n d e x ]

}

We can now use this mask region function in our interconnect’s load and
store functions to convert any address coming from the CPU into a unique
physical address used to identify the target peripheral.
We also have to change all our current address map declarations to use
physical addresses:
pub c o n s t RAM: Range = Range ( 0 x00000000 , 2 ∗ 1024 ∗ 1 0 2 4 ) ;

pub c o n s t BIOS : Range = Range ( 0 x 1 f c 0 0 0 0 0 , 512 ∗ 1 0 2 4 ) ;

// / Unknown r e g i s t e r s . The name comes from mednafen .

pub c o n s t SYS CONTROL : Range = Range ( 0 x 1 f 8 0 1 0 0 0 , 3 6 ) ;

// / R e g i s t e r t h a t has s o m e t h i n g t o do with RAM c o n f i g u r a t i o n ,

// / c o n f i g u r e d by t h e BIOS
pub c o n s t RAM SIZE : Range = Range ( 0 x 1 f 8 0 1 0 6 0 , 4 ) ;

// / Cache c o n t r o l r e g i s t e r . F u l l a d d r e s s s i n c e i t ’ s i n KSEG2
pub c o n s t CACHE CONTROL: Range = Range ( 0 x f f f e 0 1 3 0 , 4 ) ;

2.39 SH instruction
The next unhandled instruction is 0xa5200180 which encodes “store halfword”
(SH). It’s used to write 16bits (a halfword) to the memory:
sh $ z e r o , 0 x180 ( $9 )

44
The implementation is very similar to the “store word” instruction except
we truncate the register to 16bits and we’ll have to implement a new store16
method on our interconnect12 :
impl Cpu {
// . . .

// / S t o r e 16 b i t v a l u e i n t o t h e memory
f n s t o r e 1 6 (&mut s e l f , addr : u32 , v a l : u16 ) {
s e l f . i n t e r . s t o r e 1 6 ( addr , v a l ) ;
}

// / S t o r e Halfword
f n o p s h (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

if s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” Ignoring s t o r e while cache i s i s o l a t e d ” ) ;
return ;
}

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

let v = s e l f . reg ( t ) ;

s e l f . s t o r e 1 6 ( addr , v a s u16 ) ;
}
}

And in the interconnect:

impl I n t e r c o n n e c t {
// . . .

// / S t o r e 16 b i t h a l f w o r d ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 1 6 (&mut s e l f , addr : u32 , v a l : u16 ) {

i f addr % 2 != 0 {
p a n i c ! ( ” U n a l i g n e d s t o r e 1 6 a d d r e s s : { : 0 8 x} ” , addr ) ;
}

p a n i c ! ( ” unhandled s t o r e 1 6 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

I start with an empty function instead of copying the store32 code because
different devices react differently when we change the transaction width. Some
will pad the value to 32bits with zeroes, others may just set 16bits in the register
and leave the others untouched. For this reason I’ll be conservative and add
them only when needed.

2.40 SPU registers

If we run that code we see that this store16 attempts to store 0 at 0x1f801d80.
Looking at the memory map we see it’s the address of a sound processing
12 Having
separate functions for various width should make the code easier to follow for now
but it does create some code duplication, later on I’ll use generics to factor them in a single
function.

45
unit (SPU) hardware register. At that point we don’t really care for sound so
we’re going to ignore writes to these addresses for the time being:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 16 b i t h a l f w o r d ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 1 6 (&mut s e l f , addr : u32 , : u16 ) {

i f addr % 2 != 0 {
p a n i c ! ( ” U n a l i g n e d s t o r e 1 6 a d d r e s s : { : 0 8 x} ” , addr ) ;
}

l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : : SPU . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o SPU r e g i s t e r { : x} ” , o f f s e t
);
return ;
}

p a n i c ! ( ” unhandled s t o r e 1 6 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

// / SPU r e g i s t e r s
pub c o n s t SPU : Range = Range ( 0 x 1 f 8 0 1 c 0 0 , 6 4 0 ) ;

2.41 JAL instruction

The next unhandled instruction should be 0x0ff00698 which is a “jump and
link” (JAL). It behaves like the regular jump instruction except that it also
stores the return address in $ra ($31):
j a l 0 xfc01a60

Using this instruction it’s easy to implement function calls: the instruction
is called with JAL and can return to the caller by jumping to the value in $ra.
Then the control returns to the calling function. The $ra register is the link
between the caller and the callee.
We can reuse the regular J opcode implementation and simply add the code
to store the return value in $31:
impl Cpu {
// . . .

// / Jump And Link

f n o p j a l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t r a = s e l f . pc ;

// S t o r e r e t u r n a d d r e s s i n $31 ( $ r a )
s e l f . s e t r e g ( RegisterIndex (31) , ra ) ;

s e l f . op j ( instruction ) ;
}
}

46
2.42 ANDI instruction
We continue with instruction 0x308400ff which is a “bitwise and immediate”
(ANDI):
a n d i $4 , $4 , 0 x f f

We can simply copy the implementation of ORI and replace the | with an &:
impl Cpu {
// . . .

// / B i t w i s e And Immediate
f n o p a n d i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t v = s e l f . reg ( s ) & i ;

s e l f . set reg (t , v) ;
}
}

2.43 SB instruction
After the word and halfword store instructions we now meet 0xa1c42041 which
is a “store byte” (SB) instruction. We have to implement a third path for
accessing the memory like we did for store32 and store32:
impl Cpu {
// . . .

// / S t o r e 16 b i t v a l u e i n t o t h e memory
f n s t o r e 8 (&mut s e l f , addr : u32 , v a l : u8 ) {
s e l f . i n t e r . s t o r e 8 ( addr , v a l ) ;
}

// / S t o r e Byte
f n o p s b (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

if s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” Ignoring s t o r e while cache i s i s o l a t e d ” ) ;
return ;
}

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

let v = s e l f . reg ( t ) ;

s e l f . s t o r e 8 ( addr , v a s u8 ) ;
}
}

47
2.44 Expansion 2
The address being written to is 0x1f802041 which falls in the expansion 2
memory map. As far as I can tell this expansion is only used for debugging on
development boards and doesn’t do anything useful on real hardware. Therefore
we’ll just ignore writes to this expansion:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e b y t e ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 8 (&mut s e l f , addr : u32 , : u8 ) {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : : EXPANSION 2 . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o e x p a n s i o n 2 r e g i s t e r { : x} ”
, offset ) ;
return ;
}

p a n i c ! ( ” unhandled s t o r e 8 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

// / Expansion r e g i o n 2
pub c o n s t EXPANSION 2 : Range = Range ( 0 x 1 f 8 0 2 0 0 0 , 6 6 ) ;

2.45 JR instruction
A few steps later we encounter 0x03e00008 which is the “jump register” (JR)
instruction. It simply sets the PC to the value stored in one of the general
purpose registers:
j r $31

Since JAL stores the return address in $31 we can return from a subroutine
by calling jr $ra which is exactly what the BIOS is doing here.
impl Cpu {
// . . .

// / Jump R e g i s t e r
f n o p j r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;

s e l f . pc = s e l f . r e g ( s ) ;
}
}

2.46 LB instruction
The next unhandled instruction is 0x81efe288 which encodes “load byte” (LB).
As you can guess it’s like LW except that it only loads 8bits from the memory13 :
l b $15 , −7544( $15 )
13 Note the use of a negative offset, if we hadn’t implemented proper sign extension earlier

this instruction would misbehave.

48
Since the general purpose registers are always 32bit LB only loads the low
8bits of the register. The byte is treated like a signed value so it’s sign extended
to the full 32bits. Of course like LW there’s a load delay of one instruction. We
can implement it like this14 :
impl Cpu {
// . . .

// / Load 8 b i t v a l u e from t h e memory

f n l o a d 8 (& s e l f , addr : u32 ) −> u8 {
s e l f . i n t e r . l o a d 8 ( addr )
}

// / Load Byte ( s i g n e d )
f n o p l b (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// Cast a s i 8 t o f o r c e s i g n e x t e n s i o n
l e t v = s e l f . l o a d 8 ( addr ) a s i 8 ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . l o a d = ( t , v a s u32 ) ;
}
}

Next is the Interconnect implementation. The current instruction attempts

to load from an address within the BIOS so we’ll add support for it:
impl I n t e r c o n n e c t {
// . . .

// / Load b y t e a t ‘ addr ‘
pub f n l o a d 8 (& s e l f , addr : u32 ) −> u8 {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : : BIOS . c o n t a i n s ( a b s a d d r ) {

return s e l f . bios . load8 ( o f f s e t ) ;
}

p a n i c ! ( ” unhandled l o a d 8 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

And the implementation of load8 in the BIOS:

impl B i o s {
// . . .

// / Fetch b y t e a t ‘ o f f s e t ‘
pub f n l o a d 8 (& s e l f , o f f s e t : u32 ) −> u8 {
s e l f . data [ o f f s e t a s u s i z e ]
}
}

14 Note the cast from u8 to i8 and finally u32 to force the sign extension.

49
2.47 BEQ instruction
We then get a new branch instruction: 0x11e0000c is “branch if equal” (BEQ):
beq $15 , $ z e r o , +48

We can reuse the code of BNE by changing the condition:

impl Cpu {
// . . .

// / Branch i f Equal
f n o p b eq (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;
let t = instruction . t () ;

if s e l f . r e g ( s ) == s e l f . r e g ( t ) {
s e l f . branch ( i ) ;
}
}
}

2.48 Expansion 1
After that the BIOS attemps to read a byte at 0x1f000084. This is where the
first expansion port is mapped. This expansion goes to the parallel port on the
back of the early Playstation models.
If you look at the byte read by the first LB instruction above you’ll see it’s
the first byte in a C-string: “Licensed by Sony Computer Entertainment Inc”.
Apparently in order to detect and validate the expansion the BIOS compares this
hardcoded string with the values stored starting at offset 0x84 in the expansion.
We don’t really have any reason to implement an expansion at that point
so we’ll return the default value when no expansion is present. Looking at
mednafen’s source code it seems to be full-ones15 :
impl I n t e r c o n n e c t {
// . . .

// / Load b y t e a t ‘ addr ‘
pub f n l o a d 8 (& s e l f , addr : u32 ) −> u8 {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : : BIOS . c o n t a i n s ( a b s a d d r ) {

return s e l f . bios . load8 ( o f f s e t ) ;
}

if l e t Some ( ) = map : : EXPANSION 1 . c o n t a i n s ( a b s a d d r ) {

// No e x p a n s i o n implemented
return 0 x f f ;
}

p a n i c ! ( ” unhandled l o a d 8 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

15 I’m actually not sure how to test that easily since I need to have an expansion plugged in

the parallel connector to be able to run code on my console. Maybe I could start the code and
unplug it but that doesn’t sound too great. . . A better way would be to burn the test code on
a CD and run it on a modchipped console.

50
2.49 RAM byte access
Now the BIOS wants to store a byte to the RAM but we haven’t implemented
that yet, let’s fix that by implementing store8 and let’s add load8 while we’re
at it:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e b y t e ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 8 (&mut s e l f , addr : u32 , v a l : u8 ) {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . s t o r e 8 ( o f f s e t , v a l ) ;
}

// . . .
}

// / Load b y t e a t ‘ addr ‘
pub f n l o a d 8 (& s e l f , addr : u32 ) −> u8 {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . l o a d 8 ( o f f s e t ) ;
}

// . . . .
}
}

And then in the RAM implementation:

impl Ram {
// . . .

// / S t o r e t h e b y t e ‘ v a l ‘ i n t o ‘ o f f s e t ‘
pub f n s t o r e 8 (&mut s e l f , o f f s e t : u32 , v a l : u8 ) {
s e l f . data [ o f f s e t a s u s i z e ] = v a l ;
}

// / Fetch t h e b y t e a t ‘ o f f s e t ‘
pub f n l o a d 8 (& s e l f , o f f s e t : u32 ) −> u8 {
s e l f . data [ o f f s e t a s u s i z e ]
}
}

2.50 MFC0 instruction

We’ve already met MTC0, now we encounter the reciprocal instruction: 0x40026000
encodes “move from coprocessor 0” (MFC0)16 :
mfc0 $2 , $ c o p 0 1 2

There’s one important thing to note however: MFC instructions behave like
memory loads and have a delay slot before the value is finally stored in the target
register.
Fortunately we can simply re-use our load delay slots infrastructure:
16 I’m using peudo-assembly again. The proper GNU assembler syntax would be
mfc0 $2, $12

51
impl Cpu {
// . . .

// / Move From C o p r o c e s s o r 0
f n op mfc0 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t cpu r = i n s t r u c t i o n . t () ;
l e t cop r = instruction . d () . 0 ;

l e t v = match c o p r {
12 => s e l f . s r ,
13 => // Cause r e g i s t e r
p a n i c ! ( ” Unhandled r e a d from CAUSE r e g i s t e r ” ) ,
=>
p a n i c ! ( ” Unhandled r e a d from c o p 0 r {} ” , c o p r ) ,
};

s e l f . load = ( cpu r , v )
}
}

2.51 AND instruction

An other easy instruction follows a few cycles later: 0x00412024 which is a
“bitwise and” (AND):
and $4 , $2 , $1

We’ve already implemented OR so we can reuse the code, only changing the
operator:
impl Cpu {
// . . .

// / B i t w i s e And
f n op and(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = s e l f . reg ( s ) & s e l f . reg ( t ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.52 ADD instruction

We already implemented ADDIU, ADDI and ADDU. We finally encounter “add”
(ADD) in instruction 0x01094020:
add $8 , $8 , $9

It adds the value of two registers (like ADDU) but generates an exception on
signed overflow (like ADDI):
impl Cpu {
// . . .

// / Add and g e n e r a t e an e x c e p t i o n on o v e r f l o w
f n op add(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

52
let s = instruction . s () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t s = s e l f . reg ( s ) as i32 ;
l e t t = s e l f . reg ( t ) as i32 ;

l e t v = match s . c h e c k e d a d d ( t ) {
Some ( v ) => v a s u32 ,
None => p a n i c ! ( ”ADD o v e r f l o w ” ) ,
};

s e l f . s e t r e g (d , v) ;
}
}

2.53 Interrupt Control registers

The BIOS then attempts to write 0 at address 0x1f801074. Looking at the
memory map this is the “Interrupt Mask” register.
This register is used to activate or ignore external interrupt signals (things
like blanking interrupts from the GPU, timers, controller and memory card
interrupts etc. . . ).
Interrupts are a signal coming from the peripherals to the CPU to notify it
that a certain event occurred (a timer reached its target value, a button was
pressed on the controller etc. . . ). This way the CPU doesn’t have to waste time
polling the status of the various peripherals, it can just wait for the interrupt
notification.
Writing 0 to this register masks all interrupts so it seems that the BIOS
wants to make sure it won’t get interrupted before proceeding further.
There’s an other interrupt control register right before that one at 0x1f801070.
That one is called “Interrupt Status” and is used to query the status of the
various interrupts (active or not).
Since we don’t have any peripheral yet it wouldn’t make sense to implement
interrupts at that point, we’re going to ignore writes to these addresses for now17 :
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”IRQ c o n t r o l : { : x} <− { : 0 8 x} ” , o f f s e t , v a l ) ;
return ;
}

p a n i c ! ( ” unhandled s t o r e 3 2 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

// / I n t e r r u p t C o n t r o l r e g i s t e r s ( s t a t u s and mask )
pub c o n s t IRQ CONTROL : Range = Range ( 0 x 1 f 8 0 1 0 7 0 , 8 ) ;

17 IRQ is a common abbreviation for “Interrupt Request”.

53
2.54 BGTZ instruction
The next unhandled instruction is 0x1ca00003 which is a “branch if greater
than zero” (BGTZ):
b g t z $5 , +12

It’s similar to the BEQ and BNE we’ve already encountered but instead of
comparing two registers it compares a single general purpose register to 0.
The comparison is done using signed integers. For unsigned integers the test
would only ever be false if the register contained 0 and we can already test that
with BNE:
bne $5 , $ z e r o , +12

So we have to be careful to cast to a signed integer before the comparison in

our implementation:
impl Cpu {
// . . .

// / Branch i f G r e a t e r Than Zero

f n o p b g t z (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;

l e t v = s e l f . reg ( s ) as i32 ;

if v > 0 {
s e l f . branch ( i ) ;
}
}
}

2.55 BLEZ instruction

A few step later we encounter the complementary instruction 0x18a00005 which
encodes “branch if less than or equal to zero” (BLEZ):
b l e z $5 , +20

It’s the same thing as BGTZ with the opposite predicate:

impl Cpu {
// . . .

// / Branch i f L e s s than o r Equal t o Zero

f n o p b l e z (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;

l e t v = s e l f . reg ( s ) as i32 ;

i f v <= 0 {
s e l f . branch ( i ) ;
}
}
}

54
2.56 LBU instruction
After that we meet instruction 0x90ae0000 which is a “load byte unsigned”
(LBU):
l b u $14 , 0 ( $5 )

It’s exactly like LB but without sign extension, the high 24 bits of the target
register are set to 0:
impl Cpu {
// . . .

// / Load Byte Unsigned

f n o p l b u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

l e t v = s e l f . l o a d 8 ( addr ) ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . l o a d = ( t , v a s u32 ) ;
}
}

2.57 JALR instruction

Then we encounter instruction 0x0100f809 which encodes a “jump and link
register” (JALR):
j a l r $31 , $8

It’s implemented like JR except that it also stores the return address in a
general purpose register. Unlike JAL, JALR can store the return address in any
general purpose register, not just $ra:
impl Cpu {
// . . .

// / Jump And Link R e g i s t e r

f n o p j a l r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;

l e t r a = s e l f . pc ;

// S t o r e r e t u r n a d d r e s s i n ‘ d ‘
s e l f . s e t r e g (d , ra ) ;

s e l f . pc = s e l f . r e g ( s ) ;
}
}

2.58 BLTZ, BLTZAL, BGEZ and BGEZAL instructions

The next unhandled instruction, 0x04800003, is a bit of a weird one: the six
MSBs are 0b000001 which can encode four different instructions:

55
• “branch if less than zero” (BLTZ):
bltz $4 , +12

• “branch if less than zero and link” (BLTZAL):

b l t z a l $4 , +12

• “branch if greater than or equal to zero” (BGEZ):

bgez $4 , +12

• “branch if greater than or equal to zero and link” (BGEZAL):

b g e z a l $4 , +12

In order to figure out what to do exactly we need to look at bits 16 and 20

in the instruction:
• If bit 16 is set then the instruction is BGEZ, otherwise it’s BLTZ.
• If bits [20:17] are equal to 0x80 then the return address is linked in $ra.
Note that when linking is requested the return address is linked in $ra even
if the branch is not taken.
Here’s how it can be implemented:
impl Cpu {
// . . .

// / V a r i o u s branch i n s t r u c t i o n s : BGEZ, BLTZ, BGEZAL, BLTZAL .

// / B i t s 16 and 20 a r e used t o f i g u r e out which one t o u s e .
f n op bxx(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;

let instruction = instruction .0;

l e t i s b g e z = ( i n s t r u c t i o n >> 1 6 ) & 1 ;
l e t i s l i n k = ( i n s t r u c t i o n >> 1 7 ) & 0 x f == 8 ;

l e t v = s e l f . reg ( s ) as i32 ;

// Test ” l e s s than z e r o ”
l e t t e s t = ( v < 0 ) a s u32 ;

// I f t h e t e s t i s ” g r e a t e r than o r e q u a l t o z e r o ” we need
// t o n e g a t e t h e c o m p a r i s o n above s i n c e
// ( ” a >= 0” <=> ” ! ( a < 0 ) ” ) . The x o r t a k e s c a r e o f t h a t .
let test = test ˆ is bgez ;

if is link {
l e t r a = s e l f . pc ;

// S t o r e r e t u r n a d d r e s s i n R31
s e l f . s e t r e g ( RegisterIndex (31) , ra ) ;
}

if t e s t != 0 {
s e l f . branch ( i ) ;
}
}
}

56
Instead of testing bit 16 directly I save a branch by xoring the value of test
(which is a boolean 0 or 1) with it.

2.59 SLTI instruction

We then encounter 0x28810010 which encodes instruction “set if less than
immediate” (SLTI):
s l t i $1 , $4 , 16

It works like SLTU except that it compares a register with an immediate

value (sign-extended) and the comparison is done using signed arithmetics:
impl Cpu {
// . . .

// / S e t i f L e s s Than Immediate ( s i g n e d )
f n o p s l t i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) a s i 3 2 ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = ( s e l f . reg ( s ) as i32 ) < i ;

s e l f . s e t r e g ( t , v a s u32 ) ;
}
}

2.60 SUBU instruction

The next unhandled instruction is 0x01c47023 which encodes “substract un-
signed” (SUBU):
subu $14 , $14 , $4

The implementation is straightforward:

impl Cpu {
// . . .

// / S u b s t r a c t Unsigned
f n op subu (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t v = s e l f . reg ( s ) . wrapping sub ( s e l f . reg ( t ) ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.61 SRA instruction

Next we meet instruction 0x00042603 which is “shift right arithmetic” (SRA):
s r a $4 , $4 , 24

57
There are two versions of the shift right instruction: arithmetic and logical.
The arithmetic version considers that the value is signed and use the sign bit to
fill the missing MSBs in the register after the shift.
In Rust, C and C++ we can achieve the same behavior by casting the register
value to a signed integer before doing the shift:
impl Cpu {
// . . .

// / S h i f t Right A r i t h m e t i c
f n o p s r a (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let i = instruction . shift () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t v = ( s e l f . r e g ( t ) a s i 3 2 ) >> i ;

s e l f . s e t r e g ( d , v a s u32 ) ;
}
}

2.62 DIV instruction

The next unhandled instruction is 0x0061001a which is “divide” (DIV):
d i v $3 , $1

Multiplications and divisions are a bit peculiar on the MIPS architecture: for
one, the result is not stored in general purpose registers but in two dedicated
32bit registers: HI and LO.
For a division LO will contain the quotient and HI the remainder of the
euclidean division.
The reason for this is that divisions and multiplications are typically much
slower than the other instructions we’ve implemented so far (with the exception
of loads and stores potentially, due to the memory latency). While a simple
ADD or SRA can be executed in a single CPU cycle, DIV can take as much as
36 cycles to get the result.
In order to try and hide this delay when the CPU executes a division
instruction it does not stall the pipeline waiting for the instruction to finish.
Rather it continues executing the following instructions and when the code
decides to fetch the result of the division (using dedicated instructions to load HI
or LO) the CPU only stalls if it didn’t have the time to finish doing the division
in the background. This way if you craft your assembly cleverly you can hide
the division delay by doing some other work while the division is finishing.
For now we haven’t bothered implementing accurate timings at all so we
won’t worry about these details and consider the division takes one cycle to
execute. Later on when we implement proper timings we’ll have to revisit that
code.
An important thing to consider is what happens when we encounter a division
by zero. Perhaps surprisingly the CPU does not generate an exception, it just
gives bogus values (1 or -1 depending on the sign of the dividend).
An other bogus behaviour would be to divide 0x80000000 (-2147483648) by
0xffffffff (-1) which would yield 2147483648 which does not fit in a 32bit
signed integer. Table 7 gives a summary of those special cases.

58
Numerator Denominator Quotient (LO) Remainder (HI)
≥0 0 -1 (0xffffffff) numerator
<0 0 +1 numerator
0x80000000 0xffffffff 0x80000000 0

Table 7: Special cases in divisions

We should now have all we need to implement the instruction, let’s start by
adding the HI and LO registers to our Cpu:
// / CPU s t a t e
pub s t r u c t Cpu {
// . . .

// / HI r e g i s t e r f o r d i v i s i o n r e m a i n d e r and m u l t i p l i c a t i o n h i g h
// / result
hi : u32 ,
// / LO r e g i s t e r f o r d i v i s i o n q u o t i e n t and m u l t i p l i c a t i o n low
// / result
lo : u32 ,
}

impl Cpu {

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

Cpu {
// . . .
hi : 0 xdeadbeef ,
l o : 0 xdeadbeef ,
}
}

// . . .
}

And now the implementation of the DIV opcode itself:

impl Cpu {
// . . .

// / D i v i d e ( s i g n e d )
f n o p d i v (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;

l e t n = s e l f . reg ( s ) as i32 ;
l e t d = s e l f . reg ( t ) as i32 ;

i f d == 0 {
// D i v i s i o n by z e r o , r e s u l t s a r e bogus
s e l f . h i = n a s u32 ;

i f n >= 0 {
s e l f . lo = 0 x f f f f f f f f ;
} else {
s e l f . lo = 1;
}
} e l s e i f n a s u32 == 0 x80000000 && d == −1 {
// R e s u l t i s not r e p r e s e n t a b l e i n a 32 b i t

59
// s i g n e d integer
s e l f . hi = 0;
s e l f . lo = 0 x80000000 ;
} else {
s e l f . hi = ( n % d ) a s u32 ;
s e l f . lo = ( n / d ) a s u32 ;
}
}
}

2.63 MFLO instruction

We’ve seen that divisions store their results in the HI and LO registers but
we don’t know how we access those yet. Unsurprisingly the next unhandled
instruction does just that: 0x00001812 encodes “move from LO” (MFLO):
m f l o $3

This instruction simply moves the contents of LO in a general purpose

register. This instruction would also stall if the division was not yet done but
we’ll implement that later:
impl Cpu {
// . . .

// / Move From LO
f n o p m f l o (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;

let lo = s e l f . lo ;

s e l f . s e t r e g (d , lo ) ;
}
}

2.64 SRL instruction

We’ve implemented SRA not long ago, now we encounter the sister instruction
0x00057082 which is a “shift right logical” (SRL):
s r l $14 , $5 , 2

It’s very similiar to SRA except that the instruction treats the value as
unsigned and fills the missing MSBs with 0 after the shift. In Rust, C and C++
we can achieve this behavior by shifting unsigned values:
impl Cpu {
// . . .

// / S h i f t Right L o g i c a l
f n o p s r l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let i = instruction . shift () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t v = s e l f . r e g ( t ) >> i ;

s e l f . s e t r e g (d , v) ;
}
}

60
2.65 SLTIU instruction
After that we meet 0x2c410045 which is “set if less than immediate unsigned”
(SLTI):
s l t i u $1 , $2 , 0 x45

It’s implemented like SLTI but using unsigned integers18 :

impl Cpu {
// . . .

// / S e t i f L e s s Than Immediate Unsigned

f n o p s l t i u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = s e l f . reg ( s ) < i ;

s e l f . s e t r e g ( t , v a s u32 ) ;
}
}

2.66 DIVU instruction

Now we encounter the other division instruction: 0x0064001b which encodes
“divide unsigned” (DIVU):
d i v u $3 , $4

Since this version uses unsigned operands we only have one special case: the
division by zero (the first line in table 7). Thus the implementation is slightly
shorter than DIV:
impl Cpu {
// . . .

// / D i v i d e Unsigned
f n o p d i v u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;

l e t n = s e l f . reg ( s ) ;
l e t d = s e l f . reg ( t ) ;

i f d == 0 {
// D i v i s i o n by z e r o , r e s u l t s a r e bogus
s e l f . hi = n ;
s e l f . lo = 0 x f f f f f f f f ;
} else {
s e l f . hi = n % d ;
s e l f . lo = n / d ;
}
}
}

18 Note that the immediate is still sign extended even though it’s then used as an unsigned

value.

61
2.67 MFHI instruction
We already implemented MFLO, now we meet instruction 0x0000c810 which
encodes “move from HI” (MFHI):
mfhi $25

Like MFLO it should be able to stall if the operation has not yet finished
but we’ll implement that later:
impl Cpu {
// . . .

// / Move From HI
f n o p m f l o (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;

l e t hi = s e l f . hi ;

s e l f . s e t r e g (d , hi ) ;
}
}

2.68 SLT instruction

The next unhandled instruction is 0x0338082a which is “set on less than”:
s l t $1 , $25 , $24

It’s like SLTU but with signed operands:

impl Cpu {
// . . .

// / S e t on L e s s Than ( s i g n e d )
f n o p s l t (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t s = s e l f . reg ( s ) as i32 ;
l e t t = s e l f . reg ( t ) as i32 ;

let v = s < t ;

s e l f . s e t r e g ( d , v a s u32 ) ;
}
}

2.69 Interrupt Control read

The BIOS then attempts to read from the Interrupt Mask register. Earlier we
just ignored writes to this register (and the Interrupt Status register) so for now
we’ll return 0. We’ll rewrite this code when we decide to implement interrupts:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

62
if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {
p r i n t l n ! ( ”IRQ c o n t r o l r e a d { : x} ” , o f f s e t ) ;
return 0;
}

p a n i c ! ( ” unhandled l o a d 3 2 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

2.70 Timer registers

After that the BIOS wants to write 0 to 0x1f801104. This address is one of the
timer registers. Timers are basically just configurable counters that can generate
interrupts at a predetermined rate. There are three independent timers on the
Playstation.
For now we won’t have to actually implement them though because the
BIOS just initializes them to a default disabled state by writing 0 to all the
configuration registers. We can just ignore those writes and move along:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 16 b i t h a l f w o r d ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 1 6 (&mut s e l f , addr : u32 , : u16 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : TIMERS . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o t i m e r r e g i s t e r { : x} ” ,
offset ) ;
return ;
}

p a n i c ! ( ” unhandled s t o r e 1 6 i n t o a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

2.71 Exceptions
The next unhandled instruction is 0x0000000c which encodes a “system call”
(SYSCALL):
syscall 0

This instruction is used to explicitly trigger an exception. Exceptions occur

when peripherals trigger an (unmasked) interrupt, when certain error occurs
(unaligned memory access, checked overflow in certain instructions, etc. . . ) or
with commands which are meant to trigger an exception like SYSCALL or
BREAK.
When an exception occurs the following takes place in the CPU:

• The current value of the PC is stored in $cop0 14, the EPC (Exception
PC) register19 ,
19 This is not entirely accurate when the exception occurs in a branch delay slot. We’ll review

that case in a minute

63
• Record the cause of the exception (syscall, overflow, interrupt...) in
$cop0 13, the CAUSE register,

• Disable interrupts in $cop0 12 (SR),

• Jump into the exception handler whose address is either 0x80000080 or
0xbfc00180 depending on the value of the BEV field (bit 22) in $cop0 12
(SR).

Unlike regular jumps and branches exceptions don’t have a branch delay slot:
the CPU jumps to the exception handler right after the current instruction.
The problem is that with my current architecture we fetch an instruction
ahead of time to emulate the branch delay slot. When an exception is triggered
we’d have to replace that instruction by the first one in the exception handler.
It’s possible of course but it’s a bit messy and I think it was a bad idea after all.
Instead I’m going to use two variables for the PC: one will hold he current
instruction and one will hold the “next PC”. Normally next pc is always 4 bytes
ahead but when a branch occurs we’ll set the PC to the instruction in the delay
slot and next pc to the branch target. In case of an exception however we’ll set
the PC to the exception handler address directly.
Let’s change our CPU state to reflect that change:
// / CPU s t a t e
pub s t r u c t Cpu {
// / The program c o u n t e r r e g i s t e r : p o i n t s t o t h e
// / n e x t i n s t r u c t i o n
pc : u32 ,
// / Next v a l u e f o r t h e PC, used t o s i m u l a t e t h e
// / branch d e l a y s l o t
n e x t p c : u32 ,
// . . .
}

impl Cpu {

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

// R e s e t v a l u e f o r t h e PC, b e g i n n i n g o f BIOS memory

l e t pc = 0 x b f c 0 0 0 0 0 ;

Cpu {
pc : pc ,
n e x t p c : pc . wrapping add ( 4 ) ,
// . . .
}
}

// . . .
}

We can then (once again) rework run next instruction to use our PC pair:
impl Cpu {
// . . .

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
l e t pc = s e l f . pc ;

64
// Fetch i n s t r u c t i o n a t PC
l e t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . l o a d 3 2 ( pc ) ) ;

// I n c r e m e n t n e x t PC t o p o i n t t o t h e n e x t i n s t r u c t i o n .
s e l f . pc = s e l f . n e x t p c ;
s e l f . n e x t p c = s e l f . n e x t p c . wrapping add ( 4 ) ;

// Execute t h e p e n d i n g l o a d ( i f any , o t h e r w i s e i t w i l l l o a d
// ‘ R0 ‘ which i s a NOP) . ‘ s e t r e g ‘ works o n l y on ‘ o u t r e g s ‘
// s o t h i s o p e r a t i o n won ’ t be v i s i b l e by t h e n e x t
// i n s t r u c t i o n .
l e t ( reg , v a l ) = s e l f . l o a d ;
s e l f . s e t r e g ( reg , v a l ) ;

// We r e s e t t h e l o a d t o t a r g e t r e g i s t e r 0 f o r t h e n e x t
// i n s t r u c t i o n
s e l f . load = ( RegisterIndex (0) , 0) ;

s e l f . decode and execute ( i n s tr u ct i o n ) ;

// Copy t h e ou tp ut r e g i s t e r s a s i n p u t f o r t h e n e x t
instruction
s e l f . regs = s e l f . out regs ;
}
}

Then we just need to modify our branch and jump functions to set next pc
instead of pc to set the target address.
After that we can implement our exception infrastructure. On top of pc and
next pc we’ll also need to store the address of the current instruction to store
it in the EPC register ($cop0 14). We also need to add the CAUSE register to
store the exception code:
// / CPU s t a t e
pub s t r u c t Cpu {
// . . .

// / Address o f t h e i n s t r u c t i o n c u r r e n t l y b e i n g e x e c u t e d . Used
for
// / s e t t i n g t h e EPC i n e x c e p t i o n s .
c u r r e n t p c : u32 ,
// / Cop0 r e g i s t e r 1 3 : Cause R e g i s t e r
c a u s e : u32 ,
// / Cop0 r e g i s t e r 1 4 : EPC
epc : u32 ,
}

impl Cpu {
// . . .

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
// Fetch i n s t r u c t i o n a t PC
l e t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . l o a d 3 2 ( s e l f . pc ) ) ;

// Save t h e a d d r e s s o f t h e c u r r e n t i n s t r u c t i o n t o s a v e i n
// ‘EPC‘ i n c a s e o f an e x c e p t i o n .
s e l f . c u r r e n t p c = s e l f . pc ;

// . . .
}
}

65
Now that we’ve added the EPC and CAUSE registers for cop0 we can also
add them to our implementation of MFC0:
impl Cpu {
// . . .

l e t v = match c o p r {
12 => s e l f . s r ,
13 => s e l f . c a u s e ,
14 => s e l f . epc ,
=>
p a n i c ! ( ” Unhandled r e a d from c o p 0 r {} ” , c o p r ) ,
};

s e l f . load = ( cpu r , v )
}
}

2.72 SYSCALL instruction

Finally we can implement our exception infrastructure and our SYSCALL opcode.
I’m going to use a exception method that will be used from various exception
sources:
impl Cpu {
// . . .

// / T r i g g e r an e x c e p t i o n
f n e x c e p t i o n (&mut s e l f , c a u s e : E x c e p t i o n ) {
// E x c e p t i o n h a n d l e r a d d r e s s depends on t h e ‘BEV‘ b i t :
l e t h a n d l e r = match s e l f . s r & ( 1 << 2 2 ) != 0 {
t r u e => 0 x b f c 0 0 1 8 0 ,
f a l s e => 0 x80000080 ,
};

// S h i f t b i t s [ 5 : 0 ] o f ‘SR ‘ two p l a c e s t o t h e l e f t .
// Those b i t s a r e t h r e e p a i r s o f I n t e r r u p t Enable / User
// Mode b i t s b e h a v i n g l i k e a s t a c k 3 e n t r i e s deep .
// E n t e r i n g an e x c e p t i o n p u s h e s a p a i r o f z e r o e s
// by l e f t s h i f t i n g t h e s t a c k which d i s a b l e s
// i n t e r r u p t s and p u t s t h e CPU i n k e r n e l mode .
// The o r i g i n a l t h i r d e n t r y i s d i s c a r d e d ( i t ’ s up
// t o t h e k e r n e l t o h a n d l e more than two r e c u r s i v e
// e x c e p t i o n l e v e l s ) .
l e t mode = s e l f . s r & 0 x 3 f ;
s e l f . s r &= ˜0 x 3 f ;
s e l f . s r |= ( mode << 2 ) & 0 x 3 f ;

// Update ‘CAUSE‘ r e g i s t e r with t h e e x c e p t i o n code ( b i t s

// [ 6 : 2 ] )
s e l f . c a u s e = ( c a u s e a s u32 ) << 2 ;

// Save c u r r e n t i n s t r u c t i o n a d d r e s s i n ‘EPC‘
s e l f . epc = s e l f . c u r r e n t p c ;

// E x c e p t i o n s don ’ t have a branch d e l a y , we jump d i r e c t l y

66
// i n t o t h e h a n d l e r
s e l f . pc = handler ;
s e l f . n e x t p c = s e l f . pc . wrapping add ( 4 ) ;
}

// / System C a l l
f n o p s y s c a l l (&mut s e l f , : Instruction ) {
s e l f . exception ( Exception : : SysCall ) ;
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// / System c a l l ( c a u s e d by t h e SYSCALL opcode )
S y s C a l l = 0x8 ,
}

Our op syscall method ends up being a one liner. All the logic is in the
generic exception method.
With this SYSCALL instruction the BIOS enters the exception handler. The
NoCash specs tell us that we have to look at the contents of register $4 to know
what the BIOS is supposed to do. In this case $4 contains 1 so it’s supposed
to run “EnterCriticalSection”. This function is apparently supposed to disable
all interrupts. Once this is done if everything works well the exception handler
should return to the caller using an RFE instruction, let’s continue and see if we
find it as expected.

2.73 MTLO instruction

In the exception handler we stumble upon 0x00400013 which is “move to LO”
(MTLO):
mtlo $2

As its name implies it just moves the value from a general purpose register
into the LO register. Be careful though because the instruction encoding is
different from MFLO:
impl Cpu {
// . . .

// / Move t o LO
f n o p m t l o (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;

s e l f . lo = s e l f . reg ( s ) ;
}
}

It might seem surprising to encounter this instruction: why would the BIOS
want to move something into the LO register? After all this register is for the
result of divisions and multiplications, you can’t do anything with it besides
reading it back.
The answer is that exception handlers are not supposed to restore all register
values before returning to the “normal” code flow. The reason is obvious:
exceptions can be triggered by asynchronous interrupts so they can basically
happen at any time. If the exception handler changes the value of any register

67
before giving back the control to the interrupted code it could lead to bogus
behaviour.
For instance some game code could start a division and be interrupted before
it reads the result in LO. Then the interrupt handler needs to compute an other
division but does not restore the original value of the register before returning
the control to the game. At that point the game reads LO expecting to get the
result of its computation but instead it gets some garbage value left there by
the handler. Obviously that would be problematic.
To avoid this the prologue of the exception handler saves the value of the
registers it might modify (including HI and LO) to the RAM and then loads
them back in the epilogue.
There are two exceptions though: registers $26 and $27‘ are reserved for the
BIOS and are not preserved by the exception handler. In other words no code
should use those registers when exceptions can occur because their content could
change at any moment.

2.74 MTHI instruction

Unsurprisingly the MTLO is almost immediately followed by instruction 0x00400011
which is “move to HI” (MTHI):
mtlo $2

The implementation is almost identical to MTLO:

impl Cpu {
// . . .

// / Move t o HI
f n op mthi (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;

s e l f . hi = s e l f . reg ( s ) ;
}
}

2.75 RFE intsruction

As expected once the exception handler is done it executes instruction 0x42000010
which is a coprocessor 0 opcode for “return from exception” (RFE):
rfe

All this instruction does is shift the Interrupt Enable/User Mode bits two
places back to the right. This effectively undoes the opposite shift done when
entering the handler and therefore puts the CPU back in the mode it was when
the exception triggered (unless SR itself has been modified in the handler).
It does not reset the PC however, it’s up to the BIOS to fetch the address in
EPC, increment it by 4 to point at the next instruction and jump to it. The
RFE instruction is typically in the final jump delay slot (and that’s exactly what
the Playstation BIOS handler does in this case).
The instruction encoding for RFE is a bit annoying: as usual we begin by
checking bits [31:26] which are 0b010000 and introduce a coprocessor opcode.
Then we check bits [25:21] to figure which one it is. For RFE it’s 0b10000.

68
But it’s not over! There can be multiple instructionts with this coprocessor
encoding, although RFE is the only one implemented on the Playstation hardware
(the others have to do with virtual memory). To make sure the requested
instruction is the one we expect we must check bits [5:0] which must be equal to
0b010000:
impl Cpu {
// . . .

// / C o p r o c e s s o r 0 opcode
f n o p c o p 0 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
match i n s t r u c t i o n . c o p o p c o d e ( ) {
0 b00000 => s e l f . op mfc0 ( i n s t r u c t i o n ) ,
0 b00100 => s e l f . op mtc0 ( i n s t r u c t i o n ) ,
0 b10000 => s e l f . o p r f e ( i n s t r u c t i o n ) ,
=> p a n i c ! ( ” unhandled cop0 i n s t r u c t i o n {} ” ,
instruction )
}
}

// / Return From E x c e p t i o n
f n o p r f e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
// There a r e o t h e r i n s t r u c t i o n s with t h e same e n c o d i n g but
all
// a r e v i r t u a l memory r e l a t e d and t h e P l a y s t a t i o n doesn ’ t
// implement them . S t i l l , l e t ’ s make s u r e we ’ r e not r u n n i n g
// buggy code .
i f i n s t r u c t i o n . 0 & 0 x 3 f != 0 b010000 {
p a n i c ! ( ” I n v a l i d cop0 i n s t r u c t i o n : {} ” , i n s t r u c t i o n ) ;
}

// R e s t o r e t h e pre−e x c e p t i o n mode by s h i f t i n g t h e I n t e r r u p t
// Enable / User Mode s t a c k back t o i t s o r i g i n a l p o s i t i o n .
l e t mode = s e l f . s r & 0 x 3 f ;
s e l f . s r &= ! 0 x 3 f ;
s e l f . s r |= mode >> 2 ;
}
}

2.76 Exceptions and branch delay slots

In our current implementation when an exception occurs we store the current
instruction’s address in ‘EPC‘. That’s correct in most cases but there’s one
exception in the MIPS archicture: when an exception occurs in a branch delay
slot we must store the address of the branch instruction in EPC20 .
Consider the following sequence where we have a ‘SYSCALL‘ instruction in
a ‘JR‘ delay slot:
j r $ra
syscall

In this case the CPU will put the address of the jr $ra instruction in EPC
before entering the exception handler. In order to signal this condition to the
handler the CPU also sets bit 31 of the CAUSE register.
In order to implement this behaviour we first need to keep track of whether
or we’re in a branch delay slot. It’s tempting to just check whether or not the
next instruction is 4 bytes ahead of the current one but it’s technically possible
20 This is only for branch delay slots, load delay slots behave normally exception-wise.

69
to branch 4 bytes ahead, even though it wouldn’t be very useful. Instead I’m
going to play it safe and add new variables:
pub s t r u c t Cpu {
// . . .

// / S e t by t h e c u r r e n t i n s t r u c t i o n i f a branch o c c u r e d and t h e
// / n e x t i n s t r u c t i o n w i l l be i n t h e d e l a y s l o t .
branch : b o o l ,
// / S e t i f t h e c u r r e n t i n s t r u c t i o n e x e c u t e s i n t h e d e l a y s l o t
d e l a y s l o t : bool ,
}

impl Cpu {

pub f n new ( i n t e r : I n t e r c o n n e c t ) −> Cpu {

// . . .

Cpu {
// . . .
branch : false ,
delay slot : false ,
}
}

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
// . . .
l e t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . l o a d 3 2 ( s e l f . pc ) ) ;

// I f t h e l a s t i n s t r u c t i o n was a branch then we ’ r e i n t h e

// d e l a y s l o t
s e l f . d e l a y s l o t = s e l f . branch ;
s e l f . branch = false ;

s e l f . decode and execute ( i n s tr u ct i o n ) ;

// . . .
}

Now we can simply modify (once again) all the branch and jump instructions
to set self.branch = true. In the next cycle run next instruction will copy
this variable to self.delay slot.
Now that we keep track of delay slots we can modify our exception code to
handle them accurately:
impl Cpu {
// . . .

// / T r i g g e r an e x c e p t i o n
f n e x c e p t i o n (&mut s e l f , c a u s e : E x c e p t i o n ) {
// . . .

// Update ‘CAUSE‘ r e g i s t e r with t h e e x c e p t i o n code ( b i t s

// [ 6 : 2 ] )
s e l f . c a u s e = ( c a u s e a s u32 ) << 2 ;

// Save c u r r e n t i n s t r u c t i o n a d d r e s s i n ‘EPC‘
s e l f . epc = s e l f . c u r r e n t p c ;

70
if self . delay slot {
// When an e x c e p t i o n o c c u r s i n a d e l a y s l o t ‘EPC‘
points
// t o t h e branch i n s t r u c t i o n and b i t 31 o f ‘CAUSE‘ i s
set .
s e l f . epc = s e l f . epc . w r a p p i n g s u b ( 4 ) ;
s e l f . c a u s e |= 1 << 3 1 ;
}

// . . .
}
}

With our exception handling infrastructure in place we can take the oppor-
tunity to review some exception conditions we’ve ignored so far and implement
them accurately.

2.77 ADD and ADDI overflows

The ADD and ADDI opcodes generate an exception on signed overflow but in
our current implementation is incomplete. We can use our exception method
to handle them in full:
impl Cpu {
// . . .

// / Add and c h e c k f o r s i g n e d o v e r f l o w
f n op add(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t s = s e l f . reg ( s ) as i32 ;
l e t t = s e l f . reg ( t ) as i32 ;

match s . c h e c k e d a d d ( t ) {
Some ( v ) => s e l f . s e t r e g ( d , v a s u32 ) ,
None => s e l f . e x c e p t i o n ( E x c e p t i o n : : O v e r f l o w ) ,
}
}

// / Add Immediate and c h e c k f o r s i g n e d o v e r f l o w

f n o p a d d i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm se ( ) a s i 3 2 ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t s = s e l f . reg ( s ) as i32 ;

match s . c h e c k e d a d d ( i ) {
Some ( v ) => s e l f . s e t r e g ( t , v a s u32 ) ,
None => s e l f . e x c e p t i o n ( E x c e p t i o n : : O v e r f l o w ) ,
}
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// . . .

// / A r i t h m e t i c o v e r f l o w

71
O v e r f l o w = 0 xc ,
}

2.78 Store and load alignment exceptions

When a load or store instruction targets a misaligned address (i.e. a word access
address is not a multiple of 4 or a halfword access address is not a multiple of 2)
the CPU is supposed to generate an exception:
impl Cpu {
// . . .

// / Load Word
f n o p l w (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
// . . .

// Address must be 32 b i t a l i g n e d
i f addr % 4 == 0 {
l e t v = s e l f . l o a d 3 2 ( addr ) ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . load = ( t , v) ;
} else {
s e l f . e x c e p t i o n ( E x c e p t i o n : : LoadA ddre ssErr or ) ;
}
}

// / S t o r e Halfword
f n o p s h (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
// . . .

// Address must be 16 b i t a l i g n e d
i f addr % 2 == 0 {
s e l f . s t o r e 1 6 ( addr , v a s u16 ) ;
} else {
s e l f . exception ( Exception : : StoreAddressError ) ;
}
}

// / S t o r e Word
f n op sw(&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
// . . .

// Address must be 32 b i t a l i g n e d
i f addr % 4 == 0 {
s e l f . s t o r e 3 2 ( addr , v ) ;
} else {
s e l f . exception ( Exception : : StoreAddressError ) ;
}
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// . . .

// / Address e r r o r on l o a d
Loa dAddr essE rror = 0x4 ,
// / Address e r r o r on s t o r e
S t o r e A d d r e s s E r r o r = 0x5 ,
}

72
2.79 PC alignment exception
We should also generate an exception if the PC address is not correctly aligned
when we attempt to fetch an instruction. This can happen if a JR or JALR
instruction jumped to an address that was not 32bit aligned21 :
impl Cpu {
// . . .

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f ) {
// Save t h e a d d r e s s o f t h e c u r r e n t i n s t r u c t i o n t o s a v e i n
// ‘EPC‘ i n c a s e o f an e x c e p t i o n .
s e l f . c u r r e n t p c = s e l f . pc ;

if s e l f . c u r r e n t p c % 4 != 0 {
// PC i s not c o r r e c t l y a l i g n e d !
s e l f . e x c e p t i o n ( E x c e p t i o n : : LoadA ddre ssErr or ) ;
return ;
}

// Fetch i n s t r u c t i o n a t PC
l e t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . l o a d 3 2 ( s e l f . pc ) ) ;

// . . .
}
}

2.80 RAM 16bit store

If the exceptions are implemented correctly our next unhandled condition should
be a SH targeting address 0x800dee24. This address is in the RAM so we just
need to add 16bit store support for it:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 16 b i t h a l f w o r d ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 1 6 (&mut s e l f , addr : u32 , v a l : u16 ) {

l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . s t o r e 1 6 ( o f f s e t , v a l ) ;
}

// . . .
}
}

And then in our RAM implementation:

impl Ram {
// . . .

// / S t o r e t h e 16 b i t l i t t l e e n d i a n h a l f w o r d ‘ v a l ‘ i n t o ‘ offset ‘
pub f n s t o r e 1 6 (&mut s e l f , o f f s e t : u32 , v a l : u16 ) {
l e t o f f s e t = o f f s e t as u s i z e ;

21 It
might be more efficient to add the test in the branch and jump instructions capable of
setting an invalid PC but I don’t really care about performance at that point and that would
make the code more complicated

73
l e t b0 = v a l a s u8 ;
l e t b1 = ( v a l >> 8 ) a s u8 ;

s e l f . data [ o f f s e t + 0 ] = b0 ;
s e l f . data [ o f f s e t + 1 ] = b1 ;
}
}

As always, make sure you get the endianess right.

2.81 DMA registers

We then stumble upon an unhandled load from address 0x1f8010f0. Looking
at the memory map this is the “DMA control register”. DMA stands for Direct
Memory Access. This is a generic term which can mean different things on
different architectures but the concept is always the same: it’s used to move
data between a peripheral and RAM without directly involving the CPU.
For instance if a game wants to load a texture to the GPU memory it can
set up the DMA to do the copy instead of doing it from the CPU with a series
of LW/SW. This is generally faster since the DMA is usually more efficient for
moving data around and while it’s working the CPU can do more interesting
things22 .
Since we still have some work to do on the CPU let’s see if we can ignore the
DMA access for now:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

if l e t Some ( ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”DMA r e a d : { : 0 8 x} ” , a b s a d d r ) ;
return 0;
}

p a n i c ! ( ” unhandled l o a d 3 2 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

// / D i r e c t Memory A c c e s s r e g i s t e r s
pub c o n s t DMA: Range = Range ( 0 x 1 f 8 0 1 0 8 0 , 0 x80 ) ;

You’ll notice that I ignore all loads from any DMA register, not just the
control. Let’s hope we’ll be able to keep the smoke screen up for a little longer.
Soon after that we encounter a SW targeting the DMA control register with
the value 0x000b0000. This value configures the DMA SPU channel priority
and enables it. This probably means the BIOS is getting ready to play some
sound. Since we don’t care about the SPU or the DMA at that point let’s ignore
those writes as well:
impl I n t e r c o n n e c t {
// . . .

22 Although on the Playstation the CPU is seriously gimped while the DMA is running as

we’ll see later

74
// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”DMA w r i t e : { : 0 8 x} { : 0 8 x} ” , a b s a d d r , v a l ) ;
return ;
}

p a n i c ! ( ” unhandled s t o r e 3 2 i n t o a d d r e s s { : 0 8 x } : { : 0 8 x} ” ,
addr , v a l ) ;
}
}

Hopefuly we should be able to ignore the DMA for a while and keep focusing
on the CPU.

2.82 LHU instruction

The next unhandled instruction is 0x961901ae which is “load halfword unsigned”
(LHU):
l h u $25 , 4 3 0 ( $16 )

It’s the 16bit counterpart to LBU and it’s our first 16bit load istruction:
impl Cpu {
// . . .

// / Load 16 b i t v a l u e from t h e memory

f n l o a d 1 6 (& s e l f , addr : u32 ) −> u16 {
s e l f . i n t e r . l o a d 1 6 ( addr )
}

// / Load Halfword Unsigned

f n o p l h u (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// Address must be 16 b i t a l i g n e d
i f addr % 2 == 0 {
l e t v = s e l f . l o a d 1 6 ( addr ) ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . l o a d = ( t , v a s u32 ) ;
} else {
s e l f . e x c e p t i o n ( E x c e p t i o n : : LoadA ddre ssErr or ) ;
}
}
}

We need to implement load16 in the interconnect. The current instruction

attempts to load from 0x1f801dae which is the SPU status register. Let’s lie
once again and return 0 for SPU reads:
impl I n t e r c o n n e c t {
// . . .

75
// / Load 16 b i t h a l f w o r d a t ‘ addr ‘
pub f n l o a d 1 6 (& s e l f , addr : u32 ) −> u16 {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( ) = map : : SPU . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled r e a d from SPU r e g i s t e r { : 0 8 x} ” ,
abs addr ) ;
return 0;
}

p a n i c ! ( ” unhandled l o a d 1 6 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

If we continue the emulation we stumble on an other unhandled load16, this

time at address 0x800dee24. This one is easy, it’s RAM:
impl I n t e r c o n n e c t {
// . . .

// / Load 16 b i t h a l f w o r d a t ‘ addr ‘
pub f n l o a d 1 6 (& s e l f , addr : u32 ) −> u16 {
// . . .

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . l o a d 1 6 ( o f f s e t ) ;
}

p a n i c ! ( ” unhandled l o a d 1 6 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}

And in our RAM implementation:

impl Ram {
// . . .

// / Fetch t h e 16 b i t l i t t l e e n d i a n h a l f w o r d a t ‘ o f f s e t ‘
pub f n l o a d 1 6 (& s e l f , o f f s e t : u32 ) −> u16 {
l e t o f f s e t = o f f s e t as u s i z e ;

l e t b0 = s e l f . data [ o f f s e t + 0 ] a s u16 ;
l e t b1 = s e l f . data [ o f f s e t + 1 ] a s u16 ;

b0 | ( b1 << 8 )
}
}

2.83 SLLV instruction

After that we encounter 0x0078c804 which is “shift left logical variable” (SLLV):
s l l v $25 , $24 , $3

It’s like SLL except the shift amount is stored in a register instead of an
immediate value.
The implementation is quite simple but there’s something to consider: so far
the shift amount was always a 5bit immediate value but this time it’s a 32bit
register. What happens when the register value is greater than 31?

76
It’s also important to figure out because shifting out of range is undefined
in Rust (and in C) so we have to be careful not to introduce weird undefined
behavior in our emulator.
Shifting by more than 31 places would mean shifting the 32bit value completely
out of range. Intuitively you might say that it sets it to 0 (all significant bits get
shifted outside the register) but it turns out it’s not accurate.
In reality on the R3000 CPU the shift amount is always implicitly masked
with 0x1f to only keep the low 5 bits. It means that a shift amount of 32 behaves
like 0 (i.e. it’s a NOP) while 130 behaves like 2:
impl Cpu {
// . . .

// / S h i f t L e f t L o g i c a l V a r i a b l e
f n o p s l l v (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

// S h i f t amount i s t r u n c a t e d t o 5 b i t s
l e t v = s e l f . r e g ( t ) << ( s e l f . r e g ( s ) & 0 x 1 f ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.84 LH instruction
We implemented LHU not long ago and now we meet 0x87a30018 which is “load
halfword” (LH):
l h $3 , 2 4 ( $29 )

It’s implemented like LHU but it sign-extends the 16bit value to fit the 32bit
target register:
impl Cpu {
// . . .

// / Load Halfword ( s i g n e d )
f n o p l h (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// Cast a s i 1 6 t o f o r c e s i g n e x t e n s i o n
l e t v = s e l f . l o a d 1 6 ( addr ) a s i 1 6 ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . l o a d = ( t , v a s u32 ) ;
}
}

2.85 NOR instruction

After that we stumble upon 0x0040c827 which is “bitwise not or” (NOR):

77
nor $25 , $2 , $ z e r o

It simply computes a bitwise OR between two registers and then complements

the result before storing it in the destination register23 :
impl Cpu {
// . . .

// / B i t w i s e Not Or
f n o p n o r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = ! ( s e l f . reg ( s ) | s e l f . reg ( t ) ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.86 SRAV instruction

The next unhandled instruction is 0x00e84007 which encodes “shift right arith-
metic variable” (SRAV):
s r a v $8 , $8 , $7

We’ve already implemented SRA and SLLV so this one shouldn’t give us any
trouble:
impl Cpu {
// . . .

// / S h i f t Right A r i t h m e t i c V a r i a b l e
f n o p s r a v (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

// S h i f t amount i s t r u n c a t e d t o 5 b i t s
l e t v = ( s e l f . r e g ( t ) a s i 3 2 ) >> ( s e l f . r e g ( s ) & 0 x 1 f ) ;

s e l f . s e t r e g ( d , v a s u32 ) ;
}
}

2.87 SRLV instruction

We finally encounter the last shift instruction: 0x01a52806 is “shift right logical
variable” (SRLV):
s r l v $5 , $5 , $13

It’s implemented like SRAV without sign extension (or like SRL with a
register holding the shift amount, if you prefer):
23 Note that in this context ! in rust does the same thing as ~ in C: it’s the bitwise NOT

operator.

78
impl Cpu {
// . . .

// / S h i f t Right L o g i c a l V a r i a b l e
f n o p s r l v (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

// S h i f t amount i s t r u n c a t e d t o 5 b i t s
l e t v = s e l f . r e g ( t ) >> ( s e l f . r e g ( s ) & 0 x 1 f ) ;

s e l f . s e t r e g (d , v) ;
}
}

2.88 MULTU instruction

The next unhandled instruction is 0x01240019 which encodes “multiply unsigned”
(MULTU):
multu $9 , $4

It’s our first multiplication opcode. The CPU does the multiplication using
64bit arithmetics and store the result across the HI and LO registers:
impl Cpu {
// . . .

// / M u l t i p l y Unsigned
f n op multu (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;

l e t a = s e l f . r e g ( s ) a s u64 ;
l e t b = s e l f . r e g ( t ) a s u64 ;

let v = a ∗ b;

s e l f . h i = ( v >> 3 2 ) a s u32 ;
s e l f . l o = v a s u32 ;
}
}

The timings of the multiplication instructions are similar to the divisions:

they run in the background and only stall the CPU if it attempts to read the
LO or HI registers before it’s done. Since we don’t implement accurate CPU
timings I choose to ignore that for now.

2.89 GPU registers

Our next stop will be an unhandled LW at address 0x1f801814. This register
is GPUSTAT when read and GP1 when written. In other words GPUSTAT is
read only while GP1 is write only and they share the same address. Why not.
GPUSTAT contains a whole bunch of information about the GPU status.
Things like the display’s resolution and color depth, interlacing, DMA channel
status and more.
It seems we’re entering to the display initialization code, we might soon be
pushing our first pixels to the screen! Boot logo, here we come.

79
Well, let’s not get ahead of ourselves, for now we have zero GPU emulation
code so we’re going to use the usual deception and have the BIOS read zeroes
when it attempts to access the GPU register space. That’s easy, there are only
two registers in the GPU24 :
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”GPU r e a d {} ” , o f f s e t ) ;
return 0;
}

p a n i c ! ( ” unhandled l o a d 3 2 a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

2.89.1 GP0: Draw Mode Setting command

Very soon after that we get an unhandled write32 at address 0x1f801810. This
is the other GPU register address. This one is GP0 for writing and it’s used to
queue commands.
We’ll study the GPU more closely soon but for now it suffices to say that it’s
programmed differently from the other peripherals we’ve seen so far: instead of
having dedicated registers for the various function, the CPU (or DMA) queues
commands in one of the two ports (GP0 and GP1) which behave like FIFOs.
The GPU then executes the commands one after an other.
Commands include drawing triangles, lines and sprites with various attributes
but also things like interrupt management and display configuration.
In order to interpret a GPU command we must first see to which port it
was posted (GP0 in this case). Then we must look at the value: 0xe1001000
here. The high byte (0xe1) is the “opcode”, the remaining 24bits are parameters
whose meaning depends on the command.
This particular opcode is “Draw Mode Setting”. It mostly sets a bunch of
texture-related parameters. In this particular instance only bit 12 is set which
activates “Textured Rectangle X-Flip”. Not exactly obvious why the BIOS is
doing this right now but I guess we’ll figure out that soon.
For now we’re still working on our CPU, so let’s just ignore writes to the
GPU ports and hope we can get away with it:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”GPU w r i t e { } : { : 0 8 x} ” , o f f s e t , v a l ) ;
return ;
24 Well, four really: two read only and two write only sharing the same addresses.

80
}

p a n i c ! ( ” unhandled s t o r e 3 2 i n t o a d d r e s s { : 0 8 x } : { : 0 8 x} ” ,
addr , v a l ) ;
}
}

2.90 Interrupt Control 16bit access

Unfortunately we don’t go very far, the BIOS then wants to make a 16bit read
at the Interrupt Mask address. So far we’ve only implemented 32bit access so
let’s add halfword support:
impl I n t e r c o n n e c t {
// . . .

// / Load 16 b i t h a l f w o r d a t ‘ addr ‘
pub f n l o a d 1 6 (& s e l f , addr : u32 ) −> u16 {
// . . .

if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”IRQ c o n t r o l r e a d { : x} ” , o f f s e t ) ;
return 0;
}
}
}

Unsurprisingly it’s followed by a 16 bit write to the same address with the
value 1. This means that the BIOS wants to use the first interrupt which is the
vertical blanking interrupt generated by the GPU’s video output. As usual let’s
ignore that:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 16 b i t h a l f w o r d ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 1 6 (&mut s e l f , addr : u32 , v a l : u16 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”IRQ c o n t r o l w r i t e { : x } , { : 0 4 x} ” , o f f s e t , v a l )
;
return ;
}
}
}

2.91 Timer registers 32bit access

After that we get an unhandled 32bit access to the timers range.
This time the BIOS wants to store 0xffffffff at 0x1f801118 which is the
counter target value for timer 1. When the counter reaches that value it goes
back to 0 and optionally generates an interrupt. The counter is only 16bit wide
though so this write would actually set the target value to 0xffff and the upper
16bits are ignored.
Let’s add our usual placeholder code:

81
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : TIMERS . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o t i m e r r e g i s t e r { : x } : { : 0 8 x
}” ,
offset , val ) ;
return ;
}
}
}

After that the BIOS writes 0x148 to 0x1f801114 which sets the timer 1 mode.
Bit 0x8 clears the counter (resets it to 0), bit 0x40 sets the timer interrupt to
repeat mode which means that it will fire periodically when the counter reaches
the target. Finally bit 0x100 sets the clock source as “horizontal blanking”.
It means that the timer increments when the display reaches the horizontal
blanking period.
This doesn’t set bit 0x10 however which would actually enable the interrupt.
And it hasn’t attempted to unmask the interrupt in the Interrupt Mask register
either anyway. Not sure where the BIOS is going with this.
After that the BIOS tries to change the value of the Interrupt Mask and
enables interrupt 0x8 which is the DMA’s.

2.92 GPUSTAT “DMA ready” field

At this point the BIOS enters an infinite loop: it reads the GPUSTAT register
again and again. Obviously it’s waiting for something to happen but since we
only ever return 0 it deadlocks.
If we disassemble that loop the code looks like this (it’s in the BIOS at
address 0xbfc04190):
lw $8 , 0 ( $6 ) /∗ Here $6 i s e q u a l t o 0 x 1 f 8 0 1 8 1 4 (GPUSTAT) ∗/
nop /∗ load d e l a y s l o t ∗/
and $9 , $8 , $4 /∗ Here $4 c o n t a i n s 0 x10000000 ∗/
beq $9 , $0 , −44 /∗ Loop back i f $9 i s z e r o ∗/

There are more things in the loop but that’s the important part. We can see
that the BIOS loads GPUSTAT, masks bit 28 and loops if it’s 0.
If we look at the specs we can see that bit 28 of GPUSTAT tells if the GPU
is ready to receive a DMA block. So it seems that the BIOS is polling this bit in
GPUSTAT because it’s about to initiate a DMA transfer between the RAM and
the GPU.
Let’s modify our GPUSTAT handling code to return 0x10000000 when read:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

82
p r i n t l n ! ( ”GPU r e a d {} ” , o f f s e t ) ;
r e t u r n match o f f s e t {
// GPUSTAT: s e t b i t 28 t o s i g n a l t h a t t h e GPU i s
ready
// t o r e c e i v e DMA b l o c k s
4 => 0 x10000000 ,
=> 0 ,
}
}

// . . .
}
}

This lets the BIOS continue the execution a little further.

2.93 XOR instruction

We then encounter an unhandled instruction: 0x0303c826 which encodes an
“exclusive or” (XOR):
x o r $25 , $24 , $3

We can implement it by copying the OR method and replacing the | operator

with ^:
mod Cpu {
// . . .

// / B i t w i s e E x c l u s i v e Or
f n o p x o r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let d = instruction . d() ;
let s = instruction . s () ;
let t = instruction . t () ;

l e t v = s e l f . reg ( s ) ˆ s e l f . reg ( t ) ;

s e l f . s e t r e g (d , v) ;
}
}

With this instruction implemented the BIOS then goes on to write a bunch of
DMA registers and then gets stuck in an other infinite loop, polling GPUSTAT
once again.
We could look at what the BIOS is doing once again to try and figure out
the right value to return to let it continue but that would be a bit pointless at
that point. We’ve almost implemented all the CPU instructions anyway and
we’ve reach the part of the BIOS where the bootup logo is drawn. We need to
implement the DMA to send the commands to the GPU and then emulate the
GPU itself to accept those commands and draw on the screen.
Before we move on though let’s implement the handful of CPU opcodes we
haven’t yet encountered. At this point we’ve implemented 48 opcodes and 19
are remaining. Fortunately most of those are variations of instructions we’ve
already implemented so let’s get this over with.

2.94 BREAK instructions

BREAK triggers an exception like SYSCALL but it sets code 9 in the CAUSE
register. This instruction is generally meant to create software breakpoints in

83
code for debugging purposes but I imagine some games might abuse it for other
purposes.
This instruction is encoded by setting bits [31:26] of the instruction to zero
and bits [5:0] to 0xd.
impl Cpu {
// . . .

// / Break
f n o p b r e a k (&mut s e l f , : Instruction ) {
s e l f . e x c e p t i o n ( E x c e p t i o n : : Break ) ;
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// . . .

// / B r e a k p o i n t ( c a u s e d by t h e BREAK opcode )
Break = 0x9 ,
}

2.95 MULT instruction

“Multiply” (MULT) is simply the signed counterpart to MULTU. It multiplies its
operands using 64bit signed arithmetics and stores stores the result in HI and
LO.
This instruction is encoded by setting bits [31:26] of the instruction to zero
and bits [5:0] to 0x18.
impl Cpu {
// . . .

// / M u l t i p l y ( s i g n e d )
f n op mult (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;

l e t a = ( s e l f . reg ( s ) as i32 ) as i64 ;

l e t b = ( s e l f . reg ( t ) as i32 ) as i64 ;

l e t v = ( a ∗ b ) a s u64 ;

s e l f . h i = ( v >> 3 2 ) a s u32 ;
s e l f . l o = v a s u32 ;
}
}

All those casts are a bit ugly but they’re necessary to get the proper sign
extension.

2.96 SUB instruction

“Substract” (SUB) is like SUBU but with signed arithmetics and it triggers an
exception on signed overflow.
This instruction is encoded by setting bits [31:26] of the instruction to zero
and bits [5:0] to 0x22.

84
impl Cpu {
// . . .

// / S u b s t r a c t and c h e c k f o r s i g n e d o v e r f l o w
f n o p s u b (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
let s = instruction . s () ;
let t = instruction . t () ;
let d = instruction . d() ;

l e t s = s e l f . reg ( s ) as i32 ;
l e t t = s e l f . reg ( t ) as i32 ;

match s . c h e c k e d s u b ( t ) {
Some ( v ) => s e l f . s e t r e g ( d , v a s u32 ) ,
None => s e l f . e x c e p t i o n ( E x c e p t i o n : : O v e r f l o w ) ,
}
}
}

2.97 XORI instruction

“Exclusive or immediate” (XORI) is the version of the XOR instruction taking
an immediate operand. We can implement it by taking the code for ORI and
changing the operator.
This instruction is encoded by setting bits [31:26] of the instruction to 0xe.
impl Cpu {
// . . .

// / B i t w i s e e X c l u s i v e Or Immediate
f n o p x o r i (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
l e t i = i n s t r u c t i o n . imm ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t v = s e l f . reg ( s ) ˆ i ;

s e l f . set reg (t , v) ;
}
}

2.98 Cop1, cop2 and cop3 opcodes

We’ve implemented cop0 instructions (MTC0, RFE etc. . . ). The three other
coprocessors can also have dedicated opcodes. On the Playstation however cop1
and cop3 are not used so any instruction targeting them will trigger an exception
with code 0xb to signal a coprocessor error..
Cop1 and cop3 opcodes are encoded by setting bits [31:26] of the instruction
to 0x11 and 0x13 respectively.
impl Cpu {
// . . .

// / C o p r o c e s s o r 1 opcode ( d o e s not e x i s t on t h e P l a y s t a t i o n )
f n o p c o p 1 (&mut s e l f , : Instruction ) {
s e l f . exception ( Exception : : CoprocessorError ) ;
}

85
// / C o p r o c e s s o r 3 opcode ( d o e s not e x i s t on t h e P l a y s t a t i o n )
f n o p c o p 3 (&mut s e l f , : Instruction ) {
s e l f . exception ( Exception : : CoprocessorError ) ;
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// . . .

// / Unsupported c o p r o c e s s o r o p e r a t i o n
C o p r o c e s s o r E r r o r = 0xb ,
}

Cop2 however is implemented on the Playstation: it’s the Geometry Trans-

form Engine (GTE). We don’t need to implement the GTE for now so let’s just
add a dummy implementation that will crash the emulator if a GTE instruction
is encountered.
Cop opcodes are encoded by setting bits [31:26] of the instruction to 0x12.
impl Cpu {
// . . .

// / C o p r o c e s s o r 2 opcode (GTE)
f n o p c o p 2 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
p a n i c ! ( ” unhandled GTE i n s t r u c t i o n : {} ” , i n s t r u c t i o n ) ;
}
}

2.99 Non-aligned reads

So far we’ve seen that all CPU memory transactions had to be properly aligned
or they would trigger an exception. The MIPS instruction set does however have
limited support for unaligned access. For unaligned reads it provides “load word
left” (LWL) and “load word right” (LWR).
Both those instruction work by fetching the aligned word containing the
addressed byte and then shifting the value to only update the correct portion of
the target register.
Therefore in order to load a single unaligned word you need to run a both a
LWL and a LWR in sequence (the order doesn’t matter) to fetch the 32bits.
The behaviour of both these instructions changes depending on whether the
CPU is running in big or little-endian mode. Since the PSX runs exclusively in
little endian we can ignore the other case.
For a little endian architecture and assuming $2 contains the potentially
unaligned load address the sequence would look like this:
/∗ Load r i g h t p a r t o f p o t e n t i a l l y u n a l i g n e d word a t $2 ∗/
l w r $1 , 0 ( $2 )
/∗ Load l e f t p a r t o f p o t e n t i a l l y u n a l i g n e d word a t $2 ∗/
l w l $1 , 3 ( $2 )

After this sequence $1 contains the 4byte little endian value at the address
stored in $2 regardless of its alignment.
You can see that the LWL instruction is given an offset of 3. If the address
was correctly aligned we remain within the same aligned 32bit word, otherwise
we’ve moved to the next one.

86
Okay, that might sound a bit complicated, hopefully everything will be clearer
when we see the code of the implementation.
Before that however it’s important to note a specificity of these unaligned
word instructions: you’ll notice that in my asm snippet above I run the two
instructions back-to-back without delay. That’s because those instructions can
merge their data with that of a pending load without having to wait for the load
to finish.
For other load instructions it wouldn’t make a lot of sense (why would you
want to load twice to the same target register without doing anything with the
first value?) but since LWL and LWR are meant to be used together to load a
single value it makes sense to spare a cycle there25 .

2.99.1 LWL instruction

The “load word left” (LWL) opcode is encoded by setting bits [31:26] of the
instruction to 0x22.
impl Cpu {
// . . .

// / Load Word L e f t ( l i t t l e −e n d i a n o n l y i m p l e m e n t a t i o n )
f n o p l w l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// This i n s t r u c t i o n b y p a s s e s t h e l o a d d e l a y r e s t r i c t i o n :
this
// i n s t r u c t i o n w i l l merge t h e new c o n t e n t s with t h e v a l u e
// c u r r e n t l y b e i n g l o a d e d i f need be .
l e t cur v = s e l f . o u t r e g s [ t . 0 as u s i z e ] ;

// Next we l o a d t h e ∗ a l i g n e d ∗ word c o n t a i n i n g t h e f i r s t
// a d d r e s s e d b y t e
l e t a l i g n e d a d d r = addr & ! 3 ;
l e t aligned word = s e l f . load32 ( aligned addr ) ;

// Depending on t h e a d d r e s s a l i g n m e n t we f e t c h t h e 1 , 2 , 3
or
// 4 ∗ most ∗ s i g n i f i c a n t b y t e s and put them i n t h e t a r g e t
// r e g i s t e r .
l e t v = match addr & 3 {
0 => ( c u r v & 0 x 0 0 f f f f f f ) | ( a l i g n e d w o r d << 2 4 ) ,
1 => ( c u r v & 0 x 0 0 0 0 f f f f ) | ( a l i g n e d w o r d << 1 6 ) ,
2 => ( c u r v & 0 x 0 0 0 0 0 0 f f ) | ( a l i g n e d w o r d << 8 ) ,
3 => ( c u r v & 0 x00000000 ) | ( a l i g n e d w o r d << 0 ) ,
=> u n r e a c h a b l e ! ( ) ,
};

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . load = ( t , v) ;
25 Interesting bit of trivia: apparently the LWL and LWR instructions were patented. The

patent expired in 2006 and some people claimed that it might also have covered software
implementations. If that’s true it means one could not have distributed our emulator without
a license from MIPS Computer Systems.

87
}
}

Hopefully the comments are clear enough to follow what the code is doing.
You can see that LWL updates one, two, three or all four bytes in the target
register depending on the address alignment.
Note the direct reference to self.out regs instead of our usual helper to
make sure we ignore the load delay when the two instructions are used in
sequence.

2.99.2 LWR instruction

The “load word right” (LWR) opcode is encoded by setting bits [31:26] of the
instruction to 0x26. The implementation is very similar to LWL with a few key
changes:
impl Cpu {
// . . .

// / Load Word Right ( l i t t l e −e n d i a n o n l y i m p l e m e n t a t i o n )

f n o p l w r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// Depending on t h e a d d r e s s a l i g n m e n t we f e t c h t h e 1 , 2 , 3
or
// 4 ∗ l e a s t ∗ s i g n i f i c a n t b y t e s and put them i n t h e t a r g e t
// r e g i s t e r .
l e t v = match addr & 3 {
0 => ( c u r v & 0 x00000000 ) | ( a l i g n e d w o r d >> 0 ) ,
1 => ( c u r v & 0 x f f 0 0 0 0 0 0 ) | ( a l i g n e d w o r d >> 8 ) ,
2 => ( c u r v & 0 x f f f f 0 0 0 0 ) | ( a l i g n e d w o r d >> 1 6 ) ,
3 => ( c u r v & 0 x f f f f f f 0 0 ) | ( a l i g n e d w o r d >> 2 4 ) ,
=> u n r e a c h a b l e ! ( ) ,
};

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . load = ( t , v) ;
}
}

You can see that like LWL we update from one to four bytes depending on
the alignment, however this time it’s the least significant bytes.

88
2.100 Non-aligned writes
Naturally the MIPS instruction set doesn’t only support loading non-aligned
words, it can also store them using “store word left” (SWL) and “store word
right” (SWR).
The concept is the same: to store a 32bit integer at an unaligned access one
would call SWR and SWL in sequence to update the entire word.

2.100.1 SWL instruction

The “store word left” (SWL) opcode is encoded by setting bits [31:26] of the
instruction to 0x2a. Since we only update part of the aligned target word we
have to fetch its value before we can modify it and store it back again:
impl Cpu {
// . . .

// / S t o r e Word L e f t ( l i t t l e −e n d i a n o n l y i m p l e m e n t a t i o n )
f n o p s w l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

let v = s e l f . reg ( t ) ;

l e t a l i g n e d a d d r = addr & ! 3 ;
// Load t h e c u r r e n t v a l u e f o r t h e a l i g n e d word a t t h e
target
// a d d r e s s
l e t cur mem = s e l f . l o a d 3 2 ( a l i g n e d a d d r ) ;

l e t mem = match addr & 3 {

0 => ( cur mem & 0 x f f f f f f 0 0 ) | (v >> 24) ,
1 => ( cur mem & 0 x f f f f 0 0 0 0 ) | (v >> 16) ,
2 => ( cur mem & 0 x f f 0 0 0 0 0 0 ) | (v >> 8) ,
3 => ( cur mem & 0 x00000000 ) | (v >> 0) ,
=> u n r e a c h a b l e ! ( ) ,
};

s e l f . s t o r e 3 2 ( a l i g n e d a d d r , mem) ;
}
}

2.100.2 SWR instruction

The “store word right” (SWR) opcode is encoded by setting bits [31:26] of the
instruction to 0x2e. It’s very similar to SWL except for a a few key differences:
impl Cpu {
// . . .

// / S t o r e Word Right ( l i t t l e −e n d i a n o n l y i m p l e m e n t a t i o n )
f n o p s w r (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

89
l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;
let v = s e l f . reg ( t ) ;

l e t mem = match addr & 3 {

0 => ( cur mem & 0 x00000000 ) | (v << 0) ,
1 => ( cur mem & 0 x 0 0 0 0 0 0 f f ) | (v << 8) ,
2 => ( cur mem & 0 x 0 0 0 0 f f f f ) | (v << 16) ,
3 => ( cur mem & 0 x 0 0 f f f f f f ) | (v << 24) ,
=> u n r e a c h a b l e ! ( ) ,
};

s e l f . s t o r e 3 2 ( a l i g n e d a d d r , mem) ;
}
}

2.101 Coprocessor loads and stores

We’ve seen that MTC0 and MFC0 can be used to move data between the general
purpose registers and the coprocessor 0. That means that if you want to load or
store a cop0 register value from or to the memory we have to pass through the
CPU general purpose registers.
The coprocessor 2 (the GTE) supports an additional, more optimized way to
do this: “load word to coprocessor 2” (LWC2) and “store word from coprocessor
2”(SWC2). Those instructions respectively load and store a cop2 register directly
from and to memory.
Since the other coprocessors don’t support these opcodes they generate a
“coprocessor error” exception when they’re encountered.

2.101.1 LWCn instructions

“Load word coprocessor n” (LWCn) opcodes are encoded by setting bits [31:26]
of the instruction 0x30 + n.
impl Cpu {
// . . .

// / Load Word i n C o p r o c e s s o r 0
f n o p l w c 0 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}

// / Load Word i n C o p r o c e s s o r 1
f n o p l w c 1 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}

// / Load Word i n C o p r o c e s s o r 2
f n o p l w c 2 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
p a n i c ! ( ” unhandled GTE LWC: {} ” , i n s t r u c t i o n ) ;
}

90
// / Load Word i n C o p r o c e s s o r 3
f n o p l w c 3 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}
}

2.101.2 SWCn instructions

“Store word coprocessor n” (SWCn) opcodes are encoded by setting bits [31:26]
of the instruction 0x38 + n.
impl Cpu {
// . . .

// / S t o r e Word i n C o p r o c e s s o r 0
f n op swc0 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}

// / S t o r e Word i n C o p r o c e s s o r 1
f n op swc1 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}

// / S t o r e Word i n C o p r o c e s s o r 2
f n op swc2 (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
p a n i c ! ( ” unhandled GTE SWC: {} ” , i n s t r u c t i o n ) ;
}

// / S t o r e Word i n C o p r o c e s s o r 3
f n op swc3 (&mut s e l f , : Instruction ) {
// Not s u p p o r t e d by t h i s c o p r o c e s s o r
s e l f . exception ( Exception : : CoprocessorError ) ;
}
}

2.102 Illegal instructions

We now have implemented (at least partially) all the CPU instructions! That
doesn’t mean that our CPU is complete: we still have to implement the GTE
coprocessor and the cache for instance but that will wait for later.
We can also take this opportunity to implement illegal instructions. For
instance instruction 0x50000000 doesn’t encode any valid instruction on the
Playstation CPU and is therefore illegal.
Illegal instructions simply trigger an exception on the CPU with the code
0xa in the CAUSE register.
Knowing that we can complete our decode and execute function, here’s
what it should look like with all instructions implemented:
impl Cpu {
// . . .

// / Decode ‘ i n s t r u c t i o n ‘ ’ s opcode and run t h e f u n c t i o n

f n d e c o d e a n d e x e c u t e (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {

91
match i n s t r u c t i o n . f u n c t i o n ( ) {
0 b000000 => match i n s t r u c t i o n . s u b f u n c t i o n ( ) {
0 b000000 => s e l f . o p s l l ( i n s t r u c t i o n ) ,
0 b000010 => s e l f . o p s r l ( i n s t r u c t i o n ) ,
0 b000011 => s e l f . o p s r a ( i n s t r u c t i o n ) ,
0 b000100 => s e l f . o p s l l v ( i n s t r u c t i o n ) ,
0 b000110 => s e l f . o p s r l v ( i n s t r u c t i o n ) ,
0 b000111 => s e l f . o p s r a v ( i n s t r u c t i o n ) ,
0 b001000 => s e l f . o p j r ( i n s t r u c t i o n ) ,
0 b001001 => s e l f . o p j a l r ( i n s t r u c t i o n ) ,
0 b001100 => s e l f . o p s y s c a l l ( i n s t r u c t i o n ) ,
0 b001101 => s e l f . o p b r e a k ( i n s t r u c t i o n ) ,
0 b010000 => s e l f . o p m f h i ( i n s t r u c t i o n ) ,
0 b010001 => s e l f . op mthi ( i n s t r u c t i o n ) ,
0 b010010 => s e l f . o p m f l o ( i n s t r u c t i o n ) ,
0 b010011 => s e l f . o p m t l o ( i n s t r u c t i o n ) ,
0 b011000 => s e l f . op mult ( i n s t r u c t i o n ) ,
0 b011001 => s e l f . op multu ( i n s t r u c t i o n ) ,
0 b011010 => s e l f . o p d i v ( i n s t r u c t i o n ) ,
0 b011011 => s e l f . o p d i v u ( i n s t r u c t i o n ) ,
0 b100000 => s e l f . op add ( i n s t r u c t i o n ) ,
0 b100001 => s e l f . op addu ( i n s t r u c t i o n ) ,
0 b100010 => s e l f . o p s u b ( i n s t r u c t i o n ) ,
0 b100011 => s e l f . op subu ( i n s t r u c t i o n ) ,
0 b100100 => s e l f . op and ( i n s t r u c t i o n ) ,
0 b100101 => s e l f . o p o r ( i n s t r u c t i o n ) ,
0 b100110 => s e l f . o p x o r ( i n s t r u c t i o n ) ,
0 b100111 => s e l f . o p n o r ( i n s t r u c t i o n ) ,
0 b101010 => s e l f . o p s l t ( i n s t r u c t i o n ) ,
0 b101011 => s e l f . o p s l t u ( i n s t r u c t i o n ) ,
=> s e l f . o p i l l e g a l ( i n s t r u c t i o n ) ,
},
0 b000001 => s e l f . op bxx ( i n s t r u c t i o n ) ,
0 b000010 => s e l f . o p j ( i n s t r u c t i o n ) ,
0 b000011 => s e l f . o p j a l ( i n s t r u c t i o n ) ,
0 b000100 => s e l f . o p b eq ( i n s t r u c t i o n ) ,
0 b000101 => s e l f . op bne ( i n s t r u c t i o n ) ,
0 b000110 => s e l f . o p b l e z ( i n s t r u c t i o n ) ,
0 b000111 => s e l f . o p b g t z ( i n s t r u c t i o n ) ,
0 b001000 => s e l f . o p a d d i ( i n s t r u c t i o n ) ,
0 b001001 => s e l f . o p a d d i u ( i n s t r u c t i o n ) ,
0 b001010 => s e l f . o p s l t i ( i n s t r u c t i o n ) ,
0 b001011 => s e l f . o p s l t i u ( i n s t r u c t i o n ) ,
0 b001100 => s e l f . o p a n d i ( i n s t r u c t i o n ) ,
0 b001101 => s e l f . o p o r i ( i n s t r u c t i o n ) ,
0 b001110 => s e l f . o p x o r i ( i n s t r u c t i o n ) ,
0 b001111 => s e l f . o p l u i ( i n s t r u c t i o n ) ,
0 b010000 => s e l f . o p c o p 0 ( i n s t r u c t i o n ) ,
0 b010001 => s e l f . o p c o p 1 ( i n s t r u c t i o n ) ,
0 b010010 => s e l f . o p c o p 2 ( i n s t r u c t i o n ) ,
0 b010011 => s e l f . o p c o p 3 ( i n s t r u c t i o n ) ,
0 b100000 => s e l f . o p l b ( i n s t r u c t i o n ) ,
0 b100001 => s e l f . o p l h ( i n s t r u c t i o n ) ,
0 b100010 => s e l f . o p l w l ( i n s t r u c t i o n ) ,
0 b100011 => s e l f . o p l w ( i n s t r u c t i o n ) ,
0 b100100 => s e l f . o p l b u ( i n s t r u c t i o n ) ,
0 b100101 => s e l f . o p l h u ( i n s t r u c t i o n ) ,
0 b100110 => s e l f . o p l w r ( i n s t r u c t i o n ) ,
0 b101000 => s e l f . o p s b ( i n s t r u c t i o n ) ,
0 b101001 => s e l f . o p s h ( i n s t r u c t i o n ) ,
0 b101010 => s e l f . o p s w l ( i n s t r u c t i o n ) ,
0 b101011 => s e l f . op sw ( i n s t r u c t i o n ) ,

92
0 b101110 => self . op swr ( i n s t r u c t i o n ) ,
0 b110000 => self . op lwc0 ( i n s t r u c t i o n ) ,
0 b110001 => self . op lwc1 ( i n s t r u c t i o n ) ,
0 b110010 => self . op lwc2 ( i n s t r u c t i o n ) ,
0 b110011 => self . op lwc3 ( i n s t r u c t i o n ) ,
0 b111000 => self . op swc0 ( i n s t r u c t i o n ) ,
0 b111001 => self . op swc1 ( i n s t r u c t i o n ) ,
0 b111010 => self . op swc2 ( i n s t r u c t i o n ) ,
0 b111011 => self . op swc3 ( i n s t r u c t i o n ) ,
=> self . op illegal ( instruction ) ,
}
}

// / I l l e g a l i n s t r u c t i o n
f n o p i l l e g a l (&mut s e l f , i n s t r u c t i o n : I n s t r u c t i o n ) {
println ! ( ” I l l e g a l instruction {}! ” , instruction ) ;
s e l f . exception ( Exception : : I l l e g a l I n s t r u c t i o n ) ;
}
}

// / E x c e p t i o n t y p e s ( a s s t o r e d i n t h e ‘CAUSE‘ r e g i s t e r )
enum E x c e p t i o n {
// . . .

// / CPU e n c o u n t e r e d an unknown i n s t r u c t i o n
I l l e g a l I n s t r u c t i o n = 0xa ,
}

That’s quite a milestone but it’s only the beginning. While implementing all
those instructions and stepping through the BIOS we’ve seen that it tries to use
many peripherals: the SPU, the timers, the DMA and the GPU in particular.
At this point my first objective is to display an image to the screen so I want
to start implementing the GPU as soon as possible. But we won’t be able to do
anything useful with the GPU without the DMA, so let’s start with that.

3 The DMA: Ordering tables and the GPU

The DMA is used to move data back and forth between the RAM and a peripheral
(GPU, CDROM, SPU, etc. . . ). The CPU could achieve the same results by a
series of loads/stores but the DMA is generally much faster.
The Playstation DMA controller lives alongside the CPU and shares the
memory BUS with it. It means that while the DMA is busy transferring data the
CPU is stopped: only one device can access the BUS at a given time. The DMA
can only copy data between the RAM and a device, not directly between two
devices. For instance you can’t copy a texture from the CDROM directly into
the GPU with the DMA, you first have to make a transfer from the CDROM
into the main RAM and then a 2nd one between the RAM and the GPU.
There are 7 DMA channels on the Playstation:

• Channel 0 is connected to the Media Decoder input

• Channel 1 is connected to the Media Decoder output

• Channel 2 is connected to the GPU

• Channel 3 is connected to the CDROM drive

93
• Channel 4 is connected to the SPU
• Channel 5 is connected to the extension port
• Channel 6 is only connected to the RAM and is used to clear an “ordering
table”

Implementing complete and accurate DMA support can be quite tricky. The
main problem is that in certain modes the DMA sporadically gives back the
control to the CPU. For instance while the GPU is busy processing a command
and won’t accept any new input the DMA has to wait. Instead of wasting time
it gives back control to the CPU to give it the opportunity to do something else.
In order to emulate this behaviour correctly we need to emulate the GPU
command FIFO, DMA timings and CPU timings correctly. Then we need to
setup the state machine to switch between the CPU and DMA when needed.
That would require quite some work to get right and we only have the BIOS
boot logo to test it at this point.
To avoid having to implement all that we’re going to make a simplifying
assumption for now: when the DMA runs it does all the transfer at once without
giving back control to the CPU. This won’t be exactly accurate but it should
suffice to run the BIOS and hopefully some games.
The reason I feel confident doing this simplification is that PCSX-R seems to
do it that way and it can run quite many games, although some comments hint
that it breaks with certain titles and it uses some hacks to improve compatibility.
Mednafen on the other hand implements a much accurate DMA and actually
emulates the DMA giving back the control to the CPU in certain situations,
we’ll probably want to do something similar later on.
For now let’s take a few steps back and revisit all the DMA register reads
and writes done by the BIOS so that we can emulate them correctly.

3.1 DMA Control register

If we look at the DMA register access in our emulator we can see that the
first one is a read at 0x1f8010f0 which is offset 0x70 in the DMA register
range. This register is the DMA Control register which sets the priority of each
channel and whether or not they’re enabled. We don’t really care about the port
priorities since we’ll be running each channel transaction entirely at once (so
we’ll never have two channels active at once for now) and I’m not entirely sure
what disabling a channel does (does it prevent accessing the channel’s register?
What happens if a game attemps to start a disabled channel?). For now we’ll
just implement a dummy register read/write access.
We’re going to wrap the DMA code in a dedicated struct to keep our code tidy.
The Nocash spec says that the reset value for the control register is 0x07654321
which means that all channels are disabled and the priority increases with the
channel number:
// / D i r e c t Memory A c c e s s
pub s t r u c t Dma {
// / DMA c o n t r o l r e g i s t e r
c o n t r o l : u32 ,
}

impl Dma {

94
pub f n new ( ) −> Dma {
Dma {
// R e s e t v a l u e t a k e n from t h e Nocash PSX s p e c
c o n t r o l : 0 x07654321 ,
}
}

// / R e t r i e v e t h e v a l u e o f t h e c o n t r o l r e g i s t e r
pub f n c o n t r o l (& s e l f ) −> u32 {
s e l f . control
}
}

We can then add an instance of this struct Dma in our interconnect and
glue our new control method when the register is accessed:
// / G l o b a l i n t e r c o n n e c t
pub s t r u c t I n t e r c o n n e c t {
// . . .

// / DMA r e g i s t e r s
dma : Dma,
}

impl I n t e r c o n n e c t {
pub f n new ( b i o s : B i o s ) −> I n t e r c o n n e c t {
Interconnect {
// . . .

dma : Dma : : new ( ) ,

}
}

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

if l e t Some ( o f f s e t ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . dma reg ( o f f s e t ) ;
}
}

// / DMA r e g i s t e r read
f n dma reg(& s e l f , o f f s e t : u32 ) −> u32 {
match o f f s e t {
0 x70 => s e l f . dma . c o n t r o l ( ) ,
=> p a n i c ! ( ” unhandled DMA a c c e s s ” )
}
}
}

The BIOS then writes back 0x076f4321 to the the same register which means
that it enables channel 4 (the SPU) and sets it priority to 7. Let’s implement
write support for the control register:
impl I n t e r c o n n e c t {
// . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {

95
return s e l f . set dma reg ( o f f s e t , val ) ;
}
}

// / DMA r e g i s t e r w r i t e
f n s e t d m a r e g (&mut s e l f , o f f s e t : u32 , v a l : u32 ) {
match o f f s e t {
0 x70 => s e l f . dma . s e t c o n t r o l ( v a l ) ,
=> p a n i c ! ( ” unhandled DMA w r i t e a c c e s s ” )
}
}
}

impl Dma {
// . . .

// / S e t t h e v a l u e o f t h e c o n t r o l r e g i s t e r
pub f n s e t c o n t r o l (&mut s e l f , v a l : u32 ) {
s e l f . control = val
}
}

Not very exciting so far.

3.2 DMA Interrupt register

After that the BIOS writes 0 to the DMA register at offset 0x74. This one is
the DMA Interrupt register and as its name implies it is used to configure and
acknowledge the DMA interrupts.
Bits [22:16] enable the interrupt individually for each channel. Bit 23 is the
master enable: if it’s 0 then no interrupt is generated by any channel. Bit 15 on
the other hand forces the generation of an interrupt continuously when it’s set.
When a channel generates an interrupt it needs to be acknowledged to reset
it to an inactive status. This is done by writing 1 to bits [24:30] (one bit per
channel). Finally bits [5:0] are read/write but I don’t know what they do, we’ll
juts preserve them and hope they’re not important.
While we’re at it we’ll also implement reading this register. When read bits
[24:30] contain the IRQ status for each channel and bit 31 says if an interrupt is
currently active. The other fields retain the last value written to them. In code
it looks like this26 :
// / D i r e c t Memory A c c e s s
pub s t r u c t Dma {
// . . .

// / master IRQ e n a b l e
i r q e n : bool ,
// / IRQ e n a b l e f o r i n d i v i d u a l c h a n n e l s
c h a n n e l i r q e n : u8 ,
// / IRQ f l a g s f o r i n d i v i d u a l c h a n n e l s
c h a n n e l i r q f l a g s : u8 ,
// / When s e t t h e i n t e r r u p t i s a c t i v e u n c o n d i t i o n a l l y ( even i f
// / ‘ i r q e n ‘ i s f a l s e )
f o r c e i r q : bool ,
26 You’ll notice that I split the register in individual variables, I prefer to do that when know

I’ll have to manipulate the fields individually. It makes the code clearer and less error prone in
my experience. It has a small cost however: it takes up a little more memory and we have to
pack/unpack them when handling registers read/writes.

96
// / B i t s [ 0 : 5 ] o f t h e i n t e r r u p t r e g i s t e r s a r e RW but I don ’ t
know
// / what they ’ r e su p po s ed t o do s o I j u s t s t o r e them and send
them
// / back untouched on r e a d s
irq dummy : u8 ,
}

impl Dma {
// . . .

// / Return t h e s t a t u s o f t h e DMA i n t e r r u p t
f n i r q (& s e l f ) −> b o o l {
let channel irq = s e l f . channel irq flags & s e l f .
channel irq en ;

self . force irq | | ( s e l f . i r q e n && c h a n n e l i r q != 0 )

}

// / R e t r i e v e t h e v a l u e o f t h e i n t e r r u p t r e g i s t e r
pub f n i n t e r r u p t (& s e l f ) −> u32 {
l e t mut r = 0 ;

r |= s e l f . irq dummy a s u32 ;

r |= ( s e l f . f o r c e i r q a s u32 ) << 1 5 ;
r |= ( s e l f . c h a n n e l i r q e n a s u32 ) << 1 6 ;
r |= ( s e l f . i r q e n a s u32 ) << 2 3 ;
r |= ( s e l f . c h a n n e l i r q f l a g s a s u32 ) << 2 4 ;
r |= ( s e l f . i r q ( ) a s u32 ) << 3 1 ;

r
}

// / S e t t h e v a l u e o f t h e i n t e r r u p t r e g i s t e r
pub f n s e t i n t e r r u p t (&mut s e l f , v a l : u32 ) {
// Unknown what b i t s [ 5 : 0 ] do
s e l f . irq dummy = ( v a l & 0 x 3 f ) a s u8 ;

s e l f . f o r c e i r q = ( v a l >> 1 5 ) & 1 != 0 ;

s e l f . c h a n n e l i r q e n = ( ( v a l >> 1 6 ) & 0 x 7 f ) a s u8 ;

s e l f . i r q e n = ( v a l >> 2 3 ) & 1 != 0 ;

// W r i t i n g 1 t o a f l a g r e s e t s i t
l e t ack = ( ( v a l >> 2 4 ) & 0 x 3 f ) a s u8 ;
s e l f . c h a n n e l i r q f l a g s &= ! ack ;
}
}

Then you’ll have to plug those accessor methods in the interconnect as

usual27 .

3.3 DMA Channel Control register

The next DMA access is at offset 0x28. This is the control register for channel 2
(the GPU). This register contain many important fields described in table 8.
27 From now on I’m not going to bother putting the glue code in the interconnect here when
it’s straightforward. If you’re having doubts you can look up the source code of the emulator
in the repository.

97
Field bits Description
0 Transfer direction: RAM-to-device(0) or device-to-RAM(1)
1 Address increment(0) or decrement(1) mode
2 Chopping mode
[10 : 9] Synchronization type: Manual(0), Request(1) or Linked List(2)
[18 : 16] Chopping DMA window
[22 : 20] Chopping CPU window
24 Enable
28 Manual trigger
[30 : 29] Unknown

Table 8: DMA Channel Control register description

Bit 0 sets the transfer direction (RAM-to-device or device-to-RAM), bit 1

tells us whether the DMA must increment or decrement the address in RAM
during the transfer.
Bits [10:9] configure the type of synchronization: the DMA either copies all
the data at once (Manual sync) or it can wait for the device to raise a “ready”
flag to request more data or say that data is available when reading (Request
sync). There’s a third mode, Linked List sync, which is used with the GPU.
We’ll explain what it does when we look at ordering tables in a moment.
Bit 24 enables the channel and starts the transfer in Request or Linked List
sync mode. Bit 28 is the trigger to start the tranfer in Manual sync mode.
Bit 8 enables “chopping”: when active the DMA will periodically stop to let
the CPU run for a while. Bits “[18:16]” and “[22:20]” respectively say how often
and for how long the control must be given back to the CPU. At this point I’m
not entirely sure if chopping only works in Manual sync mode or all the time. It
doesn’t really matter since we won’t implement it for now.
Finally bits [30:29] are read/write but I don’t know what they do28 .
The current value of ‘0x401‘ sets the transfer direction to RAM-to-device
and the sync mode to Linked List. It doesn’t set bit ‘24‘ to enable the channel
however so nothing happens29 .
Since there are 7 DMA channels I’m going to factor all channel-related code
in a Channel structure:
// / Per−c h a n n e l data
s t r u c t Channel {
enable : bool ,
direction : Direction ,
s t e p : Step ,
s y n c : Sync ,
// / Used t o s t a r t t h e DMA t r a n s f e r when ‘ sync ‘ i s ‘ Manual ‘
t r i g g e r : bool ,
// / I f t r u e t h e DMA ” chops ” t h e t r a n s f e r and l e t s t h e CPU run
// / i n t h e gaps .
chop : b o o l ,
// / Chopping DMA window s i z e ( l o g 2 number o f words )
chop dma sz : u8 ,
// / Chopping CPU window s i z e ( l o g 2 number o f c y c l e s )
c h o p c p u s z : u8 ,
28 The Nocash docs speculate that bit 29 might be used to pause an ongoing transfer but

that will require some more testing.

29 And it’s a good thing since at that point no start address has been set!

98
// / Unkown 2 RW b i t s i n c o n f i g u r a t i o n r e g i s t e r
dummy : u8 ,
}

impl Channel {
f n new ( ) −> Channel {
Channel {
enable : f a l s e ,
d i r e c t i o n : D i r e c t i o n : : ToRam,
s t e p : Step : : Increment ,
s y n c : Sync : : Manual ,
trigger : false ,
chop : f a l s e ,
chop dma sz : 0 ,
chop cpu sz : 0 ,
dummy : 0 ,
}
}

pub f n c o n t r o l (& s e l f ) −> u32 {

l e t mut r = 0 ;

r |= ( self . d i r e c t i o n a s u32 ) << 0 ;

r
}

pub f n s e t c o n t r o l (&mut s e l f , v a l : u32 ) {

s e l f . d i r e c t i o n = match v a l & 1 != 0 {
t r u e => D i r e c t i o n : : FromRam ,
f a l s e => D i r e c t i o n : : ToRam,
};

s e l f . s t e p = match ( v a l >> 1 ) & 1 != 0 {

t r u e => Step : : Decrement ,
f a l s e => Step : : Increment ,
};

s e l f . chop = ( v a l >> 8 ) & 1 != 0 ;

s e l f . s y n c = match ( v a l >> 9 ) & 3 {

0 => Sync : : Manual ,
1 => Sync : : Request ,
2 => Sync : : L i n k e d L i s t ,
n => p a n i c ! ( ”Unknown DMA s y n c mode {} ” , n ) ,
};

s e l f . chop dma sz = ( ( v a l >> 1 6 ) & 7 ) a s u8 ;

s e l f . c h o p c p u s z = ( ( v a l >> 2 0 ) & 7 ) a s u8 ;

s e l f . e n a b l e = ( v a l >> 2 4 ) & 1 != 0 ;
s e l f . t r i g g e r = ( v a l >> 2 8 ) & 1 != 0 ;

99
s e l f . dummy = ( ( v a l >> 2 9 ) & 3 ) a s u8 ;
}
}

// / DMA t r a n s f e r d i r e c t i o n
pub enum D i r e c t i o n {
ToRam = 0,
FromRam = 1 ,
}

// / DMA t r a n s f e r s t e p
pub enum Step {
Increment = 0 ,
Decrement = 1 ,
}

// / DMA t r a n s f e r s y n c h r o n i z a t i o n mode
pub enum Sync {
// / T r a n s f e r s t a r t s when t h e CPU w r i t e s t o t h e T r i g g e r b i t and
// / t r a n s f e r s e v e r y t h i n g a t once
Manual = 0 ,
// / Sync b l o c k s t o DMA r e q u e s t s
Request = 1 ,
// / Used t o t r a n s f e r GPU command l i s t s
LinkedList = 2 ,
}

We can then put an array of 7 Channel instances in our struct Dma with
some methods to access them in the interconnect:
// / D i r e c t Memory A c c e s s
pub s t r u c t Dma {
// . . .

// / The 7 c h a n n e l i n s t a n c e s
c h a n n e l s : [ Channel ; 7 ] ,
}

impl Dma {
// . . .

// / Return a r e f e r e n c e t o a c h a n n e l by p o r t number .
pub f n c h a n n e l (& s e l f , p o r t : Port ) −> &Channel {
&s e l f . channels [ port as u s i z e ]
}

// / Return a mutable r e f e r e n c e t o a c h a n n e l by p o r t number .

pub f n c h a n n e l m u t (&mut s e l f , p o r t : Port ) −> &mut Channel {
&mut s e l f . c h a n n e l s [ p o r t a s u s i z e ]
}
}

// / The 7 DMA p o r t s
pub enum Port {
// / Macroblock d e c o d e r i n p u t
MdecIn = 0 ,
// / Macroblock d e c o d e r o ut pu t
MdecOut = 1 ,
// / G r a p h i c s P r o c e s s i n g Unit
Gpu = 2 ,
// / CD−ROM d r i v e
CdRom = 3 ,
// / Sound P r o c e s s i n g Unit

100
Spu = 4,
// / Extension port
Pio = 5,
// / Used t o c l e a r t h e o r d e r i n g t a b l e
Otc = 6,
}

impl Port {
pub f n f r o m i n d e x ( i n d e x : u32 ) −> Port {
match i n d e x {
0 => Port : : MdecIn ,
1 => Port : : MdecOut ,
2 => Port : : Gpu ,
3 => Port : : CdRom,
4 => Port : : Spu ,
5 => Port : : Pio ,
6 => Port : : Otc ,
n => p a n i c ! ( ” I n v a l i d p o r t {} ” , n ) ,
}
}
}

That’s quite a lot of code to parse one register but it should make our life
easier later on.
Since the 7 channels have the same register layout we can rewrite our
Interconnect methods to be a little more generic:
Impl I n t e r c o n n e c t {
// . . .

// / DMA r e g i s t e r r e a d
f n dma reg(& s e l f , o f f s e t : u32 ) −> u32 {
l e t major = ( o f f s e t & 0 x70 ) >> 4 ;
l e t minor = o f f s e t & 0 x f ;

match major {
// Per−c h a n n e l r e g i s t e r s
0 . . . 6 => {
l e t c h a n n e l = s e l f . dma . c h a n n e l ( Port : : f r o m i n d e x (
major ) ) ;

match minor {
8 => c h a n n e l . c o n t r o l ( ) ,
=> p a n i c ! ( ” Unhandled DMA r e a d a t { : x} ” ,
offset )
}
},
// Common DMA r e g i s t e r s
7 => match minor {
0 => s e l f . dma . c o n t r o l ( ) ,
4 => s e l f . dma . i n t e r r u p t ( ) ,
=> p a n i c ! ( ” Unhandled DMA r e a d a t { : x} ” , o f f s e t )
},
=> p a n i c ! ( ” Unhandled DMA r e a d a t { : x} ” , o f f s e t )
}
}

// / DMA r e g i s t e r w r i t e
f n s e t d m a r e g (&mut s e l f , o f f s e t : u32 , v a l : u32 ) {
l e t major = ( o f f s e t & 0 x70 ) >> 4 ;
l e t minor = o f f s e t & 0 x f ;

101
match major {
// Per−c h a n n e l r e g i s t e r s
0 . . . 6 => {
l e t p o r t = Port : : f r o m i n d e x ( major ) ;
l e t c h a n n e l = s e l f . dma . c h a n n e l m u t ( p o r t ) ;

match minor {
8 => c h a n n e l . s e t c o n t r o l ( v a l ) ,
=> p a n i c ! ( ” Unhandled DMA w r i t e { : x } : { : 0 8 x} ” ,
offset , val )
}
},
// Common DMA r e g i s t e r s
7 => {
match minor {
0 => s e l f . dma . s e t c o n t r o l ( v a l ) ,
4 => s e l f . dma . s e t i n t e r r u p t ( v a l ) ,
=> p a n i c ! ( ” Unhandled DMA w r i t e { : x } : { : 0 8 x} ” ,
offset , val ) ,
}
}
=> p a n i c ! ( ” Unhandled DMA w r i t e { : x } : { : 0 8 x} ” ,
offset , val ) ,
};
}
}

3.4 DMA Base Address register

After that the BIOS writes 0x800eb8d4 to DMA register offset 0x60. It means
that the BIOS now moved to channel 6 (Clear Ordering Table) and sets the
Base Address register. Only the low 24 bits are used since it’s plenty enough to
address the whole RAM. This one is pretty straightforward: it gives the address
of the first word to be read or written in RAM. We can add it to our struct
Channel:
// / Per−c h a n n e l data
s t r u c t Channel {
// . . .

// / DMA s t a r t a d d r e s s
b a s e : u32 ,
}

impl Channel {
// . . .

f n new ( ) −> Channel {

Channel {
// . . .

base : 0 ,
}
}

// / R e t r i e v e t h e c h a n n e l ’ s b a s e a d d r e s s
pub f n b a s e (& s e l f ) −> u32 {
s e l f . base
}

102
// / S e t c h a n n e l b a s e a d d r e s s . Only b i t s [ 0 : 2 3 ] a r e s i g n i f i c a n t
so
// / o n l y 16MB a r e a d d r e s s a b l e by t h e DMA
pub f n s e t b a s e (&mut s e l f , v a l : u32 ) {
s e l f . base = val & 0 x f f f f f f ;
}
}

3.5 DMA Block Control register

After plugging the base address methods in the interconnect we can proceed to
our next DMA register access: it’s the value 0x00000400 at offset 0x64. This
is our last unhandled DMA channel register: the Block Control. Its meaning
depends on the synchronization type in the channel Control register (see table 8:

• In Manual sync mode only the low 16bits are used and they contain the
number of words to transfer.
• In Request sync mode the low 16 bits contain the block size in words while
the upper 16bits contain the number of blocks to transfer. The DMA will
transfer a block at a time and wait for the device to assert the “request”
flag before starting a new block.
• In Linked List mode this register is not used.

We can store the contents of this registers in two u16s:

// / Per−c h a n n e l data
#[ d e r i v e ( Copy ) ]
s t r u c t Channel {
// . . .

// / S i z e o f a b l o c k i n words
b l o c k s i z e : u16 ,
// / Block count , Only used when ‘ sync ‘ i s ‘ Request ‘
b l o c k c o u n t : u16 ,
}

impl Channel {
// . . .

f n new ( ) −> Channel {

Channel {
block size : 0 ,
block count : 0 ,
}
}

// / R e t r i e v e v a l u e o f t h e Block C o n t r o l r e g i s t e r
pub f n b l o c k c o n t r o l (& s e l f ) −> u32 {
l e t bs = s e l f . b l o c k s i z e a s u32 ;
l e t bc = s e l f . b l o c k c o u n t a s u32 ;

( bc << 1 6 ) | bs
}

// / S e t v a l u e o f t h e Block C o n t r o l r e g i s t e r
pub f n s e t b l o c k c o n t r o l (&mut s e l f , v a l : u32 ) {
s e l f . b l o c k s i z e = v a l a s u16 ;

103
s e l f . b l o c k c o u n t = ( v a l >> 1 6 ) a s u16 ;
}
}

We can see that the BIOS initialized a base address and block size for channel
6, it’s no surprise that it then writes 0x11000002 to the channel control register.
The configuration is Manual sync mode, towards the RAM, with decreasing
addresses and it sets the enable and trigger bits to start the transfer.
We can now implement the DMA copy itself but before we do so we must
understand what this channel does exactly.

3.6 Depth Ordering Tables

DMA channel 6 is used to clear an ordering table in RAM. To understand what
it means we need a little background on the Playstation graphics pipeline.
The Playstation is an early 3D console and as such its 3D support is a bit
spotty. In particular the GPU doesn’t handle 3D primitives at all. That might be
surprising but as we’ll see later the GPU can only draw 2 dimensional primitives
like lines, triangles and rectangles in the framebuffer. There’s no Z coordinate
and therefore no z-buffer or anything like that. If you have two overlapping
triangles whichever is drawn last will appear on top of the other.
That means that when a game wants to render a 3D scene it can’t just create
a vertex buffer with 3D coordinates and have the GPU do the projection by itself
since it can only rasterize 2D graphics. Instead the CPU must do the projection
and send the draw commands in the right order (that is, from farthest to closest
from the point of view of the camera) to the GPU. This way closer objects will
appear above more distant ones when they overlap.
In order to do those computations more efficiently the CPU has a coproces-
sor called the “Geometry Transfor Engine” which can be used to project the
primitives and compute their distance to the camera.
All this code needs to be pretty efficient because in 3D games the camera’s
and objects’ positions can change at every frame which means that the position
of all primitives must potentially be recomputed every time. And in order to
do this more efficiently the Playstation hardware supports a construct called
“depth ordering tables”.
Let’s consider a concrete example: a game wants to draw a cube. In order
to do that it needs to render 6 quadrilaterals (or quads for short), one for each
side. Let’s assume that the player can move around the cube so that the game
doesn’t know ahead of time which side will be facing the camera.
We’ve seen that the game has to send the commands in the right order to the
GPU otherwise the back side might appear in front for instance. That means
that it must sort the primitives (the 6 quads) from back to front before sending
the commands to the GPU.
One possibility would be for the game to allocate a buffer big enough to
contain all the draw commands for the current scene, fill it with all the projected
primitives while sorting them in the correct order. If you want to draw a cube
that’s probably fine but for a complex scene with thousands of draw commands
the CPU load will become huge, it’ll spend its time sorting draw commands in
RAM.
Fortunately there’s an other solution: in order to keep the draw commands
ordered while not having to move things around all the time they’re stored in a

104
linked list. As you know inserting an entry between two elements in a linked list
is very cheap: you just rewrite the element’s list pointers and you’re done.
So here’s how a depth ordering table is implemented: each command is
stored in a “packet”, somewhere in RAM. A packet starts with a 32bit “header”
word. The low 24bits of that word are the address of the next packet in RAM
or 0xffffff if it’s the last item and the high 8bits are the number of words in
the packet.
You start with an empty table: you create an array of empty packets in RAM
(only 32bit headers with the high 8bits set to 0 to indicate they’re empty) and
you make each entry point to the address of the previous one and the last one
set to 0xffffff. So you have a linked list of empty elements stored in an array
in reverse order. Sounds silly but it’s actually very handy.
Now when the CPU wants to render a primitive it computes its distance to
the camera, normalizes it over the size of the ordering table and uses it as an
index. It can then take the value of the header at location in the table and insert
the draw command in the list at that point. This way it doesn’t have to iterate
through the entire list to figure out where the primitive goes, the ordering table
effectively works like a lookup table.
No matter the size of the scene, no matter how many elements have already
been inserted in the list you can always insert a new draw command by creating
a packet in ram, figuring out the depth index and updating two headers to insert
yourself in the right order. The computing cost is constant.
Of course, there can be collisions. Since there are only a finite number of
positions in the depth ordering tables two or more packets can end up sharing
the same slot. When that happens the newer element will point to the previous
one and will therefore be drawn first (regardless of whether it’s actually on front
or behind). The smaller the table the smaller the granularity. That explains
some of the visual glitches you can see in a lot of 3D games on the console, it’s
just a limitation of the hardware.
Once the game has finished projecting and sorting the scene’s draw command
it can send it to the GPU by starting from the last entry in the depth ordering
table and then iterating through the linked list until it reaches the 0xffffff
end-of-list marker.

3.7 DMA Clear Ordering Table channel

Enough theory, let’s implement DMA channel 6. So far I’ve encapsulated all
DMA-related code in the Dma and Channel structs, unfortunately putting the
copy code itself in them is a bit troublesome in Rust. The problem is that
this code needs to hold a reference to the various DMA-capable peripherals
(RAM, GPU, SPU, etc. . . ) but Rust adds a lot of constraints on references (and
especially mutable references) to make sure the code is completely memory-safe.
There are ways to work around that (using RefCells, unsafe code etc. . . )
but I don’t want to bother with any of this so I’m just going to implement the
copy code directly in the Interconnect since it already has access to all the
peripherals.
First, in the set dma reg function I’m going to check if a write to a DMA
channel register activated it:
impl I n t e r c o n n e c t {
// . . .

105
// / DMA r e g i s t e r w r i t e
f n s e t d m a r e g (&mut s e l f , o f f s e t : u32 , v a l : u32 ) {
l e t major = ( o f f s e t & 0 x70 ) >> 4 ;
l e t minor = o f f s e t & 0 x f ;

let active port =

match major {
// Per−c h a n n e l r e g i s t e r s
0 . . . 6 => {
l e t p o r t = Port : : f r o m i n d e x ( major ) ;
l e t c h a n n e l = s e l f . dma . c h a n n e l m u t ( p o r t ) ;

match minor {
0 => c h a n n e l . s e t b a s e ( v a l ) ,
4 => c h a n n e l . s e t b l o c k c o n t r o l ( v a l ) ,
8 => c h a n n e l . s e t c o n t r o l ( v a l ) ,
=>
p a n i c ! ( ” Unhandled DMA w r i t e { : x } : { : 0 8 x} ” ,
offset , val )
}

i f channel . a c t i v e ( ) {
Some ( p o r t )
} else {
None
}
},
// Common DMA r e g i s t e r s
7 => {
// . . .

None
}
=> p a n i c ! ( ” Unhandled DMA w r i t e { : x } : { : 0 8 x} ” ,
offset , val ) ,
};

if l e t Some ( p o r t ) = a c t i v e p o r t {
s e l f . do dma ( p o r t ) ;
}
}
}

impl Channel {
// . . .

// / Return t r u e i f t h e c h a n n e l has been s t a r t e d

pub f n a c t i v e (& s e l f ) −> b o o l {
// I n manual s y n c mode t h e CPU must s e t t h e ” t r i g g e r ” b i t
// t o s t a r t t h e t r a n s f e r .
l e t t r i g g e r = match s e l f . s y n c {
Sync : : Manual => s e l f . t r i g g e r ,
=> t r u e ,
};

s e l f . e n a b l e && t r i g g e r
}
}

Now the Interconnect’s do dma method will be called when a transfer must
take place.

106
The Manual and Request modes both copy blocks of data from/to the RAM.
Linked List mode is a bit different since it hops around the RAM following the
pointers in the headers. For this reason making a generic function to handle all
three modes will be a bit tricky, I prefer to handle linked list separately:
impl I n t e r c o n n e c t {
// . . .

// / Execute DMA t r a n s f e r f o r a p o r t
f n do dma(&mut s e l f , p o r t : Port ) {
// DMA t r a n s f e r has been s t a r t e d , f o r now l e t ’ s
// p r o c e s s e v e r y t h i n g i n one p a s s ( i . e . no
// c h o p p i n g o r p r i o r i t y h a n d l i n g )

match s e l f . dma . c h a n n e l ( p o r t ) . s y n c ( ) {
Sync : : L i n k e d L i s t => p a n i c ! ( ” Linked l i s t mode
unsupported ” ) ,
=> s e l f . d o d m a b l o c k ( p o r t ) ,
}
}
}

3.8 DMA Block copy

We can now implement the block copy function itself. We start at the base
address, we compute how many words we must copy by looking at the block
control values. Then we enter the copy loop: depending on the copy direction we
either read a word from RAM and send it to the device or the other way around.
Since channel 6 is only used to initialize an ordering table we only need to
implement the ‘ToRam‘ direction for now. Also the value copied into RAM
doesn’t come from an external peripheral, it’s just generated by the DMA based
on the current address:
impl I n t e r c o n n e c t {
// . . .

f n d o d m a b l o c k (&mut s e l f , p o r t : Port ) {
l e t c h a n n e l = s e l f . dma . c h a n n e l m u t ( p o r t ) ;

l e t i n c r e m e n t = match c h a n n e l . s t e p ( ) {
Step : : I n c r e m e n t => 4 ,
Step : : Decrement => −4,
};

l e t mut addr = c h a n n e l . b a s e ( ) ;

// T r a n s f e r s i z e i n words
l e t mut remsz = match c h a n n e l . t r a n s f e r s i z e ( ) {
Some ( n ) => n ,
// Shouldn ’ t happen s i n c e we shouldn ’ t be r e a c h i n g t h i s
// code i n l i n k e d l i s t mode
None =>
p a n i c ! ( ” Couldn ’ t f i g u r e out DMA b l o c k t r a n s f e r s i z e ” )
,
};

w h i l e remsz > 0 {
// Not s u r e what happens i f a d d r e s s i s
// bogus . . . Mednafen j u s t masks addr t h i s way , maybe

107
// t h a t ’ s how t h e hardware b e h a v e s ( i . e . t h e RAM
// a d d r e s s wraps and t h e two LSB a r e i g n o r e d , seems
// r e a s o n a b l e enough
l e t c u r a d d r = addr & 0 x 1 f f f f c ;

match c h a n n e l . d i r e c t i o n ( ) {
D i r e c t i o n : : FromRam => p a n i c ! ( ” Unhandled DMA
direction ”) ,
D i r e c t i o n : : ToRam => {
l e t s r c w o r d = match p o r t {
// C l e a r o r d e r i n g t a b l e
Port : : Otc => match remsz {
// L a s t e n t r y c o n t a i n s t h e end
// o f t a b l e marker
1 => 0 x f f f f f f ,
// P o i n t e r t o t h e p r e v i o u s e n t r y
=> addr . w r a p p i n g s u b ( 4 ) & 0 x 1 f f f f f ,
},
=> p a n i c ! ( ” Unhandled DMA s o u r c e p o r t {} ” ,
p o r t a s u8 ) ,
};

s e l f . ram . s t o r e 3 2 ( c u r a d d r , s r c w o r d ) ;
}
}

addr = addr . wrapping add ( i n c r e m e n t ) ;

remsz −= 1 ;
}

c h a n n e l . done ( ) ;
}
}

impl Channel {
// . . .

pub f n d i r e c t i o n (& s e l f ) −> D i r e c t i o n {

self . direction
}

pub f n s t e p (& s e l f ) −> Step {

s e l f . step
}

pub f n s y n c (& s e l f ) −> Sync {

s e l f . sync
}

// / Return t h e DMA t r a n s f e r s i z e i n b y t e s o r None f o r l i n k e d

// / l i s t mode .
pub f n t r a n s f e r s i z e (& s e l f ) −> Option<u32> {
l e t bs = s e l f . b l o c k s i z e a s u32 ;
l e t bc = s e l f . b l o c k c o u n t a s u32 ;

match s e l f . s y n c {
// For manual mode o n l y t h e b l o c k s i z e i s used
Sync : : Manual => Some ( bs ) ,
// I n DMA r e q u e s t mode we must t r a n s f e r ‘ bc ‘ b l o c k s
Sync : : Request => Some ( bc ∗ bs ) ,
// I n l i n k e d l i s t mode t h e s i z e i s not known ahead o f
// time : we s t o p when we e n c o u n t e r t h e ” end o f l i s t ”

108
// marker ( 0 x f f f f f f )
Sync : : L i n k e d L i s t => None ,
}
}

// / S e t t h e c h a n n e l s t a t u s t o ” c o m p l e t e d ” s t a t e
pub f n done(&mut s e l f ) {
s e l f . enable = f a l s e ;
self . trigger = false ;

// XXX Need t o s e t t h e c o r r e c t v a l u e f o r t h e o t h e r f i e l d s
// ( i n p a r t i c u l a r i n t e r r u p t s )
}
}

Note the conditional to write 0xffffff in the last iteration, it’s of course
important because otherwise the DMA won’t find the end of table marker and
start jumping randomly in RAM, sending crap to the GPU in the process. It’s
important to note that this is vital, and even the ”Sony Logo” won’t render
correctly if this is not implemented, if you’re not receiving GP0(38h), this is
probably why.
When the copy is done I call the channel.done() method which clears the
trigger and enable flags. It should probably do more than that eventually, in
particular it should trigger the interrupt if it’s enabled. We’ll leave that for later.
We can now finally run our first DMA transfer in full! The BIOS sets the
base address to 0x000eb8d4 and the block size to 1024 before starting channel 6
and we then initialize an empty ordering table.
After that the BIOS enters an infinite loop on the GPUSTAT register. This
time it’s waiting for bit 26 which is “ready to receive command word”. We are
going to set this bit by default and while we’re at it we’re also going to add bit
27 which is “ready to send VRAM to CPU”. This way we should avoid locking
the BIOS on this register in the future:
impl I n t e r c o n n e c t {
// . . .

// / Load 32 b i t word a t ‘ addr ‘

pub f n l o a d 3 2 (& s e l f , addr : u32 ) −> u32 {
// . . .

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

r e t u r n match o f f s e t {
// GPUSTAT: s e t b i t 2 6 , 27 28 t o s i g n a l t h a t t h e
GPU
// i s r e a d y f o r DMA and CPU a c c e s s . This way t h e
BIOS
// won ’ t dead l o c k w a i t i n g f o r an e v e n t t h a t ’ l l
never
// come .
4 => 0 x1c000000 ,
=> 0 ,
}
}
}
}

With this modification the BIOS goes a little further and configures DMA
channel 2 to send a Linked List to the GPU.

109
3.9 DMA Linked Lists
Navigating the linked list is pretty straightforward: the BIOS puts the address
of the first list header is the DMA channel’s base address. We read the high byte
of the header to know the size of the packet (in words, not counting the header).
Packets are continuous in RAM so the data follows the header word directly.
Once the packet data has been sent to the device we look at the low 24bits of
the header. If it’s 0xffffff then we’re done, otherwise it contains the address
of the next header and we loop.
I’m not sure about if linked list mode is supported only by channel 2 (the
GPU) or if it’s available for other ports. As far as I can tell it’s only ever used
to send commands to the GPU however, I’ll have to remember test that.
By the way, interesting bit of information for us emulator writers: it seems
that while the DMA offers a great deal of flexibility with a lot options and flags
only a handful of configs are ever used for each channel. PCSX-R hardcodes
those configs and simply ignores more exotic flag combinations (even though
they’re technically possible) and mednafen, while supporting most options, has
an optimized fast path for the common configs. The Nocash’s docs also lists
those common configs (and the few odd variations in some games). It means
that we can probably go a long way even if we don’t support some obscure
configurations.
Here’s what my simple linked list synchronization mode implementation looks
like:
impl I n t e r c o n n e c t {
// . . .

match s e l f . dma . c h a n n e l ( p o r t ) . s y n c ( ) {
Sync : : L i n k e d L i s t => s e l f . d o d m a l i n k e d l i s t ( p o r t ) ,
=> s e l f . d o d m a b l o c k ( p o r t ) ,
}
}

// / Emulate DMA t r a n s f e r f o r l i n k e d l i s t s y n c h r o n i z a t i o n mode .

f n d o d m a l i n k e d l i s t (&mut s e l f , p o r t : Port ) {
l e t c h a n n e l = s e l f . dma . c h a n n e l m u t ( p o r t ) ;

l e t mut addr = c h a n n e l . b a s e ( ) & 0 x 1 f f f f c ;

i f c h a n n e l . d i r e c t i o n ( ) == D i r e c t i o n : : ToRam {
p a n i c ! ( ” I n v a l i d DMA d i r e c t i o n f o r l i n k e d l i s t mode” ) ;
}

// I don ’ t know i f t h e DMA even s u p p o r t s l i n k e d l i s t mode

// f o r a n y t h i n g b e s i d e s t h e GPU
i f p o r t != Port : : Gpu {
p a n i c ! ( ” Attempted l i n k e d l i s t DMA on p o r t {} ” ,
p o r t a s u8 ) ;
}

loop {
// I n l i n k e d l i s t mode , each e n t r y s t a r t s with a

110
// ” h e a d e r ” word . The h i g h b y t e c o n t a i n s t h e number
// o f words i n t h e ” p a c k e t ” ( not c o u n t i n g t h e h e a d e r
// word )
l e t h e a d e r = s e l f . ram . l o a d 3 2 ( addr ) ;

l e t mut remsz = h e a d e r >> 2 4 ;

w h i l e remsz > 0 {
addr = ( addr + 4 ) & 0 x 1 f f f f c ;

l e t command = s e l f . ram . l o a d 3 2 ( addr ) ;

p r i n t l n ! ( ”GPU command { : 0 8 x} ” , command ) ;

remsz −= 1 ;
}

// The end−o f −t a b l e marker i s u s u a l l y 0 x f f f f f f but

// mednafen o n l y c h e c k s f o r t h e MSB s o maybe t h a t ’ s
what
// t h e hardware d o e s ? S i n c e t h i s b i t i s not p a r t o f any
// v a l i d a d d r e s s i t makes some s e n s e . I ’ l l have t o t e s t
// t h a t a t some p o i n t . . .
i f h e a d e r & 0 x800000 != 0 {
break ;
}

addr = h e a d e r & 0 x 1 f f f f c ;
}

c h a n n e l . done ( ) ;
}
}

Since we haven’t implement the GPU yet I just display the command word
without further processing. We’ll have to hook our GPU rendering code here
when it’s done. Let’s get a bit further in our DMA implementation before we
start working on the GPU, don’t have anything interesting to display yet.

3.10 RAM to device GPU block copy

After this the BIOS wants to do an other DMA transfer from the RAM towards
the GPU but this time in Request synchronization mode. It probably wants to
load a texture. Adding support for this in our do dma block function is quite
trivial:
impl I n t e r c o n n e c t {
// . . .

// / Emulate DMA t r a n s f e r f o r Manual and Request s y n c h r o n i z a t i o n

// / modes .
f n d o d m a b l o c k (&mut s e l f , p o r t : Port ) {
// . . .

w h i l e remsz > 0 {
// . . .

match c h a n n e l . d i r e c t i o n ( ) {
D i r e c t i o n : : FromRam => {
l e t s r c w o r d = s e l f . ram . l o a d 3 2 ( c u r a d d r ) ;

111
match p o r t {
Port : : Gpu => p r i n t l n ! ( ”GPU data { : 0 8 x} ” ,
src word ) ,
=> p a n i c ! ( ” Unhandled DMA d e s t i n a t i o n p o r t
{} ” ,
p o r t a s u8 ) ,
}
}
// . . .
}

addr = addr . wrapping add ( i n c r e m e n t ) ;

remsz −= 1 ;
}

c h a n n e l . done ( ) ;
}
}

We still can’t do much more than printing the raw GPU data but at least the
DMA part seems to work as intended. If we try to interpret the GPU commands
sent through the linked list we can guess what it’s doing30 :

• First it displays a black quadrilateral that takes the whole screen (command
0x28000000). It does this several times.
• Then it appears to load a texture (maybe the background with the text?)
• Then it draws the same quadrilateral again but with a dark-grey color
(command 0x28030303 where 0x030303 is a 24bit BGR colour)
• Then it draws it again repetedly, slowly changing the colour to a lighter
grey, it looks like the “fade-in” effect at the very beginning of the boot
animation. (commands 0x28060606, 0x28090909 etc. . . to 0x28b4b4b4)
• Then it adds three more draw commands: 0x380000b2 which draws a
shaded quadrilateral and two 0x300000b2 commands which draw shaded
triangles.

Then a little while after that we stumble upon an unhandled store8 at

address 0x1f801800 which is a CDROM drive register. I’m pleased we managed
to get to that point with our bare bones emulator. We don’t even support
interrupts!
But before we look at this CDROM business it’s tempting to try to implement
a basic GPU and display our first frames. After all it seems that our emulator
manages to go through the entire first boot logo. It’ll be more rewarding to see
that than the hexadecimal debug dumps we’ve become accustomed to and it’ll
validate that our CPU is working correctly.

4 The GPU: Internal state and first commands

We’re finally getting to the fun part: drawing on the screen. The objective of
this part in twofold:
30 I’ll describe those GPU instructions in greater details later when we’ll implement them.

112
• We want to create a reasonably accurate internal representation of the PSX
GPU. Mainly we want to update the register values to reflect the current
GPU state instead of our current hardcoded values. This will layout a
basic GPU state machine that we’ll improve later when we’ll implement
video timings, interrupts and other delicacies.
• We’ll also implement a very simple and innacurate OpenGL renderer.
That’ll give us the opportunity to implement some of the very boring low
level OpenGL boilerplate and we’ll have some visual feedback for debugging
the rest of the emulator.

In order to do this we’ll start back from the beginning, review all the GPU
register accesses (both from the CPU and DMA) and attempt to implement
them as best as we can.

4.1 GPUSTAT register

The GPU only has a single status register but it’s packed full of miscelaneous
information about the GPU state. It contains fields describing the texture
mapping config, the video mode, the various “ready” bits for the command
FIFOs, the color mode etc. . .
We can start by declaring the various variables holding all that state. In that
end I’m going to create a whole bunch of new types in order to manipulate nice
type-safe symbolic values instead of meaningless integers:
pub s t r u c t Gpu {
// / Texture page b a s e X c o o r d i n a t e ( 4 b i t s , 64 b y t e i n c r e m e n t )
p a g e b a s e x : u8 ,
// / Texture page b a s e Y c o o r d i n a t e ( 1 b i t , 256 l i n e i n c r e m e n t )
p a g e b a s e y : u8 ,
// / Semi−t r a n s p a r e n c y . Not e n t i r e l y s u r e how t o h a n d l e t h a t
value
// / yet , i t seems t o d e s c r i b e how t o b l e n d t h e s o u r c e and
// / d e s t i n a t i o n c o l o r s .
s e m i t r a n s p a r e n c y : u8 ,
// / Texture page c o l o r depth
t e x t u r e d e p t h : TextureDepth ,
// / Enable d i t h e r i n g from 24 t o 15 b i t s RGB
d i t h e r i n g : bool ,
// / Allow drawing t o t h e d i s p l a y a r e a
d r a w t o d i s p l a y : bool ,
// / F o r c e ”mask” b i t o f t h e p i x e l t o 1 when w r i t i n g t o VRAM
// / ( o t h e r w i s e don ’ t modify i t )
f o r c e s e t m a s k b i t : bool ,
// / Don ’ t draw t o p i x e l s which have t h e ”mask” b i t s e t
p r e s e r v e m a s k e d p i x e l s : bool ,
// / C u r r e n t l y d i s p l a y e d f i e l d . For p r o g r e s s i v e o ut pu t t h i s i s
// / a l w a y s Top .
f i e l d : Field ,
// / When t r u e a l l t e x t u r e s a r e d i s a b l e d
t e x t u r e d i s a b l e : bool ,
// / Video o ut pu t h o r i z o n t a l r e s o l u t i o n
hres : HorizontalRes ,
// / Video o ut pu t v e r t i c a l r e s o l u t i o n
vres : VerticalRes ,
// / Video mode
vmode : VMode ,
// / D i s p l a y depth . The GPU i t s e l f a l w a y s draws 15 b i t RGB, 24 b i t

113
// / ou tp ut must u s e e x t e r n a l a s s e t s ( pre−r e n d e r e d t e x t u r e s ,
MDEC,
// / e t c . . . )
d i s p l a y d e p t h : DisplayDepth ,
// / Output i n t e r l a c e d v i d e o s i g n a l i n s t e a d o f p r o g r e s s i v e
i n t e r l a c e d : bool ,
// / D i s a b l e t h e d i s p l a y
d i s p l a y d i s a b l e d : bool ,
// / True when t h e i n t e r r u p t i s a c t i v e
i n t e r r u p t : bool ,
// / DMA r e q u e s t d i r e c t i o n
d m a d i r e c t i o n : DmaDirection ,
}

// / Depth o f t h e p i x e l v a l u e s i n a t e x t u r e page
#[ d e r i v e ( Copy ) ]
enum TextureDepth {
// / 4 b i t s p e r p i x e l
T4Bit = 0 ,
// / 8 b i t s p e r p i x e l
T8Bit = 1 ,
// / 15 b i t s p e r p i x e l
T15Bit = 2 ,
}

// / I n t e r l a c e d ou tp ut s p l i t s each frame i n two f i e l d s

#[ d e r i v e ( Copy ) ]
enum F i e l d {
// / Top f i e l d ( odd l i n e s ) .
Top = 1 ,
// / Bottom f i e l d ( even l i n e s )
Bottom = 0 ,
}

// / Video o ut pu t h o r i z o n t a l r e s o l u t i o n
#[ d e r i v e ( Copy ) ]
s t r u c t H o r i z o n t a l R e s ( u8 ) ;

impl H o r i z o n t a l R e s {
// / C r e a t e a new H o r i z o n t a l R e s i n s t a n c e from t h e 2 b i t f i e l d ‘
hr1 ‘
// / and t h e one b i t f i e l d ‘ hr2 ‘
f n f r o m f i e l d s ( hr1 : u8 , hr2 : u8 ) −> H o r i z o n t a l R e s {
l e t hr = ( hr2 & 1 ) | ( ( hr1 & 3 ) << 1 ) ;

H o r i z o n t a l R e s ( hr )
}

// / R e t r i e v e v a l u e o f b i t s [ 1 8 : 1 6 ] o f t h e s t a t u s r e g i s t e r
f n i n t o s t a t u s ( s e l f ) −> u32 {
l e t H o r i z o n t a l R e s ( hr ) = s e l f ;

( hr a s u32 ) << 16
}
}

// / Video o ut pu t v e r t i c a l r e s o l u t i o n
#[ d e r i v e ( Copy ) ]
enum V e r t i c a l R e s {
// / 240 l i n e s
Y240Lines = 0 ,
// / 480 l i n e s ( o n l y a v a i l a b l e f o r i n t e r l a c e d ou tp ut )

114
Y480Lines = 1 ,
}

// / Video Modes
#[ d e r i v e ( Copy ) ]
enum VMode {
// / NTSC: 480 i60H
Ntsc = 0 ,
// / PAL : 576 i50Hz
Pal = 1 ,
}

// / D i s p l a y a r e a c o l o r depth
#[ d e r i v e ( Copy ) ]
enum DisplayDepth {
// / 15 b i t s p e r p i x e l
D15Bits = 0 ,
// / 24 b i t s p e r p i x e l
D24Bits = 1 ,
}

// / Requested DMA d i r e c t i o n .
#[ d e r i v e ( Copy ) ]
enum DmaDirection {
Off = 0 ,
Fifo = 1 ,
CpuToGp0 = 2 ,
VRamToCpu = 3 ,
}

This is basically a direct translation of the GPUSTAT register fields. I must

say that at that point I don’t fully understand all of those variables and it’s
possible that we’ll have to change this implementation or maybe simply rename
some of them. It does however give us a foretaste of the various features/quirks
of the Playstation GPU that we’ll have to implement eventually if we want to
make an accurate renderer. If some of those variables mean nothing to you don’t
worry, we’ll review them all when we actually need them.
I’m not entirely sure what’s the GPU state at reset but I think the BIOS
will reconfigure everything anyway. Let’s assume that all the values are 0 on
reset, except for the display disabled field:
impl Gpu {
pub f n new ( ) −> Gpu {
Gpu {
page base x : 0 ,
page base y : 0 ,
semi transparency : 0 ,
t e x t u r e d e p t h : TextureDepth : : T4Bit ,
dithering : false ,
draw to display : false ,
force set mask bit : false ,
preserve masked pixels : false ,
f i e l d : F i e l d : : Top ,
texture disable : false ,
hres : HorizontalRes : : f r o m f i e l d s (0 , 0) ,
v r e s : V e r t i c a l R e s : : Y240Lines ,
vmode : VMode : : Ntsc ,
d i s p l a y d e p t h : DisplayDepth : : D15Bits ,
interlaced : false ,
d i s p l a y d i s a b l e d : true ,
interrupt : false ,

115
d m a d i r e c t i o n : DmaDirection : : Off ,
}
}
}

For the time being we can implement the GPUSTAT register read. It’s a
read-only register since writes to the GPUSTAT register address end up in the
GP1 register. We’ll see how the GPU config is modified in a minute.
impl Gpu {
// . . .

// / R e t r i e v e v a l u e o f t h e s t a t u s r e g i s t e r
pub f n s t a t u s (& s e l f ) −> u32 {
l e t mut r = 0 u32 ;

r |= ( s e l f . p a g e b a s e x a s u32 ) << 0 ;
r |= ( s e l f . p a g e b a s e y a s u32 ) << 4 ;
r |= ( s e l f . s e m i t r a n s p a r e n c y a s u32 ) << 5 ;
r |= ( s e l f . t e x t u r e d e p t h a s u32 ) << 7 ;
r |= ( s e l f . d i t h e r i n g a s u32 ) << 9 ;
r |= ( s e l f . d r a w t o d i s p l a y a s u32 ) << 1 0 ;
r |= ( s e l f . f o r c e s e t m a s k b i t a s u32 ) << 1 1 ;
r |= ( s e l f . p r e s e r v e m a s k e d p i x e l s a s u32 ) << 1 2 ;
r |= ( s e l f . f i e l d a s u32 ) << 1 3 ;
// B i t 1 4 : not s u p p o r t e d
r |= ( s e l f . t e x t u r e d i s a b l e a s u32 ) << 1 5 ;
r |= s e l f . h r e s . i n t o s t a t u s ( ) ;
r |= ( s e l f . v r e s a s u32 ) << 1 9 ;
r |= ( s e l f . vmode a s u32 ) << 2 0 ;
r |= ( s e l f . d i s p l a y d e p t h a s u32 ) << 2 1 ;
r |= ( s e l f . i n t e r l a c e d a s u32 ) << 2 2 ;
r |= ( s e l f . d i s p l a y d i s a b l e d a s u32 ) << 2 3 ;
r |= ( s e l f . i n t e r r u p t a s u32 ) << 2 4 ;

// For now we p r e t e n d t h a t t h e GPU i s a l w a y s r e a d y :

// Ready t o r e c e i v e command
r |= 1 << 2 6 ;
// Ready t o send VRAM t o CPU
r |= 1 << 2 7 ;
// Ready t o r e c e i v e DMA b l o c k
r |= 1 << 2 8 ;

r |= ( s e l f . d m a d i r e c t i o n a s u32 ) << 2 9 ;

// B i t 31 s h o u l d change d e p e n d i n g on t h e c u r r e n t l y drawn
// l i n e ( whether i t ’ s even , odd o r i n t h e v b l a c k
// a p p a r e n t l y ) . Let ’ s not b o t h e r with i t f o r now .
r |= 0 << 3 1 ;

// Not s u r e about t h a t , I ’m g u e s s i n g t h a t i t ’ s t h e s i g n a l
// c h e c k e d by t h e DMA i n when s e n d i n g data i n Request
// s y n c h r o n i z a t i o n mode . For now I b l i n d l y f o l l o w t h e
// Nocash s p e c .
l e t dma request =
match s e l f . d m a d i r e c t i o n {
// Always 0
DmaDirection : : O f f => 0 ,
// Should be 0 i f FIFO i s f u l l , 1 o t h e r w i s e
DmaDirection : : F i f o => 1 ,
// Should be t h e same a s s t a t u s b i t 28
DmaDirection : : CpuToGp0 => ( r >> 2 8 ) & 1 ,
// Should be t h e same a s s t a t u s b i t 27

116
DmaDirection : : VRamToCpu => ( r >> 2 7 ) & 1 ,
};

r |= d m a r e q u e s t << 2 5 ;

r
}
}

You can see that I don’t support bit 14: the Nocash spec says that when this
bit is set on the real hardware just messes up the display in a weird way. We
can probably assume that it’s not a commonly used feature for the moment.
As before I hardcode the “ready” bits to 1 since we have a long way to go
before we have the necessary infrastructure to emulate them accurately. We’ll
need to emulate the various internal FIFOs and the rate at which they empty
for instance. That will come later.
In general I’m not entirely sure how the DMA state machine synchronizes
with the GPU. We’ll have to hope it’s not too critical for now. As we progress if
we start to notice that our emulator seems to misbehave because of a broken
GPU DMA we’ll have to investigate further.

4.2 GP0 Dram Mode Setting command

All the GPU configuration and draw commands are transferred through two
registers: GP0 and GP1. GP0 is used to send drawing commands (lines, triangles,
quadrilaterals with various attributes) and to copy data between the VRAM
(the video RAM dedicated to the GPU) and the CPU/DMA.
We’ll have to decode those command like we decoded the CPU instructions.
The format is pretty simple: the most significant byte is the “opcode” and the
rest are parameters whose meaning depends on the opcode. The only difficulty
is that GP0 commands can take a variable amount of parameters and therefore
fit in multiple words.
The first command sent by the BIOS into GP0 is 0xe1003000. The high
byte is 0xe1 which is the “Draw Mode setting” command. It sets a bunch
of texture-related values (dithering, texture depth, texture disable, etc. . . )
and two new fields we haven’t already encountered in the GPUSTAT register:
rectangle texture x flip and rectangle texture y flip. They’re used to
mirror a textured rectangle horizontally or vertically:
pub s t r u c t Gpu {
// . . .

// / M i r r o r textured rectangles along the x a x i s

rectangle texture x f l i p : bool ,
// / M i r r o r textured rectangles along the y a x i s
rectangle texture y f l i p : bool ,
}

impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
l e t opcode = ( v a l >> 2 4 ) & 0 x f f ;

match opcode {

117
0 xe1 => s e l f . gp0 draw mode ( v a l ) ,
=> p a n i c ! ( ” Unhandled GP0 command { : 0 8 x} ” , v a l ) ,
}
}

// / GP0( 0 xE1 ) command

f n gp0 draw mode(&mut s e l f , v a l : u32 ) {
s e l f . p a g e b a s e x = ( v a l & 0 x f ) a s u8 ;
s e l f . p a g e b a s e y = ( ( v a l >> 4 ) & 1 ) a s u8 ;
s e l f . s e m i t r a n s p a r e n c y = ( ( v a l >> 5 ) & 3 ) a s u8 ;

s e l f . texture depth =
match ( v a l >> 7 ) & 3 {
0 => TextureDepth : : T4Bit ,
1 => TextureDepth : : T8Bit ,
2 => TextureDepth : : T15Bit ,
n => p a n i c ! ( ” Unhandled t e x t u r e depth {} ” , n ) ,
};

self . dithering = (( val >> 9 ) & 1 ) != 0 ;

self . draw to display = ( ( v a l >> 1 0 ) & 1 ) != 0 ;
self . texture disable = ( ( v a l >> 1 1 ) & 1 ) != 0 ;
self . rectangle texture x f l i p = ( ( v a l >> 1 2 ) & 1 ) != 0 ;
self . rectangle texture y f l i p = ( ( v a l >> 1 3 ) & 1 ) != 0 ;
}
}

We can now call our new gp0 method from the interconnect:
impl I n t e r c o n n e c t {
// . . .

// / S t o r e 32 b i t word ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e 3 2 (&mut s e l f , addr : u32 , v a l : u32 ) {
// . . .

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

match o f f s e t {
0 => s e l f . gpu . gp0 ( v a l ) ,
=> p a n i c ! ( ”GPU w r i t e { } : { : 0 8 x} ” , o f f s e t , v a l ) ,
}
return ;
}

// . . .
}
}

4.3 GP0 NOP command

The next GP0 command sent by the BIOS is ‘0x0007920c‘. Apparently opcode
‘0x00‘ is a NOP so I’m not sure what’s the meaning of the ‘0x7920c‘ given
as parameter31 . Maybe it’s just a garbage value. Let’s ignore it for now and
implement the NOP:
impl Gpu {
// . . .

31 I’ve tried quickly disassembling the surrounding code but I couldn’t really figure out what

it’s trying to do. I’ll have to take the time to dig deeper at some point. . .

118
// / Handle w r i t e s t o t h e GP0 command r e g i s t e r
pub f n gp0(&mut s e l f , v a l : u32 ) {
l e t opcode = ( v a l >> 2 4 ) & 0 x f f ;

match opcode {
0 x00 => ( ) , // NOP
0 xe1 => s e l f . gp0 draw mode ( v a l ) ,
=> p a n i c ! ( ” Unhandled GP0 opcode { : 0 8 x} ” , v a l ) ,
}
}
}

4.4 GP1 Soft Reset command

After that the BIOS writes 0x00000000 to GP1 this time. The command format
is the same: the high byte is the opcode while the low 24bits contain the
parameters. However GP1 has a different set of commands mostly used to
configure the display and the DMA. GP1 commands are always one word in
length.
GP1 opcode 0x00 is a software reset command, it resets the GPU to a default
configuration. It reconfigures most of the fields we’ve already encountered and a
few more:
pub s t r u c t Gpu {
// . . .

// / Texture window x mask ( 8 p i x e l s t e p s )

t e x t u r e w i n d o w x m a s k : u8 ,
// / Texture window y mask ( 8 p i x e l s t e p s )
t e x t u r e w i n d o w y m a s k : u8 ,
// / Texture window x o f f s e t ( 8 p i x e l s t e p s )
t e x t u r e w i n d o w x o f f s e t : u8 ,
// / Texture window y o f f s e t ( 8 p i x e l s t e p s )
t e x t u r e w i n d o w y o f f s e t : u8 ,
// / L e f t −most column o f drawing a r e a
d r a w i n g a r e a l e f t : u16 ,
// / Top−most l i n e o f drawing a r e a
d r a w i n g a r e a t o p : u16 ,
// / Right−most column o f drawing a r e a
d r a w i n g a r e a r i g h t : u16 ,
// / Bottom−most l i n e o f drawing a r e a
d r a w i n g a r e a b o t t o m : u16 ,
// / H o r i z o n t a l drawing o f f s e t a p p l i e d t o a l l v e r t e x
d r a w i n g x o f f s e t : i16 ,
// / V e r t i c a l drawing o f f s e t a p p l i e d t o a l l v e r t e x
d r a w i n g y o f f s e t : i16 ,
// / F i r s t column o f t h e d i s p l a y a r e a i n VRAM
d i s p l a y v r a m x s t a r t : u16 ,
// / F i r s t l i n e o f t h e d i s p l a y a r e a i n VRAM
d i s p l a y v r a m y s t a r t : u16 ,
// / D i s p l a y o ut pu t h o r i z o n t a l s t a r t r e l a t i v e t o HSYNC
d i s p l a y h o r i z s t a r t : u16 ,
// / D i s p l a y o ut pu t h o r i z o n t a l end r e l a t i v e t o HSYNC
d i s p l a y h o r i z e n d : u16 ,
// / D i s p l a y o ut pu t f i r s t l i n e r e l a t i v e t o VSYNC
d i s p l a y l i n e s t a r t : u16 ,
// / D i s p l a y o ut pu t l a s t l i n e r e l a t i v e t o VSYNC
d i s p l a y l i n e e n d : u16 ,
}

119
I tried to get the reset value from my console, unfortunately some of the
values like display horiz * and display line * cannot be read directly from
any register as far as I can tell so I’m going to use the values given by the NoCash
specs instead.
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP1 command r e g i s t e r

pub f n gp1(&mut s e l f , v a l : u32 ) {
l e t opcode = ( v a l >> 2 4 ) & 0 x f f ;

match opcode {
0 x00 => s e l f . g p 1 r e s e t ( v a l ) ,
=> p a n i c ! ( ” Unhandled GP1 command { : 0 8 x} ” , v a l ) ,
}
}

// / GP1( 0 x00 ) : s o f t r e s e t
f n g p 1 r e s e t (&mut s e l f , : u32 ) {
s e l f . interrupt = false ;

self . page base x = 0;

self . page base y = 0;
self . semi transparency = 0;
self . t e x t u r e d e p t h = TextureDepth : : T4Bit ;
self . texture window x mask = 0 ;
self . texture window y mask = 0 ;
self . texture window x offset = 0;
self . texture window y offset = 0;
self . dithering = false ;
self . draw to display = f a l s e ;
self . texture disable = false ;
self . rectangle texture x flip = false ;
self . rectangle texture y flip = false ;
self . drawing area left = 0;
self . drawing area top = 0;
self . drawing area right = 0;
self . drawing area bottom = 0 ;
self . drawing x offset = 0;
self . drawing y offset = 0;
self . force set mask bit = false ;
self . preserve masked pixels = f a l s e ;

s e l f . d m a d i r e c t i o n = DmaDirection : : O f f ;

self . display disabled = true ;

self . display vram x start = 0;
self . display vram y start = 0;
self . hres = HorizontalRes : : f r o m f i e l d s (0 , 0) ;
self . v r e s = V e r t i c a l R e s : : Y240Lines ;

self . vmode = VMode : : Ntsc ;

self . interlaced = true ;
self . d i s p l a y h o r i z s t a r t = 0 x200 ;
self . d i s p l a y h o r i z e n d = 0 xc00 ;
self . d i s p l a y l i n e s t a r t = 0 x10 ;
self . d i s p l a y l i n e e n d = 0 x100 ;
self . d i s p l a y d e p t h = DisplayDepth : : D15Bits ;

// XXX s h o u l d a l s o c l e a r t h e command FIFO when we implement

120
// XXX s h o u l d a l s o i n v a l i d a t e GPU c a c h e i f we e v e r
implement i t
}
}

The reset command is supposed to flush the command FIFO and the texture
cache but we don’t emulate those yet so I just added a note to remember to
modify the function when we add support for one of those.
The texture window * parameters are used to crop a texture. The drawing area *
parameters are used to describe a drawing window, the GPU won’t draw anything
outside of this area.
The drawing offset * parameters are a constant offset that’s added to all
the vertex. It lets you translate a scene in VRAM without having to recompute
all the coordinates on the CPU.
The display vram *, display horiz * and display line * parameters are
used to describe which portion of the VRAM are drawn on the screen. If you’re
not familiar with the wonderful world of analog video it might not be immediately
obvious what those parameters do so let me give a quick overview of the GPU’s
video output.

4.5 The GPU renderer and the video output

You can think of the Playstation GPU as two different modules operating
asynchronously. First you have t he renderer which take the draw commands
(through GP0) and rasterizes them into the dedicated video memory used by
the GPU: the VRAM.
The VRAM is organized as a two dimensional byte array whose dimensions
are 2048x512, giving a grand total of 1MB of video memory. This VRAM is
used to store the image generated by the GPU’s rasterizer (i.e. the framebuffer)
but also any texture used to render the scene. The GPU has no direct access
to the main RAM, much less the CDROM: all the assets have to be copied in
VRAM by the CPU or DMA before the rendering can take place.
Once the renderer has completed a scene it ends up somewhere in the VRAM.
Now it has to be displayed on the TV screen. That’s where the GPU’s video
output is used.
The video output (when enabled) sends the video signal continuously at 60
NTSC or 50 PAL frames per second. It never stops because doing so would
cause a glitch on the screen. Consider the CRT displays everybody used in the
nineties: you have an electron beam sweeping the screen line by line, you can’t
jump to any random position of the screen when you want. Even on modern
LCD screens most video interfaces (VGA, DVI, HDMI, LVDS, MIPI,...) behave
in the same way.
That means that when the game wants to draw a triangle on the screen it
doesn’t directly send the triangle to the TV, rather it renders it in the framebuffer
and the video output will send it to the screen during its next pass.
For the time being we won’t bother emulating the video output, we can
directly display the contents of the framebuffer. It’s not accurate but it’s simpler
and we should be able to plug our video output emulation layer on top of it
when we’re ready to implement it.

121
4.6 GPUREAD register placeholder
After those commands the BIOS reads from the register at offset 0 in the GPU
(the same address where GP0 commands are written). This register is GPUREAD
and is used to retrieve data generated by certain commands, typically to read
parts of the framebuffer back in RAM. The problem is that so far no such
command has been issued so I’m not sure why the BIOS attempts to read from
there. For now let’s return 0 and we’ll implement it properly later:
impl Gpu {
// . . .

// / R e t r i e v e v a l u e o f t h e ” r e a d ” r e g i s t e r
pub f n r e a d (& s e l f ) −> u32 {
// Not implemented f o r now . . .
0
}
}

4.7 GP1 Display Mode command

The BIOS goes on by sending command 0x08000000 in GP1. Opcode 0x08
sets the display mode: video mode, screen resolution, interlacing etc... It also
sets that weird field which we encountered as bit 14 of GPUSTAT: the one who
appears to mess up the video output. I’m going to assume this field is useless so
I’m just going to crash if it’s set, this way if one game relies on it we’re sure to
catch it:
impl Gpu {
// . . .

// / GP1( 0 x80 ) : D i s p l a y Mode

f n g p 1 d i s p l a y m o d e (&mut s e l f , v a l : u32 ) {
l e t hr1 = ( v a l & 3 ) a s u8 ;
l e t hr2 = ( ( v a l >> 6 ) & 1 ) a s u8 ;

s e l f . h r e s = H o r i z o n t a l R e s : : f r o m f i e l d s ( hr1 , hr2 ) ;

s e l f . v r e s = match v a l & 0 x4 != 0 {
f a l s e => V e r t i c a l R e s : : Y240Lines ,
t r u e => V e r t i c a l R e s : : Y480Lines ,
};

s e l f . vmode = match v a l & 0 x8 != 0 {

f a l s e => VMode : : Ntsc ,
t r u e => VMode : : Pal ,
};

s e l f . d i s p l a y d e p t h = match v a l & 0 x10 != 0 {

f a l s e => DisplayDepth : : D24Bits ,
t r u e => DisplayDepth : : D15Bits ,
};

s e l f . i n t e r l a c e d = v a l & 0 x20 != 0 ;

i f v a l & 0 x80 != 0 {
p a n i c ! ( ” Unsupported d i s p l a y mode { : 0 8 x} ” , v a l ) ;
}
}
}

122
4.8 GP1 DMA direction command
After that the BIOS issues the GP1 command 0x04000000. Opcode 0x04 simply
sets the DMA direction (to Off in this case):
impl Gpu {
// . . .

// / GP1( 0 x04 ) : DMA D i r e c t i o n

f n g p 1 d m a d i r e c t i o n (&mut s e l f , v a l : u32 ) {
s e l f . dma direction =
match v a l & 3 {
0 => DmaDirection : : Off ,
1 => DmaDirection : : F i f o ,
2 => DmaDirection : : CpuToGp0 ,
3 => DmaDirection : : VRamToCpu,
=> u n r e a c h a b l e ! ( ) ,
};
}
}

4.9 DMA GP0 commands

After that the CPU issues an other “DMA Direction” command to set it to value
2 (CpuToGp0). After that the BIOS starts sending the Linked List commands
using the DMA. Those commands are always sent to GP0 so we can update our
linked list DMA routine to call the gp0 method of our GPU:
impl Gpu {
// . . .

// / Emulate DMA t r a n s f e r f o r l i n k e d l i s t s y n c h r o n i z a t i o n mode .

f n d o d m a l i n k e d l i s t (&mut s e l f , p o r t : Port ) {
// . . .

loop {
// . . .

w h i l e remsz > 0 {
addr = ( addr + 4 ) & 0 x 1 f f f f c ;

l e t command = s e l f . ram . l o a d 3 2 ( addr ) ;

// Send command t o t h e GPU

s e l f . gpu . gp0 ( command ) ;

remsz −= 1 ;
}
// . . .
}
// . . .
}
}

4.10 GP0 Set Drawing Area commands

The first command sent through the linked list is 0xe3000400 which sets the
top-left corner of the drawing area. When the GPU renderer draws to the

123
framebuffer it won’t write anything outside of the drawing area even if a draw
command clips outside.
impl Gpu {
// . . .

// / GP0( 0 xE3 ) : S e t Drawing Area top l e f t

f n g p 0 d r a w i n g a r e a t o p l e f t (&mut s e l f , v a l : u32 ) {
s e l f . d r a w i n g a r e a t o p = ( ( v a l >> 1 0 ) & 0 x 3 f f ) a s u16 ;
s e l f . d r a w i n g a r e a l e f t = ( v a l & 0 x 3 f f ) a s u16 ;
}
}

You see that the drawing area top value can range from 0 to 1023. It’s
strange because the GPU VRAM only has 512 lines so anything beyond that
value won’t be rendered. The horizontal coordinate, drawing area left, has
the same resolution but this one is normal: the VRAM has 2048 bytes per lines
but since the GPU draws 16 bits per pixel (15bit RGB + mask bit) you can only
fit 1024 pixels per VRAM line.
Unsuprisingly the next command is 0xe403c27f which sets the bottom-right
corner of the drawing area, the parameter packing is the same:
impl Gpu {
// . . .

// / GP0( 0 xE4 ) : S e t Drawing Area bottom r i g h t

f n g p 0 d r a w i n g a r e a b o t t o m r i g h t (&mut s e l f , v a l : u32 ) {
s e l f . d r a w i n g a r e a b o t t o m = ( ( v a l >> 1 0 ) & 0 x 3 f f ) a s u16 ;
s e l f . d r a w i n g a r e a r i g h t = ( v a l & 0 x 3 f f ) a s u16 ;
}
}

After those two commands the top-left corder is at [0, 1] while the bottom-
right is at [639, 240]. The coordinates are inclusive so the drawing area resolution
is 640x240 which looks like a standard NTSC field resolution.

4.11 GP0 Set Drawing Offset command

The BIOS continues setting the drawing area with command 0xe5000800 which
sets the drawing offset. We have to be careful with that one because the x and y
parameters are 11 bit signed two’s complement values. It means that the GPU
can handle negative offsets. We need to mess with a few bit shifts to get the
correct sign extension for negative values32 :
impl Gpu {
// . . .

// / GP0( 0 xE5 ) : S e t Drawing O f f s e t

f n g p 0 d r a w i n g o f f s e t (&mut s e l f , v a l : u32 ) {
l e t x = ( v a l & 0 x 7 f f ) a s u16 ;
l e t y = ( ( v a l >> 1 1 ) & 0 x 7 f f ) a s u16 ;

// V a l u e s a r e 11 b i t two ’ s complement s i g n e d v a l u e s , we need

to
// s h i f t t h e v a l u e t o 16 b i t s t o f o r c e s i g n e x t e n s i o n
s e l f . d r a w i n g x o f f s e t = ( ( x << 5 ) a s i 1 6 ) >> 5 ;
s e l f . d r a w i n g y o f f s e t = ( ( y << 5 ) a s i 1 6 ) >> 5 ;
32 The reason is that Rust obviously doesn’t have a 11bit signed integer type so we have to

shift to 16bits in order to get the correct sign in an i16, then we can shift back to 11bits.

124
}
}

This particular command sets the offset to [0, 1] which matches the drawing
area top-left corner so everything is coherent so far. I’m not sure why the BIOS
doesn’t start at [0, 0] but I guess wasting one line doesn’t matter much for
displaying the boot logo.

4.12 GP0 Texture Window command

After that we have yet another GPU config command: 0xe2000000 which
configures the texture window parameters:
impl Gpu {
// . . .

// / GP0( 0 xE2 ) : S e t Texture Window

f n g p 0 t e x t u r e w i n d o w (&mut s e l f , v a l : u32 ) {
s e l f . t e x t u r e w i n d o w x m a s k = ( v a l & 0 x 1 f ) a s u8 ;
s e l f . t e x t u r e w i n d o w y m a s k = ( ( v a l >> 5 ) & 0 x 1 f ) a s u8 ;
s e l f . t e x t u r e w i n d o w x o f f s e t = ( ( v a l >> 1 0 ) & 0 x 1 f ) a s u8 ;
s e l f . t e x t u r e w i n d o w y o f f s e t = ( ( v a l >> 1 5 ) & 0 x 1 f ) a s u8 ;
}
}

4.13 GP0 Mask Bit Setting command

The BIOS continues with the last GP0 rendering attribute command: e6000000‘
which sets the mask bit-related parameters:
impl Gpu {
// . . .

// / GP0( 0 xE6 ) : S e t Mask B i t S e t t i n g

f n g p 0 m a s k b i t s e t t i n g (&mut s e l f , v a l : u32 ) {
s e l f . f o r c e s e t m a s k b i t = ( v a l & 1 ) != 0 ;
s e l f . p r e s e r v e m a s k e d p i x e l s = ( v a l & 2 ) != 0 ;
}
}

The mask bit behaves a bit like OpenGL’s stencil masks, it prevents the
GPU from overwriting a pixel if its mask bit is set and masking is enabled.

4.14 GP1 Display VRAM Start command

The BIOS then configures the video output through the GP1 register. It starts
with 0x0503c400 which sets the display start address in VRAM.
Note that the LSB of the horizontal coordinate is ignored. It means that
we’re always aligned to a 16bit pixel.
impl Gpu {
// . . .

// / GP1( 0 x05 ) : D i s p l a y VRAM S t a r t

f n g p 1 d i s p l a y v r a m s t a r t (&mut s e l f , v a l : u32 ) {
s e l f . d i s p l a y v r a m x s t a r t = ( v a l & 0 x 3 f e ) a s u16 ;
s e l f . d i s p l a y v r a m y s t a r t = ( ( v a l >> 1 0 ) & 0 x 1 f f ) a s u16 ;
}
}

125
The current command sets the start coordinates to [0, 241] which is immedi-
ately below the drawing area we configured before. I assume it’s because the
BIOS will use a form of double buffering and won’t draw directly to the displayed
area.

4.15 GP1 Display Range commands

After the display VRAM start address the BIOS configures the video output
timings with commands 0x06c60260 and 0x0703fc10 which respectively set the
display’s horizontal and vertical range33 :
impl Gpu {
// . . .

// / GP1( 0 x06 ) : D i s p l a y H o r i z o n t a l Range

f n g p 1 d i s p l a y h o r i z o n t a l r a n g e (&mut s e l f , v a l : u32 ) {
s e l f . d i s p l a y h o r i z s t a r t = ( v a l & 0 x f f f ) a s u16 ;
s e l f . display horiz end = ( ( v a l >> 1 2 ) & 0 x f f f ) a s u16 ;
}

// / GP1( 0 x07 ) : D i s p l a y V e r t i c a l Range

f n g p 1 d i s p l a y v e r t i c a l r a n g e (&mut s e l f , v a l : u32 ) {
s e l f . d i s p l a y l i n e s t a r t = ( v a l & 0 x 3 f f ) a s u16 ;
self . display line end = ( ( v a l >> 1 0 ) & 0 x 3 f f ) a s u16 ;
}
}

Note that those commands use a different packing format for their parameters.

4.16 GP0 Monochrome Quadrilateral command

We’re finally getting to the interesting part: the first draw command. The
BIOS sends a linked list to the GPU containing command 0x28000000. GP0
opcode 0x28 draws a monochrome quadrilateral. The low 3bytes of the command
contain the 24bit BGR color of the polygon (black in this case).
There’s a problem however: this command takes 4 additional words as argu-
ment containing the coordinates of the 4 vertex needed to draw a quadrilateral.
So far we’ve only implemented single-word GP0 commands so we’ll have to
improve our code a little.
To simplify our task I’m going to start with a simple container that will
accumulate the words for the current command:
// / B u f f e r h o l d i n g m u l t i −word f i x e d −l e n g t h GP0 command p a r a m e t e r s
s t r u c t CommandBuffer {
// / Command b u f f e r : t h e l o n g u e s t p o s s i b l e command i s GP0( 0 x3E )
// / which t a k e s 12 p a r a m e t e r s
b u f f e r : [ u32 ; 1 2 ] ,
// / Number o f words queued i n b u f f e r
len : u8 ,
}

impl CommandBuffer {
f n new ( ) −> CommandBuffer {
CommandBuffer {
buffer : [ 0 ; 12] ,
33 Those coordinates are not in VRAM but rather in the output’s video signal system of

coordinates.

126
len : 0,
}
}

// / C l e a r t h e command b u f f e r
f n c l e a r (&mut s e l f ) {
s e l f . len = 0;
}

f n push word(&mut s e l f , word : u32 ) {

s e l f . b u f f e r [ s e l f . l e n a s u s i z e ] = word ;

s e l f . l e n += 1 ;
}
}

impl : : s t d : : ops : : Index<u s i z e > f o r CommandBuffer {

t y p e Output = u32 ;

f n index <’a >(&’a s e l f , i n d e x : u s i z e ) −> &’a u32 {

i f i n d e x >= s e l f . l e n a s u s i z e {
p a n i c ! ( ”Command b u f f e r i n d e x out o f r a n g e : {} ( { } ) ” ,
index , s e l f . l e n ) ;
}

&s e l f . b u f f e r [ index ]
}
}

It’s just a glorified array which can contain up to 12 words and keeps the
count of how many words have been pushed into it. The std::ops::Index
mumbo jumbo just overloads the [] operator to let us access CommandBuffer
elements like a regular array.
We can add an instance of this CommandBuffer to our GPU state and we’ll
also add a counter of the number of remaining parameters and a function pointer
to the method which implements the command (it will save us having to match
the opcode twice):
pub s t r u c t Gpu {
// . . .

// / B u f f e r c o n t a i n i n g t h e c u r r e n t GP0 command
gp0 command : CommandBuffer ,
// / Remaining words f o r t h e c u r r e n t GP0 command
gp0 command remaining : u32 ,
// / P o i n t e r t o t h e method i m p l e m e n t i n g t h e c u r r e n t GP) command
gp0 command method : f n (&mut Gpu) ,
}

We can now modify our GP0 register handler to use this new infrastructure:
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
i f s e l f . gp0 command remaining == 0 {
// We s t a r t a new GP0 command
l e t opcode = ( v a l >> 2 4 ) & 0 x f f ;

l e t ( l e n , method ) =
match opcode {

127
0 x00 =>
( 1 , Gpu : : gp0 nop
a s f n (&mut Gpu) ) ,
0 x28 =>
( 5 , Gpu : : gp0 quad mono opaque
a s f n (&mut Gpu) ) ,
0 xe1 =>
( 1 , Gpu : : gp0 draw mode
a s f n (&mut Gpu) ) ,
0 xe2 =>
( 1 , Gpu : : g p 0 t e x t u r e w i n d o w
a s f n (&mut Gpu) ) ,
0 xe3 =>
( 1 , Gpu : : g p 0 d r a w i n g a r e a t o p l e f t
a s f n (&mut Gpu) ) ,
0 xe4 =>
( 1 , Gpu : : g p 0 d r a w i n g a r e a b o t t o m r i g h t
a s f n (&mut Gpu) ) ,
0 xe5 =>
( 1 , Gpu : : g p 0 d r a w i n g o f f s e t
a s f n (&mut Gpu) ) ,
0 xe6 =>
( 1 , Gpu : : g p 0 m a s k b i t s e t t i n g
a s f n (&mut Gpu) ) ,
=> p a n i c ! ( ” Unhandled GP0 command { : 0 8 x} ” ,
val ) ,
};

s e l f . gp0 command remaining = l e n ;

s e l f . gp0 command method = method ;

s e l f . gp0 command . c l e a r ( ) ;
}

s e l f . gp0 command . push word ( v a l ) ;

s e l f . gp0 command remaining −= 1 ;

if s e l f . gp0 command remaining == 0 {

// We have a l l t h e p a r a m e t e r s , we can run t h e command
( s e l f . gp0 command method ) ( s e l f ) ;
}
}

// / GP0( 0 x00 ) : No O p e r a t i o n
f n gp0 nop (&mut s e l f ) {
// NOP
}
}

We’re still missing the implementation of the gp0 quad mono opaque function
that’s supposed to render the primitive in the framebuffer. We could start
drawing to the screen right away but since we only have a black rectangle so far
it wouldn’t be very interesting. Let’s put a placeholder for now and continue a
little further before we fire up OpenGL:
impl Gpu {
// . . .

// / GP0( 0 x28 ) : Monochrome Opaque Q u a d r i l a t e r a l

f n gp0 quad mono opaque (&mut s e l f ) {
p r i n t l n ! ( ”Draw quad ” ) ;
}

128
}

4.17 Interleaved video deadlock workaround

Unfortunately when we attempt to continue the execution to get to the next
GPU commands we enter an infinite loop in the BIOS. That’s weird, especially
since we got way past that point before we started implementing the GPU.
If we disassemble the code at the deadlock location we discover that the
BIOS is apparently waiting for bit 31 of the GPUSTAT register to change. This
bit is supposed to alternate between odd and even lines when the output is
interlaced). But we haven’t implemented this bit yet.
So why did it work before? After some testing it turns out it’s because this
particular piece of code is only entered when GPUSTAT retuns bit 19 set, i.e.
when the vertical resolution is set to 480 lines. Paradoxically by implementing
the GPUSTAT register and improving the accuracy of our emulator we caused a
regression.
I’m not entirely sure what this particular piece of BIOS code does to be
honest, to figure it out I’d have to disassemble a bigger chunk of surrounding
code to figure out what it’s trying to do and I don’t want to go down this rabbit
hole at that point. I’d rather implement bit 31 of GPUSTAT correctly but in
order to do that we need accurate GPU timings and we don’t have that yet.
In that light and in order to keep us moving we’re going to use a temporary
hack: in the GPUSTAT register we’re going to return 0 in bit 19 no matter what.
It’s not accurate but it will side step that problemetic piece of code. When we
implement GPU timings and emulate bit 31 correctly we’ll revert that change:
impl Gpu {
// . . .

// / R e t r i e v e v a l u e o f t h e s t a t u s r e g i s t e r
pub f n s t a t u s (& s e l f ) −> u32 {
l e t mut r = 0 u32 ;

// . . .
r |= s e l f . h r e s . i n t o s t a t u s ( ) ;
// XXX Temporary hack : i f we don ’ t e m u l a t e b i t 31 c o r r e c t l y
// s e t t i n g ‘ v r e s ‘ t o 1 l o c k s t h e BIOS :
// r |= ( s e l f . v r e s a s u32 ) << 1 9 ;
r |= ( s e l f . vmode a s u32 ) << 2 0 ;
// . . .
}
}

This is not very satisfactory of course but that should allow us to keep going
with our first GPU implementation. Soon after that we’ll start working on
accurate timings and we’ll be able to emulate bit 31 properly.

4.18 GP0 Clear Cache command

We can now resume the BIOS execution and we reach a new GP0 command:
0x01000000. This command is used to clear the internal texture cache. Since we
don’t implement a texture cache yet we can just ignore this command for now:
impl Gpu {
// . . .

129
// / GP0( 0 x01 ) : C l e a r Cache
f n g p 0 c l e a r c a c h e (&mut s e l f ) {
// Not implemented
}
}

4.19 GP0 Load Image command

Next we have GP0 command 0xa0000000 which is used to load an image into
the GPU’s VRAM using the CPU or the DMA. This is how a program can load
a texture or a palette into the GPU’s dedicated memory.
The command takes two additional word parameters: the first one contains
the coordinates of the top-left corner of target rectangle in the VRAM. The 2nd
one contains the resolution of the image (width/heigh) in pixels. The GPU then
expects the pixel data on the same GP0 port.
Since the GPU uses 16bits pixels and the CPU/DMA send 32bits at a time
to the GP0 port an additional 16bits of padding must be added in the total
number of pixels is odd.
Since this command is immediately followed by the image data the total
amount of data transfered can be quite big, storing all of it in our gp0 command
buffer to copy it all in the VRAM afterwards would be wasteful. Instead we are
going to special case image transfer to store the data directly inside the VRAM.
I add a new variable in the GPU state holding the current mode of the GP0
port:
pub s t r u c t Gpu {
// . . .

// / C u r r e n t mode o f t h e GP0 r e g i s t e r
gp0 mode : Gp0Mode ,
}

impl Gpu {
pub f n new ( ) −> Gpu {
Gpu {
// . . .

gp0 mode : Gp0Mode : : Command ,

}
}

// . . .
}

// / P o s s i b l e s t a t e s f o r t h e GP0 command r e g i s t e r
enum Gp0Mode {
// / D e f a u l t mode : h a n d l i n g commands
Command ,
// / Loading an image i n t o VRAM
ImageLoad ,
}

I also renamed gp0 command remaining into gp0 words remaining since it
will also count the remaining number of image words to load.
We can then tweak the gp0 method to handle this new mode:

130
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
i f s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// We s t a r t a new GP0 command
l e t opcode = ( v a l >> 2 4 ) & 0 x f f ;

l e t ( l e n , method ) =
match opcode {
// . . .
0 xa0 =>
( 3 , Gpu : : g p 0 i m a g e l o a d a s f n (&mut Gpu) ) ,
// . . .
};

s e l f . gp0 words remaining = len ;

s e l f . gp0 command method = method ;

s e l f . gp0 command . c l e a r ( ) ;
}

s e l f . g p 0 w o r d s r e m a i n i n g −= 1 ;

match s e l f . gp0 mode {

Gp0Mode : : Command => {
s e l f . gp0 command . push word ( v a l ) ;

if s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// We have a l l t h e p a r a m e t e r s , we can run
// t h e command
( s e l f . gp0 command method ) ( s e l f ) ;
}
}
Gp0Mode : : ImageLoad => {
// XXX Should copy p i x e l data t o VRAM

if s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// Load done , s w i t c h back t o command mode
s e l f . gp0 mode = Gp0Mode : : Command ;
}
}
}
}
}
I added the gp0 image load command which I consider to be 3 words long.
The method uses those parameters to compute the number of words we must
expect as part of the image data and puts it back in gp0 words remaining while
switching gp0 mode to ImageLoad:
impl Gpu {
// . . .

// / GP0( 0XA0) : Image Load

f n g p 0 i m a g e l o a d (&mut s e l f ) {
// Parameter 2 c o n t a i n s t h e image r e s o l u t i o n
l e t r e s = s e l f . gp0 command [ 2 ] ;

l e t width = r e s & 0 x f f f f ;
l e t h e i g h t = r e s >> 1 6 ;

131
// S i z e o f t h e image i n 16 b i t p i x e l s
l e t i m g s i z e = width ∗ h e i g h t ;

// I f we have an odd number o f p i x e l s we must round up

// s i n c e we t r a n s f e r 32 b i t s a t a time . There ’ l l be 16 b i t s
// o f padding i n t h e l a s t word .
let imgsize = ( imgsize + 1) & ! 1 ;

// S t o r e number o f words e x p e c t e d f o r t h i s image

s e l f . gp0 words remaining = imgsize / 2;

// Put t h e GP0 s t a t e machine i n ImageLoad mode

s e l f . gp0 mode = Gp0Mode : : ImageLoad ;
}
}

Of course in my gp0 implementation above I don’t actually do anything with

the image data. When we add support for the VRAM emulation we should copy
the image data at the right location (given by the first command parameter) but
there’s no reason to bother with that at that point.

4.20 DMA image transfer

The BIOS doesn’t send the image data in a linked list like other GP0 commands,
instead it uses a regular “block” DMA transfer so we need to plug our gp0
method in it:
impl I n t e r c o n n e c t {
// . . .

// / Emulate DMA t r a n s f e r f o r Manual and Request s y n c h r o n i z a t i o n

// / modes .
f n d o d m a b l o c k (&mut s e l f , p o r t : Port ) {
// . . .

w h i l e remsz > 0 {
// . . .

match c h a n n e l . d i r e c t i o n ( ) {
D i r e c t i o n : : FromRam => {
l e t s r c w o r d = s e l f . ram . l o a d 3 2 ( c u r a d d r ) ;

match p o r t {
Port : : Gpu => s e l f . gpu . gp0 ( s r c w o r d ) ,
=>
p a n i c ! ( ” Unhandled DMA d e s t i n a t i o n p o r t {} ” ,
p o r t a s u8 ) ,
}
}
// . . .
}

// . . .
}

// . . .
}
}

Our emulator now loads textures to the GPU and then discards them imme-
diately. Beautiful.

132
4.21 GP1 Display Enable command
After that the bios issues GP1 command 0x03000000 which is used to set the
value of of our display disabled field:
impl Gpu {
// . . .

// / GP1( 0 x03 ) : D i s p l a y Enable

f n g p 1 d i s p l a y e n a b l e (&mut s e l f , v a l : u32 ) {
s e l f . d i s p l a y d i s a b l e d = v a l & 1 != 0 ;
}
}

In this case it sets it to 0, effectively enabling the display.

4.22 GP0 Image Store command

Then the BIOS does something quite perplexing: it issues a 0xc0000000 com-
mand on GP0 which is used to copy data from the VRAM to the CPU/DMA.
The parameters are the same as the “Load Image” command but this time it’s
the GPU providing the pixel data through the GPUREAD register. The CPU
(or DMA) can then read the data 32bits at a time by reading this register until
all the image has been transfered.
I find it perplexing because I can’t imagine what the BIOS is trying to do
here. Since it hasn’t rendered anything worthwhile yet it doesn’t have anything
interesting to recover. Maybe it’s part of a self-test of some sort.
We could try to figure it out by disassembling the code that calls this
command but I don’t want to bother with that. I’m just going to foolishly ignore
it for now. Whatever this code is doing doesn’t seem to prevent the BIOS from
continuing its execution normally (it gets through the boot animation and starts
probing the CDROM). So let’s use a placeholder implementation for now:
impl Gpu {
// . . .

// / GP0( 0 xC0 ) : Image S t o r e

f n g p 0 i m a g e s t o r e (&mut s e l f ) {
// Parameter 2 c o n t a i n s t h e image r e s o l u t i o n
l e t r e s = s e l f . gp0 command [ 2 ] ;

l e t width = r e s & 0 x f f f f ;
l e t h e i g h t = r e s >> 1 6 ;

p r i n t l n ! ( ” Unhandled image s t o r e : {} x {} ” , width , h e i g h t ) ;

}
}

We don’t have to do anything more: after this command the BIOS will expect
the image data to be available through the GPUREAD register. Right now our
implementation of this register always returns 0 so it will read that as many
times as it wants.

4.23 GP0 Shaded Quadrilateral command

At long last we’re reaching the interesting part! The BIOS is starting to draw
the boot animation with the “Sony Computer Entertainment” logo. We’re only

133
missing a few commands before we can proceed to implement the OpenGL
renderer itself.
The first one is 0x380000b2 which draws a shaded quadrilateral. It means
that unlike the previous quad command this one takes one color per vertex and
fills the shape with a Gouraud shading which creates a gradient between those
values. We’ll see that this type of shading is trivial to implement in OpenGL.
This command takes 8 parameters: 4 vertex position and their assorted colors.
As for the other drawing commands let’s put a placeholder for the moment:
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
i f s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// . . .

l e t ( l e n , method ) =
match opcode {
// . . .
0 x38 =>
( 8 , Gpu : : g p 0 q u a d s h a d e d o p a q u e
a s f n (&mut Gpu) ) ,
// . . .
};
// . . .
}
// . . .
}

// / GP0( 0 x38 ) : Shaded Opaque Q u a d r i l a t e r a l

f n g p 0 q u a d s h a d e d o p a q u e (&mut s e l f ) {
p r i n t l n ! ( ”Draw quad shaded ” ) ;
}
}

4.24 GP0 Shaded Triangle command

After that we get to the GP0 command 0x300000b2. This command is almost
identical to the one before except that it draws a triangle instead of a quad. As
such it only takes 6 parameters (3 position/color couples):
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
i f s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// . . .

l e t ( l e n , method ) =
match opcode {
// . . .
0 x30 =>
( 6 , Gpu : : g p 0 t r i a n g l e s h a d e d o p a q u e
a s f n (&mut Gpu) ) ,
// . . .
};
// . . .
}

134
// . . .
}

// / GP0( 0 x30 ) : Shaded Opaque T r i a n g l e

f n g p 0 t r i a n g l e s h a d e d o p a q u e (&mut s e l f ) {
p r i n t l n ! ( ”Draw t r i a n g l e shaded ” ) ;
}
}

4.25 GP0 Textured Quadrilateral With Color Blending

command
Our last drawing command for the moment will be 0x2c808080 which is the
fanciest yet: it draws a quadrilateral, maps a texture on it while blending it with
a solid color. It takes 9 parameters:
impl Gpu {
// . . .

// / Handle w r i t e s t o t h e GP0 command r e g i s t e r

pub f n gp0(&mut s e l f , v a l : u32 ) {
i f s e l f . g p 0 w o r d s r e m a i n i n g == 0 {
// . . .

l e t ( l e n , method ) =
match opcode {
// . . .
0 x2c =>
( 9 , Gpu : : g p 0 q u a d t e x t u r e b l e n d o p a q u e
a s f n (&mut Gpu) ) ,
// . . .
};
// . . .
}
// . . .
}

// / GP0( 0 x2C ) : Textured Opaque Q u a d r i l a t e r a l

f n g p 0 q u a d t e x t u r e b l e n d o p a q u e (&mut s e l f ) {
p r i n t l n ! ( ”Draw quad t e x t u r e b l e n d i n g ” ) ;
}
}

4.26 GP1 Acknowledge Interrupt command

Once the BIOS has finished displaying the boot logo it attempts to acknowledge
the GPU interrupt by issuing the GP1 command 0x02000000. Of course in our
current implementation we never trigger the interrupt in the first place but we
might as well add a simple implementation anyway:
impl Gpu {
// . . .

// / GP1( 0 x02 ) : Acknowledge I n t e r r u p t

f n g p 1 a c k n o w l e d g e i r q (&mut s e l f ) {
s e l f . interrupt = false ;
}
}

135
4.27 GP1 Reset Command Buffer command
And we finish this sequence with GP1 command 0x01000000 which clears the
command FIFO. We don’t implement the FIFO itself yet but we can at least
reset the GP0 state machine to a default state:
impl Gpu {
// . . .

// / GP1( 0 x01 ) : R e s e t Command B u f f e r

f n g p 1 r e s e t c o m m a n d b u f f e r (&mut s e l f ) {
s e l f . gp0 command . c l e a r ( ) ;
s e l f . gp0 words remaining = 0;
s e l f . gp0 mode = Gp0Mode : : Command ;
// XXX s h o u l d a l s o c l e a r t h e command FIFO when we
// implement i t
}
}

I take the opportunity add a call to this function in gp1 reset since it should
also clear the command buffer.
And that’s it! We have our entire GPU command sequence to display the
boot logo. Now we can implement a basic OpenGL renderer to visualize it all.

5 The GPU: Basic OpenGL renderer for the

boot logo
For our first renderer we’re not going to bother with the Video Display: since
the GPU’s internal video memory has a total resolution of 1024x512 we’ll just
display all of the framebuffer at once and draw the primitives directly on the
screen. We just need to take the draw commands, render them in our framebuffer
in the native internal resolution and display it all. Easy.
The first step is to create a window and retreive an OpenGL context to draw
in it. OpenGL itself doesn’t handle things like window creations since that’s
system specific. There are many libraries out there to take care of that (GLFW,
freeglut, etc. . . ). For my emulator I opted for the SDL2 library, mainly because
I’m familiar with it. I’ll also use this library to handle controller input and, later
on, sound support.
If for some reason you prefer to use an other library (or libraries) to handle
these system-spefic interfaces rest assured that it won’t change anything to the
OpenGL code itself, just the window setup code.

5.1 Window and OpenGL context creation

Here’s the code for creating a window and recovering its OpenGL context with
the SDL2:
use sdl2 ;
use sdl2 : : v i d e o : : { OPENGL, WindowPos } ;
use sdl2 : : v i d e o : : GLAttr : : GLContextMajorVersion ;
use sdl2 : : v i d e o : : GLContextMinorVersion ;
use gl ;
use libc : : c void ;

pub s t r u c t R ende rer {

136
s d l c o n t e x t : s d l 2 : : s d l : : Sdl ,
window : s d l 2 : : v i d e o : : Window ,
g l c o n t e x t : s d l 2 : : v i d e o : : GLContext ,
}

impl Re ndere r {

pub f n new ( ) −> Re ndere r {

l e t s d l c o n t e x t = s d l 2 : : i n i t ( : : s d l 2 : : INIT VIDEO ) . unwrap ( ) ;

s d l 2 : : v i d e o : : g l s e t a t t r i b u t e ( GLContextMajorVersion , 3 ) ;
s d l 2 : : v i d e o : : g l s e t a t t r i b u t e ( GLContextMinorVersion , 3 ) ;

l e t window = s d l 2 : : v i d e o : : Window : : new (

&s d l c o n t e x t ,
”PSX” ,
WindowPos : : PosCentered ,
WindowPos : : PosCentered ,
1024 , 512 ,
PENGL) . unwrap ( ) ;

l e t g l c o n t e x t = window . g l c r e a t e c o n t e x t ( ) . unwrap ( ) ;

gl : : load with ( | s |
s d l 2 : : v i d e o : : g l g e t p r o c a d d r e s s ( s ) . unwrap ( )
as ∗ const c v o i d ) ;

Re ndere r {
sdl context : sdl context ,
window : window ,
gl context : gl context ,
}
}
}

The function sdl2::init calls the global SDL2 initialization routine. For
now we’re only using the VIDEO subsystem. In the SDL2 C API this function
doesn’t return anything but the rust bindings return an object that’s used
to call SDL Quit automatically when it’s destroyed. In C you have to call
SDL Quit explicitly when your program exits (or whenever you don’t need the
SDL anymore).
After that the two gl set attribute calls say that we’re going to use
OpenGL 3.334 .
Then Window::new creates the window itself with a resolution of 1024x512
(the resolution of the VRAM) and OpenGL support. I named the window “PSX”
because I don’t have any imagination.
We can retreive the window’s OpenGL context with the gl create context
method and then we must load the OpenGL function pointers. You don’t really
need to understand that part in details, it’s some glue between the OpenGL
and SDL libraries, you just need to make sure it’s done before we start calling
OpenGL commands.
Finally we store the SDL context, window and OpenGL context in the newly
created Renderer object. We need to put an instance of this struct in our GPU:
pub s t r u c t Gpu {
// . . .
34 At the time of writing OpenGL 4.5 is the latest version but 3.3 is more widely supported

and should suffice for what we’re doing although we may end up using a couple extensions.

137
// / OpenGL r e n d e r e r
r e n d e r e r : o p e n g l : : Renderer ,
}

impl Gpu {
pub f n new ( ) −> Gpu {
Gpu {
// . . .
r e n d e r e r : o p e n g l : : Re nder er : : new ( ) ,
}
}
}

If everything works well our emulator should now create a window when
starting up. The window’s contents are garbage however (on my system it
contains a chunk of the screen). We can clear it by issuing the following calls:
impl Re ndere r {

pub f n new ( ) −> Re ndere r {

// . . .

// C l e a r t h e window
unsafe {
gl : : ClearColor ( 0 . , 0. , 0. , 1.0) ;
g l : : C l e a r ( g l : : COLOR BUFFER BIT) ;
}

window . gl swap window ( ) ;

Re ndere r {
// . . .
}
}
}

The unsafe keyword is there because as far as Rust is concerned all OpenGL
calls are a C foreign function interface and are therefore potentially memory
unsafe. The ClearColor35 function sets the clear color (duh): the first three
parameters are the red, green and blue components and the fourth is the alpha
parameter. They all are floating point integers in the range [0.0, 1.0]. In this
case all the color components are 0.0 so the color is black and alpha is set to 1.0
which means it’s fully opaque.
The Clear function then applies this color to the entire color buffer. You’ll
notice that we just give the type of buffer we want to clear as parameter, not a
handle to a specific buffer. That’s the way most of the OpenGL API works: you
“bind” various types of object to an implicit global context and the subsequent
function calls act on the currently bound object of for a given type. In this case
we haven’t bound anything ourselves, by default the color buffer will be the
window’s framebuffer.
The gl swap window forces a window update and displays the result of
the previous commands. With this addition the window should now appear
completely black. Progress!
35 The OpenGL C API concatenates the “gl” prefix to symbols (“GL ” for macros) so in

C ClearColor would be glClearColor and COLOR BUFFER BIT would be GL COLOR BUFFER BIT.
When searching for an OpenGL symbol online it’s sometimes better to use the C form.

138
5.2 Drawing the primitives
Now let’s do something more interesting: drawing the primitives. This is the
part where we’ll have to write a whole lot of OpenGL glue so take a deep breath
and dive in.
Let’s choose a primitive to start with, I’ve decided to use GP0(0x30),the
gouraud shaded triangle. It’s a simple shape with some basic shading. It has
three vertex, each having a position in VRAM and a color. Let’s create structs
to hold those attributes in a shader-friendly fashion:
// / P o s i t i o n i n VRAM.
#[ d e r i v e ( Copy , Clone , D e f a u l t , Debug ) ]
pub s t r u c t P o s i t i o n ( pub GLshort , pub GLshort ) ;

impl P o s i t i o n {
// / P a r s e p o s i t i o n from a GP0 p a r a m e t e r
pub f n f r o m g p 0 ( v a l : u32 ) −> P o s i t i o n {
l e t x = val as i16 ;
l e t y = ( v a l >> 1 6 ) a s i 1 6 ;

P o s i t i o n ( x a s GLshort , y a s GLshort )
}
}

// / RGB c o l o r
#[ d e r i v e ( Copy , Clone , D e f a u l t , Debug ) ]
pub s t r u c t C o l o r ( pub GLubyte , pub GLubyte , pub GLubyte ) ;

impl C o l o r {
// / P a r s e c o l o r from a GP0 p a r a m e t e r
pub f n f r o m g p 0 ( v a l : u32 ) −> C o l o r {
l e t r = v a l a s u8 ;
l e t g = ( v a l >> 8 ) a s u8 ;
l e t b = ( v a l >> 1 6 ) a s u8 ;

C o l o r ( r a s GLubyte , g a s GLubyte , b a s GLubyte )

}
}

I store the Position as a pair of of GLshorts, OpenGL’s signed 16bit integer

type. The color is stored as a triplet of unsigned bytes, GLubyte. Internally
OpenGL uses floats for screen coordinates and colors but we’ll be able to make
the conversion in the shaders.
We can use these new types to create two arrays: one will contain the three
vertex positions of the triangle in VRAM, the other the associated colors:
impl Gpu {
// . . .

// / GP0( 0 x30 ) : Shaded Opaque T r i a n g l e

f n g p 0 t r i a n g l e s h a d e d o p a q u e (&mut s e l f ) {
let positions = [
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 1 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 3 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 5 ] ) ,
];

let colors = [
C o l o r : : f r o m g p 0 ( s e l f . gp0 command [ 0 ] ) ,
C o l o r : : f r o m g p 0 ( s e l f . gp0 command [ 2 ] ) ,
C o l o r : : f r o m g p 0 ( s e l f . gp0 command [ 4 ] ) ,

139
];

s e l f . renderer . push triangle ( positions , colors ) ;

}
}

Now we need to implement this push triangle method that will put the
attributes in a list of vertex to render. That’s where the fun begins.
First we need to setup somme buffers to hold the data. There are several
ways to send data to the GPU, I’ve decided to go with persistently mapped
buffers. The idea is that we’re going to ask OpenGL to allocate some memory
that will be shared between the GPU and us. We’ll fill it with our data and
when we’re ready we’ll tell the GPU to use it to draw the scene. Easy.
To avoid duplicating a bunch of code let’s make a generic Buffer struct
holding an attribute buffer and its mapping:
// Write o n l y b u f f e r with enough s i z e f o r VERTEX BUFFER LEN
elements
pub s t r u c t B u f f e r <T> {
// / OpenGL b u f f e r o b j e c t
o b j e c t : GLuint ,
// / Mapped b u f f e r memory
map : ∗mut T,
}

impl<T : Copy + D e f a u l t > B u f f e r <T> {

// / C r e a t e a new b u f f e r bound t o t h e c u r r e n t v e r t e x a r r a y
// / object .
pub f n new ( ) −> B u f f e r <T> {
l e t mut o b j e c t = 0 ;
l e t mut memory ;

unsafe {
// G e n e r a t e t h e b u f f e r o b j e c t
g l : : G e n B u f f e r s ( 1 , &mut o b j e c t ) ;

// Bind i t
g l : : B i n d B u f f e r ( g l : : ARRAY BUFFER, o b j e c t ) ;

// Compute t h e s i z e o f t h e b u f f e r
l e t e l e m e n t s i z e = s i z e o f : : <T>() a s G L s i z e i p t r ;
l e t b u f f e r s i z e = e l e m e n t s i z e ∗ VERTEX BUFFER LEN a s
GLsizeiptr ;

// Write o n l y p e r s i s t e n t mapping . Not c o h e r e n t !

l e t a c c e s s = g l : : MAP WRITE BIT | g l : : MAP PERSISTENT BIT
;

// A l l o c a t e b u f f e r memory
g l : : B u f f e r S t o r a g e ( g l : : ARRAY BUFFER,
buffer size ,
ptr : : null () ,
access ) ;

// Remap t h e e n t i r e b u f f e r
memory = g l : : MapBufferRange ( g l : : ARRAY BUFFER,
0,
buffer size ,
a c c e s s ) a s ∗mut T ;

// R e s e t t h e b u f f e r t o 0 t o a v o i d hard−to−r e p r o d u c e

140
bugs
// i f we do s o m e t h i n g wrong with u n i t i a l i z e d memory
l e t s = s l i c e : : f r o m r a w p a r t s m u t ( memory ,
VERTEX BUFFER LEN a s
usize ) ;

for x in s . iter mut () {

∗x = D e f a u l t : : d e f a u l t ( ) ;
}
}

Buffer {
object : object ,
map : memory ,
}
}

// / S e t e n t r y a t ‘ index ‘ t o ‘ v a l ‘ i n t h e b u f f e r .
pub f n s e t (&mut s e l f , i n d e x : u32 , v a l : T) {
i f i n d e x >= VERTEX BUFFER LEN {
panic ! ( ” b u f f e r overflow ! ” ) ;
}

unsafe {
l e t p = s e l f . map . o f f s e t ( i n d e x a s i s i z e ) ;

∗p = v a l ;
}
}
}

impl<T> Drop f o r B u f f e r <T> {

f n drop(&mut s e l f ) {
unsafe {
g l : : B i n d B u f f e r ( g l : : ARRAY BUFFER, s e l f . o b j e c t ) ;
g l : : UnmapBuffer ( g l : : ARRAY BUFFER) ;
g l : : DeleteBuffers (1 , &s e l f . object ) ;
}
}
}

// / Maximum number o f v e r t e x t h a t can be s t o r e d i n an a t t r i b u t e

// / b u f f e r s
c o n s t VERTEX BUFFER LEN : u32 = 64 ∗ 1 0 2 4 ;

That’s a lot of code to simply allocate a buffer! Let’s walk through it:

• First GenBuffers creates a new buffer object. That doesn’t allocate the
buffer memory, it basically just creates a handle.
• This handle is then bound with BindBuffer, from then on the commands
targetting ARRAY BUFFER will use this buffer.
• We must then compute the size of the buffer in bytes. I’ve decided to
hardcode the length of the buffer in VERTEX BUFFER LEN, ideally it should
be big enough to hold an entire scene (otherwise we’ll have to make several
draw calls per frame), but not too big in order not to waste memory. We’ll
probably want to better tune that constant later.
• Once we know how much room we need we can ask OpenGL to allocate
it for us. We request MAP WRITE BIT since we want to write-only access

141
to the buffer and MAP PERSISTENT BIT to be able to hold the mapping
persistently (instead of having to remap it for each frame).

• Now we can retreive a pointer to this memory location using MapBufferRange

to remap the buffer in the process’ address space.
• To make debugging easier if we mess something up I then reset the buffer’s
memory to zero. This way if we attempt to draw an unused part of the
buffer by mistake we’ll still have a well defined behaviour instead of drawing
random unitialized data.
• The set method will be used to store an entry in the buffer.
• The Drop destructor will cleanup everything when we’re done.

We can add our two buffers to the Renderer right now but creating buffers
without having any shaders to render them isn’t very useful.

5.3 The vertex shader

If you’re not familiar with the concept of shaders you should take the time to
read about them a little before we continue. Basically they’re programs executed
by various GPU stages. We’ll only need two shaders for now: the vertex shader
and the fragment shader.
The vertex shader is the first programmable stage in the OpenGL pipeline.
It will receive the vertex coordinates and the colors from our attribute buffers.
It’ll have to convert them from the Playstation VRAM representation to the one
used by OpenGL and pass them on to the next stage:
#v e r s i o n 330 c o r e

in ivec2 vertex position ;

i n uvec3 v e r t e x c o l o r ;

out v e c 3 c o l o r ;

v o i d main ( ) {
// Convert VRAM c o o r d i n a t e s ( 0 ; 1 0 2 3 , 0 ; 5 1 1 ) i n t o
// OpenGL c o o r d i n a t e s ( − 1 ; 1 , −1;1)
f l o a t xpos = ( f l o a t ( v e r t e x p o s i t i o n . x ) / 5 1 2 ) − 1 . 0 ;
// VRAM p u t s 0 a t t h e top , OpenGL a t t h e bottom ,
// we must m i r r o r v e r t i c a l l y
f l o a t ypos = 1 . 0 − ( f l o a t ( v e r t e x p o s i t i o n . y ) / 2 5 6 ) ;

g l P o s i t i o n . xyzw = v e c 4 ( xpos , ypos , 0 . 0 , 1 . 0 ) ;

// Convert t h e components from [ 0 ; 2 5 5 ] to [ 0 ; 1 ]

c o l o r = vec3 ( f l o a t ( v e r t e x color . r ) / 255 ,
float ( vertex color . g ) / 255 ,
float ( vertex color . b) / 255) ;
}

OpenGL shader language, also called GLSL, looks a bit like C but don’t
let that fool you, it’s actually quite different. For one you can see that the
parameters and return values are not given in the main prototype, instead they’re
given at the global scope as in and out parameters.

142
We have two in parameters: the vertex position (a pair of signed integers)
and its color (a triplet of unsigned integers). The main function is called once
for each vertex. Our triangle as three vertices so it’ll be called 3 times.
The shader sets two output variables: color (a triplet of three floats) and
gl Position which is a a builtin GLSL variable, a vector of four floats. The last
two components of gl Position are the z (depth) coordinate which is always
0 for us since we’re drawing in 2D and the w parameter (the homogeneous
component) which should be 1.0 for a position. This last parameter is used for
perspective correct projection36 .
You can see that the OpenGL horizontal and vertical screen coordinates go
from -1.0 to 1.0 (no matter the actual resolution of the screen) and that the
vertical coordinates go in the opposite direction than the Playstation VRAM
addressing. OpenGL colors are also floats in the range [0.0, 1.0].
You can see that our vertex shader does all the work of converting coordinates
and colors from the Playstation internal representation to the OpenGL format.
In general we’ll want to offload as much computation as possible to the GPU
since I’m expecting the emulation bottleneck to be on the CPU.
After all Playstation graphics are extremely simple compared to modern
games, for instance modern GPUs have gigabytes worth of video RAM compared
to the Playstation’s puny 2MB. Even if we enhance the graphics significantly
our graphic cards shouldn’t break a sweat if we’re careful not to write extremely
poorly optimized shader code.

5.4 The fragment shader

Once the primitive passed through the vertex shader it will be rasterized37 .
In the rasterization process the triangle primitive is converted into individual
fragments. In our case the fragments will be the individual screen pixels but
with multisampling enabled you can get several fragments per pixels that get
averaged to produce the final pixel value.
For each fragment in the rasterized primitive the GPU then runs the fragment
shader whose job is to produce the fragment’s color:
#v e r s i o n 330 c o r e

in vec3 c o l o r ;
out v e c 4 f r a g c o l o r ;

v o i d main ( ) {
f r a g c o l o r = vec4 ( c o l o r , 1 . 0 ) ;
}

Pretty straightforward: the output color frag color (the name is arbitrary)
takes the value of the input attribute color and a fourth value which is the alpha
channel to handle transparent pixels. In our case the pixels are fully opaque so
it’s hardcoded to 1.0.
If you’re not familiar with OpenGL you’re probably puzzled, what’s the value
of this color parameter exactly? A triangle has three vertices, potentially each
with a different color, so which one do we get here?
36 If you’re not familiar with homogeneous coordinates don’t worry, all you have to know for

now is that you have to set the w component to 1.0 for a position and 0.0 for a vector.
37 There are actually a couple more stages before that in modern OpenGL like the tesselation

and geometry shaders but we don’t need to bother with that.

143
What happens is that in this case OpenGL tells the GPU to interpolate the
value of the color based on its distance to the three vertices and their respective
color. That means that we’ll get a smooth gradient which is exactly what we
need for the gouraud shading. OpenGL does all the hard work for us!

Figure 1: OpenGL shaded RGB triangle

Figure 1 shows an example of a triangle rendered with our fragment shader:

each of the three vertex is colored using one of the RGB colors and we can see
that the GPU interpolates the gradient for each of the pixels inside the triangle.

5.5 Compiling and linking the shaders

We can now piece our shaders together in our Renderer:
pub s t r u c t R ende rer {
// . . .

// / Ve rt e x s h a d e r o b j e c t
v e r t e x s h a d e r : GLuint ,
// / Fragment s h a d e r o b j e c t
f r a g m e n t s h a d e r : GLuint ,
// / OpenGL Program o b j e c t
program : GLuint ,
// / OpenGL V er te x a r r a y o b j e c t
v e r t e x a r r a y o b j e c t : GLuint ,
// / B u f f e r c o n t a i n i n g t h e v e r t i c e p o s i t i o n s
p o s i t i o n s : B u f f e r <P o s i t i o n >,
// / B u f f e r c o n t a i n i n g t h e v e r t i c e c o l o r s
c o l o r s : B u f f e r <Color >,
// / C u r r e n t number o r v e r t i c e s i n t h e b u f f e r s
n v e r t i c e s : u32 ,
}

impl Re ndere r {

pub f n new ( ) −> Re ndere r {

// . . .

// ” S l u r p ” t h e c o n t e n t s o f t h e s h a d e r f i l e s . Note : t h i s i s
// a c o m p i l e −time t h i n g .
l e t vs src = i n c l u d e s t r ! ( ” vertex . g l s l ”) ;
l e t f s s r c = i n c l u d e s t r ! ( ” fragment . g l s l ” ) ;

// Compile our s h a d e r s . . .
let vertex shader = compile shader ( vs src ,

144
g l : : VERTEX SHADER) ;
l e t fragment shader = compile shader ( f s s r c ,
g l : : FRAGMENT SHADER) ;
// . . . Link our program . . .
l e t program = l i n k p r o g r a m (&[ v e r t e x s h a d e r ,
fragment shader ] ) ;
// . . . And u s e i t .
unsafe {
g l : : UseProgram ( program ) ;
}

// G e n e r a t e our v e r t e x a t t r i b u t e o b j e c t t h a t w i l l h o l d our
// v e r t e x a t t r i b u t e s
l e t mut vao = 0 ;
unsafe {
g l : : GenVertexArrays ( 1 , &mut vao ) ;
// Bind our VAO
g l : : BindVertexArray ( vao ) ;
}

// Setup t h e ” p o s i t i o n ” a t t r i b u t e . F i r s t we c r e a t e
// t h e b u f f e r h o l d i n g t h e p o s i t i o n s ( t h i s c a l l a l s o
// b i n d s i t )
l e t p o s i t i o n s = B u f f e r : : new ( ) ;

unsafe {
// Then we r e t r e i v e t h e i n d e x f o r t h e a t t r i b u t e i n t h e
// s h a d e r
l e t i n d e x = f i n d p r o g r a m a t t r i b ( program ,
” vertex position ”) ;

// Enable i t
g l : : EnableVertexAttribArray ( index ) ;

// Link t h e b u f f e r and t h e i n d e x : 2 GLshort a t t r i b u t e s ,

// not n o r m a l i z e d . That s h o u l d send t h e data untouched
// t o t h e v e r t e x s h a d e r .
g l : : V e r t e x A t t r i b I P o i n t e r ( index , 2 , g l : : SHORT, 0 , p t r : :
null () ) ;
}

// Setup t h e ” c o l o r ” a t t r i b u t e and bind i t

l e t c o l o r s = B u f f e r : : new ( ) ;

unsafe {
l e t i n d e x = f i n d p r o g r a m a t t r i b ( program ,
” vertex color ”) ;
g l : : EnableVertexAttribArray ( index ) ;

// Link t h e b u f f e r and t h e i n d e x : 3 GLByte a t t r i b u t e s ,

// not n o r m a l i z e d . That s h o u l d send t h e data untouched
// t o t h e v e r t e x s h a d e r .
g l : : V e r t e x A t t r i b I P o i n t e r ( index ,
3,
g l : : UNSIGNED BYTE,
0,
ptr : : null () ) ;
}

Re ndere r {
sdl context : sdl context ,
window : window ,

145
gl context : gl context ,
vertex shader : vertex shader ,
fragment shader : fragment shader ,
program : program ,
v e r t e x a r r a y o b j e c t : vao ,
positions : positions ,
colors : colors ,
nvertices : 0 ,
}
}
// . . .
}

Quite a lot of code to go through here. I put the code for our two shaders
described earlier in two files named “vertex.glsl” and “fragment.glsl” respectively.
I retreive their contents here using Rust’s include str directive. Then I ask
OpenGL to compile both shaders using the compile shader helper function:
pub f n c o m p i l e s h a d e r ( s r c : &s t r , s h a d e r t y p e : GLenum) −> GLuint {
l e t shader ;

unsafe {
shader = g l : : CreateShader ( shader type ) ;
// Attempt t o c o m p i l e t h e s h a d e r
l e t c s t r = C S t r i n g : : new ( s r c . a s b y t e s ( ) ) . unwrap ( ) ;
g l : : S h a d e r S o u r c e ( s h a d e r , 1 , &c s t r . a s p t r ( ) , p t r : : n u l l ( ) ) ;
g l : : CompileShader ( s h a d e r ) ;

// Extra b i t o f e r r o r c h e c k i n g i n c a s e we ’ r e not u s i n g a
// DEBUG OpenGL c o n t e x t and c h e c k f o r e r r o r s can ’ t do i t
// p r o p e r l y :
l e t mut s t a t u s = g l : : FALSE a s GLint ;
g l : : G e t S h a d e r i v ( s h a d e r , g l : : COMPILE STATUS, &mut s t a t u s ) ;

if s t a t u s != ( g l : : TRUE a s GLint ) {
p a n i c ! ( ” Shader c o m p i l a t i o n f a i l e d ! ” ) ;
}
}

shader
}

The compilation is always done at runtime when we start the emulator.

Once the shaders are compiled we must link them together to form a complete
OpenGL “program”. This is done by the link program helper function:
pub f n l i n k p r o g r a m ( s h a d e r s : &[ GLuint ] ) −> GLuint {
l e t program ;

unsafe {
program = g l : : CreateProgram ( ) ;

f o r &s h a d e r i n s h a d e r s {
g l : : AttachShader ( program , s h a d e r ) ;
}

g l : : LinkProgram ( program ) ;

// Extra b i t o f e r r o r c h e c k i n g i n c a s e we ’ r e not u s i n g a
// DEBUG OpenGL c o n t e x t and c h e c k f o r e r r o r s can ’ t do i t
// p r o p e r l y :
l e t mut s t a t u s = g l : : FALSE a s GLint ;

146
g l : : GetProgramiv ( program , g l : : LINK STATUS , &mut s t a t u s ) ;

if s t a t u s != ( g l : : TRUE a s GLint ) {
p a n i c ! ( ”OpenGL program l i n k i n g f a i l e d ! ” ) ;
}
}

program
}

Once the program is linked UseProgram activates it. We can then setup our
position and color attributes.

5.6 Vertex array objects

First we need to create a “vertex array object” (VAOs) to old the attributes.
The idea is that if you have different sets of attributes in your application and
you want to be able to switch rapidly you create one vertex array object per set
and you can then switch between them with a single call (instead of one call per
attribute).
We don’t really need more than one set at that point so we just create a
single one with GenVertexArrays and bind it with BindVertexArray.
At last we use our Buffer struct to initialize the positions buffer. We then
need to associate it with the vertex position attribute in the vertex shader.
In order to do this we use the find program attrib function to recover the
attribute index is the ‘program‘:
// / Return t h e i n d e x o f a t t r i b u t e ‘ a t t r ‘ i n ‘ program ‘ . P a n i c s i f
// / t h e i n d e x i s n ’ t found .
pub f n f i n d p r o g r a m a t t r i b ( program : GLuint , a t t r : &s t r ) −> GLuint {
l e t c s t r = C S t r i n g : : new ( a t t r ) . unwrap ( ) . a s p t r ( ) ;

l e t i n d e x = u n s a f e { g l : : G e t A t t r i b L o c a t i o n ( program , c s t r ) } ;

i f index < 0 {
p a n i c ! ( ” A t t r i b u t e \”{}\” not found i n program ” , a t t r ) ;
}

i n d e x a s GLuint
}

We must then enable the attribute with EnableVertexAttribArray and we

describe the format of the buffer with VertexAttribIPointer. This last call is
very important to get right, otherwise the program’s behavior will be potentially
undefined:

• The first parameter is the index of the attribute in the program.

• The second parameter contains the number of elements per vertex in the
buffer. For the position we have the x and y coordinates, so that’s two. It
matches our declaration in the vertex shader since we used an ivec2 to
hold this value.
• The third parameter is the type of each element. It must match the type
we’re using to represent the values in our rust code. In this case we’re
using GLshorts to hold the coordinates so we set it to SHORT.

147
• The fourth parameter is the “stride” which is a number of bytes the GPU
will skip between each value. Since we don’t have any padding in our buffer
we set it to 0.

• The last parameter is an optional pointer to some data that will be copied
as the initial value of the attribute buffer. We don’t have any data to put
in at that point (and we could do it through our Buffer mapping if we
wanted anyway) so we set it to NULL.

After this call our position buffer will be ready for use!
We then go through the same sequence for our color buffer, the only difference
being the parameters to the VertexAttribIPointer call: this time we have
three values per vertex and the type is UNSIGNED BYTE.
Finally I put it all in the Renderer struct along with an nvertices variable
that will hold the current number of vertices ready to be drawn in the vertex
buffers.
In order to clean everything up properly when we exit we need a destructor
to release the resources:
impl Drop f o r Rend erer {
f n drop(&mut s e l f ) {
unsafe {
g l : : DeleteVertexArrays (1 , &s e l f . v e r t e x a r r a y o b j e c t ) ;
gl : : DeleteShader ( s e l f . vertex shader ) ;
gl : : DeleteShader ( s e l f . fragment shader ) ;
g l : : DeleteProgram ( s e l f . program ) ;
}
}
}

5.7 OpenGL rendering and synchronization

Now we have everything to finally implement our push triangle command. It
will just push the three positions and colors into their respective buffers. However
we need to be careful not to overflow so if the buffers are full we must force an
early draw:
impl Re ndere r {
// . . .

// / Add a t r i a n g l e t o t h e draw b u f f e r
pub f n p u s h t r i a n g l e (&mut s e l f ,
positions : [ Position ; 3] ,
colors : [ Color ; 3 ] ) {

// Make s u r e we have enough room l e f t t o queue t h e v e r t e x

i f s e l f . n v e r t i c e s + 3 > VERTEX BUFFER LEN {
p r i n t l n ! ( ” V e rt ex a t t r i b u t e b u f f e r s f u l l , f o r c i n g draw ” )
;
s e l f . draw ( ) ;
}

for i in 0 . . 3 {
// Push
s e l f . positions . set ( s e l f . nvertices , positions [ i ]) ;
s e l f . colors . set ( s e l f . nvertices , colors [ i ]) ;
s e l f . n v e r t i c e s += 1 ;

148
}
}
}

The draw command itself is not very complicated but we need to be careful
to synchronize ourselves properly with the GPU. That means flushing our buffers
before we ask the GPU to start drawing and then waiting for the rendering to
finish before we touch the buffers again:
impl Re ndere r {
// . . .

// / Draw t h e b u f f e r e d commands and r e s e t t h e b u f f e r s

pub f n draw(&mut s e l f ) {
unsafe {
// Make s u r e a l l t h e data from t h e p e r s i s t e n t mappings
// i s f l u s h e d t o t h e b u f f e r
g l : : MemoryBarrier ( g l : : CLIENT MAPPED BUFFER BARRIER BIT) ;

g l : : DrawArrays ( g l : : TRIANGLES,
0,
s e l f . n v e r t i c e s as GLsizei ) ;
}

// Wait f o r GPU t o c o m p l e t e
unsafe {
l e t s y n c = g l : : FenceSync ( g l : : SYNC GPU COMMANDS COMPLETE,
0) ;

loop {
l e t r = g l : : ClientWaitSync (
sync ,
g l : : SYNC FLUSH COMMANDS BIT,
10000000) ;

i f r == g l : : ALREADY SIGNALED | |
r == g l : : CONDITION SATISFIED {
// Drawing done
break ;
}
}
}

// R e s e t t h e b u f f e r s
s e l f . nvertices = 0;
}
}

The call to MemoryBarrier makes sure the data written to the mapped buffer
is visible by the GPU instead of, say, stuck in a CPU cache. We could avoid this
call by mapping the buffer with the MAP COHERENT BIT access flag set but that
might make writing to the buffers slower so it’s not necessarily better.
The DrawArrays function is where the magic happens: it tells the GPU to
draw nvertices as triangles. Once this command is issued the GPU will start
working asynchronously so we must be careful: if we start pushing new data to
the buffers before the GPU is done we might overwrite attributes that are still
in use which may cause glitches.
To avoid that we simply wait for the GPU to finish by using a fence:
FencSync creates a fence waiting for the current commands to complete and
ClientWaitSync is used to wait for completion.

149
Finally we reset nvertices to 0 to start anew.
This method is actually pretty suboptimal: we stall our emulator completely
when the GPU is working. We could improve this by using double buffering on
for our attributes but let’s leave that for later.
This draw command will render everything but it won’t display anything
until we swap the window’s buffer. We can add a display command to do just
that:
impl Render {
// . . .

// / Draw t h e b u f f e r e d commands and d i s p l a y them

pub f n d i s p l a y (&mut s e l f ) {
s e l f . draw ( ) ;

s e l f . window . gl swap window ( ) ;

}
}

Now we need to figure out when to call this method. Normally we’d want
to call it at each VSYNC, so 60 or 50 times per second depending on the video
mode but we don’t support GPU timings yet. Instead for the time being we can
find a command that the BIOS calls once per frame and put the display call in
there. Once such command seems to be “Set Drawing Offset” so let’s put our
call to display in there:
impl Gpu {
// . . .

// / GP0( 0 xE5 ) : S e t Drawing O f f s e t

f n g p 0 d r a w i n g o f f s e t (&mut s e l f ) {
// . . .

// XXX Temporary hack : f o r c e d i s p l a y when c h a n g i n g o f f s e t

// s i n c e we don ’ t have p r o p e r t i m i n g s
s e l f . renderer . display () ;
}
}

We should now finally be ready to draw ou first triangles. If you restart the
emulator you should end up with the image in figure 2.
The two triangles start back-to-back and then move and shrink to their final
position. Since we don’t yet draw the background quad they’re all drawn on top
of each other which gives this color smearing effect. Note that the image has
a weird aspect ratio (2:1) and that the logo is not centered, it’s because we’re
displaying the entire VRAM framebuffer instead of just the 640x480 portion
configured in the video output.

5.8 OpenGL debugging

You might have noticed that there’s not a whole lot of error checking in my
OpenGL code above. We could call ‘GetError‘ after every OpenGL function but
that’s annoying an noisy. Instead I prefer to use the debug extension.
This extension logs errors, warnings, performance notices and other messages
to an internal queue. We can then call GetDebugMessageLog to retreive the

150
Figure 2: First output of our OpenGL renderer

messages38 :
// / Check f o r OpenGL e r r o r s u s i n g ‘ g l : : GetDebugMessageLog ‘ . I f a
// / s e v e r e e r r o r i s e n c o u n t e r e d t h i s f u n c t i o n p a n i c s . I f t h e OpenGL
// / c o n t e x t doesn ’ t have t h e DEBUG a t t r i b u t e t h i s ∗ p r o b a b l y ∗ won ’ t
do
// / a n y t h i n g .
pub f n c h e c k f o r e r r o r s ( ) {
l e t mut f a t a l = f a l s e ;

loop {
l e t mut b u f f e r = v e c ! [ 0 ; 4096];

let mut severity = 0;

let mut source = 0;
let mut m e s s a g e s i z e= 0 ;
let mut mtype = 0 ;
let mut id = 0;

l e t count =
unsafe {
g l : : GetDebugMessageLog ( 1 ,
b u f f e r . len ( ) as GLsizei ,
&mut s o u r c e ,
&mut mtype ,
&mut id ,
&mut s e v e r i t y ,
&mut m e s s a g e s i z e ,
b u f f e r . a s m u t p t r ( ) a s ∗mut
GLchar )
};

i f count == 0 {
// No m e s s a g e s l e f t
break ;
}

38 I’m leaving out the definition of the various Debug* types which are just thin wrappers

around the OpenGL values, as always check the repository if you want to see the entire code.

151
b u f f e r . truncate ( m e s s a g e s i z e as u s i z e ) ;

l e t message =
match s t r : : f r o m u t f 8 (& b u f f e r ) {
Ok(m) => m,
Err ( e ) => p a n i c ! ( ”Got i n v a l i d message : {} ” , e ) ,
};

l e t s o u r c e = DebugSource : : from raw ( s o u r c e ) ;

l e t s e v e r i t y = D e b u g S e v e r i t y : : from raw ( s e v e r i t y ) ;
l e t mtype = DebugType : : from raw ( mtype ) ;

p r i n t l n ! ( ”OpenGL [ { : ? } | { : ? } | { : ? } | 0 x { : x } ] {} ” ,
s e v e r i t y , s o u r c e , mtype , id , message ) ;

if severity . i s f a t a l () {
// Something i s v e r y wrong , don ’ t d i e j u s t y e t i n o r d e r
to
// d i s p l a y any a d d i t i o n a l e r r o r message
f a t a l = true ;
}
}

if fatal {
p a n i c ! ( ” F a t a l OpenGL e r r o r ” ) ;
}
}

We can then call the check for errors method after critical sections: in
‘draw‘ for instance to check for errors in the past frame but also at the end of
‘new‘ to make sure the initialization went well. There’s one caveat though: the
debug extension only works when we use a debug OpenGL context. We can get
one by setting the CONTEXT DEBUG attribute before we create the window:
sdl2 : : video : : g l s e t a t t r i b u t e (
GLAttr : : GLContextFlags ,
s d l 2 : : v i d e o : : GL CONTEXT DEBUG. b i t s ( ) ) ;

A debug context might be slower than a normal one though so we’ll probably
want to only activate this for troubleshooting (via a command line flag or
something like that). For now performances don’t matter in the least so we can
leave it enabled at all times.
The error messages themselves are vendor specific but hopefully they should
be helpful. For instance with my radeon card if I mess up my vertex shader by
replacing vec3 by vec4 in the color affectation I get the following message:
OpenGL [High|ShaderCompiler|Error|0x1] 0:19(10):
error: too few components to vec4

5.9 Drawing quadrilaterals

Modern OpenGL doesn’t support quads, only points, lines and triangles39 .
Fortunately for us, neither does the Playstation GPU! When a quad draw
command is received it’s interpreted as two triangles and drawn that way. This
is significant for gouraud shaded quadrilaterals since it means that only three
vertices are ever used to interpolate the color of any pixel in the quad. For
textured quads it shouldn’t make any difference.
39 Although you can emulate proper quad shading in shaders if you really need to.

152
We can emulate that behavior in a push quad method:
Impl Re nder er {
// . . .

// / Add a quad t o t h e draw b u f f e r

pub f n push quad (&mut s e l f ,
positions : [ Position ; 4] ,
colors : [ Color ; 4 ] ) {

// Make s u r e we have enough room l e f t t o queue t h e v e r t e x .

We
// need t o push two t r i a n g l e s t o draw a quad , s o 6 v e r t e x
i f s e l f . n v e r t i c e s + 6 > VERTEX BUFFER LEN {
// The v e r t e x a t t r i b u t e b u f f e r s a r e f u l l , f o r c e an
early
// draw
s e l f . draw ( ) ;
}

// Push t h e f i r s t t r i a n g l e
for i in 0 . . 3 {
s e l f . positions . set ( s e l f . nvertices , positions [ i ]) ;
s e l f . colors . set ( s e l f . nvertices , colors [ i ]) ;
s e l f . n v e r t i c e s += 1 ;
}

// Push t h e 2nd t r i a n g l e
for i in 1 . . 4 {
s e l f . positions . set ( s e l f . nvertices , positions [ i ]) ;
s e l f . colors . set ( s e l f . nvertices , colors [ i ]) ;
s e l f . n v e r t i c e s += 1 ;
}
}
}

We must duplicate the two vertices shared by the two triangles across one of
the quad’s diagonal so we end up with 6 vertices for a single quad. It’s possible
to avoid that duplication (for instance by using indexed rendering) but at that
point it would be premature optimization.
Now all that’s left to do is to is use push quad to draw the monochrome and
shaded quadrilaterals:
impl Gpu {
// . . .

// / GP0( 0 x28 ) : Monochrome Opaque Q u a d r i l a t e r a l

f n gp0 quad mono opaque (&mut s e l f ) {
let positions = [
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 1 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 2 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 3 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 4 ] ) ,
];

// Only one c o l o r r e p e a t e d 4 t i m e s
l e t c o l o r s = [ C o l o r : : f r o m g p 0 ( s e l f . gp0 command [ 0 ] ) ; 4];

s e l f . r e n d e r e r . push quad ( p o s i t i o n s , c o l o r s ) ;
}

// / GP0( 0 x38 ) : Shaded Opaque Q u a d r i l a t e r a l

153
f n g p 0 q u a d s h a d e d o p a q u e (&mut self ) {
let positions = [
P o s i t i o n : : from gp0 ( s e l f . gp0 command [ 1 ] ) ,
P o s i t i o n : : from gp0 ( s e l f . gp0 command [ 3 ] ) ,
P o s i t i o n : : from gp0 ( s e l f . gp0 command [ 5 ] ) ,
P o s i t i o n : : from gp0 ( s e l f . gp0 command [ 7 ] ) ,
];

let colors = [
Color : : from gp0 ( self . gp0 command [ 0 ] ) ,
Color : : from gp0 ( self . gp0 command [ 2 ] ) ,
Color : : from gp0 ( self . gp0 command [ 4 ] ) ,
Color : : from gp0 ( self . gp0 command [ 6 ] ) ,
];

s e l f . r e n d e r e r . push quad ( p o s i t i o n s , c o l o r s ) ;
}
}

Even though we use per-vertex colors it’s easy to draw monochrome primitives
by repeating the same color. We have encountered a third quad command,
gp0 quad texture blend opaque but since we don’t support textures we can’t
implement that correctly yet. In the meantime we can use a solid color instead,
it won’t look right but at least we’ll see something:
impl Gpu {
// . . .

// / GP0( 0 x2C ) : Textured Opaque Q u a d r i l a t e r a l

f n g p 0 q u a d t e x t u r e b l e n d o p a q u e (&mut s e l f ) {
let positions = [
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 1 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 3 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 5 ] ) ,
P o s i t i o n : : f r o m g p 0 ( s e l f . gp0 command [ 7 ] ) ,
];

// XXX We don ’ t s u p p o r t t e x t u r e s f o r now , u s e a s o l i d r e d

// c o l o r i n s t e a d
l e t c o l o r s = [ C o l o r ( 0 x80 , 0 x00 , 0 x00 ) ; 4 ] ;

s e l f . r e n d e r e r . push quad ( p o s i t i o n s , c o l o r s ) ;
}
}

Lo and behold, we should now have something that looks very much like the
“Sony Computer Entertainment” boot logo, minus the text which is contained in
the textures. Figure 3 shows the expected output.
As before the black area at the right and bottom of the image is due to the
fact that we display the entire framebuffer instead of just the part configured
in the video output. You can see that a single 640x480 image already takes
more than half of the entire VRAM and we’re only displaying a very simple
logo. Game developers back then had to be very careful with VRAM usage
(and memory usage in general). This is also one of the reasons most games are
rendered at lower resolutions like 640x240, but we’ll see that later.
Note that there are two ways to split a quadrilateral in two triangles by
cutting along either diagonal. The choice is significant, figure 4 shows the result
of splitting across the other diagonal40 . You can see that the main “tilted square”
40 I modified push quad: instead of rendering triangles with vertex indexes [0, 1, 2] and

154
Figure 3: Playstation boot logo without textures

Figure 4: Playstation boot logo with bad quad rendering

behind the two triangles is shaded differently. If your emulator’s output looks
like this it means that you’re not rendering the quads in the right order, you
need to split along the other diagonal.

5.10 Draw Offset emulation

Our OpenGL renderer is very basic but we can at least add the draw offset easily.
Of course the most obvious way would be to add it to the Positions before we
put them in the attribute buffer but instead we can have the vertex shader do it
for us!
In order to do this we can declare an “uniform” in the shader code:
// . . .

[1, 2, 3] I used [2, 3, 0] and [3, 0, 1].

155
// Drawing o f f s e t
uniform i v e c 2 o f f s e t ;

v o i d main ( ) {
ivec2 position = vertex position + o f f s e t ;

// Convert VRAM c o o r d i n a t e s ( 0 ; 1 0 2 3 , 0 ; 5 1 1 ) i n t o
// OpenGL c o o r d i n a t e s ( − 1 ; 1 , −1;1)
f l o a t xpos = ( f l o a t ( p o s i t i o n . x ) / 5 1 2 ) − 1 . 0 ;
// VRAM p u t s 0 a t t h e top , OpenGL a t t h e bottom ,
// we must m i r r o r v e r t i c a l l y
f l o a t ypos = 1 . 0 − ( f l o a t ( p o s i t i o n . y ) / 2 5 6 ) ;

g l P o s i t i o n . xyzw = v e c 4 ( xpos , ypos , 0 . 0 , 1 . 0 ) ;

// . . .
}

Uniforms are inputs that are shared across all the intances of the shader. So
instead of having an offset vertex attribute with one entry per vertex we can
have a single variable that will be used for an entire batch of primitives.
To be able to modify the value of the uniform from our code we must retreive
the index like we did for the vertex attributes. We can then set its value using
Uniform2i41 :
impl Re ndere r {
// . . .

// / Index o f t h e ” o f f s e t ” s h a d e r u n i f o r m
u n i f o r m o f f s e t : GLint ,
}

impl Re ndere r {

pub f n new ( ) −> Re ndere r {

// . . .

// R e t r e i v e and i n i t i a l i z e t h e draw o f f s e t
l e t u n i f o r m o f f s e t = f i n d p r o g r a m u n i f o r m ( program ,
” offset ”) ;
unsafe {
g l : : Uniform2i ( u n i f o r m o f f s e t , 0 , 0) ;
}

Re ndere r {
// . . .

uniform offset : uniform offset ,

}
}

// . . .
}

We can now add a method to set the value of the uniform. We need to be
careful to draw the currently buffered primitives before we change the offset
since those were supposed to be drawn with the previous value and might end
up located at the wrong place:
41 The 2i part means that the function works on ivec2s, there are other Uniform* functions

for the various other types.

156
impl Re ndere r {
// . . .

// / S e t t h e v a l u e o f t h e u n i f o r m draw o f f s e t
pub f n s e t d r a w o f f s e t (&mut s e l f , x : i 1 6 , y : i 1 6 ) {
// F o r c e draw f o r t h e p r i m i t i v e s with t h e c u r r e n t o f f s e t
s e l f . draw ( ) ;

// Update t h e u n i f o r m v a l u e
unsafe {
g l : : Uniform2i ( s e l f . u n i f o r m o f f s e t ,
x a s GLint ,
y a s GLint ) ;
}
}
}

Finally we can get rid of our drawing x offset and drawing y offset
member variables in the GPU and call set draw offset directly instead.
The fact that we have to force a partial draw every time the offset is changed
means that in pathological cases this might end up being slower. For instance
if a game draws thounsands of triangles, changing the offset between each one,
we’ll issue thousands of partial draw commands. In this case it would probably
be faster to simply add the offset before we push the Positions in the attribute
buffer.

5.11 Handling SDL2 events and exiting cleanly

Before we move on I want to fix one annoying problem introduced by our brand
new SDL2 window: since we don’t handle SDL events we can’t exit the emulator
cleanly. And since SDL2 catches SIGINT by default we can’t even interrupt our
emulator with ^C anymore.
Fortunately it’s an easy fix: instead of initializing the SDL context in the
Renderer we move it all the way up in the main function and then check for
events in the top level loop. Since we need the SDL context to create the window
we also have to shuffle constructors a bit: I’ve decided to create the renderer in
the main function then move it into the Gpu constructor which is itself moved
into the Interconnect constructor:
u s e s d l 2 : : e v e n t : : Event ;
u s e s d l 2 : : k e y c o d e : : KeyCode ;

f n main ( ) {
l e t b i o s = B i o s : : new(&Path : : new ( ” roms /SCPH1001 . BIN” ) ) . unwrap ( ) ;

// We must i n i t i a l i z e SDL b e f o r e t h e i n t e r c o n n e c t i s c r e a t e d
since
// i t c o n t a i n s t h e GPU and t h e GPU n e e d s t o c r e a t e a window
l e t s d l c o n t e x t = s d l 2 : : i n i t ( : : s d l 2 : : INIT VIDEO ) . unwrap ( ) ;

let r e n d e r e r = Rende rer : : new(& s d l c o n t e x t ) ;

let gpu = Gpu : : new ( r e n d e r e r ) ;
let i n t e r = I n t e r c o n n e c t : : new ( b i o s , gpu ) ;
let mut cpu = Cpu : : new ( i n t e r ) ;

l e t mut event pump = s d l c o n t e x t . event pump ( ) ;

loop {

157
for in 0 . . 1 000 000 {
cpu . r u n n e x t i n s t r u c t i o n ( ) ;
}

// See i f we s h o u l d q u i t
f o r e i n event pump . p o l l i t e r ( ) {
match e {
Event : : KeyDown { k e y c o d e : KeyCode : : Escape , . . } =>
return ,
Event : : Quit { . . } => r e t u r n ,
=> ( ) ,
}
}
}
}

When the Quit event is encountered (window closes, received SIGINT etc...)
we return from main, effectively exiting the program. For convenience I also quit
when the Escape key is pressed in the window.
The inner for loop is needed because checking for events before every in-
struction slows everything down very significantly so I only check once for every
million instruction executed.

6 The Interconnect: Generic loads and stores

At this point we have three load and three store methods in our interconnect to
deal with byte, halfword and word accesses. Those implementations look very
similar to each other.
When we implement the debugger and the timings you’ll see that we’ll need
special versions of those methods. At this rate we’ll end up with dozens of
memory access functions that will be very similar but for a few key differences.
This is a lot of potential code duplication. Fortunately we can avoid most of
it by making our code use generics instead of having different flavors for 8, 16
and 32bit loads and store.
The first step is to create a generic Addressable trait:
// / Types o f a c c e s s s u p p o r t e d by t h e P l a y s t a t i o n a r c h i t e c t u r e
#[ d e r i v e ( P a r t i a l E q , Eq , Debug ) ]
pub enum AccessWidth {
Byte = 1 ,
Halfword = 2 ,
Word = 4 ,
}

// / T r a i t r e p r e s e n t i n g t h e a t t r i b u t e s o f a p r i m i t i v e a d d r e s s a b l e
// / memory l o c a t i o n .
pub t r a i t A d d r e s s a b l e {
// / R e t r e i v e t h e width o f t h e a c c e s s
f n width ( ) −> AccessWidth ;
// / B u i l d an A d d r e s s a b l e v a l u e from an u32 . I f t h e A d d r e s s a b l e
is 8
// / o r 16 b i t s wide t h e MSBs a r e d i s c a r d e d t o f i t .
f n f r o m u 3 2 ( u32 ) −> S e l f ;
// / R e t r e i v e t h e v a l u e o f t h e A d d r e s s a b l e a s an u32 . I f t h e
// / A d d r e s s a b l e i s 8 o r 16 b i t s wide t h e MSBs a r e padded with 0 s
.
f n a s u 3 2 ( s e l f ) −> u32 ;
}

158
We can then implement this trait for u8, u16 and u32:
impl A d d r e s s a b l e f o r u8 {
f n width ( ) −> AccessWidth {
AccessWidth : : Byte
}

f n f r o m u 3 2 ( v : u32 ) −> u8 {
v a s u8
}

f n a s u 3 2 (& s e l f ) −> u32 {

∗ s e l f a s u32
}
}

impl A d d r e s s a b l e f o r u16 {
f n width ( ) −> AccessWidth {
AccessWidth : : Halfword
}

f n f r o m u 3 2 ( v : u32 ) −> u16 {

v a s u16
}

f n a s u 3 2 (& s e l f ) −> u32 {

∗ s e l f a s u32
}
}

impl A d d r e s s a b l e f o r u32 {
f n width ( ) −> AccessWidth {
AccessWidth : : Word
}

f n f r o m u 3 2 ( v : u32 ) −> u32 {

v
}

f n a s u 3 2 (& s e l f ) −> u32 {

∗self
}
}

6.1 Porting the CPU code

We can now factor our various memory access functions by making them generic
over this Addressable trait. On the CPU it looks like this:
impl Cpu {
// . . .

// / Memory r e a d
f n l o a d <T : A d d r e s s a b l e >(& s e l f , addr : u32 ) −> T {
s e l f . i n t e r . l o a d ( addr )
}

// / Memory w r i t e
f n s t o r e <T : A d d r e s s a b l e >(&mut s e l f , addr : u32 , v a l : T) {
i f s e l f . s r & 0 x10000 != 0 {
// Cache i s i s o l a t e d , i g n o r e w r i t e
p r i n t l n ! ( ” Ignoring s t o r e while cache i s i s o l a t e d ” ) ;

159
return ;
}

s e l f . i n t e r . s t o r e ( addr , v a l ) ;
}
}

We can then replace the various load* and store* functions used in the
CPU code by the generic versions. Most of the time the compiler can’t infer the
type properly (since we’re casting all over the place to get the correct width and
sign extension) so we have to explicitly tell it which of u8, u16 or u32 to use.
For instance our LB implementation becomes:
impl Cpu {
// . . .

// / Load Byte ( s i g n e d )
f n o p l b (&mut s e l f ,
instruction : Instruction ,
d e b u g g e r : &mut Debugger ) {

l e t i = i n s t r u c t i o n . imm se ( ) ;
let t = instruction . t () ;
let s = instruction . s () ;

l e t addr = s e l f . r e g ( s ) . wrapping add ( i ) ;

// Cast a s i 8 t o f o r c e s i g n e x t e n s i o n
l e t v = s e l f . l o a d :: < u8>(addr , d e b u g g e r ) a s i 8 ;

// Put t h e l o a d i n t h e d e l a y s l o t
s e l f . l o a d = ( t , v a s u32 ) ;
}
}

6.2 Porting the interconnect code

Then we need to port our interconnect code to use the generic interface. We
have to merge the various load and store function in a single generic one. First
the load function:
impl I n t e r c o n n e c t {
// . . .

// / I n t e r c o n n e c t : l o a d v a l u e a t ‘ addr ‘
pub f n l o a d <T : A d d r e s s a b l e >(& s e l f , addr : u32 ) −> T {
l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . l o a d ( o f f s e t ) ;
}

if l e t Some ( o f f s e t ) = map : : BIOS . c o n t a i n s ( a b s a d d r ) {

return s e l f . bios . load ( o f f s e t ) ;
}

if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”IRQ c o n t r o l r e a d { : x} ” , o f f s e t ) ;
r e t u r n Addressable : : from u32 ( 0 ) ;
}

160
if l e t Some ( o f f s e t ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {
r e t u r n s e l f . dma reg ( o f f s e t ) ;
}

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . gpu . l o a d ( o f f s e t ) ;
}

if l e t Some ( o f f s e t ) = map : : TIMERS . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled r e a d from t i m e r r e g i s t e r { : x} ” ,
offset ) ;
r e t u r n Addressable : : from u32 ( 0 ) ;
}

if l e t Some ( ) = map : : SPU . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled r e a d from SPU r e g i s t e r { : 0 8 x} ” ,
abs addr ) ;
r e t u r n Addressable : : from u32 ( 0 ) ;
}

if l e t Some ( ) = map : : EXPANSION 1 . c o n t a i n s ( a b s a d d r ) {

// No e x p a n s i o n implemented . Returns f u l l o n e s when no
// e x p a n s i o n i s p r e s e n t
r e t u r n Addressable : : from u32 ( ! 0 ) ;
}

p a n i c ! ( ” unhandled l o a d a t a d d r e s s { : 0 8 x} ” , addr ) ;
}
}

You can see that the Addressable::from u32 function can be used to return
a literal value without having to know the real type being used.
The store function is pretty straightforward:
impl I n t e r c o n n e c t {
// . . .

// / I n t e r c o n n e c t : s t o r e ‘ v a l ‘ i n t o ‘ addr ‘
pub f n s t o r e <T : A d d r e s s a b l e >(&mut s e l f , addr : u32 , v a l : T) {

l e t a b s a d d r = map : : m a s k r e g i o n ( addr ) ;

if l e t Some ( o f f s e t ) = map : :RAM. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . ram . s t o r e ( o f f s e t , v a l ) ;
}

if l e t Some ( o f f s e t ) = map : : IRQ CONTROL . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ”IRQ c o n t r o l : { : x} <− { : 0 8 x} ” , o f f s e t ,
val . as u32 () ) ;
return ;
}

if l e t Some ( o f f s e t ) = map : :DMA. c o n t a i n s ( a b s a d d r ) {

return s e l f . set dma reg ( o f f s e t , val ) ;
}

if l e t Some ( o f f s e t ) = map : : GPU. c o n t a i n s ( a b s a d d r ) {

r e t u r n s e l f . gpu . s t o r e ( o f f s e t , v a l ) ;
}

if l e t Some ( o f f s e t ) = map : : TIMERS . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o t i m e r r e g i s t e r ” ) ;

161
return ;
}

if l e t Some ( ) = map : : SPU . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o SPU r e g i s t e r ” ) ;
return ;
}

if l e t Some ( ) = map : : CACHE CONTROL. c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o CACHE CONTROL” ) ;
return ;
}

if l e t Some ( o f f s e t ) = map : : MEM CONTROL. c o n t a i n s ( a b s a d d r ) {

match o f f s e t {
0 => // Expansion 1 b a s e a d d r e s s
i f v a l != 0 x 1 f 0 0 0 0 0 0 {
p a n i c ! ( ”Bad e x p a n s i o n 1 b a s e a d d r e s s ” ) ;
},
4 => // Expansion 2 b a s e a d d r e s s
i f v a l != 0 x 1 f 8 0 2 0 0 0 {
p a n i c ! ( ”Bad e x p a n s i o n 2 b a s e a d d r e s s ” ) ;
},
=>
p r i n t l n ! ( ” Unhandled w r i t e t o MEM CONTROL r e g i s t e r ” ) ,
}

return ;
}

if l e t Some ( ) = map : : RAM SIZE . c o n t a i n s ( a b s a d d r ) {

// We i g n o r e w r i t e s a t t h i s a d d r e s s
return ;
}

if l e t Some ( o f f s e t ) = map : : EXPANSION 2 . c o n t a i n s ( a b s a d d r ) {

p r i n t l n ! ( ” Unhandled w r i t e t o e x p a n s i o n 2 r e g i s t e r ” ) ;
return ;
}

p a n i c ! ( ” unhandled s t o r e i n t o a d d r e s s { : 0 8 x } : { : 0 8 x} ” ,
addr , v a l . a s u 3 2 ( ) ) ;
}
}

6.3 Porting the RAM and BIOS

For the RAM we need to know how many bytes must be loaded or stored. We
can use the Addressable::width method to figure it out:
impl Ram {
// . . .

// / Fetch t h e l i t t l e e n d i a n v a l u e a t ‘ o f f s e t ‘
pub f n l o a d <T : A d d r e s s a b l e >(& s e l f , o f f s e t : u32 ) −> T {
l e t o f f s e t = o f f s e t as u s i z e ;

l e t mut v = 0 ;

f o r i i n 0 . . T : : width ( ) a s u s i z e {
v |= ( s e l f . data [ o f f s e t + i ] a s u32 ) << ( i ∗ 8 )

162
}

Addressable : : from u32 ( v )

}

// / S t o r e t h e 32 b i t l i t t l e e n d i a n word ‘ v a l ‘ i n t o ‘ o f f s e t ‘
pub f n s t o r e <T : A d d r e s s a b l e >(&mut s e l f , o f f s e t : u32 , v a l : T) {
l e t o f f s e t = o f f s e t as u s i z e ;

l e t val = val . as u32 () ;

f o r i i n 0 . . T : : width ( ) a s u s i z e {
s e l f . data [ o f f s e t + i ] = ( v a l >> ( i ∗ 8 ) ) a s u8 ;
}
}
}

The BIOS doesn’t have a store method since it’s read-only and we can reuse
the RAM’s load code without any change.
This looping and bit fiddling might seem a little under-optimized but LLVM
seems to handle it well and generates code which looks almost exactly like the
previous non-generic version. And we have less code duplication, so all is good.

6.4 Porting the GPU code

For the GPU I’ll be a little more conservative: at this point I’m not sure how it
behaves when we don’t use 32bit for register reads and writes. For this reason
I’ll still just support 32bit access by checking what kind of generic I’m using:
impl Gpu {
// . . .

pub f n l o a d <T : A d d r e s s a b l e >(& s e l f , o f f s e t : u32 ) −> T {

i f T : : width ( ) != AccessWidth : : Word {

p a n i c ! ( ” Unhandled { : ? } GPU l o a d ” , T : : width ( ) ) ;
}

let r =
match o f f s e t {
0 => s e l f . r e a d ( ) ,
4 => s e l f . s t a t u s ( ) ,
=> u n r e a c h a b l e ! ( ) ,
};

Addressable : : from u32 ( r )

}

pub f n s t o r e <T : A d d r e s s a b l e >(&mut s e l f , o f f s e t : u32 , v a l : T) {

i f T : : width ( ) != AccessWidth : : Word {

p a n i c ! ( ” Unhandled { : ? } GPU l o a d ” , T : : width ( ) ) ;
}

l e t val = val . as u32 () ;

match o f f s e t {
0 => s e l f . gp0 ( v a l ) ,
4 => s e l f . gp1 ( v a l ) ,
=> u n r e a c h a b l e ! ( ) ,
}

163
}
}

6.5 Porting the DMA code

Likewise we only support 32bit access on the DMA registers so we can modify
the code to reflect that:
impl I n t e r c o n n e c t {
// . . .

// / DMA r e g i s t e r r e a d
f n dma reg<T : A d d r e s s a b l e >(& s e l f , o f f s e t : u32 ) −> T {

i f T : : width ( ) != AccessWidth : : Word {

p a n i c ! ( ” Unhandled { : ? } DMA l o a d ” , T : : width ( ) ) ;
}

// . . .

Addressable : : from u32 ( r e s )

}

// / DMA r e g i s t e r w r i t e
f n s e t d m a r e g <T : A d d r e s s a b l e >(&mut s e l f , o f f s e t : u32 , v a l : T)
{
i f T : : width ( ) != AccessWidth : : Word {
p a n i c ! ( ” Unhandled { : ? } DMA s t o r e ” , T : : width ( ) ) ;
}

l e t val = val . as u32 () ;

// . . .
}
}

Now our code should build and behave exactly like it did before. On my
system the performance is the performance is the same as far as I can tell. This
more generic infrastructure will show its usefulness soon enough.

7 The Debugger: Breakpoints and Watchpoints

This section is of optional but having a good debugger can save us a lot of time
later on. Being able to disassemble the code, set breakpoints or watchpoints or
step through the assembly are invaluable tools when we need to understand why
some emulated game doesn’t behave properly in our emulator.
Writing a good debugger frontend can be quite some work however. For
simplicity’s sake I’ve decided to implement the GDB remote protocol over a
local TCP socket. This way I can just implement the low level debugging code
in the emulator and I use a general purpose GDB binary targeting the MIPS
architecture as a frontend. Then I can debug Playstation code almost like any
program using GDB, I can disassemble the code, dump the data etc. . . . If I run
code that I build myself I can even provide it with debugging symbols and step
through functions and other high level niceties.
You might prefer to design the frontend yourself and integrate it directly
in the emulator. It’s more work but you may add Playstation-specific features

164
more easily (GPU debugging comes to mind). For this reason I’m just going to
describe the low level debugging interface in this guide, you’ll decide what kind
of frontend you want to build on top.

7.1 Debugger memory access

This part is easy, we already have the generic load and store functions in our
Cpu that we can use to access the memory. We can simply pass a reference to
our Cpu in the debugger code and use that directly.
One potential issue with this approach is that loads and stores may have
unintended side-effects when used from the debugger. For instance if we read
from the GPUREAD register (when we properly implement it) we “pop” a word
from the read buffer and it’ll become unavailable when the real Playstation code
wants to read it.
Later on when we implement the timings even reading from from regular
RAM will take a few emulated CPU cycles which will effectively “waste” some
time for the emulated code and might result in a missed interrupt or something
similar.
Fortunately now that we have our generic load and store implementations
we’ll only have to implement a specialized version of those two functions if
the side-effects become unmanageable in debugging code. Those specialized
functions could ignore regular timings and even call specialized code in the
various peripherals to prevent any state change.
For the time being I’ll just call the regular load and store functions since
we don’t emulate enough side-effects to make a significant difference anyway.
That might change as me become more accurate.

7.2 Breakpoints
Breakpoints are triggered when a certain instruction gets executed. The instruc-
tion is identified by its memory address. We can store the breakpoint addresses
in a vector:
pub s t r u c t Debugger {
// / V e c t o r c o n t a i n i n g a l l a c t i v e b r e a k p o i n t a d d r e s s e s
b r e a k p o i n t s : Vec<u32 >,
}

We then need a pair of function for adding a deleting a breakpoint. It’s a

good idea to make sure we can’t insert the same address twice: insertions and
deletions are going to be rare while the address lookup will have to happen for
every instruction so we want to keep the list as small as possible:
impl Debugger {
// / Add a b r e a k p o i n t t h a t w i l l t r i g g e r when t h e i n s t r u c t i o n a t
// / ‘ addr ‘ i s about t o be e x e c u t e d .
f n a d d b r e a k p o i n t (&mut s e l f , addr : u32 ) {
// Make s u r e we ’ r e not a dd in g t h e same a d d r e s s t w i c e
i f ! s e l f . b r e a k p o i n t s . c o n t a i n s (&addr ) {
s e l f . b r e a k p o i n t s . push ( addr ) ;
}
}

// / D e l e t e b r e a k p o i n t a t ‘ addr ‘ . Does n o t h i n g i f t h e r e was no

// / b r e a k p o i n t s e t f o r t h i s a d d r e s s .

165
f n d e l b r e a k p o i n t (&mut s e l f , addr : u32 ) {
s e l f . b r e a k p o i n t s . r e t a i n (|& a | a != addr ) ;
}
}

Finally we can implement the method pc change that will be called before
every instruction to look for a breakpoint at the current address. Needless to
say this code is in a very critical path and must be as fast as possible:
impl Debugger {
// . . .

// / C a l l e d by t h e CPU when i t ’ s about t o e x e c u t e a new

// / i n s t r u c t i o n . This f u n c t i o n i s c a l l e d b e f o r e ∗ a l l ∗ CPU
// / i n s t r u c t i o n s s o i t n e e d s t o be a s f a s t a s p o s s i b l e .
pub f n p c c h a n g e (&mut s e l f , cpu : &mut Cpu ) {
i f s e l f . b r e a k p o i n t s . c o n t a i n s (&cpu . pc ( ) ) {
s e l f . debug ( cpu ) ;
}
}
}

The debug method is where the debugging frontend should be notified that
the execution stopped and wait for the user to resume the execution.
Using a vector to store the breakpoints might seem sub-optimal since it has
linear lookup time. A tree-based collection could theoritically work in logarithmic
time. We have to consider two things however: we want to optimize for the
common case where no debugging is taking place and no breakpoint is set and
even when we’re debugging we probably won’t be using thousands of breakpoints
simultaneously.
Iterating over an empty vector should be very cheap: a simple test of the
length of the vector and we exit the loop immediately. And even for small
non-empty vectors it will probably be faster than a more complex structure
(strong cache locality, no cache thrashing, no indirections, easy prefetching).
For these reasons I don’t think it’s necessary to bother using anything more
complicated than a good old vector, the constant cost probably matters more
than the linear complexity for our usage.
Finally we can plug pc change in our CPU:
impl Cpu {
// . . .

// / Run a s i n g l e CPU i n s t r u c t i o n and r e t u r n

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f , d e b u g g e r : &mut Debugger )
{
// S y n c h r o n i z e t h e p e r i p h e r a l s
s e l f . i n t e r . s y n c (&mut s e l f . t k ) ;

// Save t h e a d d r e s s o f t h e c u r r e n t i n s t r u c t i o n t o s a v e i n
// ‘EPC‘ i n c a s e o f an e x c e p t i o n .
s e l f . c u r r e n t p c = s e l f . pc ;

// Debugger e n t r y p o i n t : used f o r code b r e a k p o i n t s and

stepping
debugger . pc change ( s e l f ) ;

// . . .
}

166
pub f n pc(& s e l f ) −> u32 {
s e l f . pc
}
}

I pass the debugger object from the main function in order to be able to start
a debugging session at the press of a key:
f n main ( ) {
// . . .

l e t mut d e b u g g e r = Debugger : : new ( ) ;

l e t mut event pump = s d l c o n t e x t . event pump ( ) ;

loop {
for in 0 . . 1 000 000 {
cpu . r u n n e x t i n s t r u c t i o n (&mut d e b u g g e r ) ;
}

// See i f we s h o u l d q u i t
f o r e i n event pump . p o l l i t e r ( ) {
match e {
Event : : KeyDown { k e y c o d e : KeyCode : : Pause , . . } =>
d e b u g g e r . debug(&mut cpu ) ,
Event : : KeyDown { k e y c o d e : KeyCode : : Escape , . . } =>
return ,
Event : : Quit { . . } => r e t u r n ,
=> ( ) ,
}
}
}
}

In a quick benchmark this debugging code causes a small (but noticeable)

degradation of the performances. I think it’ll probably end up being well worth
it. We could make the compilation of the debugging code optional to make it
possible to have faster binaries when we don’t want debugging but we never
know when we might need it anyway and having several build configurations
makes the code harder to test and could lead to code rot. The debugger could
also potentially be used for cheating in games so it might make sense to leave it
enabled even for “end user” builds.

7.3 Watchpoints
Being able to break on a specific instruction is useful but sometimes we want to
know when a certain location in memory is loaded or modified. In order to do
that we can implement read and write watchpoints that will respectively check
each load and store address and trigger the debugger when a watched address is
encountered.
As for breakpoints we’ll store the watchpoint addresses in vectors:
pub s t r u c t Debugger {
// / V e c t o r c o n t a i n i n g a l l a c t i v e r e a d w a t c h p o i n t s
r e a d w a t c h p o i n t s : Vec<u32 >,
// / V e c t o r c o n t a i n i n g a l l a c t i v e w r i t e w a t c h p o i n t s
w r i t e w a t c h p o i n t s : Vec<u32 >,
}

167
The methods for adding, removing and testing the watchpoints will therefore
look very similar to the breakpoint implementation:
impl Debugger {
// . . .

// / Add a b r e a k p o i n t t h a t w i l l t r i g g e r when t h e CPU a t t e m p t s t o

// / r e a d from ‘ addr ‘
f n a d d r e a d w a t c h p o i n t (&mut s e l f , addr : u32 ) {
// Make s u r e we ’ r e not a dd in g t h e same a d d r e s s t w i c e
i f ! s e l f . r e a d w a t c h p o i n t s . c o n t a i n s (&addr ) {
s e l f . r e a d w a t c h p o i n t s . push ( addr ) ;
}
}

// / D e l e t e r e a d w a t c h p o i n t a t ‘ addr ‘ . Does n o t h i n g i f t h e r e was

no
// / b r e a k p o i n t s e t f o r t h i s a d d r e s s .
f n d e l r e a d w a t c h p o i n t (&mut s e l f , addr : u32 ) {
s e l f . r e a d w a t c h p o i n t s . r e t a i n (|& a | a != addr ) ;
}

// / C a l l e d by t h e CPU when i t ’ s about t o l o a d a v a l u e from

memory .
pub f n memory read(&mut s e l f , cpu : &mut Cpu , addr : u32 ) {
// XXX: how s h o u l d we h a n d l e u n a l i g n e d w a t c h p o i n t s ? For
// i n s t a n c e i f we have a w a t c h p o i n t on a d d r e s s 1 and t h e
CPU
// e x e c u t e s a ‘ l o a d 3 2 at ‘ a d d r e s s 0 , s h o u l d we b r e a k ? Also ,
// s h o u l d we mask t h e r e g i o n ?
i f s e l f . r e a d w a t c h p o i n t s . c o n t a i n s (&addr ) {
p r i n t l n ! ( ”Read w a t c h p o i n t t r i g g e r e d a t 0x { : 0 8 x} ” , addr )
;
s e l f . debug ( cpu ) ;
}
}

// / Add a b r e a k p o i n t t h a t w i l l t r i g g e r when t h e CPU a t t e m p t s t o

// / w r i t e t o ‘ addr ‘
f n a d d w r i t e w a t c h p o i n t (&mut s e l f , addr : u32 ) {
// Make s u r e we ’ r e not a dd in g t h e same a d d r e s s t w i c e
i f ! s e l f . w r i t e w a t c h p o i n t s . c o n t a i n s (&addr ) {
s e l f . w r i t e w a t c h p o i n t s . push ( addr ) ;
}
}

// / D e l e t e w r i t e w a t c h p o i n t a t ‘ addr ‘ . Does n o t h i n g i f t h e r e
was no
// / b r e a k p o i n t s e t f o r t h i s a d d r e s s .
f n d e l w r i t e w a t c h p o i n t (&mut s e l f , addr : u32 ) {
s e l f . w r i t e w a t c h p o i n t s . r e t a i n (|& a | a != addr ) ;
}

// / C a l l e d by t h e CPU when i t ’ s about t o l o a d a v a l u e from

memory .
pub f n memory write (&mut s e l f , cpu : &mut Cpu , addr : u32 ) {
// XXX: same remark a s memory read f o r u n a l i g n e d s t o r e s
i f s e l f . w r i t e w a t c h p o i n t s . c o n t a i n s (&addr ) {
p r i n t l n ! ( ” Write w a t c h p o i n t t r i g g e r e d a t 0x { : 0 8 x} ” , addr
);
s e l f . debug ( cpu ) ;
}
}

168
}

You can see that I put a few comments about unaligned access and regions,
I’m not entirely sure what’s the right thing to do here. I guess we’ll see how we
want the debugger to behave as we’re using it.
Now we just have to plug the memory read and write methods in our generic
load and store functions in the CPU:
impl Cpu {
// . . .

// / Memory r e a d
f n l o a d <T : A d d r e s s a b l e >(&mut s e l f ,
addr : u32 ,
d e b u g g e r : &mut Debugger ) −> T {
d e b u g g e r . memory read ( s e l f , addr ) ;

s e l f . i n t e r . l o a d (&mut s e l f . tk , addr )
}

// / Memory w r i t e
f n s t o r e <T : A d d r e s s a b l e >(&mut s e l f ,
addr : u32 ,
v a l : T,
d e b u g g e r : &mut Debugger ) {
d e b u g g e r . memory write ( s e l f , addr ) ;

if s e l f . sr . cache isolated () {
s e l f . c a c h e m a i n t e n a n c e ( addr , v a l ) ;
} else {
s e l f . i n t e r . s t o r e (&mut s e l f . tk , addr , v a l ) ;
}
}
}

We’ve added an additional debugger parameter to these two methods

so we have to pass the debugger reference from run next instruction to
decode and execute and finally to the various load and store methods that
need to do memory access (op sw, op lw, op swr, etc. . . ).
There are two issues with this implementation however. First we use this
store method to fetch instructions but we don’t want to trigger a read watchpoint
when we’re loading instructions (that’s what breakpoints are for). The fix is
easy, we just call the interconnect’s load method directly:
impl Cpu {
// . . .

// / Run a s i n g l e CPU i n s t r u c t i o n and r e t u r n

pub f n r u n n e x t i n s t r u c t i o n (&mut s e l f , d e b u g g e r : &mut Debugger )
{
// . . .

// Fetch i n s t r u c t i o n a t PC
l e t pc = s e l f . c u r r e n t p c ;
l e t i n s t r u c t i o n = I n s t r u c t i o n ( s e l f . i n t e r . l o a d ( pc ) ) ;

// . . .
}
}

169
An other problem is that you might be using this CPU load method in your
debugger to read the memory’s contents. Obviously you don’t want to recursively
trigger the debugger when you use it to read some memory location where a
watchpoint happens to live. Instead we can create an other method used for
loading data for debugging purposes42 . I named this method examine:
impl Cpu {
// . . .

// / Debugger memory r e a d
pub f n examine<T : A d d r e s s a b l e >(&mut s e l f , addr : u32 ) −> T {
s e l f . i n t e r . l o a d (&mut s e l f . tk , addr )
}
}

7.4 Code disassembly and beyond

I didn’t show any disassembler code since GDB does it for me but it shouldn’t
be too difficult to implement since MIPS instructions are fixed width. Just
read the code you want to disassemble 32bits at a time and then implement
something similar to our decode and execute method but instead of executing
the instruction you return the disassembled code in a string for instance.
If you want to be fancy and support MISP assembler pseudo-instructions you’ll
have to handle certain instructions specifically, for instance sll $zero, $zero, 0
could be displayed as nop while addu $1, $2, $zero should be move $1, $2.
Of course it’s still correct if you keep the real instructions instead of the assembler
shorthand but it’s generally more readable if you use the later.
Later on we’ll have to consider adding debugging for the GPU as well
(displaying primitives, textures, exploring linked lists etc. . . ).

8 The CPU: Instruction cache

Before we move on to the GPU timings let’s start by implementing the CPU
instruction cache. Without it we won’t be able to emulate the CPU speed
properly since cached code gets executed much faster than instructions that have
to be fetched from RAM. The CPU also has a data cache but it’s not used as a
proper cache so we can leave that for later.

8.1 Instruction cache lookup behavior

The Playstation’s CPU has a 4KB instruction cache that can contain up to 1024
instructions across 256 4-instruction cachelines. The cache is directly mapped
which means that there’s only one possible cacheline for any given memory
address.
Here’s how it works: each cacheline contains enough room for 4 instructions
plus a tag and valid bits. The tag contains the upper 20bits of the physical
address being cached, it is used to make sure we’re really getting the data from
the correct memory location and not some other address that happens to alias
the cacheline. Then for each instruction in the cacheline a bit says if the entry
42 It can be used as a starting point for a “side-effect free” debugging path as I mentioned in

section 7.1.

170
is valid or not. When fetching an instruction if the tag is mismatched or the
entry is not valid it’ll have to be fetched from main memory, otherwise we can
directly use the cached value.

Tag [31:12] Cacheline [11:4] Index [3:2] Word alignment [1:0]

0x80005 0x38 1 0

Table 9: Anatomy of cached address 0x80005384

Let’s take a concrete example shown in table 9. Suppose the CPU wants to
run code from address 0x80005384. First we need to figure out which cacheline
matches this address, for that we need to shift the address two bits to the right
(since we have 4 32bit words per cache line) and then take the 8 LSBs (since
we have 256 cachelines in total). In this case we end up in cacheline number 56
(0x38).
Now that we have identified the cacheline we need to see if it already contains
data for the current address, after all any address ending in 0x38X will match
the same cache location. In order to do that we compare the tag stored in the
line with bits [31:12] of the instruction address, in this case 0x80005. If the tag
doesn’t match we consider it invalid and we have to fetch it from RAM.
If the tag is the one we’re looking for however we just have to check the valid
bit for the instruction we’re looking for. Bits [3:2] give us the location in the
4-word cacheline, bits [1:0] are always 0 since all instructions are word-aligned43
so in this case we’re looking for the 2nd word in the cacheline. If the valid bit is
set we can use it directly, otherwise the instruction is invalid and we must fetch
it from main RAM.

8.2 Instruction cache fetch behavior

When an invalid instruction is encountered (either because the line has the wrong
tag or the valid bit is not set) the Playstation cache will will update the tag to
match the current address and then fetch the missing instruction as well as any
following instruction in the same cacheline, but not the one before it.
For instance in the case of address 0x80005384 if we have a cache miss
the instructions at addresses 0x80005384, 0x80005388 and 0x8000538c will be
fetched (words at indexes 1, 2 and 3 respectively) but not 0x80005380 (the word
at index 0). I suppose that if some of the following instructions are already
valid they’re not fetched again but I haven’t tested it and it’s probably not very
common anyway.

43 Remember that we generate an exception if we ever end up with a misaligned PC so we

can always assume that it’s correctly aligned after that.

171
List of Tables
1 Playstation memory map . . . . . . . . . . . . . . . . . . . . . . 10
2 KSEG2 memory map . . . . . . . . . . . . . . . . . . . . . . . . . 10
3 SCPH1001.BIN BIOS checksums . . . . . . . . . . . . . . . . . . . 11
4 R3000 CPU general purpose registers . . . . . . . . . . . . . . . . 17
5 R3000 CPU special purpose registers . . . . . . . . . . . . . . . . 18
6 16 to 32bit conversion: influence of sign extension . . . . . . . . . 24
7 Special cases in divisions . . . . . . . . . . . . . . . . . . . . . . . 59
8 DMA Channel Control register description . . . . . . . . . . . . . 98
9 Anatomy of cached address 0x80005384 . . . . . . . . . . . . . . 171

List of Figures
1 OpenGL shaded RGB triangle . . . . . . . . . . . . . . . . . . . 144
2 First output of our OpenGL renderer . . . . . . . . . . . . . . . . 151
3 Playstation boot logo without textures . . . . . . . . . . . . . . . 155
4 Playstation boot logo with bad quad rendering . . . . . . . . . . 155

172

Acoustic Measurements For Sonar Transducer Test Personnel
No ratings yet
Acoustic Measurements For Sonar Transducer Test Personnel
197 pages
BASH Guide - Joseph DeVeau
100% (2)
BASH Guide - Joseph DeVeau
227 pages
UGEO - HM70A - Operation Manual (Vol1)
100% (1)
UGEO - HM70A - Operation Manual (Vol1)
232 pages
Abs Bendix
No ratings yet
Abs Bendix
72 pages
(The International Cryogenics Monograph Series) Graham Walker (Auth.) - Cryocoolers - Part 2 - Applications-Springer US (1983) PDF
No ratings yet
(The International Cryogenics Monograph Series) Graham Walker (Auth.) - Cryocoolers - Part 2 - Applications-Springer US (1983) PDF
420 pages
FREESCALE Familymanual DSP56300
No ratings yet
FREESCALE Familymanual DSP56300
552 pages
GEFIL1 SIM Week 7-9 Mohinog PDF
0% (1)
GEFIL1 SIM Week 7-9 Mohinog PDF
40 pages
Ethico-Moral and Legal Foundations of Client Education-Venus
No ratings yet
Ethico-Moral and Legal Foundations of Client Education-Venus
3 pages
BKI - Vol 2 - Rules For Hull
67% (3)
BKI - Vol 2 - Rules For Hull
355 pages
Sprugh 7
No ratings yet
Sprugh 7
1,013 pages
En CD00240193
No ratings yet
En CD00240193
911 pages
Jona
No ratings yet
Jona
4 pages
DSP56800ERM - Reference Manual
No ratings yet
DSP56800ERM - Reference Manual
728 pages
Different Between Deen and Religion
50% (2)
Different Between Deen and Religion
5 pages
Guia Basica de Comandos para Micropeocesadores
No ratings yet
Guia Basica de Comandos para Micropeocesadores
502 pages
S12zcpu RM V1 PDF
No ratings yet
S12zcpu RM V1 PDF
376 pages
Use of Modified Bitumen in Highway Construction: Minakshi Singhal Yudhvir Yadav
No ratings yet
Use of Modified Bitumen in Highway Construction: Minakshi Singhal Yudhvir Yadav
7 pages
Theda Weberlucks Electroacoustic Voices in Vocal Performance Art A Gender Issue 1
No ratings yet
Theda Weberlucks Electroacoustic Voices in Vocal Performance Art A Gender Issue 1
10 pages
Stm8 Programming Manual
No ratings yet
Stm8 Programming Manual
162 pages
HCS12 V1.5 Core User Guide: Original Release Date: 12 May 2000 Revised: 17 August 2000
No ratings yet
HCS12 V1.5 Core User Guide: Original Release Date: 12 May 2000 Revised: 17 August 2000
548 pages
Quantum Mechanical Spin
No ratings yet
Quantum Mechanical Spin
23 pages
A New Very High Sensitivity Potassium Magnetometer For Near Surface Geophysical Mapping
No ratings yet
A New Very High Sensitivity Potassium Magnetometer For Near Surface Geophysical Mapping
15 pages
Motorola PowerPC RISC CPU Reference Manual
No ratings yet
Motorola PowerPC RISC CPU Reference Manual
554 pages
21264ev68cb Ev68dc HRM
100% (1)
21264ev68cb Ev68dc HRM
360 pages
2 Literature Review
No ratings yet
2 Literature Review
15 pages
CPU08RM
No ratings yet
CPU08RM
200 pages
Grade 8 and 9 Workbook
No ratings yet
Grade 8 and 9 Workbook
155 pages
CPU08RM
No ratings yet
CPU08RM
200 pages
TWGMC 1N4007 - C727081 - Diode 1N4001 Surface Mount
No ratings yet
TWGMC 1N4007 - C727081 - Diode 1N4001 Surface Mount
3 pages
Volume 2 - Instruction Set Reference
No ratings yet
Volume 2 - Instruction Set Reference
966 pages
BITSAT Preference Sheet 2021
No ratings yet
BITSAT Preference Sheet 2021
4 pages
Zilog Z8000 Reference Manual
No ratings yet
Zilog Z8000 Reference Manual
299 pages
ZX Next Dev Guide r2
No ratings yet
ZX Next Dev Guide r2
231 pages
Book Eum
No ratings yet
Book Eum
452 pages
Reference - Manual
No ratings yet
Reference - Manual
437 pages
Alpha
No ratings yet
Alpha
158 pages
Summative For Week 1 & 2 Statistics
No ratings yet
Summative For Week 1 & 2 Statistics
3 pages
dm00104451 Cortexm0 Programming Manual For stm32l0 stm32g0 stm32wl and stm32wb Series Stmicroelectronics PDF
No ratings yet
dm00104451 Cortexm0 Programming Manual For stm32l0 stm32g0 stm32wl and stm32wb Series Stmicroelectronics PDF
110 pages
CPU08 Central Processor Unit: Reference Manual
No ratings yet
CPU08 Central Processor Unit: Reference Manual
200 pages
Reference Manual: M68HC12 and HCS12 Microcontrollers
No ratings yet
Reference Manual: M68HC12 and HCS12 Microcontrollers
414 pages
pm0223 stm32 Cortexm0 Mcus Programming Manual Stmicroelectronics
No ratings yet
pm0223 stm32 Cortexm0 Mcus Programming Manual Stmicroelectronics
110 pages
Motorola MC68336 User's Manual
100% (1)
Motorola MC68336 User's Manual
434 pages
Asian Countries
No ratings yet
Asian Countries
4 pages
C166SV2 Manual
No ratings yet
C166SV2 Manual
440 pages
tc1 6 Architecture Vol1
100% (1)
tc1 6 Architecture Vol1
225 pages
Stm32f0xxx Cortexm0 Programming Manual Stmicroelectronics
No ratings yet
Stm32f0xxx Cortexm0 Programming Manual Stmicroelectronics
91 pages
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
No ratings yet
IPR Gandhinagar Apprentice (Diploma Degree) Recruitment 2020RIJADEJAcom
3 pages
ST 20 Programing
No ratings yet
ST 20 Programing
212 pages
Nonfiction Reading Test Google
100% (2)
Nonfiction Reading Test Google
4 pages
Risc V Asm Manual
No ratings yet
Risc V Asm Manual
138 pages
x86 Assembly
No ratings yet
x86 Assembly
100 pages
IMD MBA Class Profiles
No ratings yet
IMD MBA Class Profiles
16 pages
ST92195C7B1 Datasheet
No ratings yet
ST92195C7B1 Datasheet
249 pages
Unit - 6 Promotion Decisions: Jacqueline
No ratings yet
Unit - 6 Promotion Decisions: Jacqueline
22 pages
Datasheet XC888
No ratings yet
Datasheet XC888
144 pages
Digital Twins For Precision Healthcare
No ratings yet
Digital Twins For Precision Healthcare
20 pages
Introduction To Computer Graphics
No ratings yet
Introduction To Computer Graphics
2 pages
MAD - PRACTICAL EXAM Slips - 23 - 24
No ratings yet
MAD - PRACTICAL EXAM Slips - 23 - 24
9 pages
MSP430x2xx Family - User Guide - Slau144e
No ratings yet
MSP430x2xx Family - User Guide - Slau144e
693 pages
s7300 Instruction List
No ratings yet
s7300 Instruction List
130 pages
z80 Documented
100% (1)
z80 Documented
52 pages
Alpha 21164 Data Sheet
No ratings yet
Alpha 21164 Data Sheet
122 pages
Assembly Wiki Book
No ratings yet
Assembly Wiki Book
100 pages
Open Silicon Pakistan Brochure
No ratings yet
Open Silicon Pakistan Brochure
1 page
MC68376BGCFT20
No ratings yet
MC68376BGCFT20
440 pages
Miraña Genus Aeromonas
No ratings yet
Miraña Genus Aeromonas
1 page
2 Term 9 Form
No ratings yet
2 Term 9 Form
31 pages
I486 Data Sheet Apr89
No ratings yet
I486 Data Sheet Apr89
177 pages
Book
No ratings yet
Book
40 pages
XC800 Arch UM v0.2
No ratings yet
XC800 Arch UM v0.2
137 pages
S12XCPUV2
No ratings yet
S12XCPUV2
497 pages
TMS320VC5509APGE Texas Instruments
No ratings yet
TMS320VC5509APGE Texas Instruments
148 pages
Computer Architecture and Assembly
No ratings yet
Computer Architecture and Assembly
357 pages
Service Catalog
No ratings yet
Service Catalog
3 pages
Zcpudoc
No ratings yet
Zcpudoc
338 pages
Infineon XC88XCLM DS v01 02 en
No ratings yet
Infineon XC88XCLM DS v01 02 en
144 pages
Tms 320 VC 5509 A
No ratings yet
Tms 320 VC 5509 A
150 pages
Cpu S12Z
No ratings yet
Cpu S12Z
376 pages
Q1 Arts8 Summative Test
No ratings yet
Q1 Arts8 Summative Test
2 pages
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
100% (2)
(FREE PDF Sample) Mostly Codeless Game Development: New School Game Engines Robert Ciesla Ebooks
55 pages
21264ev6 HRM
No ratings yet
21264ev6 HRM
348 pages
Game Boy
No ratings yet
Game Boy
167 pages
2020 Hwang, Effects, Multi-Level Concept Mapping-Based Question
No ratings yet
2020 Hwang, Effects, Multi-Level Concept Mapping-Based Question
17 pages
Alpha 21264 Microprocessor
No ratings yet
Alpha 21264 Microprocessor
340 pages
Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
From Everand
Securing ChatGPT: Best Practices for Protecting Sensitive Data in AI Language Models
Matthew C. Smith
No ratings yet
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Osama the Gun
From Everand
Osama the Gun
Norman Spinrad
5/5 (1)
The Gracious Lily Affair
From Everand
The Gracious Lily Affair
Van Wyck Mason
5/5 (1)
The Last Smile: Stone Angel #5
From Everand
The Last Smile: Stone Angel #5
Marvin H. Albert
No ratings yet
Deadline Yemen (The Elizabeth Darcy Series)
From Everand
Deadline Yemen (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)
Deadline Istanbul (The Elizabeth Darcy Series)
From Everand
Deadline Istanbul (The Elizabeth Darcy Series)
Peggy Hanson
5/5 (1)